Re: [R-sig-DB] SQL escaping/quoting proposal
I've cleaned up the examples and made them work and put the results in https://gist.github.com/hadley/7057387 - that should make discussion a bit more concrete. I made a few more changes to https://gist.github.com/hadley/7057387 * the method is now based on the connection, not the driver, since for some drivers the escaping may vary between connections (e.g. ODBC) * I added a sample tableName function (which I don't think would be included in DBI), showing how you might accept multiple forms of table names, and do the right thing as much as possible. Hadley -- Chief Scientist, RStudio http://had.co.nz/ ___ R-sig-DB mailing list -- R Special Interest Group R-sig-DB@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db
Re: [R-sig-DB] SQL escaping/quoting proposal
On 13-10-18 12:44 PM, Hadley Wickham wrote: Hi all, The approach that DBI takes to escaping is sub-optimal: it tries to figure out if an R variable name matches an SQL reserved word, and if so munge it so that there's no longer a conflict. This creates a situation where there are some identifiers that are valid in R, and some that are valid in SQL and we have a complicated and bug prone approach to converting between them. Instead, I recommend taking an approach where identifiers (i.e. table and field names) are always quoted using the appropriate database syntax. I think this addresses my longtime wish that DBI would present a consistent interface on the R side wrt capitalization even though the db engines do different things in this regard. This would make changing among engines easier. If it does not achieve this, is it possible? Paul This not only avoids any problems with SQL reserved words, but it also ensures that every field name in R (even those containing spaces and other special characters) can be used in SQL. To achieve this change, I think we should to: * deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()` * add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`. The new generics would be defined on the driver object, and would come with default methods as follows: ``` setGeneric(sqlQuoteString, function(drv, x, ...) { standardGeneric(sqlQuoteString) }) setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) { x - gsub('', '', x, fixed = TRUE) paste('', x, '', sep = ) }) setGeneric(sqlQuoteIdentifer, function(drv, x, ...) { standardGeneric(sqlQuoteIdentifer) }) setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) { x - gsub(', '', x, fixed = TRUE) paste(', x, ', sep = ) }) ``` Individual implementations would be encouraged to provide methods that use the quoting functions provided by the client library, where available. Does anyone see any problems with this approach? Hadley ___ R-sig-DB mailing list -- R Special Interest Group R-sig-DB@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db
Re: [R-sig-DB] SQL escaping/quoting proposal
I've cleaned up the examples and made them work and put the results in https://gist.github.com/hadley/7057387 - that should make discussion a bit more concrete. Could you provide a bit more clarification on what the sqlQuoteIdentifier should do? How shall we deal when a vector of strings is passed? Is it right to assume that sqlQuoteIdentifier(drv, c('a', 'b')) should return a vector consisting of quoted results of individual element of the vector? Yes, I think that makes the most sense. How do we construct a reference to table with schema, or column with table? eg schema.table or table.column? More specifically, is it right to assume that sqlQuoteIdentifier is used for constructing individual part of the composite identifier? I think that would be up to the individual function author: you could assume that if a vector was passed then you should quote then concatenate together with .. Or you could assume that for more complicated references the user had already flagged that the input should not be escaped with sql(). You had a minor mistake in showing the default method (The name is both sqlQuoteString, and I am not sure which is intended for sqlQuoteIdentifier). Fixed. Another consideration is for the name of the function. Whether we should use sql prefix or use db prefix. I would like to know what others think for this point. I'm pretty sure it should be db to be consistent with the rest of the package. (And I've also added dbFetch as an alias since fetch is the _only_ function in DBI without the db prefix) Here, I think we can avoid problem in most cases, but there are still a bit cases where the encoding does not allow proper conversion. That's the problem of the database capability, there is not much things that the driver can do, though. Right, we can only support what the db can. The quoting function could also throw an error if it was not possible to quote the input in a safe way for the database. Hadley -- Chief Scientist, RStudio http://had.co.nz/ ___ R-sig-DB mailing list -- R Special Interest Group R-sig-DB@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db
Re: [R-sig-DB] SQL escaping/quoting proposal
Regarding (b), my reasoning is that when working with Postgres, I would want to allow Postgres to do its normal lower casing of unquoted identifiers. That is, `thisField` is a valid identifier, but passed in quoted case will be preserved, which means always having to quote the identifier in the future. Passed in quoted, Postgres will force it to `thisfield`, and if a client requests `thisField` unquoted, the correct field will be returned. It might be reasonable to opt in to this behaviour, but it's dangerous by default because R is case-sensitive. What happens if you're using dbWriteTable with this data.frame? df - data.frame(a = 1, A = 2) Hadley -- Chief Scientist, RStudio http://had.co.nz/ ___ R-sig-DB mailing list -- R Special Interest Group R-sig-DB@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db
Re: [R-sig-DB] SQL escaping/quoting proposal
Hi Hadley, Thanks to bring up this issue and I agree in the overall direction and have some minor concerns. Could you provide a bit more clarification on what the sqlQuoteIdentifier should do? How shall we deal when a vector of strings is passed? Is it right to assume that sqlQuoteIdentifier(drv, c('a', 'b')) should return a vector consisting of quoted results of individual element of the vector? How do we construct a reference to table with schema, or column with table? eg schema.table or table.column? More specifically, is it right to assume that sqlQuoteIdentifier is used for constructing individual part of the composite identifier? You had a minor mistake in showing the default method (The name is both sqlQuoteString, and I am not sure which is intended for sqlQuoteIdentifier). Another consideration is for the name of the function. Whether we should use sql prefix or use db prefix. I would like to know what others think for this point. This not only avoids any problems with SQL reserved words, but it also ensures that every field name in R (even those containing spaces and other special characters) can be used in SQL. Here, I think we can avoid problem in most cases, but there are still a bit cases where the encoding does not allow proper conversion. That's the problem of the database capability, there is not much things that the driver can do, though. -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan On 2013/10/19, at 1:44, Hadley Wickham wrote: Hi all, The approach that DBI takes to escaping is sub-optimal: it tries to figure out if an R variable name matches an SQL reserved word, and if so munge it so that there's no longer a conflict. This creates a situation where there are some identifiers that are valid in R, and some that are valid in SQL and we have a complicated and bug prone approach to converting between them. Instead, I recommend taking an approach where identifiers (i.e. table and field names) are always quoted using the appropriate database syntax. This not only avoids any problems with SQL reserved words, but it also ensures that every field name in R (even those containing spaces and other special characters) can be used in SQL. To achieve this change, I think we should to: * deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()` * add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`. The new generics would be defined on the driver object, and would come with default methods as follows: ``` setGeneric(sqlQuoteString, function(drv, x, ...) { standardGeneric(sqlQuoteString) }) setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) { x - gsub('', '', x, fixed = TRUE) paste('', x, '', sep = ) }) setGeneric(sqlQuoteIdentifer, function(drv, x, ...) { standardGeneric(sqlQuoteIdentifer) }) setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) { x - gsub(', '', x, fixed = TRUE) paste(', x, ', sep = ) }) ``` Individual implementations would be encouraged to provide methods that use the quoting functions provided by the client library, where available. Does anyone see any problems with this approach? Hadley -- Chief Scientist, RStudio http://had.co.nz/ ___ R-sig-DB mailing list -- R Special Interest Group R-sig-DB@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db ___ R-sig-DB mailing list -- R Special Interest Group R-sig-DB@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-db