Re: [sqlite] dates, times and R
Thanks for the idea but the interface already handles that and does it without special names. The last example in my last post shows that d was correctly typed in the output because the interface noticed that it had the same name as an input column. Other problems are that it would still not handle propagation through expressions and would require that the user use special names different than the names in the input. I appreciate these ideas but these or equally effective alternatives are already implemented and it is precisely these kludges that I was trying to avoid. With one R statement the user can switch back ends so unless sqlite works as smoothly as the alternative backends a user will choose one of those if they are doing a lot of date and datetime processing. For many applications the other advantages of sqlite would take precedence. The fact that sqlirte is included right in the R driver package is very convenient as it means there is nothing additional to install beyond the R driver. (H2 is also included in the R driver but in that case java needs to be installed.) Also the new windowing functions, CTEs and other features are great. Unfortunately in the widely applicable case of dates and date times the other databases just work and additional care needs to be taken with sqlite. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] dates, times and R
Monday, August 12, 2019, 1:06:00 PM, Gabor Grothendieck wrote: > The whole point of this is to make it as easy as possible for the user. > With other backends the database handles the types but with sqlite > the user has to get involved. > ... > sqldf("select d as d__Date, d + 1 as nextDay__Date from DF", method > = "name__class") > ##dnextDay > ## 1 2000-01-01 2000-01-02 NOTE: I don't know R, so I may be talking rubbish... I *think* the idea of whoever suggested it was not that the *user* should have to do things like the above, but that the INTERFACE between R and SQLite should do it. In my simplistic view the interface "knows" the input is a date ("as.Date(..)") and so would tack the "...__date" hint on to the column name when talking to SQLite. Coming back out of SQLite, the interface would recognize the "...__date" and turn the value into an R date. Graham Holden ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] dates, times and R
The whole point of this is to make it as easy as possible for the user. With other backends the database handles the types but with sqlite the user has to get involved. It is not a matter of storage. It is a matter of maintaining the type information on the database side and passing the type information back to R. Perhaps an example would help. Consider this. Here R passes the type information to the H2 database backend and H2 passes it back to R so that the output is correctly typed. This works as desired. library(RH2) # load H2 driver and H2 database library(sqldf) # load package DF <- data.frame(d = as.Date("2000-01-01")) # create data.frame with one column d sqldf("select d, d + 1 as nextDay from DF") ##dnextDay ## 1 2000-01-01 2000-01-02 Now if we switch from H2 to sqlite that would have not have worked. Any of these workaournds are currently implemented and work but are not optimal. library(sqldf) # if no driver loaded it assumes sqlite # Alternative 1 # method = "Date" means all returned numbers are regarded as Date sqldf("select d, d + 1 as nextDay from DF", method = "Date") ##dnextDay ## 1 2000-01-01 2000-01-02 # Alternative 2 # this is similar to what Simon suggested. # method = "name__class" means use the column name suffix # to determine the type sqldf("select d as d__Date, d + 1 as nextDay__Date from DF", method = "name__class") ##dnextDay ## 1 2000-01-01 2000-01-02 # Alternative 3 # convert it manually. It deduces that d is Date because an input # column of the same name exists as Date but that is not foolproof # and does not, for example, work for the nextDay column so we # convert it manually out <- sqldf("select d, d + 1 as nextDay from DF") out$nextDay <- as.Date(out$nextDay, origin = "1970-01-01") out ##dnextDay ## 1 2000-01-01 2000-01-02 On Sun, Aug 11, 2019 at 7:16 PM Keith Medcalf wrote: > > > On Sunday, 11 August, 2019 07:45, Gabor Grothendieck > wrote: > > >R supports Date and POSIXct (date/time) classes which are represented > >internally as days and seconds since the UNIX Epoch respectively; > >however, due to the class it knows to display and manipulate them as > >dates and datetimes rather than numbers. > > SQLite3 can store integers and doubles. Seems to me this is quite sufficient > for storing a "number of seconds" since the Unix Epoch or "number of days" > since the Unix epoch. What exactly is the problem? An IEEE-754 double > precision floating point number is certainly capable of storing all offsets > from the Unix Epoch (either as a days or seconds offset) with far more > precision that a human is likely able to discern (or with more precision than > the accuracy of most atomic clocks, for that matter). > > >If sqldf sends a Date or POSIXct to SQLite then it is sent as a > >number (days or seconds since the UNIX Epoch) but when it is sent > >back it cannot know that that number is supposed to represent a > >date or datetime. > > Why not? The column type declaration in the table is merely an "arbitrary > string", and the returned column names from a select are merely "arbitrary > strings". It seems like an "application deficiency" that it cannot set and > retrieve "arbitrary metadata" for its own internal use (such as providing > column type declarations in the CREATE TABLE or additional metadata in the > column name (in select statements). Many other "applications" do so without > difficulty (for example the default sqlite3 wrapper in Python). Are the R > programmers somehow particularly deficient in this regard? > > I quite often store "timestamps" using application dependent epochs and > offsets without difficulty (for example, the number of one-minute intervals > since the Unix epoch). Given that just about every programming system and > language ever invented seems to store datetime data as some type of offset > from some epoch, and each of them different, does not seem to bother > interoperability in the least. Once you know the Epoch and Offset interval, > conversion is rather simple arithmetic that most children learned in primary > school. > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says a > lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] dates, times and R
On Sunday, 11 August, 2019 07:45, Gabor Grothendieck wrote: >R supports Date and POSIXct (date/time) classes which are represented >internally as days and seconds since the UNIX Epoch respectively; >however, due to the class it knows to display and manipulate them as >dates and datetimes rather than numbers. SQLite3 can store integers and doubles. Seems to me this is quite sufficient for storing a "number of seconds" since the Unix Epoch or "number of days" since the Unix epoch. What exactly is the problem? An IEEE-754 double precision floating point number is certainly capable of storing all offsets from the Unix Epoch (either as a days or seconds offset) with far more precision that a human is likely able to discern (or with more precision than the accuracy of most atomic clocks, for that matter). >If sqldf sends a Date or POSIXct to SQLite then it is sent as a >number (days or seconds since the UNIX Epoch) but when it is sent >back it cannot know that that number is supposed to represent a >date or datetime. Why not? The column type declaration in the table is merely an "arbitrary string", and the returned column names from a select are merely "arbitrary strings". It seems like an "application deficiency" that it cannot set and retrieve "arbitrary metadata" for its own internal use (such as providing column type declarations in the CREATE TABLE or additional metadata in the column name (in select statements). Many other "applications" do so without difficulty (for example the default sqlite3 wrapper in Python). Are the R programmers somehow particularly deficient in this regard? I quite often store "timestamps" using application dependent epochs and offsets without difficulty (for example, the number of one-minute intervals since the Unix epoch). Given that just about every programming system and language ever invented seems to store datetime data as some type of offset from some epoch, and each of them different, does not seem to bother interoperability in the least. Once you know the Epoch and Offset interval, conversion is rather simple arithmetic that most children learned in primary school. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] dates, times and R
Actually sqldf has multiple heuristics and the one you suggested is already one of them (except for minor differences in syntax) but this has the disadvantage that the user must specify classes whereas if the user simply uses any of the other backends they don't have to. On Sun, Aug 11, 2019 at 10:52 AM Simon Slavin wrote: > > On 11 Aug 2019, at 2:45pm, Gabor Grothendieck wrote: > > > R supports Date and POSIXct (date/time) classes which are represented > > internally as days and seconds since the UNIX Epoch respectively; > > however, due to the class it knows to display and manipulate them as > > dates and datetimes rather than numbers. > > > > If sqldf sends a Date or POSIXct to SQLite then it is sent as a number > > (days or seconds since the UNIX Epoch) but when it is sent back it cannot > > know that that number is supposed to represent a date or datetime. There > > are some default heuristics (if any of the input tables have a column name > > the same name as an output column name then such output columns are > > automatically converted) but this is far from foolproof. > > Add a new heuristic that if the column name ends in '_Date' or '_ > POSIXct' then the value should be converted. When R creates a new SQLite > table, it should append '_Date' or '_POSIXct' to the column name it chooses > for classes of those types. Existing heuristics should continue to work. > > I do understand the value of having date/time types in SQLite, but it is not > easy to do while retaining backward compatibility. It'll have to wait for > SQLite4 or something. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] dates, times and R
On 11 Aug 2019, at 2:45pm, Gabor Grothendieck wrote: > R supports Date and POSIXct (date/time) classes which are represented > internally as days and seconds since the UNIX Epoch respectively; > however, due to the class it knows to display and manipulate them as > dates and datetimes rather than numbers. > > If sqldf sends a Date or POSIXct to SQLite then it is sent as a number (days > or seconds since the UNIX Epoch) but when it is sent back it cannot know that > that number is supposed to represent a date or datetime. There are some > default heuristics (if any of the input tables have a column name the same > name as an output column name then such output columns are automatically > converted) but this is far from foolproof. Add a new heuristic that if the column name ends in '_Date' or '_ POSIXct' then the value should be converted. When R creates a new SQLite table, it should append '_Date' or '_POSIXct' to the column name it chooses for classes of those types. Existing heuristics should continue to work. I do understand the value of having date/time types in SQLite, but it is not easy to do while retaining backward compatibility. It'll have to wait for SQLite4 or something. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] dates, times and R
It's really useful that SQLite now supports window operations as that was one of the key features that R users needed to do many manipulations using SQLite. From the perspective of the R language there is really one particularly key feature left that prevents some users from easily using SQLite in many cases. That is the lack of date and time types. From the perspective of the sqldf package in R, the user writes: sqldf("select ...whatever...") and sqldf creates an empty sqlite data base, looks for all table names in the select statement, generates create table statements for them, uploads the tables to the new database, runs the statement, downloads the result and deletes the database. (This package uses a lower level R driver package for sqlite which may also be used directly.) R supports Date and POSIXct (date/time) classes which are represented internally as days and seconds since the UNIX Epoch respectively; however, due to the class it knows to display and manipulate them as dates and datetimes rather than numbers. If sqldf sends a Date or POSIXct to SQLite then it is sent as a number (days or seconds since the UNIX Epoch) but when it is sent back it cannot know that that number is supposed to represent a date or datetime. There are some default heuristics (if any of the input tables have a column name the same name as an output column name then such output columns are automatically converted) but this is far from foolproof. This is not a problem for the other backend databases that are supported since those backends have true date and datetime types so when R receives such objects it knows to convert them to R's similar types. Currently SQLite is the default backend but I normally recommend that users switch to the java based H2 backend if they are doing a lot of date and datetime processing for the above reason. A lot of data analysis does involve dates and date times so this covers a lot of applications. If SQLite were to support true date and datetime types, as do other databases, and not just functions which handle numbers as if they were dates or datetimes this problem would be addressed so this is a key remaining feature that I think SQLite needs. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users