Re: [sqlite] dates, times and R

2019-08-12 Thread Gabor Grothendieck
Thanks for the idea but the interface already handles that and does it
without special names. The last example in my last post shows that
d was correctly typed in the output because the interface noticed that
it had the same name as an input column.  Other problems are that it
would still not handle propagation through expressions and would require
that the user use special names different than the names in the
input.

I appreciate these ideas but these or equally effective alternatives are
already implemented and it is precisely these kludges that I was
trying to avoid.  With one R statement the user can switch back ends
so unless sqlite works as smoothly as the alternative backends
a user will choose one of those if they are doing a lot of date and
datetime processing.

For many applications the other advantages
of sqlite would take precedence.
The fact that sqlirte is included right in
the R driver package is very convenient as it means there is nothing
additional to install
beyond the R driver.  (H2 is also included in the R driver but in that
case java needs to be installed.) Also the new windowing functions, CTEs
and other features are great.
Unfortunately in the widely applicable case of dates and date times
the other databases just work and additional care needs to be taken
with sqlite.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-12 Thread Graham Holden
Monday, August 12, 2019, 1:06:00 PM, Gabor Grothendieck 
 wrote:

> The whole point of this is to make it as easy as possible for the user.
> With other backends the database handles the types but with sqlite
> the user has to get involved.

> ...

>   sqldf("select d as d__Date, d + 1 as nextDay__Date from DF", method
> = "name__class")
>   ##dnextDay
>   ## 1 2000-01-01 2000-01-02

NOTE: I don't know R, so I may be talking rubbish...

I *think* the idea of whoever suggested it was not that the *user*
should have to do things like the above, but that the INTERFACE
between R and SQLite should do it. In my simplistic view the interface
"knows" the input is a date ("as.Date(..)") and so would tack the
"...__date" hint on to the column name when talking to SQLite. Coming
back out of SQLite, the interface would recognize the "...__date" and
turn the value into an R date.

Graham Holden



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-12 Thread Gabor Grothendieck
The whole point of this is to make it as easy as possible for the user.
With other backends the database handles the types but with sqlite
the user has to get involved.

It is not a matter of storage.  It is a matter of maintaining the type
information
on the database side and passing the type information back to R.

Perhaps an example would help.  Consider this.  Here R passes the
type information to the H2 database backend and H2 passes it back to R
so that the output is correctly typed.  This works as desired.

  library(RH2) # load H2 driver and H2 database
  library(sqldf) # load package

  DF <- data.frame(d = as.Date("2000-01-01"))  # create data.frame
with one column d

  sqldf("select d, d + 1 as nextDay from DF")
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

Now if we switch from H2 to sqlite that would have not have worked.
Any of these workaournds
are currently implemented and work but are not optimal.

  library(sqldf)  # if no driver loaded it assumes sqlite

  # Alternative 1
  # method = "Date" means all returned numbers are regarded as Date

  sqldf("select d, d + 1 as nextDay from DF", method = "Date")
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

 # Alternative 2
  # this is similar to what Simon suggested.
  # method = "name__class" means use the column name suffix
  # to determine the type

  sqldf("select d as d__Date, d + 1 as nextDay__Date from DF", method
= "name__class")
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

 # Alternative 3
  # convert it manually.  It deduces that d is Date because an input
  # column of the same name exists as Date but that is not foolproof
  # and does not, for example, work for the nextDay column so we
  # convert it manually

  out <- sqldf("select d, d + 1 as nextDay from DF")
  out$nextDay <- as.Date(out$nextDay, origin = "1970-01-01")
  out
  ##dnextDay
  ## 1 2000-01-01 2000-01-02

On Sun, Aug 11, 2019 at 7:16 PM Keith Medcalf  wrote:
>
>
> On Sunday, 11 August, 2019 07:45, Gabor Grothendieck 
>  wrote:
>
> >R supports Date and POSIXct (date/time) classes which are represented
> >internally as days and seconds since the UNIX Epoch respectively;
> >however, due to the class it knows to display and manipulate them as
> >dates and datetimes rather than numbers.
>
> SQLite3 can store integers and doubles.  Seems to me this is quite sufficient 
> for storing a "number of seconds" since the Unix Epoch or "number of days" 
> since the Unix epoch.  What exactly is the problem?  An IEEE-754 double 
> precision floating point number is certainly capable of storing all offsets 
> from the Unix Epoch (either as a days or seconds offset) with far more 
> precision that a human is likely able to discern (or with more precision than 
> the accuracy of most atomic clocks, for that matter).
>
> >If sqldf sends a Date or POSIXct to SQLite then it is sent as a
> >number (days or seconds since the UNIX Epoch) but when it is sent
> >back it cannot know that that number is supposed to represent a
> >date or datetime.
>
> Why not?  The column type declaration in the table is merely an "arbitrary 
> string", and the returned column names from a select are merely "arbitrary 
> strings".  It seems like an "application deficiency" that it cannot set and 
> retrieve "arbitrary metadata" for its own internal use (such as providing 
> column type declarations in the CREATE TABLE or additional metadata in the 
> column name (in select statements).  Many other "applications" do so without 
> difficulty (for example the default sqlite3 wrapper in Python).  Are the R 
> programmers somehow particularly deficient in this regard?
>
> I quite often store "timestamps" using application dependent epochs and 
> offsets without difficulty (for example, the number of one-minute intervals 
> since the Unix epoch).  Given that just about every programming system and 
> language ever invented seems to store datetime data as some type of offset 
> from some epoch, and each of them different, does not seem to bother 
> interoperability in the least.  Once you know the Epoch and Offset interval, 
> conversion is rather simple arithmetic that most children learned in primary 
> school.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
> lot about anticipated traffic volume.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-11 Thread Keith Medcalf

On Sunday, 11 August, 2019 07:45, Gabor Grothendieck  
wrote:

>R supports Date and POSIXct (date/time) classes which are represented
>internally as days and seconds since the UNIX Epoch respectively;
>however, due to the class it knows to display and manipulate them as
>dates and datetimes rather than numbers.

SQLite3 can store integers and doubles.  Seems to me this is quite sufficient 
for storing a "number of seconds" since the Unix Epoch or "number of days" 
since the Unix epoch.  What exactly is the problem?  An IEEE-754 double 
precision floating point number is certainly capable of storing all offsets 
from the Unix Epoch (either as a days or seconds offset) with far more 
precision that a human is likely able to discern (or with more precision than 
the accuracy of most atomic clocks, for that matter).

>If sqldf sends a Date or POSIXct to SQLite then it is sent as a
>number (days or seconds since the UNIX Epoch) but when it is sent 
>back it cannot know that that number is supposed to represent a 
>date or datetime.

Why not?  The column type declaration in the table is merely an "arbitrary 
string", and the returned column names from a select are merely "arbitrary 
strings".  It seems like an "application deficiency" that it cannot set and 
retrieve "arbitrary metadata" for its own internal use (such as providing 
column type declarations in the CREATE TABLE or additional metadata in the 
column name (in select statements).  Many other "applications" do so without 
difficulty (for example the default sqlite3 wrapper in Python).  Are the R 
programmers somehow particularly deficient in this regard?

I quite often store "timestamps" using application dependent epochs and offsets 
without difficulty (for example, the number of one-minute intervals since the 
Unix epoch).  Given that just about every programming system and language ever 
invented seems to store datetime data as some type of offset from some epoch, 
and each of them different, does not seem to bother interoperability in the 
least.  Once you know the Epoch and Offset interval, conversion is rather 
simple arithmetic that most children learned in primary school.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.







___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-11 Thread Gabor Grothendieck
Actually sqldf has multiple heuristics and the one you suggested is
already one of them
(except for minor differences in syntax) but this has the disadvantage
that the user must
specify classes whereas if the user simply uses any of the other backends
 they don't have to.

On Sun, Aug 11, 2019 at 10:52 AM Simon Slavin  wrote:
>
> On 11 Aug 2019, at 2:45pm, Gabor Grothendieck  wrote:
>
> > R supports Date and POSIXct (date/time) classes which are represented
> > internally as days and seconds since the UNIX Epoch respectively;
> > however, due to the class it knows to display and manipulate them as
> > dates and datetimes rather than numbers.
> >
> > If sqldf sends a Date or POSIXct to SQLite then it is sent as a number 
> > (days or seconds since the UNIX Epoch) but when it is sent back it cannot 
> > know that that number is supposed to represent a date or datetime.   There 
> > are some default heuristics (if any of the input tables have a column name 
> > the same name as an output column name then such output columns are 
> > automatically converted) but this is far from foolproof.
>
> Add a new heuristic that if the column name ends in '_Date' or '_
> POSIXct' then the value should be converted.  When R creates a new SQLite 
> table, it should append '_Date' or '_POSIXct' to the column name it chooses 
> for classes of those types.  Existing heuristics should continue to work.
>
> I do understand the value of having date/time types in SQLite, but it is not 
> easy to do while retaining backward compatibility.  It'll have to wait for 
> SQLite4 or something.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] dates, times and R

2019-08-11 Thread Simon Slavin
On 11 Aug 2019, at 2:45pm, Gabor Grothendieck  wrote:

> R supports Date and POSIXct (date/time) classes which are represented
> internally as days and seconds since the UNIX Epoch respectively;
> however, due to the class it knows to display and manipulate them as
> dates and datetimes rather than numbers.
> 
> If sqldf sends a Date or POSIXct to SQLite then it is sent as a number (days 
> or seconds since the UNIX Epoch) but when it is sent back it cannot know that 
> that number is supposed to represent a date or datetime.   There are some 
> default heuristics (if any of the input tables have a column name the same 
> name as an output column name then such output columns are automatically 
> converted) but this is far from foolproof.

Add a new heuristic that if the column name ends in '_Date' or '_ 
POSIXct' then the value should be converted.  When R creates a new SQLite 
table, it should append '_Date' or '_POSIXct' to the column name it chooses for 
classes of those types.  Existing heuristics should continue to work.

I do understand the value of having date/time types in SQLite, but it is not 
easy to do while retaining backward compatibility.  It'll have to wait for 
SQLite4 or something.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] dates, times and R

2019-08-11 Thread Gabor Grothendieck
It's really useful that SQLite now supports window operations as that was
one of the key features that R users needed to do many manipulations
using SQLite.

From the perspective of the R language there is really one
particularly key feature left that prevents some users from easily
using SQLite in many cases.  That is the lack of date and time types.

From the perspective of the sqldf package in R, the user writes:

sqldf("select ...whatever...")

and sqldf creates an empty sqlite data base, looks for all table names
in the select statement, generates create table statements for them,
uploads the tables to the new database, runs the statement, downloads
the result and deletes the database. (This package uses a lower level
R driver package for sqlite which may also be used directly.)

R supports Date and POSIXct (date/time) classes which are represented
internally as days and seconds since the UNIX Epoch respectively;
however, due to the class it knows to display and manipulate them as
dates and datetimes rather than numbers.

If sqldf sends a Date or POSIXct to SQLite then it is sent as a number (days or
seconds since the UNIX Epoch) but when it is sent back it cannot know that
that number is supposed to represent a date or datetime.   There are some
default heuristics (if any of the input tables have a column name the same name
as an output column name then such output columns are automatically
converted) but this is far from foolproof.

This is not a problem for the other backend databases  that are
supported since those backends have true date and datetime types so
when R receives such objects it knows to convert them to R's similar types.
Currently SQLite is the default backend but I normally recommend that users
switch to the java based H2 backend if they are doing a lot of date and datetime
processing for the above reason. A lot of data analysis does involve
dates and date times so this covers a lot of applications.

If SQLite were to support true date and datetime types, as do other databases,
and not just functions which handle numbers as if they were dates or
datetimes this problem would be addressed so this is a key remaining feature
that I think SQLite needs.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users