Re: [HACKERS] binary representation of datatypes
Jeroen Vermeulen wrote: Matthieu Imbert wrote: scenario 1 - parse the textual representation of all results of requests to the database and convert textual timestamps to a binary format that i choose among those ones (number of microseconds since 2000-01-01, or a structure similar to pg_tm (but with microsecond precision), or a time-format similar to one defined in rfc1305, or something else) or scenario 2 - directly use pgsql binary timestamp format. I think the latter is far more efficient. I'm new to postgresql, but from what i understand, here are the conversions involved in both scenarios (hopping that my ascii art won't be garbled by your mail clients ;-) : scenario 1: .-. .--. .-. .--. .--. .--. .-. |timestamp| |pgsql | |timestamp| |pgsql | |timestamp | |my | |my | |storage |-|internal |-|storage |-|network |-|as |-|timestamp |-|timestamp| |in | |to| |in | |to| |textual | |conversion| |format | |database | |network | |network | |textual | |representation| |routines | | | |backend | |conversion| | | |conversion| | | | | | | | | |function | | | |function | | | | | | | '-' '--' '-' '--' '--' '--' '-' I think this scenario has two boxes too many. Why would the backend convert to network representation before converting to text? Jeroen You mean that when results are asked in textual representation (the default), data is sent on network directly as text? -- Matthieu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] binary representation of datatypes
Dear postgresql hackers, I would like to be able to get results from SQL commands directly in a binary format, instead of a textual one. Actually, I want to be able to get timestamps with their full precision (microsecond). googling around i found some threads on this mailing list about this: http://archives.postgresql.org/pgsql-interfaces/2007-05/msg00047.php http://archives.postgresql.org/pgsql-interfaces/2007-06/msg0.php http://archives.postgresql.org/pgsql-interfaces/2007-03/msg7.php From these threads and from postgresql source code, i figured out how to get timestamp data the way i need it: - make a PQexecParams asking for results in binary format. - convert the returned 64 bits integer from network representation to host representation (reverse the order of the bytes or do nothing, depending on the endianness of the platform) - the resulting 64 bits integer is the number of microseconds since 2000-01-01 - convert this number of microseconds as needed. (my test code currently only handles the case where timestamps are int64) This works great but i have a few questions: - Is the binary representation of data (especially timestamps) subject to change in the future? - wouldn't it be a good think if functions for dealing with this binary representation are made available to client code (for example: pq_getmsgint64 in src/backend/libpq/pqformat.c or timestamp2tm in src/interfaces/ecpg/pgtypeslib/timestamp.c). Doing so would ensure that client code does not have to reimplement things already correctly done in postgres (with all special cases, and correct error handling), and would not be broken if the internals change. Moreover it would remove from client code the burden to handle both cases of timestamp as int64 or timestamp as double. In short, what i would like (as a libpq client code writer), is a function which given an opaque binary representation of a timestamp returns me the timestamp as a number of microseconds since 2000-01-01, and a function which given a timestamp as a number of microseconds since 2000-01-01 returns me a structure similar to pg_tm, but without loss of information (with microseconds). Of course, this would be needed not only for timestamps but also for other types. If this is not possible, at least what i would like is to be sure that the code i write for converting timestamp binary representation will not be broken by future postgresql release, and is portable. best regards, -- Matthieu Imbert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] binary representation of datatypes
Michael Meskes wrote: On Tue, Oct 21, 2008 at 10:21:38AM +0200, Matthieu Imbert wrote: I would like to be able to get results from SQL commands directly in a binary format, instead of a textual one. Actually, I want to be able to get timestamps with their full precision (microsecond). Are you sure you cannot get those in textual mode? If so I wonder why I got some numbers in a quick test: ... [NO_PID]: ecpg_execute on line 37: query: select * from date_test where d = $1 ; with 1 parameter(s) on connection regress1 [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_execute on line 37: using PQexecParams [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: free_params on line 37: parameter 1 = 1966-01-17 [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_execute on line 37: correctly got 1 tuples with 2 fields [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_get_data on line 37: RESULT: 1966-01-17 offset: -1; array: yes [NO_PID]: sqlca: code: 0, state: 0 [NO_PID]: ecpg_get_data on line 37: RESULT: 2000-07-12 17:34:29.140787 offset: -1; array: yes [NO_PID]: sqlca: code: 0, state: 0 ... What do I miss here? Michael Yes microseconds are available in textual mode but i do want to use binary mode. Let me explain why: - my data will be time series. So typical requests will return lots of timestamped data (mainly floats or int). - after extraction i need to have all timestamps stored in format convenient for calculations. I can accommodate different formats (for example: number of microseconds since 2000-01-01, or a structure similar to pg_tm (but with microsecond precision), or a time-format similar to one defined in rfc1305), but definitely storing timestamps as text is a no go for me. so i have two choices: scenario 1 - parse the textual representation of all results of requests to the database and convert textual timestamps to a binary format that i choose among those ones (number of microseconds since 2000-01-01, or a structure similar to pg_tm (but with microsecond precision), or a time-format similar to one defined in rfc1305, or something else) or scenario 2 - directly use pgsql binary timestamp format. I think the latter is far more efficient. I'm new to postgresql, but from what i understand, here are the conversions involved in both scenarios (hopping that my ascii art won't be garbled by your mail clients ;-) : scenario 1: .-. .--. .-. .--. .--. .--. .-. |timestamp| |pgsql | |timestamp| |pgsql | |timestamp | |my | |my | |storage |-|internal |-|storage |-|network |-|as |-|timestamp |-|timestamp| |in | |to| |in | |to| |textual | |conversion| |format | |database | |network | |network | |textual | |representation| |routines | | | |backend | |conversion| | | |conversion| | | | | | | | | |function | | | |function | | | | | | | '-' '--' '-' '--' '--' '--' '-' scenario 2: .-. .--. .-. .--. .-. |timestamp| |pgsql | |timestamp| |pgsql | |timestamp| |storage |-|internal |-|storage |-|network |-|official | |in | |to| |in | |to| |format | |database | |network | |network | |offical | | | |backend | |conversion| | | |conversion| | | | | |function | | | |function | | | '-' '--' '-' '--' '-' if i'm correct, it seems obvious that the second scenario is more efficient (and less ugly). In scenario 2, when talking about timestamp 'official' format, i mean timestamp expressed as number of microseconds since 2000-01-01. But of course, it only deserves this name 'official' if it is guaranteed to stay the same across postgresql versions and platforms -- Matthieu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers