Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
PostgreSQL has a very powerful possibilities for storing any kind of encoding. So maybe it makes sense to add the ENCODING as another column property, the same way a COLLATION was added? Some other people in this community suggested that. ANd the SQL standard suggests the same -- specifying a character encoding for each column: CHAR(n) CHARASET SET ch. Text operations should work automatically, as in memory all strings will be converted to the database encoding. This approach will also open a possibility to implement custom ENCODINGs for the column data storage, like snappy compression or even BSON, gobs or protbufs for much more compact type storage. Thanks for your idea that sounds interesting, although I don't understand that well. The idea is very simple: CREATE DATABASE utf8_database ENCODING 'utf8'; \c utf8_database CREATE TABLE a( id serial, ascii_data text ENCODING 'ascii', -- will use ascii_to_utf8 to read and utf8_to_ascii to write koi8_data text ENCODING 'koi8_r', -- will use koi8_r_to_utf8 to read and utf8_to_koi8_r to write json_data json ENCODING 'bson' -- will use bson_to_json to read and json_to_bson to write ); The problem with bson_to_json here is that probably it will not be possible to write JSON in koi8_r for example. But now it is also even not considered in these discussions. If the ENCODING machinery would get not only the encoding name, but also the type OID, it should be possible to write encoders for TYPEs and array of TYPEs (I had to do it using the casts to bytea and protobuff to minimize the size of storage for an array of types when writing a lot of data, that could be unpacked afterwords directly in the DB as normal database types). I hope I made my point a little bit clearer. Regards, Valentine Gogichashvili
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Hi, That may be what's important to you, but it's not what's important to me. National character types support may be important to some potential users of PostgreSQL and the popularity of PostgreSQL, not me. That's why national character support is listed in the PostgreSQL TODO wiki. We might be losing potential users just because their selection criteria includes national character support. the whole NCHAR appeared as hack for the systems, that did not have it from the beginning. It would not be needed, if all the text would be magically stored in UNICODE or UTF from the beginning and idea of character would be the same as an idea of a rune and not a byte. PostgreSQL has a very powerful possibilities for storing any kind of encoding. So maybe it makes sense to add the ENCODING as another column property, the same way a COLLATION was added? It would make it possible to have a database, that talks to the clients in UTF8 and stores text and varchar data in the encoding that is the most appropriate for the situation. It will make it impossible (or complicated) to make the database have a non-UTF8 default encoding (I wonder who should need that in this case), as conversions will not be possible from the broader charsets into the default database encoding. One could define an additional DATABASE property like LC_ENCODING that would work for the ENCODING property of a column like LC_COLLATE for COLLATE property of a column. Text operations should work automatically, as in memory all strings will be converted to the database encoding. This approach will also open a possibility to implement custom ENCODINGs for the column data storage, like snappy compression or even BSON, gobs or protbufs for much more compact type storage. Regards, -- Valentine Gogichashvili
Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?
On Sun, Oct 30, 2011 at 22:12, Eric Ridge eeb...@gmail.com wrote: Yes. It's basically a modifier to the star that immediately precedes it. In order to support excluding multiple columns, it needs parens: SELECT foo.* EXCLUDING (foo.x, foo.y), bar.* EXCLUDING (bar.y), baz.z, (a+b) AS c but yes, that's what I'm thinking. I think doing this will require more changes to the grammar than I had first thought because there'd be no point in supporting: SELECT foo.* EXCLUDING (foo.* EXCLUDING foo.y) FROM ... It looks like the above would be implicitly allowed without a bit of extra work. But, if you've got a complex query consisting of a few joins, it'd be nice to say: SELECT * EXCLUDING (table1.*, table2.x) FROM table1 INNER JOIN table2 ... Putting aside arguments like it is not a good idea to use * because it generates not sustainable code especially in case when you extend table structure, I think this construct would be really nice for building ROWs, for example in plpgsql triggers or in conditions for big update statements: IF (NEW.* EXCLUDING ( last_modified ) ) IS DISTINCT FROM (OLD.* EXCLUDING ( last_modified ) ) THEN NEW.last_modified = clock_timestamp() ; END IF by now, I do not know any really nice syntax to do that efficiently, and for some wide tables, building this kind of structures listing all the fields, that you have there is completely stupid and makes code unreadable. So I would definitely like to have a syntax, that you are suggesting in case it would give a possibility to construct ROWs (RECORDs). Regards, -- Valentine Gogichashvili
Re: [HACKERS] confusing invalid UTF8 byte sequence error
Has anyone else ever found this error message confusing: ERROR: 22021: invalid byte sequence for encoding UTF8: 0xdb24 I think what is really meant is better expressed like this: ERROR: 22021: invalid byte sequence for encoding UTF8: 0xdb 0x24 Otherwise it looks like a codepoint or a 16-bit word (endianness?) or who knows what. Talking about this error, I find the lack of context in this error message more confusing... relative position or surrounding charachters would be really helpful, when searching for a problem with encoding... -- Valentine Gogichashvili
Re: [HACKERS] Backup's from standby
What issue we may face if you take a backups(includes data dir + wal files) at standby without LVM snapshot? The backup might be corrupted in arbitrary ways. And what will happen, if one issues a pg_start_backup() on the master, then takes a file-backup on slave, and issues pg_stop_backup() on master again? As far as I remember this approach was working for me, considering, that all needed WAL files are transferred to the newly created DB copy as well. -- Valentine Gogichashvili
Re: [HACKERS] How to define global variable in postgresql
Hello. How can we define a global variable in postgresql? you can also use global structure in plpython for example: http://www.postgresql.org/docs/9.0/static/plpython-sharing.html
Re: [HACKERS] Improved JDBC driver part 2
Hi, I cannot get the file: wget http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz --2010-12-01 12:05:28-- http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz Resolving www.rsmogura.net... 64.120.14.83 Connecting to www.rsmogura.net|64.120.14.83|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2010-12-01 12:05:29 ERROR 404: Not Found. On Tue, Nov 30, 2010 at 7:49 PM, Radosław Smogura m...@smogura.eu wrote: Hello, Maybe you are interested about this what I done with JDBC === Original driver (Text mode) === * Memory * 1. Memory usage improvments when using result set input streams (no uneeded memory copy) - needs few touches for bigger performance. 2. Memory usage improvments for large data, should be no problem to load 1GB bytea[] when have only 300MB of memory (threshold size still hardcoded). * JDBC 4 * 1. XML are now correctly transformed before send to server - previous version used normal text-file transformations that is not enaugh. 2. In all modes (text/binary) XMLs are sended in binary mode, so driver don't need to do special transformation (does it require libxml?), until character streams are used. 3. JDBC4 exception throwing. 4. XML objects are readable only once, you can't reuse it, update form result set (silently set to null on RS.updateRow() - shouldn't be silent) returns null till refreshRow(), but you can write to them after load. 5.Target XML behavior is streaming behavior to don't repeat problems with bytea. * JDBC 4.1 * 1. Just started. * Others * 1. Few additional test cases. Few utils for XML checking (string equals is too less) no good, but better. 2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE fields, after changing default time zone. === Binary mode === 1. Read for almost all data types with arrays. 2. Write for few. 3. Much more restrictive checking when casting form one type to other. 4. Exceptions when casting from one type to other inproper type. 5. Still ResultSet.getString() for XML will return XML - this spec. prohibited (X - base type conversion, x - possible conversion, no x - no base and possible = no conversion). 6. No getPriviliges for metadata - no binary output for ACL!!! 7. Many, many tests passed. 8. Data reading is faster for all reads (checked with profiler, against original driver). Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gzis currently JDK 6 compatible (will be not), compressed patch takes about 136kb gziped. Kind regards have a nice day -- Radosław Smogura http://www.softperience.eu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DELETE with LIMIT (or my first hack)
Hi, actually introducing LIMIT and OFFSET with ORDER BY to DELETE/UPDATE would make it much easier to push data from one (say queue) table to another. And to fetch chunks of queue entries updating their status in one statement. Now I have to do SELECT...ORDER BY...LIMIT and then do some magic with arrays of IDs and updates/deletes or UPDATE ... WHERE id in (SELECT .. ORDER BY... LIMIT) RETURNING ... to make that work, but this is still possible to do with the WHERE clause, though I am not quite sure if that is most efficient in comparison to the direct approach. And speaking about pushing data from one table to another, what I really would like to be able to do would be also something like: INSERT INTO ... DELETE FROM ... WHERE... ORDER BY.. [LIMIT...] RETURNING...; this would be also quite efficient when re-arranging data in table partitions (though LIMIT/OFFSET there will be just nice to have possibility for reducing chunk sized of data being moved). Additionally we need quite often to clean up some log tables depending not on the timestamps but on the number of rows in that tables, so leaving only last N newest records in a table... OFFSET would be really cool to have for that usecase as well... With best regards, -- Valentine Gogichashvili
Re: [HACKERS] final patch - plpgsql: for-in-array
Hi, with the FOR e IN SELECT UNNEST(a) construct there is an issue again related to the unresting of composite type arrays: BEGIN; CREATE TYPE truple AS (i integer, a text, b text); DO $SQL$ DECLARE start_time timestamp; t truple; ta truple[] := ARRAY( select ROW(s.i, 'A' || (s.i)::text, 'B' || (s.i)::text )::truple from generate_series(1, 1) as s(i) ); i integer := 1; BEGIN start_time := clock_timestamp(); FOR t IN SELECT UNNEST(ta) LOOP raise info 't is %', t; i := i + 1; END LOOP; RAISE INFO 'looped in %', clock_timestamp() - start_time; END; $SQL$; ROLLBACK; fails with ERROR: invalid input syntax for integer: (1,A1,B1) CONTEXT: PL/pgSQL function inline_code_block line 8 at FOR over SELECT rows So to UNNEST such an array one has to SELECT * FROM UNNEST(a) to be able loop there like: BEGIN; CREATE TYPE truple AS (i integer, a text, b text); DO $SQL$ DECLARE start_time timestamp; t truple; ta truple[] := ARRAY( select ROW(s.i, 'A' || (s.i)::text, 'B' || (s.i)::text )::truple from generate_series(1, 1) as s(i) ); i integer := 1; BEGIN start_time := clock_timestamp(); FOR t IN SELECT * FROM UNNEST(ta) LOOP raise info 't is %', t; i := i + 1; END LOOP; RAISE INFO 'looped in %', clock_timestamp() - start_time; END; $SQL$; ROLLBACK; Is it a bug or a feature? And if the second, then any work on optimizing FOR e IN SELECT UNNEST(a) should probably include FOR e IN SELECT * FROM UNNEST(a) statement optimizations. Also, would the suggested FOR-IN-ARRAY construct loop in such a composite type arrays? Best regards, -- Valenine Gogichashvili On Thu, Nov 18, 2010 at 8:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Pavel Stehule pavel.steh...@gmail.com writes: 2010/11/18 Tom Lane t...@sss.pgh.pa.us: The problem here is that FOR is a syntactic choke point: it's already overloaded with several different sub-syntaxes that are quite difficult to separate. Adding another one makes that worse, with the consequences that we might misinterpret the user's intent, leading either to misleading/unhelpful error messages or unexpected runtime behavior. yes, this argument is correct - but we can rearange a parser rules related to FOR statement. It can be solved. No, it can't. The more things that can possibly follow FOR, the less likely that you correctly guess which one the user had in mind when faced with something that's not quite syntactically correct. Or maybe it *is* syntactically correct, only not according to the variant that the user thought he was invoking. We've seen bug reports of this sort connected with FOR already; in fact I'm pretty sure you've responded to a few yourself. Adding more variants *will* make it worse. We need a decent return on investment for anything we add here, and this proposal just doesn't offer enough benefit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Cannot make GIN intarray index be used by the planner
Hello again, I got the opclass for the index and it looks like it is a default one myvideoindex=# select pg_opclass.*, pg_type.typname myvideoindex-# from pg_index, pg_opclass, pg_type myvideoindex-# where pg_index.indexrelid = 'idx_nonnulls_myintarray_int4_gin'::regclass myvideoindex-#and pg_opclass.oid = any (pg_index.indclass::oid[] ) myvideoindex-#and pg_type.oid = pg_opclass.opcintype; opcamid | opcname | opcnamespace | opcowner | opcintype | opcdefault | opckeytype | typname -+---+--+--+---+++- 2742 | _int4_ops | 11 | 10 | 1007 | t | 23 | _int4 (1 row) The search_path is set to the following myvideoindex=# show search_path; search_path versionA, public (1 row) With best regards, -- Valentine On 5/9/07, Tom Lane [EMAIL PROTECTED] wrote: [cc'ing to pgsql-hackers since this is looking like a contrib/intarray bug] Valentine Gogichashvili [EMAIL PROTECTED] writes: here is the DT That works fine for me in 8.2: regression=# explain SELECT id, (myintarray_int4) FROM myintarray_table_nonulls WHERE ARRAY[8] @ myintarray_int4; QUERY PLAN -- Index Scan using idx_nonnulls_myintarray_int4_gin on myintarray_table_nonulls (cost=0.00..8.27 rows=1 width=36) Index Cond: ('{8}'::integer[] @ myintarray_int4) (2 rows) What I am betting is that you've installed contrib/intarray in this database and that's bollixed things up somehow. In particular, intarray tries to take over the position of default gin opclass for int4[], and the opclass that it installs as default has operators named just like the built-in ones. If somehow your query is using pg_catalog.@ instead of intarray's public.@, then the planner wouldn't think the index is relevant. In a quick test your example still works with intarray installed, because what it's really created is public.@ (integer[], integer[]) which is an exact match and therefore takes precedence over the built-in pg_catalog.@ (anyarray, anyarray). But if for example you don't have public in your search_path then the wrong operator would be chosen. Please look at the pg_index entry for your index, eg select * from pg_index where indexrelid = 'versionA.idx_nonnulls_myintarray_int4_gin'::regclass; and see whether the index opclass is the built-in one or not. Note to hackers: we've already discussed that intarray shouldn't be trying to take over the default gin opclass, but I am beginning to wonder if it still has a reason to live at all. We should at least consider removing the redundant operators to avoid risks like this one. regards, tom lane -- ვალენტინ გოგიჩაშვილი Valentine Gogichashvili