Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-22 Thread Valentine Gogichashvili


  PostgreSQL has a very powerful possibilities for storing any kind of
 encoding. So maybe it makes sense to add the ENCODING as another column
 property, the same way a COLLATION was added?


 Some other people in this community suggested that.  ANd the SQL standard
 suggests the same -- specifying a character encoding for each column:
 CHAR(n) CHARASET SET ch.


  Text operations should work automatically, as in memory all strings will
 be
 converted to the database encoding.

 This approach will also open a possibility to implement custom ENCODINGs
 for the column data storage, like snappy compression or even BSON, gobs or
 protbufs for much more compact type storage.


 Thanks for your idea that sounds interesting, although I don't understand
 that well.


The idea is very simple:

CREATE DATABASE utf8_database ENCODING 'utf8';

\c utf8_database

CREATE TABLE a(
  id serial,
  ascii_data text ENCODING 'ascii', -- will use ascii_to_utf8 to read and
utf8_to_ascii to write
  koi8_data text ENCODING 'koi8_r', -- will use koi8_r_to_utf8 to read and
utf8_to_koi8_r to write
  json_data json ENCODING 'bson' -- will use bson_to_json to read and
json_to_bson to write
);

The problem with bson_to_json here is that probably it will not be possible
to write JSON in koi8_r for example. But now it is also even not considered
in these discussions.

If the ENCODING machinery would get not only the encoding name, but also
the type OID, it should be possible to write encoders for TYPEs and array
of TYPEs (I had to do it using the casts to bytea and protobuff to minimize
the size of storage for an array of types when writing a lot of data, that
could be unpacked afterwords directly in the DB as normal database types).

I hope I made my point a little bit clearer.

Regards,

Valentine Gogichashvili


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Valentine Gogichashvili
Hi,


  That may be what's important to you, but it's not what's important to
 me.


 National character types support may be important to some potential users
 of PostgreSQL and the popularity of PostgreSQL, not me.  That's why
 national character support is listed in the PostgreSQL TODO wiki.  We might
 be losing potential users just because their selection criteria includes
 national character support.


the whole NCHAR appeared as hack for the systems, that did not have it from
the beginning. It would not be needed, if all the text would be magically
stored in UNICODE or UTF from the beginning and idea of character would be
the same as an idea of a rune and not a byte.

PostgreSQL has a very powerful possibilities for storing any kind of
encoding. So maybe it makes sense to add the ENCODING as another column
property, the same way a COLLATION was added?

It would make it possible to have a database, that talks to the clients in
UTF8 and stores text and varchar data in the encoding that is the most
appropriate for the situation.

It will make it impossible (or complicated) to make the database have a
non-UTF8 default encoding (I wonder who should need that in this case), as
conversions will not be possible from the broader charsets into the default
database encoding.

One could define an additional DATABASE property like LC_ENCODING that
would work for the ENCODING property of a column like LC_COLLATE for
COLLATE property of a column.

Text operations should work automatically, as in memory all strings will be
converted to the database encoding.

This approach will also open a possibility to implement custom ENCODINGs
for the column data storage, like snappy compression or even BSON, gobs or
protbufs for much more compact type storage.

Regards,

-- Valentine Gogichashvili


Re: [HACKERS] Thoughts on SELECT * EXCLUDING (...) FROM ...?

2011-11-02 Thread Valentine Gogichashvili
On Sun, Oct 30, 2011 at 22:12, Eric Ridge eeb...@gmail.com wrote:


 Yes.  It's basically a modifier to the star that immediately precedes
 it.  In order to support excluding multiple columns, it needs parens:
 SELECT foo.* EXCLUDING (foo.x, foo.y), bar.* EXCLUDING (bar.y),
 baz.z, (a+b) AS c

 but yes, that's what I'm thinking.  I think doing this will require
 more changes to the grammar than I had first thought because there'd
 be no point in supporting:
 SELECT foo.* EXCLUDING (foo.* EXCLUDING foo.y) FROM ...
 It looks like the above would be implicitly allowed without a bit of extra
 work.

 But, if you've got a complex query consisting of a few joins, it'd be
 nice to say:
 SELECT * EXCLUDING (table1.*, table2.x) FROM table1 INNER JOIN table2
 ...


Putting aside arguments like it is not a good idea to use * because it
generates not sustainable code especially in case when you extend table
structure,  I think this construct would be really nice for building ROWs,
for example in plpgsql triggers or in conditions for big update statements:

IF (NEW.* EXCLUDING ( last_modified ) ) IS DISTINCT FROM (OLD.* EXCLUDING (
last_modified ) ) THEN NEW.last_modified = clock_timestamp() ; END IF

by now, I do not know any really nice syntax to do that efficiently, and
for some wide tables, building this kind of structures listing all the
fields, that you have there is completely stupid and makes code unreadable.

So I would definitely like to have a syntax, that you are suggesting in
case it would give a possibility to construct ROWs (RECORDs).

Regards,

-- Valentine Gogichashvili


Re: [HACKERS] confusing invalid UTF8 byte sequence error

2011-08-29 Thread Valentine Gogichashvili

 Has anyone else ever found this error message confusing:
 ERROR:  22021: invalid byte sequence for encoding UTF8: 0xdb24
 I think what is really meant is better expressed like this:
 ERROR:  22021: invalid byte sequence for encoding UTF8: 0xdb 0x24
 Otherwise it looks like a codepoint or a 16-bit word (endianness?) or
 who knows what.


Talking about this error, I find the lack of context in this error message
more confusing... relative position or surrounding charachters would be
really helpful, when searching for a problem with encoding...

-- Valentine Gogichashvili


Re: [HACKERS] Backup's from standby

2011-08-19 Thread Valentine Gogichashvili
  What issue we may face if you take a backups(includes data dir + wal
files)
  at standby without LVM snapshot?

 The backup might be corrupted in arbitrary ways.


And what will happen, if one issues a pg_start_backup() on the master, then
takes a file-backup on slave, and issues pg_stop_backup() on master again?
As far as I remember this approach was working for me, considering, that all
needed WAL files are transferred to the newly created DB copy as well.

-- Valentine Gogichashvili


Re: [HACKERS] How to define global variable in postgresql

2011-08-19 Thread Valentine Gogichashvili
 Hello. How can we define a global variable in postgresql?

you can also use global structure in plpython for example:

http://www.postgresql.org/docs/9.0/static/plpython-sharing.html


Re: [HACKERS] Improved JDBC driver part 2

2010-12-01 Thread Valentine Gogichashvili
Hi,

I cannot get the file:

wget http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz
--2010-12-01 12:05:28--
http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz
Resolving www.rsmogura.net... 64.120.14.83
Connecting to www.rsmogura.net|64.120.14.83|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2010-12-01 12:05:29 ERROR 404: Not Found.


On Tue, Nov 30, 2010 at 7:49 PM, Radosław Smogura m...@smogura.eu wrote:

 Hello,

 Maybe you are interested about this what I done with JDBC

 === Original driver (Text mode) ===
 * Memory *
 1. Memory usage improvments when using result set input streams (no uneeded
 memory copy) - needs few touches for bigger performance.
 2. Memory usage improvments for large data, should be no problem to load
 1GB
 bytea[] when have only 300MB of memory (threshold size still hardcoded).

 * JDBC 4 *
 1. XML are now correctly transformed before send to server - previous
 version
 used normal text-file transformations that is not enaugh.
 2. In all modes (text/binary) XMLs are sended in binary mode, so driver
 don't
 need to do special transformation (does it require libxml?), until
 character
 streams are used.
 3. JDBC4 exception throwing.
 4. XML objects are readable only once, you can't reuse it, update form
 result
 set (silently set to null on RS.updateRow() - shouldn't be silent) returns
 null till refreshRow(), but you can write to them after load.
 5.Target XML behavior is streaming behavior to don't repeat problems with
 bytea.

 * JDBC 4.1 *
 1. Just started.

 * Others *
 1. Few additional test cases. Few utils for XML checking (string equals is
 too
 less) no good, but better.
 2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE
 fields,
 after changing default time zone.

 === Binary mode ===
 1. Read for almost all data types with arrays.
 2. Write for few.
 3. Much more restrictive checking when casting form one type to other.
 4. Exceptions when casting from one type to other inproper type.
 5. Still ResultSet.getString() for XML will return XML - this spec.
 prohibited
 (X - base type conversion, x - possible conversion, no x - no base and
 possible = no conversion).
 6. No getPriviliges for metadata - no binary output for ACL!!!
 7. Many, many tests passed.
 8. Data reading is faster for all reads (checked with profiler, against
 original driver).

 Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gzis
 currently JDK 6 compatible (will be not), compressed patch takes about
 136kb
 gziped.

 Kind regards  have a nice day
 --
 Radosław Smogura
 http://www.softperience.eu

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] DELETE with LIMIT (or my first hack)

2010-12-01 Thread Valentine Gogichashvili
Hi,

actually introducing LIMIT and OFFSET with ORDER BY to DELETE/UPDATE would
make it much easier to push data from one (say queue) table to another.
And to fetch chunks of queue entries updating their status in one statement.
Now I have to do SELECT...ORDER BY...LIMIT and then do some magic with
arrays of IDs and updates/deletes or UPDATE ... WHERE id in (SELECT .. ORDER
BY... LIMIT) RETURNING ... to make that work, but this is still possible to
do with the WHERE clause, though I am not quite sure if that is most
efficient in comparison to the direct approach. And speaking about pushing
data from one table to another, what I really would like to be able to do
would be also something like:

INSERT INTO ...
DELETE FROM ... WHERE... ORDER BY.. [LIMIT...]
RETURNING...;

this would be also quite efficient when re-arranging data in table
partitions (though LIMIT/OFFSET there will be just nice to have possibility
for reducing chunk sized of data being moved).

Additionally we need quite often to clean up some log tables depending not
on the timestamps but on the number of rows in that tables, so leaving only
last N newest records in a table... OFFSET would be really cool to have for
that usecase as well...

With best regards,

-- Valentine Gogichashvili


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-22 Thread Valentine Gogichashvili
Hi,

with the FOR e IN SELECT UNNEST(a) construct there is an issue again related
to the unresting of composite type arrays:

BEGIN;
CREATE TYPE truple AS (i integer, a text, b text);

DO $SQL$
DECLARE
  start_time timestamp;
  t truple;
  ta truple[] := ARRAY( select ROW(s.i, 'A' || (s.i)::text, 'B' ||
(s.i)::text )::truple from generate_series(1, 1) as s(i) );
  i integer := 1;
BEGIN
  start_time := clock_timestamp();
  FOR t IN SELECT UNNEST(ta) LOOP
raise info 't is %', t;
i := i + 1;
  END LOOP;
  RAISE INFO 'looped in %', clock_timestamp() - start_time;
END;
$SQL$;
ROLLBACK;

fails with ERROR:  invalid input syntax for integer: (1,A1,B1)
CONTEXT:  PL/pgSQL function inline_code_block line 8 at FOR over SELECT
rows

So to UNNEST such an array one has to SELECT * FROM UNNEST(a) to be able
loop there like:

BEGIN;
CREATE TYPE truple AS (i integer, a text, b text);

DO $SQL$
DECLARE
  start_time timestamp;
  t truple;
  ta truple[] := ARRAY( select ROW(s.i, 'A' || (s.i)::text, 'B' ||
(s.i)::text )::truple from generate_series(1, 1) as s(i) );
  i integer := 1;
BEGIN
  start_time := clock_timestamp();
  FOR t IN SELECT * FROM UNNEST(ta) LOOP
raise info 't is %', t;
i := i + 1;
  END LOOP;
  RAISE INFO 'looped in %', clock_timestamp() - start_time;
END;
$SQL$;
ROLLBACK;

Is it a bug or a feature? And if the second, then any work on optimizing FOR
e IN SELECT UNNEST(a) should probably include FOR e IN SELECT * FROM
UNNEST(a) statement optimizations.

Also, would the suggested FOR-IN-ARRAY construct loop in such
a composite type arrays?

Best regards,

-- Valenine Gogichashvili


On Thu, Nov 18, 2010 at 8:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Pavel Stehule pavel.steh...@gmail.com writes:
  2010/11/18 Tom Lane t...@sss.pgh.pa.us:
  The problem here is that FOR is a syntactic choke point: it's already
  overloaded with several different sub-syntaxes that are quite difficult
  to separate.  Adding another one makes that worse, with the consequences
  that we might misinterpret the user's intent, leading either to
  misleading/unhelpful error messages or unexpected runtime behavior.

  yes, this argument is correct - but we can rearange a parser rules
  related to FOR statement. It can be solved.

 No, it can't.  The more things that can possibly follow FOR, the less
 likely that you correctly guess which one the user had in mind when
 faced with something that's not quite syntactically correct.  Or maybe
 it *is* syntactically correct, only not according to the variant that
 the user thought he was invoking.  We've seen bug reports of this sort
 connected with FOR already; in fact I'm pretty sure you've responded to
 a few yourself.  Adding more variants *will* make it worse.  We need
 a decent return on investment for anything we add here, and this
 proposal just doesn't offer enough benefit.

regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] [PERFORM] Cannot make GIN intarray index be used by the planner

2007-05-10 Thread Valentine Gogichashvili

Hello again,

I got the opclass for the index and it looks like it is a default one

myvideoindex=# select pg_opclass.*, pg_type.typname
myvideoindex-#   from pg_index, pg_opclass, pg_type
myvideoindex-#  where pg_index.indexrelid =
'idx_nonnulls_myintarray_int4_gin'::regclass
myvideoindex-#and pg_opclass.oid = any (pg_index.indclass::oid[] )
myvideoindex-#and pg_type.oid = pg_opclass.opcintype;

opcamid |  opcname  | opcnamespace | opcowner | opcintype | opcdefault |
opckeytype | typname
-+---+--+--+---+++-
   2742 | _int4_ops |   11 |   10 |  1007 | t
| 23 | _int4
(1 row)

The search_path is set to the following

myvideoindex=# show search_path;
   search_path

versionA, public
(1 row)

With best regards,

-- Valentine

On 5/9/07, Tom Lane [EMAIL PROTECTED] wrote:


[cc'ing to pgsql-hackers since this is looking like a contrib/intarray
bug]

Valentine Gogichashvili [EMAIL PROTECTED] writes:
 here is the DT

That works fine for me in 8.2:

regression=#  explain SELECT id, (myintarray_int4)
  FROM myintarray_table_nonulls
WHERE ARRAY[8] @ myintarray_int4;
QUERY PLAN

--
Index Scan using idx_nonnulls_myintarray_int4_gin on
myintarray_table_nonulls  (cost=0.00..8.27 rows=1 width=36)
   Index Cond: ('{8}'::integer[] @ myintarray_int4)
(2 rows)

What I am betting is that you've installed contrib/intarray in this
database and that's bollixed things up somehow.  In particular, intarray
tries to take over the position of default gin opclass for int4[],
and the opclass that it installs as default has operators named just
like the built-in ones.  If somehow your query is using pg_catalog.@
instead of intarray's public.@, then the planner wouldn't think the
index is relevant.

In a quick test your example still works with intarray installed, because
what it's really created is public.@ (integer[], integer[]) which is
an exact match and therefore takes precedence over the built-in
pg_catalog.@ (anyarray, anyarray).  But if for example you don't have
public in your search_path then the wrong operator would be chosen.

Please look at the pg_index entry for your index, eg

select * from pg_index where indexrelid =
'versionA.idx_nonnulls_myintarray_int4_gin'::regclass;

and see whether the index opclass is the built-in one or not.

Note to hackers: we've already discussed that intarray shouldn't be
trying to take over the default gin opclass, but I am beginning to
wonder if it still has a reason to live at all.  We should at least
consider removing the redundant operators to avoid risks like this one.

regards, tom lane





--
ვალენტინ გოგიჩაშვილი
Valentine Gogichashvili