Re: [HACKERS] need info about extensibility in other databases

2017-11-08 Thread Oleg Bartunov
On Wed, Nov 8, 2017 at 2:37 PM, Li Song  wrote:
> Hi,
>
> When is the English version of "GiST programming tutorial" available?

Well, I wrote it many years ago, so it needs to be updated. For now,
you can use google translate, which seems works fine
https://translate.google.com/translate?sl=auto=en=y=_t=en=UTF-8=http%3A%2F%2Fwww.sai.msu.su%2F~megera%2Fpostgres%2Ftalks%2Fgist_tutorial.html=

Also, we have 7 papers about indexing, also in russian, check
https://habrahabr.ru/company/postgrespro/

>
> Regards,
> Song
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-05 Thread Oleg Bartunov
On Thu, Nov 2, 2017 at 6:02 PM, Craig Ringer  wrote:
> On 2 November 2017 at 17:41, Ildus Kurbangaliev
>  wrote:
>
>> In this patch compression methods is suitable for MAIN and EXTENDED
>> storages like in current implementation in postgres. Just instead only
>> of LZ4 you can specify any other compression method.
>
> We've had this discussion before.
>
> Please read the "pluggable compression support" thread. See you in a
> few days ;) sorry, it's kinda long.
>
> https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de
>

the proposed patch provides "pluggable" compression and let's user
decide by their own which algorithm to use.
The postgres core doesn't responsible for any patent problem.


> IIRC there were some concerns about what happened with pg_upgrade,
> with consuming precious toast bits, and a few other things.

yes, pg_upgrade may be a problem.

>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Oleg Bartunov
On Mon, Oct 30, 2017 at 12:05 PM, Oleg Bartunov <obartu...@gmail.com> wrote:
> On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
> <w...@imaginaryindustries.com> wrote:
>> Hi there!
>>
>> I'm looking at implementing a custom indexing scheme, and I've been having
>> trouble understanding the proper approach.
>>
>> Basically, I need a BK tree, which is a tree-structure useful for indexing
>> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
>> across the hamming edit-distance of perceptual hashes, for fuzzy image
>> searching). I'm pretty sure a SP-GiST index is the correct index type, as my
>> tree is intrinsically unbalanced.
>>
>> I have a functional stand-alone implementation of a BK-Tree, and it works
>> very well, but the complexity of managing what is basically a external index
>> for my database has reached the point where it's significantly problematic,
>> and it seems to be it should be moved into the database.
>>
>> Anyways, looking at the contents of postgres/src/backend/access/spgist, it
>> looks pretty straightforward in terms of the actual C implementation, but
>> I'm stuck understanding how to "install" a custom SP-GiST implementation.
>> There are several GiST indexing implementations in the contrib directory,
>> but no examples for how I'd go about implementing a loadable SP-GiST index.
>>
>> Basically, my questions are:
>>
>> Is it possible to implement a SP-GiST indexing scheme as a loadable module?
>>
>> If so, how?
>> And is there an example I can base my implementation off of?
>
> Look on RUM access method ( https://github.com/postgrespro/rum ) we
> developed using
> api available since 9.6.

or even simple, there is contrib/bloom access method, which illustrates
developing access method as an extension.

>
>
>>
>> I'm relatively comfortable with C (much moreso with C++), but I haven't
>> spent a lot of time looking at the postgresql codebase.  I don't think I
>> could start from a empty folder and make a properly-implemented module in
>> any reasonable period of time, so if I have a working example for some sort
>> of index that uses the same interfaces that would really help a lot.
>>
>> Thanks!
>> Connor


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Oleg Bartunov
On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
 wrote:
> Hi there!
>
> I'm looking at implementing a custom indexing scheme, and I've been having
> trouble understanding the proper approach.
>
> Basically, I need a BK tree, which is a tree-structure useful for indexing
> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> across the hamming edit-distance of perceptual hashes, for fuzzy image
> searching). I'm pretty sure a SP-GiST index is the correct index type, as my
> tree is intrinsically unbalanced.
>
> I have a functional stand-alone implementation of a BK-Tree, and it works
> very well, but the complexity of managing what is basically a external index
> for my database has reached the point where it's significantly problematic,
> and it seems to be it should be moved into the database.
>
> Anyways, looking at the contents of postgres/src/backend/access/spgist, it
> looks pretty straightforward in terms of the actual C implementation, but
> I'm stuck understanding how to "install" a custom SP-GiST implementation.
> There are several GiST indexing implementations in the contrib directory,
> but no examples for how I'd go about implementing a loadable SP-GiST index.
>
> Basically, my questions are:
>
> Is it possible to implement a SP-GiST indexing scheme as a loadable module?
>
> If so, how?
> And is there an example I can base my implementation off of?

Look on RUM access method ( https://github.com/postgrespro/rum ) we
developed using
api available since 9.6.


>
> I'm relatively comfortable with C (much moreso with C++), but I haven't
> spent a lot of time looking at the postgresql codebase.  I don't think I
> could start from a empty folder and make a properly-implemented module in
> any reasonable period of time, so if I have a working example for some sort
> of index that uses the same interfaces that would really help a lot.
>
> Thanks!
> Connor


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surjective functional indexes

2017-09-28 Thread Oleg Bartunov
On Thu, May 25, 2017 at 7:30 PM, Konstantin Knizhnik
 wrote:
> Right now Postgres determines whether update operation touch index or not
> based only on set of the affected columns.
> But in case of functional indexes such policy quite frequently leads to
> unnecessary index updates.
> For example, functional index are widely use for indexing JSON data:
> info->>'name'.
>
> JSON data may contain multiple attributes and only few of them may be
> affected by update.
> Moreover, index is used to build for immutable attributes (like "id",
> "isbn", "name",...).
>
> Functions like (info->>'name') are named "surjective" ni mathematics.
> I have strong feeling that most of functional indexes are based on
> surjective functions.
> For such indexes current Postgresql index update policy is very inefficient.
> It cause disabling of hot updates
> and so leads to significant degrade of performance.
>
> Without this patch Postgres is slower than Mongo on YCSB benchmark with (50%
> update,50 % select)  workload.
> And after applying this patch Postgres beats Mongo at all workloads.

I confirm that the patch helps for workload A of YCSB, but actually
just extends #clients, where postgres outperforms mongodb (see
attached picture).  If we increase #clients > 100 postgres quickly
degrades not only for workload A, but even for workload B (5%
updates), while mongodb and mysql behave much-much better, but this is
another problem, we will discuss in different thread.

>
> My proposal is to check value of function for functional indexes instead of
> just comparing set of effected attributes.
> Obviously, for some complex functions it may  have negative effect on update
> speed.
> This is why I have added "surjective" option to index. By default it is
> switched on for all functional indexes (based on my assumption
> that most functions used in functional indexes are surjective). But it is
> possible to explicitly disable it and make decision weather index
> needs to be updated or not only based on set of effected attributes.
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-25 Thread Oleg Bartunov
On Fri, Sep 22, 2017 at 3:51 PM, Peter Eisentraut
 wrote:
> On 9/21/17 11:24, Dmitry Dolgov wrote:
>> One last thing that I need to clarify. Initially there was an idea to
>> minimize changes in `pg_type`
>
> I see, but there is no value in that if it makes everything else more
> complicated.

+1



>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add Roman numeral conversion to to_number

2017-09-17 Thread Oleg Bartunov
On 3 Aug 2017 16:29, "Oliver Ford"  wrote:

Adds to the to_number() function the ability to convert Roman numerals
to a number. This feature is on the formatting.c TODO list. It is not
currently implemented in either Oracle, MSSQL or MySQL so gives
PostgreSQL an edge :-)


I see use of this in full text search as a dictionary. It's useful for
indexing and searching historical documents. Probably, better to have as
contrib.


==Usage==

Call: to_number(numerals, 'RN') or to_number(numerals, 'rn').

Example: to_number('MMMXIII', 'RN') returns 3013. to_number('xiv',
'rn') returns 14.

The function is case insensitive for the numerals. It accepts a mix of
cases and treats them the same. So  to_number ('MCI, 'rn') and
to_number ('McI', 'RN') both return 1101. The format mask must however
be either 'RN' or 'rn'. If there are other elements in the mask, those
other elements will be ignored. So to_number('MMM', 'FMRN') returns
3000.

Whitespace before the numerals is ignored.

==Validation==

The new function roman_to_int() in formatting.c performs the
conversion. It strictly validates the numerals based on the following
Roman-to-Arabic conversion rules:

1. The power-of-ten numerals (I, X, C, M) can be repeated up to three
times in a row. The beginning-with-5 numerals (V, L, D) can each
appear only once.

2. Subtraction from a power-of-ten numeral cannot occur if a
beginning-with-5 numeral appears later.

3. Subtraction cannot occur if the smaller numeral is less than a
tenth of the greater numeral (so IX is valid, but IC is invalid).

4. There cannot be two subtractions in a row.

5. A beginning-with-5 numeral cannot subtract.

If any of these rules are violated, an error is raised.

==Testing==

This has been tested on a Windows build of the master branch with
MinGW. The included regression tests positively test every value from
1 to 3999 (the Roman numeral max value) by calling the existing
to_char() function to get the Roman value, then converting it back to
an Arabic value. There are also negative tests for each invalid code
path and some positive mixed-case tests.

Documentation is updated to include this new feature.

==References==

http://sierra.nmsu.edu/morandi/coursematerials/RomanNumerals.html
Documents the strict Roman numeral standard.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-17 Thread Oleg Bartunov
On 16 Sep 2017 02:32, "Nikita Glukhov" <n.glu...@postgrespro.ru> wrote:

On 15.09.2017 22:36, Oleg Bartunov wrote:

On Fri, Sep 15, 2017 at 7:31 PM, Robert Haas <robertmh...@gmail.com> wrote:
>
>> On Fri, Sep 15, 2017 at 10:10 AM, Daniel Gustafsson <dan...@yesql.se>
>> wrote:
>>
>>> Can we expect a rebased version of this patch for this commitfest?
>>> Since it’s
>>> a rather large feature it would be good to get it in as early as we can
>>> in the
>>> process.
>>>
>> Again, given that this needs a "major" rebase and hasn't been updated
>> in a month, and given that the CF is already half over, this should
>> just be bumped to the next CF.  We're supposed to be trying to review
>> things that were ready to go by the start of the CF, not the end.
>>
> We are supporting v10 branch in our github repository
> https://github.com/postgrespro/sqljson/tree/sqljson_v10
>
> Since the first post we made a lot of changes, mostly because of
> better understanding the standard and availability of technical report
> (http://standards.iso.org/ittf/PubliclyAvailableStandards/c0
> 67367_ISO_IEC_TR_19075-6_2017.zip).
> Most important are:
>
> 1.We abandoned FORMAT support, which could confuse our users, since we
> have data types json[b].
>
> 2. We use XMLTABLE infrastructure, extended for  JSON_TABLE support.
>
> 3. Reorganize commits, so we could split one big patch by several
> smaller patches, which could be reviewed independently.
>
> 4. The biggest problem is documentation, we are working on it.
>
> Nikita will submit patches soon.
>

Attached archive with 9 patches rebased onto latest master.

0001-jsonpath-v02.patch:
 - jsonpath type
 - jsonpath execution on jsonb type
 - jsonpath operators for jsonb type
 - GIN support for jsonpath operators

0002-jsonpath-json-v02.patch:
 - jsonb-like iterators for json type
 - jsonpath execution on json type
 - jsonpath operators for json type

0003-jsonpath-extensions-v02.patch:
0004-jsonpath-extensions-tests-for-json-v02.patch:
 - some useful standard extensions with tests
 0005-sqljson-v02.patch:
 - SQL/JSON constructors (JSON_OBJECT[AGG], JSON_ARRAY[AGG])
 - SQL/JSON query functions (JSON_VALUE, JSON_QUERY, JSON_EXISTS)
 - IS JSON predicate

0006-sqljson-json-v02.patch:
 - SQL/JSON support for json type and tests

0007-json_table-v02.patch:
 - JSON_TABLE using XMLTABLE infrastructure

0008-json_table-json-v02.patch:
 - JSON_TABLE support for json type

0009-wip-extensions-v02.patch:
 - FORMAT JSONB
 - jsonb to/from bytea casts
 - jsonpath operators
 - some unfinished jsonpath extensions


Originally, JSON path was implemented only for jsonb type, and I decided to
add jsonb-like iterators for json type for json support implementation with
minimal changes in JSON path code.  This solution (see jsonpath_json.c from
patch 0002) looks a little dubious to me, so I separated json support into
independent patches.

The last WIP patch 0009 is unfinished and contains a lot of FIXMEs.  But
the ability to use arbitrary Postgres operators in JSON path with explicitly
specified  types is rather interesting, and I think it should be shown now
to get a some kind of pre-review.

We are supporting v11 and v10 branches in our github repository:

https://github.com/postgrespro/sqljson/tree/sqljson
https://github.com/postgrespro/sqljson/tree/sqljson_wip
https://github.com/postgrespro/sqljson/tree/sqljson_v10
https://github.com/postgrespro/sqljson/tree/sqljson_v10_wip


We provide web interface to our build
http://sqlfiddle.postgrespro.ru/#!21/



Attached patches can be produced simply by combining groups of consecutive
commits from these branches.

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-15 Thread Oleg Bartunov
On Fri, Sep 15, 2017 at 7:31 PM, Robert Haas  wrote:
> On Fri, Sep 15, 2017 at 10:10 AM, Daniel Gustafsson  wrote:
>> Can we expect a rebased version of this patch for this commitfest?  Since 
>> it’s
>> a rather large feature it would be good to get it in as early as we can in 
>> the
>> process.
>
> Again, given that this needs a "major" rebase and hasn't been updated
> in a month, and given that the CF is already half over, this should
> just be bumped to the next CF.  We're supposed to be trying to review
> things that were ready to go by the start of the CF, not the end.

We are supporting v10 branch in our github repository
https://github.com/postgrespro/sqljson/tree/sqljson_v10

Since the first post we made a lot of changes, mostly because of
better understanding the standard and availability of technical report
(http://standards.iso.org/ittf/PubliclyAvailableStandards/c067367_ISO_IEC_TR_19075-6_2017.zip).
Most important are:

1.We abandoned FORMAT support, which could confuse our users, since we
have data types json[b].

2. We use XMLTABLE infrastructure, extended for  JSON_TABLE support.

3. Reorganize commits, so we could split one big patch by several
smaller patches, which could be reviewed independently.

4. The biggest problem is documentation, we are working on it.

Nikita will submit patches soon.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-25 Thread Oleg Bartunov
On Mon, Jul 24, 2017 at 11:38 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jul 21, 2017 at 8:05 AM, Alexey Chernyshov
> <a.chernys...@postgrespro.ru> wrote:
>> the following patch transfers functionality from gevel module
>> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
>> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
>> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
>> GIN indexes.
>
> It's not clear from the web site in question that the relevant code is
> released under the PostgreSQL license.

git clone git://sigaev.ru/gevel

from README.gevel

License

Stable version, included into PostgreSQL distribution, released under
BSD license. Development version, available from this site, released under
the GNU General Public License, version 2 (June 1991)

We would be happy to write anything community likes :)

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-02 Thread Oleg Bartunov
On Mon, May 1, 2017 at 7:22 AM, Pavel Stehule  wrote:
>
>
> 2017-05-01 1:21 GMT+02:00 Andres Freund :
>>
>> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> > why we cannot to introduce GUC option - enable_cteoptfence ?
>>
>> Doesn't really solve the issue, and we've generally shied away from GUCs
>> that influence behaviour after a few bad experiences.  What if you want
>> one CTE inlined, but another one not?
>
>
> It change behave in same sense like enable_nestloop, enable_hashjoin, ...
> with same limits.

And then we recall  plan hints :)

>
> Regards
>
> Pavel
>
>>
>>
>> - Andres
>
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-31 Thread Oleg Bartunov
On 30 Mar 2017 23:43, "Dmitry Dolgov" <9erthali...@gmail.com> wrote:

On 31 March 2017 at 00:01, Andrew Dunstan 
wrote:
>
> I have just noticed as I was writing/testing the non-existent docs for
> this patch that it doesn't supply variants of to_tsvector that take a
> regconfig as the first argument. Is there a reason for that? Why
> should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version of the
patch (only the functions part)
to add those variants.


Congratulations with patch committed, who will write an addition
documentation? I think we need to touch  FTS and JSON parts.


Re: [HACKERS] [GSoC] Push-based query executor discussion

2017-03-22 Thread Oleg Bartunov
On Wed, Mar 22, 2017 at 8:04 PM, Arseny Sher  wrote:

> > While I admire your fearlessness, I think the chances of you being
> > able to bring a project of this type to a successful conclusion are
> > remote.  Here is what I said about this topic previously:
> >
> > http://postgr.es/m/CA+Tgmoa=kzHJ+TwxyQ+vKu21nk3prkRjSdbhjubN7qvc8UKuG
> g...@mail.gmail.com
>
> Well, as I said, I don't pretend that I will support full functionality:
> >> instead, we should decide which part of this work (if any) is
> >> going to be done in the course of GSoC. Probably, all TPC-H queries
> >> with and without index support is a good initial target, but this
> >> needs to be discussed.
>
> I think that successfull completion of this project should be a clear
> and justified answer to the question "Is this idea is good enough to
> work on merging it into the master?", not the production-ready patches
> themselves. Nevertheless, of course project success criterion must be
> reasonably formalized -- e.g. implement nodes X with features Y, etc.
>

How many GSoC slots and possible students we have ?

Should we reject this interesting project, which based on several years of
research work of academician group in the institute ? May be better help
him to reformulate the scope of project and let him work ? I don't know
exactly if the results of GSoC project should be committed , but as a
research project it's certainly would be useful for the community.


>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-13 Thread Oleg Bartunov
On Mon, Mar 13, 2017 at 9:24 AM, Nico Williams 
wrote:

> On Thu, Mar 09, 2017 at 07:12:07PM +0100, Sven R. Kunze wrote:
> > From my day-to-day work I can tell, the date(time) type is the only
> missing
> > piece of JSON to make it perfect for business applications (besides,
> maybe,
> > a "currency" type).
>
> And a binary type.  And a chunked-string type (to avoid having to escape
> strings).  And an interval type.  And...
>

Let's first have this basic implementation in postgres, then we'll add
extendability support not only for types, but also for operators.
Right now I see in our regression tests:

select _jsonpath_object(
'["10.03.2017 12:34 +1", "10.03.2017 12:35 +1", "10.03.2017 12:36 +1",
"10.03.2017 12:35 +2", "10.03.2017 12:35 -2"]',
'$[*].datetime("dd.mm. HH24:MI TZH") ? (@ < "10.03.2017 12:35
+1".datetime("dd.mm. HH24:MI TZH"))'
);
 _jsonpath_object
--
 "2017-03-10 14:34:00+03"
 "2017-03-10 13:35:00+03"
(2 rows)


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-11 Thread Oleg Bartunov
On Fri, Mar 10, 2017 at 7:07 AM, Petr Jelinek 
wrote:

> On 09/03/17 19:50, Peter van Hardenberg wrote:
> > Anecdotally, we just stored dates as strings and used a convention (key
> > ends in "_at", I believe) to interpret them. The lack of support for
> > dates in JSON is well-known, universally decried... and not a problem
> > the PostgreSQL community can fix.
> >
>
> The original complain was about JSON_VALUE extracting date but I don't
> understand why there is problem with that, the SQL/JSON defines that
> behavior. The RETURNING clause there is more or less just shorthand for
> casting with some advanced options.
>

There is no problem with serializing date and SQL/JSON describes it rather
well. There is no correct procedure to deserialize date from a correct json
string and the standards keeps silence about this and now we understand
that date[time] is actually virtual and the only use of them is in jsonpath
(filter) expressions.



>
> --
>   Petr Jelinek  http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Oleg Bartunov
On Wed, Mar 8, 2017 at 7:05 AM, David Steele  wrote:

> On 3/7/17 11:38 AM, Andres Freund wrote:
>
> <...>
>
> We have a plenty of time and we dedicate one full-time developer for
>>> this project.
>>>
>>
>> How about having that, and perhaps others, developer participate in
>> reviewing patches and getting to the bottom of the commitfest?  Should
>> we end up being done early, we can look at this patch...  There's not
>> been review activity corresponding to the amount of submissions from
>> pgpro...
>>
>
> This patch has been moved to CF 2017-07.
>

Yes, after committing XMLTABLE, we anyway need to extend its infrastructure
to support JSON_TABLE.


>
> --
> -David
> da...@pgmasters.net
>


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-08 Thread Oleg Bartunov
On Wed, Mar 8, 2017 at 12:43 AM, Sven R. Kunze  wrote:

> Hi,
>
> about the datetime issue: as far as I know, JSON does not define a
> serialization format for dates and timestamps.
>
> On the other hand, YAML (as a superset of JSON) already supports a
> language-independent date(time) serialization format (
> http://yaml.org/type/timestamp.html).
>
> I haven't had a glance into the SQL/JSON standard yet and a quick search
> didn't reveal anything. However, reading your test case here
> https://github.com/postgrespro/sqljson/blob/5a8a241/src/
> test/regress/sql/sql_json.sql#L411 it seems as if you intend to parse all
> strings in the form of "-MM-DD" as dates. This is problematic in case a
> string happens to look like this but is not intended to be a date.
>

SQL/JSON defines methods in jsonpath, in particularly,


| datetime  [  ] 
| keyvalue  

 ::=


datetime template is also specified in the standard (very rich)

 ::=
{  }...
 ::=

| 
 ::=

| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
 ::=

| 
| 
| 
| 
| 
| 
| 
 ::=
 | YYY | YY | Y
 ::=
 | RR
 ::=
MM
 ::=
DD
 ::=
DDD
 ::=
HH | HH12
 ::=
HH24
 ::=
MI
 ::=
SS
 ::=
S
 ::=
FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
 ::=
A.M. | P.M.
 ::=
TZH
 ::=
TZM



> Just for the sake of completeness: YAML solves this issue by omitting the
> quotation marks around the date string (just as JSON integers have no
> quotations marks around them).
>

interesting idea, but need to dig the standard first.


>
> Regards,
> Sven
>


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-07 Thread Oleg Bartunov
On Fri, Mar 3, 2017 at 11:49 PM, David Steele <da...@pgmasters.net> wrote:

> Hi Oleg,
>
> On 2/28/17 2:55 PM, Pavel Stehule wrote:
> > 2017-02-28 20:08 GMT+01:00 Oleg Bartunov <obartu...@gmail.com
> >
> > Attached patch is an implementation of SQL/JSON data model from
> > SQL-2016 standard (ISO/IEC 9075-2:2016(E)), which was published
> > 2016-12-15 and is available only for purchase from ISO web site
> > (https://www.iso.org/standard/63556.html
> > <https://www.iso.org/standard/63556.html>). Unfortunately I didn't
> > find any public sources of the standard or any preview documents,
> > but Oracle implementation of json support in 12c release 2 is very
> > close
> > (http://docs.oracle.com/database/122/ADJSN/json-in-
> oracle-database.htm
> > <http://docs.oracle.com/database/122/ADJSN/json-in-
> oracle-database.htm>),
> > also we used https://livesql.oracle.com/  to understand some
> details.
>
> <...>
>
> > This is last commitfest for current release cycle - are you sure, so is
> > good idea to push all mentioned features?
>
> Implementing standards is always a goal of the PostgreSQL community, but
> this is a very large patch arriving very late in the release cycle with
> no prior discussion.
>

We discussed this in Brussels, but I agree, the patch is rather big.


>
> That the patch proposed follows a standard which will not be available
> to the majority of reviewers is very worrisome, let alone the sheer
> size.  While much of the code is new, I see many changes to core data
> structures that could very easily be destabilizing.
>

I don't know when the standard will be publicly available.


>
> I propose we move this patch to the 2017-07 CF so further development
> and review can be done without haste and as the standard becomes more
> accessible.
>

I wanted to have one more  good feature in 10 and let postgres be on par
with other competitors.  SQL/JSON adds many interesting features and users
will be dissapointed if we postpone it for next two years.   Let's wait for
reviewers, probably they will find the patch is not very  intrusive. We
have a plenty of time and we dedicate one full-time developer for this
project.


>
> Regards,
> --
> -David
> da...@pgmasters.net
>


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-02-28 Thread Oleg Bartunov
On Tue, Feb 28, 2017 at 10:55 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

> Hi
>
>
> 2017-02-28 20:08 GMT+01:00 Oleg Bartunov <obartu...@gmail.com>:
>
>> Hi there,
>>
>>
>> Attached patch is an implementation of SQL/JSON data model from SQL-2016
>> standard (ISO/IEC 9075-2:2016(E)), which was published 2016-12-15 and is
>> available only for purchase from ISO web site (
>> https://www.iso.org/standard/63556.html). Unfortunately I didn't find
>> any public sources of the standard or any preview documents, but Oracle
>> implementation of json support in 12c release 2 is very close (
>> http://docs.oracle.com/database/122/ADJSN/json-in-oracle-database.htm),
>> also we used https://livesql.oracle.com/  to understand some details.
>>
>> Postgres has already two json data types - json and jsonb and
>> implementing another json data type, which strictly conforms the standard,
>> would be not a good idea. Moreover, SQL standard doesn’t describe data
>> type, but only data model, which “comprises SQL/JSON items and SQL/JSON
>> sequences. The components of the SQL/JSON data model are:
>>
>> 1) An SQL/JSON item is defined recursively as any of the following:
>>
>> a) An SQL/JSON scalar, defined as a non-null value of any of the
>> following predefined (SQL) types:
>>
>> character string with character set Unicode, numeric, Boolean, or
>> datetime.
>>
>> b) An SQL/JSON null, defined as a value that is distinct from any value
>> of any SQL type.
>>
>> NOTE 122 — An SQL/JSON null is distinct from the SQL null value.
>>
>> c) An SQL/JSON array, defined as an ordered list of zero or more SQL/JSON
>> items, called the SQL/JSON
>>
>> elements of the SQL/JSON array.
>>
>> d) An SQL/JSON object, defined as an unordered collection of zero or more
>> SQL/JSON members….
>>
>> “
>>
>> Our jsonb corresponds to SQL/JSON with UNIQUE KEYS and implicit ordering
>> of keys and our main intention was to provide support of jsonb as a most
>> important and usable data type.
>>
>> We created repository for reviewing (ask for write access) -
>> https://github.com/postgrespro/sqljson/tree/sqljson
>>
>> Examples of usage can be found in src/test/regress/sql/sql_json.sql
>>
>> The whole documentation about json support should be reorganized and
>> added, and we plan to do this before release. We need help of community
>> here.
>>
>> Our goal is to provide support of main features of SQL/JSON to release
>> 10, as we discussed at developers meeting in Brussels (Andrew Dunstan has
>> kindly agreed to review the patch).
>>
>> We had not much time to develop the complete support, because of standard
>> availability), but hope all major features are here, namely, all nine
>> functions as described in the standard (but see implementation notes below):
>>
>> “All manipulation (e.g., retrieval, creation, testing) of SQL/JSON items
>> is performed through a number of SQL/JSON functions. There are nine such
>> functions, categorized as SQL/JSON retrieval functions and SQL/JSON
>> construction functions. The SQL/JSON retrieval functions are characterized
>> by operating on JSON data and returning an SQL value (possibly a Boolean
>> value) or a JSON value. The SQL/JSON construction functions return JSON
>> data created from operations on SQL data or other JSON data.
>>
>> The SQL/JSON retrieval functions are:
>>
>> — : extracts an SQL value of a predefined type from
>> a JSON text.
>>
>> — : extracts a JSON text from a JSON text.
>>
>> — : converts a JSON text to an SQL table.
>>
>> — : tests whether a string value is or is not properly
>> formed JSON text.
>>
>> — : tests whether an SQL/JSON path expression
>> returns any SQL/JSON items.
>>
>> The SQL/JSON construction functions are:
>>
>> — : generates a string that is a serialization
>> of an SQL/JSON object.
>>
>> — : generates a string that is a serialization of
>> an SQL/JSON array.
>>
>> — : generates, from an aggregation of
>> SQL data, a string that is a serialization
>>
>> of an SQL/JSON object.
>>
>> — : generates, from an aggregation of
>> SQL data, a string that is a serialization
>>
>> of an SQL/JSON array.
>>
>> A JSON-returning function is an SQL/JSON construction function or
>> JSON_QUERY.”
>>
>> The standard describes SQL/JSON path language, which used by SQL/JSON
>> query operators to query JSON. It defines path lang

[HACKERS] SQL/JSON in PostgreSQL

2017-02-28 Thread Oleg Bartunov
Hi there,


Attached patch is an implementation of SQL/JSON data model from SQL-2016
standard (ISO/IEC 9075-2:2016(E)), which was published 2016-12-15 and is
available only for purchase from ISO web site (
https://www.iso.org/standard/63556.html). Unfortunately I didn't find any
public sources of the standard or any preview documents, but Oracle
implementation of json support in 12c release 2 is very close (
http://docs.oracle.com/database/122/ADJSN/json-in-oracle-database.htm),
also we used https://livesql.oracle.com/  to understand some details.

Postgres has already two json data types - json and jsonb and implementing
another json data type, which strictly conforms the standard, would be not
a good idea. Moreover, SQL standard doesn’t describe data type, but only
data model, which “comprises SQL/JSON items and SQL/JSON sequences. The
components of the SQL/JSON data model are:

1) An SQL/JSON item is defined recursively as any of the following:

a) An SQL/JSON scalar, defined as a non-null value of any of the following
predefined (SQL) types:

character string with character set Unicode, numeric, Boolean, or datetime.

b) An SQL/JSON null, defined as a value that is distinct from any value of
any SQL type.

NOTE 122 — An SQL/JSON null is distinct from the SQL null value.

c) An SQL/JSON array, defined as an ordered list of zero or more SQL/JSON
items, called the SQL/JSON

elements of the SQL/JSON array.

d) An SQL/JSON object, defined as an unordered collection of zero or more
SQL/JSON members….

“

Our jsonb corresponds to SQL/JSON with UNIQUE KEYS and implicit ordering of
keys and our main intention was to provide support of jsonb as a most
important and usable data type.

We created repository for reviewing (ask for write access) -
https://github.com/postgrespro/sqljson/tree/sqljson


Examples of usage can be found in src/test/regress/sql/sql_json.sql

The whole documentation about json support should be reorganized and added,
and we plan to do this before release. We need help of community here.

Our goal is to provide support of main features of SQL/JSON to release 10,
as we discussed at developers meeting in Brussels (Andrew Dunstan has
kindly agreed to review the patch).

We had not much time to develop the complete support, because of standard
availability), but hope all major features are here, namely, all nine
functions as described in the standard (but see implementation notes below):

“All manipulation (e.g., retrieval, creation, testing) of SQL/JSON items is
performed through a number of SQL/JSON functions. There are nine such
functions, categorized as SQL/JSON retrieval functions and SQL/JSON
construction functions. The SQL/JSON retrieval functions are characterized
by operating on JSON data and returning an SQL value (possibly a Boolean
value) or a JSON value. The SQL/JSON construction functions return JSON
data created from operations on SQL data or other JSON data.

The SQL/JSON retrieval functions are:

— : extracts an SQL value of a predefined type from a
JSON text.

— : extracts a JSON text from a JSON text.

— : converts a JSON text to an SQL table.

— : tests whether a string value is or is not properly
formed JSON text.

— : tests whether an SQL/JSON path expression
returns any SQL/JSON items.

The SQL/JSON construction functions are:

— : generates a string that is a serialization of
an SQL/JSON object.

— : generates a string that is a serialization of
an SQL/JSON array.

— : generates, from an aggregation of
SQL data, a string that is a serialization

of an SQL/JSON object.

— : generates, from an aggregation of SQL
data, a string that is a serialization

of an SQL/JSON array.

A JSON-returning function is an SQL/JSON construction function or
JSON_QUERY.”

The standard describes SQL/JSON path language, which used by SQL/JSON query
operators to query JSON. It defines path language as string literal. We
implemented the path language as  JSONPATH data type, since other
approaches are not friendly to planner and executor.

The functions and JSONPATH provide a new functionality for json support,
namely, ability to operate (in standard specified way) with json structure
at SQL-language level - the often requested feature by the users.

The patch is consists of about 15000 insertions (about 5000 lines are from
tests), passes all regression tests and doesn’t touches critical parts, so
we hope with community help to bring it to committable state.

Authors: Nikita Glukhov, Teodor Sigaev, Oleg Bartunov and Alexander Korotkov

Implementation notes:


   1.

   We didn’t implemented ‘datetime’ support, since it’s not clear from
   standard.
   2.

   JSON_OBJECT/JSON_OBJECTAGG (KEY  VALUE , ...) doesn’t
   implemented, only (:, …) and ( VALUE , …) are
   supported, because of  grammar conflicts with leading KEY keyword.
   3.

   FORMAT (JSON|JSONB))  in JSON_ARRAYAGG with subquery  doesn’t supported,
   because of grammar conflicts with non-reserved word FORMAT.
   4.

   JSONPATH

Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-02-28 Thread Oleg Bartunov
The proposed patch looks not very important, but I consider it as an
important feature, which Oracle and Microsoft already have, that's why I
asked Dmitry to work on this and made it before feature freeze. My comments
follows below the post.

On Tue, Feb 28, 2017 at 1:59 PM, Dmitry Dolgov <9erthali...@gmail.com>
wrote:

> Hi all
>
> I would like to propose patch with a set of new small functions for fts in
> case of
> jsonb data type:
>
> * to_tsvector(config, jsonb) - make a tsvector from all string values and
>   elements of jsonb object. To prevent the situation, when tsquery can
> find a
>   phrase consisting of lexemes from two different values/elements, this
>   function will add an increment to position of each lexeme from every new
>   value/element.
>
> * ts_headline(config, jsonb, tsquery, options) - generate a headline
> directly
>   from jsonb object
>
> Here are the examples how they work:
>
> ```
> =# select to_tsvector('{"a": "aaa bbb", "b": ["ccc ddd"], "c": {"d": "eee
> fff"}}'::jsonb);
>to_tsvector
> -
>  'aaa':1 'bbb':2 'ccc':4 'ddd':5 'eee':7 'fff':8
> (1 row)
>
>
> =# select ts_headline('english', '{"a": "aaa bbb", "b": {"c": "ccc
> ddd"}}'::jsonb, tsquery('bbb & ddd & hhh'), 'StartSel = <, StopSel = >');
>  ts_headline
> --
>  aaa  ccc 
> (1 row)
> ```
>

> Any comments or suggestions?
>

1. add json support
2. Its_headline  should returns the original json with highlighting.  As a
first try the proposed ts_headline  could be ok, probably need special
option.



> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-17 Thread Oleg Bartunov
On Tue, Jan 17, 2017 at 6:26 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Oleg Bartunov wrote:
> > Hi there,
> >
> > I just bought ISO/IEC 9075-2:2016
> > http://www.iso.org/iso/home/store/catalogue_tc/catalogue_
> detail.htm?csnumber=63556
> > to satisfy my interest on json support in SQL.  I am not ready to discuss
> > here implementation details, but there is one problem with the status of
> > this document, which is copyright protected.
> >
> > "All rights reserved. Unless otherwise specified, no part of this
> > publication may be reproduced or utilized otherwise in any form or by any
> > means, electronic or mechanical, including photocopying, or posting on
> the
> > internet or an intranet, without prior written permission. Permission can
> > be requested from either ISO at the address below or ISO’s member body in
> > the country of the requester."
> >
> > My question is should we ask ISO for permission to use my copy in our
> > community or  should we buy at least one another copy (for reviewer) ?
>
> With previous versions, we have relied on draft versions
> "indistinguishable from latest" posted by third parties.  I couldn't
> find such a version for sql2106 in an (admittedly very quick) search.
>

I found several mentions about such a draft, but failed to find any.  I
think we can wait until we have a prototype or somebody share a link.


> I doubt ISO would give the community permission to use your copy, but it
> wouldn't hurt to ask them.
>

> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


[HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-17 Thread Oleg Bartunov
Hi there,

I just bought ISO/IEC 9075-2:2016
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=63556
to satisfy my interest on json support in SQL.  I am not ready to discuss
here implementation details, but there is one problem with the status of
this document, which is copyright protected.

"All rights reserved. Unless otherwise specified, no part of this
publication may be reproduced or utilized otherwise in any form or by any
means, electronic or mechanical, including photocopying, or posting on the
internet or an intranet, without prior written permission. Permission can
be requested from either ISO at the address below or ISO’s member body in
the country of the requester."

My question is should we ask ISO for permission to use my copy in our
community or  should we buy at least one another copy (for reviewer) ?

Regards,

Oleg


Re: [HACKERS] [PATCH] Generic type subscription

2016-10-05 Thread Oleg Bartunov
On Wed, Oct 5, 2016 at 6:48 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:

> On 5 October 2016 at 03:00, Oleg Bartunov <obartu...@gmail.com> wrote:
>
>>
>> have you ever run 'make check' ?
>>
>> =
>>  53 of 168 tests failed.
>> =
>>
>>
> Sure, how else could I write tests for this feature? But right now on my
> machine
> everything is ok (the same for `make installcheck`):
>
> $ make check
> 
> ===
>  All 168 tests passed.
> ===
>

Oops, something was wrong in my setup :)


Re: [HACKERS] [PATCH] Generic type subscription

2016-10-04 Thread Oleg Bartunov
On Sat, Oct 1, 2016 at 12:52 PM, Dmitry Dolgov <9erthali...@gmail.com>
wrote:

> > I've tried to compile this patch with current state of master (commit
> > 51c3e9fade76c12)  and found out that, when configured with
> --enable-cassert,
> > it doesn't pass make check.
>
> Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can
> return
> expanded `jbvArray` and `jbvObject` instead `jbvBinary` in both cases. It's
> interesting, that this doesn't break anything, but obviously violates
> the `pushJsonbValueScalar` semantics. I don't think `ExecEvalExpr` should
> be
> changed for jsonb, we can handle this situation in `pushJsonbValue`
> instead. I've
> attached a new version of patch with this modification.
>
>
have you ever run 'make check' ?

=
 53 of 168 tests failed.
=



> On 27 September 2016 at 19:08, Victor Wagner  wrote:
>
>> On Fri, 9 Sep 2016 18:29:23 +0700
>> Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > Regarding to the previous conversations [1], [2], here is a patch
>> > (with some improvements from Nikita Glukhov) for the generic type
>> > subscription. It allows
>> > to define type-specific subscription logic for any data type and has
>> > implementations for the array and jsonb types. There are following
>> > changes in this
>> > patch:
>>
>> I've tried to compile this patch with current state of master (commit
>> 51c3e9fade76c12)  and found out that, when configured with
>> --enable-cassert, it doesn't pass make check.
>>
>> LOG:  server process (PID 4643) was terminated by signal 6: Aborted
>> DETAIL:  Failed process was running:
>> update test_jsonb_subscript set test_json['a'] = '{"b":
>> 1}'::jsonb;
>>
>>
>>
>> --
>>  Victor
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Better tracking of free space during SP-GiST index build

2016-09-25 Thread Oleg Bartunov
On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondra
 wrote:
> On 09/22/2016 07:37 PM, Tom Lane wrote:
>>
>> Tomas Vondra  writes:
>>
>>> ... I've tried increasing the cache size to 768
>>> entries, with vast majority of them (~600) allocated to leaf pages.
>>> Sadly, this seems to only increase the CREATE INDEX duration a bit,
>>> without making the index significantly smaller (still ~120MB).
>>
>>
>> Yeah, that's in line with my results: not much further gain from a
>> larger cache.  Though if you were testing with the same IRRExplorer
>> data, it's not surprising that our results would match.  Would be
>> good to try some other cases...
>>
>
> Agreed, but I don't have any other data sets at hand. One possibility would
> be to generate something randomly (e.g. it's not particularly difficult to
> generate random IP addresses), but I'd much rather use some real-world data
> sets.

Tomas, I have one real dataset, which I used for testing spgist
(https://www.postgresql.org/message-id/caf4au4zxd2xov0a__fu7xohxsiwjzm1z2xhs-ffat1dzb9u...@mail.gmail.com)
Let me know if you need it.

>
>>>
>>>
>>> One thing I'd change is making the SpGistLUPCache dynamic, i.e.
>>> storing the size and lastUsedPagesMap on the meta page. That
>>> should allow us resizing the cache and tweak lastUsedPagesMap in
>>> the future.
>>
>>
>> Yeah, probably a good idea. I had thought of bumping
>> SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but
>> keeping it as a separate field won't add noticeable cost, and it
>> might save some trouble.
>>
>
> I see you plan to track only the cache size, while I proposed to track also
> the map, i.e. number of pages per category. I think that'd useful in case we
> come up with better values (e.g. more entries for leaf pages), or even
> somewhat adaptive way.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: [Snowball-discuss] Greek stemmer

2016-08-25 Thread Oleg Bartunov
This is a chance to add default configuration for Greek language if
somebody with good knowledge could follow this development.

Oleg

-- Forwarded message --
From: Oleg Smirnov 
Date: Thu, Aug 25, 2016 at 5:26 PM
Subject: [Snowball-discuss] Greek stemmer
To: "snowball-discu." 


Hi all,

 I have implemented a stemmer for Modern Greek language [1] based on a
thesis by G. Ntais [2] with improvements proposed by S. Saroukos [3]

 I'm pretty new to Snowball so it will be great if someone could
review my code. Any feedback is much appreciated.

 1. https://github.com/snowballstem/snowball/pull/44
 2. http://sais.se/mthprize/2007/ntais2007.pdf
 3. http://tampub.uta.fi/bitstream/handle/10024/80480/gradu03463.pdf

--
Regards,
Oleg Smirnov

___
Snowball-discuss mailing list
snowball-disc...@lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index Onlys Scan for expressions

2016-08-16 Thread Oleg Bartunov
On Tue, Aug 16, 2016 at 1:03 AM, Ildar Musin  wrote:
> Hi, hackers!
>
> There is a known issue that index only scan (IOS) can only work with simple
> index keys based on single attributes and doesn't work with index
> expressions. In this patch I propose a solution that adds support of IOS for
> index expressions. Here's an example:
>
> create table abc(a int, b int, c int);
> create index on abc ((a * 1000 + b), c);
>
> with t1 as (select generate_series(1, 1000) as x),
>  t2 as (select generate_series(0, 999) as x)
> insert into abc(a, b, c)
> select t1.x, t2.x, t2.x from t1, t2;
> vacuum analyze;
>
> Explain results with the patch:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a * 1000 +
> b = 1001;
>QUERY PLAN
> -
>  Index Only Scan using abc_expr_c_idx on abc  (cost=0.42..4.45 rows=1
> width=4) (actual time=0.032..0.033 rows=1 loops=1)
>Index Cond: a * 1000) + b)) = 1001)
>Heap Fetches: 0
>Buffers: shared hit=4
>  Planning time: 0.184 ms
>  Execution time: 0.077 ms
> (6 rows)
>
> Before the patch it was:
>
> explain (analyze, buffers) select a * 1000 + b + c from abc where a * 1000 +
> b = 1001;
>  QUERY PLAN
> 
>  Index Scan using abc_expr_c_idx on abc  (cost=0.42..8.45 rows=1 width=4)
> (actual time=0.039..0.041 rows=1 loops=1)
>Index Cond: (((a * 1000) + b) = 1001)
>Buffers: shared hit=4
>  Planning time: 0.177 ms
>  Execution time: 0.088 ms
> (5 rows)
>
> This solution has limitations though: the restriction or the target
> expression tree (or its part) must match exactly the index. E.g. this
> expression will pass the check:
>
> select a * 1000 + b + 100 from ...
>
> but this will fail:
>
> select 100 + a * 1000 + b from ...
>
> because the parser groups it as:
>
> (100 + a * 1000) + b
>
> In this form it won't match any index key. Another case is when we create
> index on (a+b) and then make query like 'select b+a ...' or '... where b+a =
> smth' -- it won't match. This applies to regular index scan too. Probably it
> worth to discuss the way to normalize index expressions and clauses and work
> out more convenient way to match them.

pg_operator.oprcommutative ?

> Anyway, I will be grateful if you take a look at the patch in attachment.
> Any comments and tips are welcome.
>
> Thanks!
>
> --
> Ildar Musin
> i.mu...@postgrespro.ru
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-11 Thread Oleg Bartunov
On Tue, Aug 9, 2016 at 9:59 PM, Ryan Pedela  wrote:
>
>

>  I would say that it is worth it to have a "phrase slop" operator (Apache
> Lucene terminology). Proximity search is extremely useful for improving
> relevance and phrase slop is one of the tools to achieve that.
>

It'd be great if you explain what is "phrase slop". I assume it's not
about search, but about relevance.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forthcoming SQL standards about JSON and Multi-Dimensional Arrays (FYI)

2016-07-01 Thread Oleg Bartunov
On Wed, Jun 29, 2016 at 2:51 AM, Stefan Keller  wrote:
> Hi,
>
> FYI: I'd just like to point you to following two forthcoming standard
> parts from "ISO/IEC JTS 1/SC 32" comittee: one on JSON, and one on
> "Multi-Dimensional Arrays" (SQL/MDA).
>
> They define there some things different as already in PG. See also
> Peter Baumann's slides [1] and e.g. [2]

I' very dissapointed with this.


>
> :Stefan
>
> [1] 
> https://www.unibw.de/inf4/professors/geoinformatics/agile-2016-workshop-gis-with-nosql
> [2] 
> http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: About CMake v2

2016-06-30 Thread Oleg Bartunov
On Wed, Jun 29, 2016 at 7:23 PM, Yury Zhuravlev
 wrote:
> Hello Hackers.
>
> I decided to talk about the current state of the project:
> 1. Merge with 9.6 master. 2. plpython2, plpython3, plperl, pltcl, plsql all
> work correctly (all tests pass).
> 3. Works done for all contrib modules. 4. You can use gettext, .po->.mo will
> have converted by CMake.  5. All test pass under some Linux, FreeBSD,
> Solaris10 (on Sparc), Windows MSVC 2015. MacOSX I think not big trouble too.
> 6. Prototype for PGXS (with MSVC support) done.
> I think is very big progress but I came across one problem.
> I not have access to many OS and platforms. For each platform need tests and
> small fixes. I can't develop and give guarantee without it.

can we use our buildfarm infrastructure for this, Andrew ?

>
> I think this is very important work which makes it easier further support
> Postgres but I can not do everything himself. It's physically impossible.
>
> I think without community support I can't do significantly more.
>
> Current version you can get here: https://github.com/stalkerg/postgres_cmake
>
> Thanks!
> --
> Yury Zhuravlev
>
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gin index on array of uuid

2016-06-29 Thread Oleg Bartunov
On Wed, Jun 29, 2016 at 6:17 AM, M Enrique 
wrote:

> What's a good source code entry point to review how this is working for
> anyarray currently? I am new to the postgres code. I spend some time
> looking for it but all I found is the following (which I have not been able
> to decipher yet).
>

Look on https://commitfest.postgresql.org/4/145/


>
> [image: pasted1]
>
> Thank you,
> Enrique
>
>
>
> On Tue, Jun 21, 2016 at 12:20 PM Tom Lane  wrote:
>
>> Enrique MailingLists  writes:
>> > Currently creating an index on an array of UUID involves defining an
>> > operator class. I was wondering if this would be a valid request to add
>> as
>> > part of the uuid-ossp extension? This seems like a reasonable operator
>> to
>> > support as a default for UUIDs.
>>
>> This makes me itch, really, because if we do this then we should logically
>> do it for every other add-on type.
>>
>> It seems like we are not that far from being able to have just one GIN
>> opclass on "anyarray".  The only parts of this declaration that are
>> UUID-specific are the comparator function and the storage type, both of
>> which could be gotten without that much trouble, one would think.
>>
>> > Any downsides to adding this as a default?
>>
>> Well, it'd likely break things at dump/reload time for people who had
>> already created a competing "default for _uuid" opclass manually.  I'm not
>> entirely sure, but possibly replacing the core opclasses with a single one
>> that is "default for anyarray" could avoid such failures.  We'd have to
>> figure out ambiguity resolution rules.
>>
>> regards, tom lane
>>
>


[HACKERS] Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-28 Thread Oleg Bartunov
On Tue, Jun 28, 2016 at 7:00 PM, Oleg Bartunov <obartu...@gmail.com> wrote:
> On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <n...@leadboat.com> wrote:
>> On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
>>> On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
>>> > On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
>>> > > On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
>>> > > > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
>>> > > > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
>>> > > > > > [Action required within 72 hours.  This is a generic 
>>> > > > > > notification.]
>>> > > > > >
>>> > > > > > The above-described topic is currently a PostgreSQL 9.6 open 
>>> > > > > > item.  Teodor,
>>> > > > > > since you committed the patch believed to have created it, you 
>>> > > > > > own this open
>>> > > > > > item.  If some other commit is more relevant or if this does not 
>>> > > > > > belong as a
>>> > > > > > 9.6 open item, please let us know.  Otherwise, please observe the 
>>> > > > > > policy on
>>> > > > > > open item ownership[1] and send a status update within 72 hours 
>>> > > > > > of this
>>> > > > > > message.  Include a date for your subsequent status update.  
>>> > > > > > Testers may
>>> > > > > > discover new open items at any time, and I want to plan to get 
>>> > > > > > them all fixed
>>> > > > > > well in advance of shipping 9.6rc1.  Consequently, I will 
>>> > > > > > appreciate your
>>> > > > > > efforts toward speedy resolution.  Thanks.
>>> > > > > >
>>> > > > > > [1] 
>>> > > > > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
>>> > > > >
>>> > > > > This PostgreSQL 9.6 open item is past due for your status update.  
>>> > > > > Kindly send
>>> > > > > a status update within 24 hours, and include a date for your 
>>> > > > > subsequent status
>>> > > > > update.  Refer to the policy on open item ownership:
>>> > > > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
>>> > > >
>>> > > >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long 
>>> > > >past due
>>> > > >for your status update.  Please reacquaint yourself with the policy on 
>>> > > >open
>>> > > >item ownership[1] and then reply immediately.  If I do not hear from 
>>> > > >you by
>>> > > >2016-06-16 07:00 UTC, I will transfer this item to release management 
>>> > > >team
>>> > > >ownership without further notice.
>>> > > >
>>> > > >[1] 
>>> > > >http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
>>> > >
>>> > > I'm working on it right now.
>>> >
>>> > That is good news, but it is not a valid status update.  In particular, it
>>> > does not specify a date for your next update.
>>>
>>> You still have not delivered the status update due thirteen days ago.  If I 
>>> do
>>> not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, 
>>> or
>>> if this item ever again becomes overdue for a status update, I will transfer
>>> the item to release management team ownership.
>>
>> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
>> committer like to take ownership?  I see Teodor committed some things 
>> relevant
>> to this item just today, so the task may be as simple as verifying that those
>> commits resolve the item.  If this role interests you, please read this 
>> thread
>> and the policy linked above, then send an initial status update bearing a 
>> date
>> for your subsequent status update.  If the item does not have a permanent
>> owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all 
>> phrase
>> search commits.
>
> Teodor pushed three patches, two of th

[HACKERS] Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-28 Thread Oleg Bartunov
On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch  wrote:
> On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
>> On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
>> > On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
>> > > On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
>> > > > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
>> > > > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
>> > > > > > [Action required within 72 hours.  This is a generic notification.]
>> > > > > >
>> > > > > > The above-described topic is currently a PostgreSQL 9.6 open item. 
>> > > > > >  Teodor,
>> > > > > > since you committed the patch believed to have created it, you own 
>> > > > > > this open
>> > > > > > item.  If some other commit is more relevant or if this does not 
>> > > > > > belong as a
>> > > > > > 9.6 open item, please let us know.  Otherwise, please observe the 
>> > > > > > policy on
>> > > > > > open item ownership[1] and send a status update within 72 hours of 
>> > > > > > this
>> > > > > > message.  Include a date for your subsequent status update.  
>> > > > > > Testers may
>> > > > > > discover new open items at any time, and I want to plan to get 
>> > > > > > them all fixed
>> > > > > > well in advance of shipping 9.6rc1.  Consequently, I will 
>> > > > > > appreciate your
>> > > > > > efforts toward speedy resolution.  Thanks.
>> > > > > >
>> > > > > > [1] 
>> > > > > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
>> > > > >
>> > > > > This PostgreSQL 9.6 open item is past due for your status update.  
>> > > > > Kindly send
>> > > > > a status update within 24 hours, and include a date for your 
>> > > > > subsequent status
>> > > > > update.  Refer to the policy on open item ownership:
>> > > > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
>> > > >
>> > > >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long 
>> > > >past due
>> > > >for your status update.  Please reacquaint yourself with the policy on 
>> > > >open
>> > > >item ownership[1] and then reply immediately.  If I do not hear from 
>> > > >you by
>> > > >2016-06-16 07:00 UTC, I will transfer this item to release management 
>> > > >team
>> > > >ownership without further notice.
>> > > >
>> > > >[1] 
>> > > >http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
>> > >
>> > > I'm working on it right now.
>> >
>> > That is good news, but it is not a valid status update.  In particular, it
>> > does not specify a date for your next update.
>>
>> You still have not delivered the status update due thirteen days ago.  If I 
>> do
>> not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, 
>> or
>> if this item ever again becomes overdue for a status update, I will transfer
>> the item to release management team ownership.
>
> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
> committer like to take ownership?  I see Teodor committed some things relevant
> to this item just today, so the task may be as simple as verifying that those
> commits resolve the item.  If this role interests you, please read this thread
> and the policy linked above, then send an initial status update bearing a date
> for your subsequent status update.  If the item does not have a permanent
> owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
> search commits.

Teodor pushed three patches, two of them fix the issues discussed in
this topic (working with duplicates and disable fallback to & for
stripped tsvector)
 and the one about precedence of phrase search tsquery operator, which
was discussed in separate thread
(https://www.postgresql.org/message-id/flat/576AB63C.7090504%40sigaev.ru#576ab63c.7090...@sigaev.ru)

They all look good, but need small documentation patch. I will provide it later.



>
> Thanks,
> nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-09 Thread Oleg Bartunov
On Thu, Jun 9, 2016 at 12:47 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Oleg Bartunov <obartu...@gmail.com> writes:
>> On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Another thing I noticed: if you test with tsvectors that don't contain
>>> position info, <-> seems to reduce to &, that is it doesn't enforce
>>> relative position:
>
>> yes, that's documented behaviour.
>
> Oh?  Where?  I've been going through the phrase-search documentation and
> copy-editing it today, and I have not found this stated anywhere.

Hmm, looks like it is missing.  We have told about this since 2008. Just found
http://www.sai.msu.su/~megera/postgres/talks/2009.pdf (slide 5) and
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf (slide 27)

We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.


>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane  wrote:
> Jean-Pierre Pelletier  writes:
>> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
>> matching consecutive words but it won't work for us if it cannot handle
>> consecutive *duplicate* words.
>
>> For example, the following returns true:select
>> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
>
>> Is this expected ?
>
> I concur that that seems like a rather useless behavior.  If we have
> "x <-> y" it is not possible to match at distance zero, while if we
> have "x <-> x" it seems unlikely that the user is expecting us to
> treat that identically to "x".  So phrase search simply should not
> consider distance-zero matches.

what's about word with several infinitives

select to_tsvector('en', 'leavings');
  to_tsvector

 'leave':1 'leavings':1
(1 row)

select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
 ?column?
--
 t
(1 row)


>
> The attached one-liner patch seems to fix this problem, though I am
> uncertain whether any other places need to be changed to match.
> Also, there is a regression test case that changes:
>
> *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 
> 19:21:17 2016
> --- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 
> 17:55:41 2016
> ***
> *** 897,903 
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>ts_rank_cd
>   
> !   0.0714286
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
> --- 897,903 
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>ts_rank_cd
>   
> !   0
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
>
>
> I'm not sure if this case is intentionally exhibiting the behavior that
> both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
> result simply wasn't thought about carefully.
>
> regards, tom lane
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane  wrote:
> Another thing I noticed: if you test with tsvectors that don't contain
> position info, <-> seems to reduce to &, that is it doesn't enforce
> relative position:
>
> regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
>  ?column?
> --
>  t
> (1 row)
>
> regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
>  ?column?
> --
>  t
> (1 row)

yes, that's documented behaviour.


>
> I'm doubtful that this is a good behavior, because it seems like it can
> silently mask mistakes.  That is, applying <-> to a stripped tsvector
> seems like user error to me.  Actually throwing an error might be too
> much, but perhaps we should make such cases return false not true?

it's question of convention. Probably, returning false will quickly
indicate user
on his error, so such behaviour looks better.

>
> (This is against HEAD, without the patch I suggested yesterday.
> It strikes me that that patch might change this behavior, if the
> lexemes are all being treated as having position zero, but I have
> not checked.)

I didn't see the patch yet.

>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 9:01 PM, Jean-Pierre Pelletier
<jppellet...@e-djuster.com> wrote:
> If instead of casts, functions to_tsvector() and to_tsquery() are used,
> then the results is (I think ?) as expected:

because to_tsvector() function returns positions of words.

>
> select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
> 'cat <-> rat');
> or
> select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
> 'cat <-> rat');
> returns "false"
>
> select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
> 'cat <-> rat');
> returns "true"
>
> Jean-Pierre Pelletier
>
> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Wednesday, June 8, 2016 1:12 PM
> To: Teodor Sigaev; Oleg Bartunov
> Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
> to_tsvector('simple', 'blue') be true ?
>
> Another thing I noticed: if you test with tsvectors that don't contain
> position info, <-> seems to reduce to &, that is it doesn't enforce
> relative position:
>
> regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
> ?column?
> --
>  t
> (1 row)
>
> regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
> ?column?
> --
>  t
> (1 row)
>
> I'm doubtful that this is a good behavior, because it seems like it can
> silently mask mistakes.  That is, applying <-> to a stripped tsvector
> seems like user error to me.  Actually throwing an error might be too
> much, but perhaps we should make such cases return false not true?
>
> (This is against HEAD, without the patch I suggested yesterday.
> It strikes me that that patch might change this behavior, if the lexemes
> are all being treated as having position zero, but I have not checked.)
>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Precedence of new phrase search tsquery operator

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 7:13 PM, Tom Lane  wrote:
> It appears that the new <-> operator has been made to have exactly the
> same grammatical precedence as the existing & (AND) operator.  Thus,
> for example, 'a & b <-> c'::tsquery means something different from
> 'b <-> c & a'::tsquery:
>
> regression=# select 'a & b <-> c'::tsquery;
>   tsquery
> ---
>  ( 'a' <-> 'c' ) & ( 'b' <-> 'c' )
> (1 row)
>
> regression=# select 'b <-> c & a'::tsquery;
> tsquery
> ---
>  ( 'b' <-> 'c' ) & 'a'
> (1 row)
>
> I find this surprising.  My intuitive feeling is that <-> ought to
> bind tighter than & (and therefore also tighter than |).  What's
> the reasoning for making it act like this?

ah, now we remember :)   The idea about equivalence of  & and <->
operators appeared in situation when <-> degenerates to & in case of
absence of positional information. Looks like we mixed different
things, will fix.

>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Precedence of new phrase search tsquery operator

2016-06-08 Thread Oleg Bartunov
On Wed, Jun 8, 2016 at 7:13 PM, Tom Lane  wrote:
> It appears that the new <-> operator has been made to have exactly the
> same grammatical precedence as the existing & (AND) operator.  Thus,
> for example, 'a & b <-> c'::tsquery means something different from
> 'b <-> c & a'::tsquery:
>
> regression=# select 'a & b <-> c'::tsquery;
>   tsquery
> ---
>  ( 'a' <-> 'c' ) & ( 'b' <-> 'c' )
> (1 row)
>
> regression=# select 'b <-> c & a'::tsquery;
> tsquery
> ---
>  ( 'b' <-> 'c' ) & 'a'
> (1 row)
>
> I find this surprising.  My intuitive feeling is that <-> ought to
> bind tighter than & (and therefore also tighter than |).  What's
> the reasoning for making it act like this?

I don't remember, but it looks like a bug. I found another issue with that

If some dictionary returns two infinitives, like:

select * from to_tsquery('en','leavings');
  to_tsquery
--
 'leavings' | 'leave'
(1 row)


then following query looks like a bug

select to_tsquery('en', 'aa & leavings <-> tut');
to_tsquery
---
 ( 'aa' <-> 'tut' ) & ( 'leavings' <-> 'tut' | 'leave' <-> 'tut' )
(1 row)

It should be definitely

select to_tsquery('en', 'aa & leavings <-> tut');
to_tsquery
---
  'aa'  & ( 'leavings' <-> 'tut' | 'leave' <-> 'tut' )
(1 row)

so, yes, <-> should be more tight than &.

>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an alternate syntax for Phrase Search

2016-05-26 Thread Oleg Bartunov
On Thu, May 26, 2016 at 3:00 PM, Josh berkus  wrote:
> On 05/22/2016 06:53 PM, Teodor Sigaev wrote:
>>
>>> to_tsquery(' Berkus & "PostgreSQL Version 10.0" ')
>>>
>>> ... would be equivalent to:
>>>
>>> to_tsquery(' Berkus & ( PostgreSQL <-> version <-> 10.0 )')
>>
>> select to_tsquery('Berkus') && phraseto_tsquery('PostgreSQL Version 10.0');
>> does it as you wish
>
> Aha, you didn't mention this in your presentation.  That seems plenty
> good enough for 9.6.

Will add this to the slides.


>
> --
> --
> Josh Berkus
> Red Hat OSAS
> (any opinions are my own)
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Just-in-time compiling things (was: asynchronous and vectorized execution)

2016-05-15 Thread Oleg Bartunov
On Sat, May 14, 2016 at 12:10 PM, Andreas Seltenreich
 wrote:
> Konstantin Knizhnik writes:
>
>> Latest information from ISP RAS guys: them have made good progress
>> since February: them have rewritten most of methods of Scan, Aggregate
>> and Join to LLVM API.
>
> Is their work available somewhere?  I'm experimenting in that area as
> well, although I'm using libFirm instead of LLVM.  I wonder what their
> motivation to rewrite backend code in LLVM IR was, since I am following
> the approach of keeping the IR around when compiling the vanilla
> postgres C code, possibly inlining it during JIT and then doing
> optimizations on this IR.  That way the logic doesn't have to be
> duplicated.

I have discussed availability of their work and the consensus was that
eventually their code will be open source, but not right now, since it
is not ready to be  published. I'll meet (after PGCon)  their
management staff and discuss how we can work together.

>
> regrads
> Andreas
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Oleg Bartunov
On Wed, May 11, 2016 at 5:20 PM, Bruce Momjian  wrote:
> I am giving a keynote at an IEEE database conference in Helsinki next
> week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> because I accepted the Helsinki conference invitation before the PGCon
> Ottawa date was changed from June to May).
>
> As part of the keynote, I would like to mention areas where academia can
> help us.  The topics I can think of are:
>
> Query optimization
> Optimizer statistics
> Indexing structures
> Reducing function call overhead
> CPU locality
> Sorting
> Parallelism
> Sharding
>
> Any others?

machine learning  for adaptive planning
distributed stuff

>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Ancient Roman grave inscription +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Html parsing and inline elements

2016-04-30 Thread Oleg Bartunov
On Wed, Apr 13, 2016 at 6:57 PM, Marcelo Zabani  wrote:

> Hi, Tom,
>
> You're right, I don't think one can argue that the default parser should
> know HTML.
> How about your suggestion of there being an HTML parser, is it feasible? I
> ask this because I think that a lot of people store HTML documents these
> days, and although there probably aren't lots of HTML with words written
> along multiple inline elements, it would certainly be nice to have a proper
> parser for these use cases.
>
> What do you think?
>

I think it could be useful separate parser. But the problem is how to fully
utilize it to facilitate ranking, for example, words in title could be
considered more important than in the body, etc. Currently, setweight()
functions provides this separately from parser.

Parser outputs tokid and token:

select * from ts_parse('default','parsertext');
 tokid |  token
---+--
13 | 
 1 | parser
13 | 
13 | 
 1 | text
13 | 
(6 rows)

If we change parser to output also rank flag, then we could use it to
assign different weights.



>
> On Wed, Apr 13, 2016 at 11:09 AM, Tom Lane  wrote:
>
>> Marcelo Zabani  writes:
>> > I was here wondering whether HTML parsing should separate tokens that
>> are
>> > not separated by spaces in the original text, but are separated by an
>> > inline element. Let me show you an example:
>>
>> > *SELECT to_tsvector('english', 'Helloneighbor, you are
>> > nice')*
>> > *Results:** "'ce':7 'hello':1 'n':5 'neighbor':2"*
>>
>> > "Hello" and "neighbor" should really be separated, because ** is a
>> block
>> > element, but "nice" should be a single word there, since there is no
>> visual
>> > separation when rendered (** and ** are inline elements).
>>
>> I can't imagine that we want to_tsvector to know that much about HTML.
>> It doesn't, really, even have license to assume that its input *is*
>> HTML.  So even if you see things that look like  and  in the
>> string, it could easily be XML or SGML or some other SGML-like markup
>> format with different semantics for the markup keywords.
>>
>> Perhaps it'd be sane to do something like this as long as the
>> HTML-specific behavior was broken out into a separate function.
>> (Or maybe it could be done within to_tsvector as a separate parser
>> or separate dictionary?)  But I don't think it should be part of
>> the default behavior.
>>
>> regards, tom lane
>>
>
>


Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-30 Thread Oleg Bartunov
On Fri, Apr 29, 2016 at 7:40 PM, Joshua D. Drake <j...@commandprompt.com>
wrote:

> On 04/29/2016 08:44 AM, Bruce Momjian wrote:
>
>> On Tue, Apr 12, 2016 at 11:07:04PM +0300, Oleg Bartunov wrote:
>>
>>> Our roadmap http://www.postgresql.org/developer/roadmap/ is the
>>> problem. We
>>> don't have clear roadmap and that's why we cannot plan future feature
>>> full
>>> release. There are several postgres-centric companies, which have most of
>>> developers, who do all major contributions. All these companies has their
>>> roadmaps, but not the community.
>>>
>>
>> I would be concerned if company roadmaps overtly affected the community
>> roadmap.  In general, I find company roadmaps to be very short-sighted
>> and quickly changed based on the demands of specific users/customers ---
>> something we don't want to imitate.
>>
>> We do want company roadmaps to affect the community roadmap, but in a
>> healthy, long-term way, and I think, in general, that is happening.
>>
>>
> The roadmap is not the problem it is the lack of cooperation. Many
> companies are now developing features in a silo and then presenting them to
> the community. Instead we should be working with those companies to have
> them develop transparently so others can be a part of the process.
>

We are working on our roadmap to have it in form to be presented to the
community. I think we'll publish it somewhere in wiki.


>
> If the feature is going to be submitted to core anyway (or open source)
> why wouldn't we just do that? Why wouldn't EDB develop directly within the
> Pg infrastructure. Why wouldn't we build teams around the best and
> brightest between EDB, 2Q and Citus?
>

This is what I suggested.  Features considered to be open source could be
discussed and developed together.



>
> Egos.
>
> Consider PgLogical, who is working on this outside of 2Q? Where is the git
> repo for it? Where is the bug tracker? Where is the mailing list? Oh, its
> -hackers, except that it isn't, is it?
>
> It used to be that everyone got together and worked together before the
> patch review process. Now it seems like it is a competition between
> companies to see whose ego can get the most inflated via press releases
> because they developed X for Y.
>
>
git log says better than any press releases :)


> If the companies were to come together and truly recognize that profit is
> the reward not the goal then our community would be much stronger for it.


I'd not limited by the companies, individual developes are highly welcome.
I'm afraid there are some.


>
>
> Sincerely,
>
> JD
>
>
> --
> Command Prompt, Inc.  http://the.postgres.company/
> +1-503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Everyone appreciates your honesty, until you are honest with them.
>


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-19 Thread Oleg Bartunov
On Tue, Apr 19, 2016 at 1:49 PM, Simon Riggs  wrote:

> On 12 April 2016 at 06:51, Tom Lane  wrote:
>
>> Craig Ringer  writes:
>> > The other area where there's room for extension without throwing out the
>> > whole thing and rebuilding is handling of new top-level statements. We
>> can
>> > probably dispatch the statement text to a sub-parser provided by an
>> > extension that registers interest in that statement name when we
>> attempt to
>> > parse it and fail. Even then I'm pretty sure it won't be possible to do
>> so
>> > while still allowing multi-statements. I wish we didn't support
>> > multi-statements, but we're fairly stuck with them.
>>
>> Well, as I said, I've been there and done that.  Things get sticky
>> when you notice that those "new top-level statements" would like to
>> contain sub-clauses (e.g. arithmetic expressions) that should be defined
>> by the core grammar.  And maybe the extension would also like to
>> define additions to the expression grammar, requiring a recursive
>> callback into the extension.  It gets very messy very fast.
>
>
> As Tom says, we can't easily break it down into multiple co-operating
> pieces, so lets forget that as unworkable.
>
> What is possible is a whole new grammar... for example if we imagine
>
>  SET client_language_path = 'foo, postgresql'
>
> Works similar to search_path, but not userset. We try to parse incoming
> statements against the foo parser first, if that fails we try postgresql.
> The default setting would be simply 'postgresql', so no match -> syntax
> error.
>
>
that's interesting. I'd add parse_error_handler, which actually processes
syntax error.

SET client_language_path = 'foo, postgresql, parse_error_handler'


> We could make that easier by making the postgresql parser a plugin itself.
> So to produce a new one you just copy the files, modify them as needed then
> insert a new record into pg_language as an extension.
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Oleg Bartunov
On Tue, Apr 12, 2016 at 9:25 PM, Josh berkus  wrote:

> On 04/12/2016 10:43 AM, Robert Haas wrote:
> > 1. Large backward compatibility breaks are bad.  Therefore, if any of
> > these things are absolutely impossible to do without major
> > compatibility breaks, we shouldn't do them at all.
>
> +1
>
> > 2. Small backward compatibility breaks are OK, but don't require doing
> > anything special to the version number.
>
> +1
>
> > 3. There's no value in aggregating many small backward compatibility
> > breaks into a single release.  That increases pain for users, rather
> > than decreasing it, and slows down development, too, because you have
> > to wait for the special magic release where it's OK to hose users.  We
> > typically have a few small backward compatibility breaks in each
> > release, and that's working fine, so I see little reason to change it.
>
> +1
>
> > 4. To the extent that I can guess what the things on Simon's list
> > means from what he wrote, and that's a little difficult because his
> > descriptions were very short, I think that everything on that list is
> > either (a) a bad idea or (b) something that we can do without any
> > compatibility break at all.
>
> +1
>
> Here's the features I can imagine being worth major backwards
> compatibility breaks:
>
> 1. Fully pluggable storage with a clean API.
>
> 2. Total elimination of VACUUM or XID freezing
>
> 3. Fully transparent-to-the user MM replication/clustering or sharding.
>
> 4. Perfect partitioning (i.e. transparent to the user, supports keys &
> joins, supports expressions on partition key, etc.)
>
> 5. Transparent upgrade-in-place (i.e. allowing 10.2 to use 10.1's tables
> without pg_upgrade or other modification).
>
> 6. Fully pluggable parser/executor with a good API
>
> That's pretty much it.  I can't imagine anything else which would
> justify imposing a huge upgrade barrier on users.  And, I'll point out,
> that in the above list:
>
> * nobody is currently working on anything in core except #4.
>

We are working on #3 (HA multimaster).


>
> * we don't *know* that any of the above items will require a backwards
> compatibility break.
>
> People keep talking about "we might want to break compatibility/file
> format one day".  But nobody is working on anything which will and
> justifies it.
>

Our roadmap http://www.postgresql.org/developer/roadmap/ is the problem. We
don't have clear roadmap and that's why we cannot plan future feature full
release. There are several postgres-centric companies, which have most of
developers, who do all major contributions. All these companies has their
roadmaps, but not the community. I think 9.6 release is inflection point,
where we should combine our roadmaps and release the one for the community.
Than we could plan releases and our customers will see what to expect. I
can't say for other companies, but we have big demand for many features
from russian customers and we have to compete with other databases. Having
community roadmap will helps us to work with customers and plan our
resources.


>
> --
> --
> Josh Berkus
> Red Hat OSAS
> (any opinions are my own)
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-04-01 Thread Oleg Bartunov
On Fri, Apr 1, 2016 at 5:24 PM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > Teodor Sigaev wrote:
> >> may be <=>? it isn't used anywhere yet.
> >>
> >> select 'fat'::tsquery <=> 'cat';
> >> select 'fat <=> cat'::tsquery;
> >> select 'fat <3> cat'::tsqyery; -- for non-default distance.
>
> > Dunno.  That looks pretty "relationalish".
>
> The existing precedent in the geometric types is to use <->
> as an operator denoting distance.  Perhaps <-> amd <3>
> would work here.
>
>
looks like we get consensus about arrows.


> regards, tom lane
>


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-04-01 Thread Oleg Bartunov
On Thu, Mar 31, 2016 at 9:14 PM, Alvaro Herrera 
wrote:

> What led you to choose the ? operator for the FOLLOWED BY semantics?
> It doesn't seem a terribly natural choice -- most other things seems to
> use ? as some sort of wildcard.  What about something like "...", so you
> would do
>   SELECT q @@ to_tsquery('fatal ... error');
> and
>   SELECT q @@ (tsquery 'fatal' ... tsquery 'error');
>
>
originally was $, but then we change it to ?, we don't remember why. During
warming-up this morning we came to other suggestion

SELECT q @@ to_tsquery('fatal <> error');
and
SELECT q @@ to_tsquery('fatal <2> error');

How about this ?



> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Draft release notes for next week's releases

2016-03-29 Thread Oleg Bartunov
On Mon, Mar 28, 2016 at 6:08 PM, Robert Haas  wrote:

> On Mon, Mar 28, 2016 at 10:24 AM, Tom Lane  wrote:
> > I'm also not exactly convinced by your implicit assumption that ICU is
> > bug-free.
>

> Noah spent some time looking at ICU back when he was EnterpriseDB, and
> his conclusion was that ICU collations weren't stable across releases,
> which is pretty much the same problem we're running into with glibc
> collations.  Now it might still be true that they have the equivalent
> of strxfrm() and strcoll() and that those things behave consistently
> with each other, and that would be very good.  Everybody seems to
> agree it's faster, and that's good, too.  But I wonder what we do
> about the fact that, as with glibc, an ICU upgrade involves a REINDEX
> of every potentially affected index.  It seems like ICU has some
> facilities built into it that might be useful for detecting and
> handling such situations, but I don't understand them well enough to
> know whether they'd solve our versioning problems or how effectively
> they would do so, and I think there are packaging issues that tie into
> it, too.  http://userguide.icu-project.org/design mentions building
> with specific configure flags if you need to link with multiple server
> versions, and I don't know what operating system packagers typically
> do about that stuff.
>
> In any case, I agree that we'd be very unwise to think that ICU is
> necessarily going to be bug-free without testing it carefully.
>

agree.

In other thread I wrote:
"Ideally, we should benchmarking all locales on all platforms for all kind
indexes. But that's  big project."


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Draft release notes for next week's releases

2016-03-29 Thread Oleg Bartunov
On Mon, Mar 28, 2016 at 1:36 PM, Thomas Kellerer <spam_ea...@gmx.net> wrote:

> Oleg Bartunov-2 wrote
> > But still, icu provides us abbreviated keys and collation stability,
>
> Does include ICU mean that collation handling is identical across
> platforms?
> E.g. a query on Linux involving string comparison would yield the same
> result on MacOS and Windows?
>

Yes, it does and that's the most important issue for us.


>
> If that is the case I'm all for it.
>
> Currently the different behaviour in handling collation aware string
> comparisons is a bug in my eyes from a user's perspective. I do understand
> and can accept the technical reasons for that, but it still feels odd that
> a
> query yields different results (with identical data) just because it runs
> on
> a different platform.
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Draft-release-notes-for-next-week-s-releases-tp5895357p5895484.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Dealing with collation and strcoll/strxfrm/etc

2016-03-29 Thread Oleg Bartunov
On Mon, Mar 28, 2016 at 5:57 PM, Stephen Frost <sfr...@snowman.net> wrote:

> All,
>
> Changed the thread name (we're no longer talking about release
> notes...).
>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Oleg Bartunov <obartu...@gmail.com> writes:
> > > Should we start thinking about ICU ?
> >
> > Isn't it still true that ICU fails to meet our minimum requirements?
> > That would include (a) working with the full Unicode character range
> > (not only UTF16) and (b) working with non-Unicode encodings.  No doubt
> > we could deal with (b) by inserting a conversion, but that would take
> > a lot of shine off the performance numbers you mention.
> >
> > I'm also not exactly convinced by your implicit assumption that ICU is
> > bug-free.
>
> We have a wiki page about ICU.  I'm not sure that it's current, but if
> it isn't and people are interested then perhaps we should update it:
>
> https://wiki.postgresql.org/wiki/Todo:ICU
>
>
Good point, I forget about this page.



> If we're going to talk about minimum requirements, I'd like to argue
> that we require whatever system we're using to have versioning (which
> glibc currently lacks, as I understand it...) to avoid the risk that
> indexes will become corrupt when whatever we're using for collation
> changes.  I'm pretty sure that's already bitten us on at least some
> RHEL6 -> RHEL7 migrations in some locales, even forgetting the issues
> with strcoll vs. strxfrm.
>

agree.


>
> Regarding key abbreviation and performance, if we are confident that
> strcoll and strxfrm are at least independently internally consistent
> then we could consider offering an option to choose between them.
> We'd need to identify what each index was built with to do so, however,
> as they would need to be rebuilt if the choice changes, at least
> until/unless they're made to reliably agree.  Even using only one or the
> other doesn't address the versioning problem though, which is a problem
> for all currently released versions of PG and is just going to continue
> to be an issue.
>

Ideally, we should benchmarking all locales on all platforms for all kind
indexes. But that's  big project.


>
> Thanks!
>
> Stephen
>


Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Oleg Bartunov
On Mon, Mar 28, 2016 at 2:06 PM, Peter Geoghegan <p...@heroku.com> wrote:

> On Mon, Mar 28, 2016 at 12:55 AM, Oleg Bartunov <obartu...@gmail.com>
> wrote:
> >  We'll post the patch.
>
> Cool.
>
> > Teodor made something to get abbreviated keys work as
> > I remember. I should say, that 27x improvement I got on my macbook. I
> will
> > check on linux.
>
> I think that Linux will be much faster. The stxfrm() blob produced by
> Mac OSX will have a horribly low concentration of entropy. For an 8
> byte Datum, you get only 2 distinguishing bytes. It's really, really
> bad. Mac OSX probably makes very little use of strxfrm() in practice;
> there are proprietary APIs that do something similar, but all using
> UTF-16 only.
>

Yes, Linux is much-much faster, I see no difference in performance using
latest icu 57_1.
I tested on Ubuntu 14.4.04.  But still, icu provides us abbreviated keys
and collation stability,
so let's add --with-icu.


>
> --
> Peter Geoghegan
>


Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Oleg Bartunov
On Mon, Mar 28, 2016 at 1:21 PM, Peter Geoghegan <p...@heroku.com> wrote:

> On Mon, Mar 28, 2016 at 12:08 AM, Oleg Bartunov <obartu...@gmail.com>
> wrote:
> > Should we start thinking about ICU ? I compare Postgres with ICU and
> without
> > and found 27x improvement in btree index creation for russian strings.
> This
> > includes effect of abbreviated keys and ICU itself. Also, we'll get
> system
> > independent locale.
>
> I think we should. I want to develop a detailed proposal before
> talking about it more, though, because the idea is controversial.
>
> Did you use the FreeBSD ports patch? Do you have your own patch that
> you could share?
>

 We'll post the patch. Teodor made something to get abbreviated keys work
as
I remember. I should say, that 27x improvement I got on my macbook. I will
check on linux.

>
> I'm not surprised that ICU is so much faster, especially now that
> UTF-8 is not a second class citizen (it's been possible to build ICU
> to specialize all its routines to handle UTF-8 for years now). As you
> may know, ICU supports partial sort keys, and sort key compression,
> which may have also helped:
> http://userguide.icu-project.org/collation/architecture
>

>
> That page also describes how binary sort keys are versioned, which
> allows them to be stored on disk. It says "A common example is the use
> of keys to build indexes in databases". We'd be crazy to trust Glibc
> strxfrm() to be stable *on disk*, but ICU already cares deeply about
> the things we need to care about, because it's used by other database
> systems like DB2, Firebird, and in some configurations SQLite [1].
>
> Glibc strxfrm() is not great with codepoints from the Cyrillic
> alphabet -- it seems to store 2 bytes per code-point in the primary
> weight level. So ICU might also do better in your test case for that
> reason.
>

Yes, I see on this page, that ICU is ~3 times faster for russian text.
http://site.icu-project.org/charts/collation-icu4c48-glibc


>
> [1]
> https://www.sqlite.org/src/artifact?ci=trunk=ext/icu/README.txt
> --
> Peter Geoghegan
>


Re: [HACKERS] Draft release notes for next week's releases

2016-03-28 Thread Oleg Bartunov
On Mar 28, 2016 09:44, "Peter Geoghegan"  wrote:
>
> On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane  wrote:
> > Probably the most discussion-worthy item is whether we can say
> > anything more about the strxfrm mess.  Should we make a wiki
> > page about that and have the release note item link to it?
>
> I think that there is an argument against doing so, which is that
> right now, all we have to offer on that are weasel words. However, I'm
> still in favor of a Wiki page, because I would not be at all surprised
> if our understanding of this problem evolved, and we were able to
> offer better answers in several weeks. Realistically, it will probably
> take at least that long before affected users even start to think
> about this.

Should we start thinking about ICU ? I compare Postgres with ICU and
without and found 27x improvement in btree index creation for russian
strings. This includes effect of abbreviated keys and ICU itself. Also,
we'll get system independent locale.
>
>
> --
> Peter Geoghegan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in searching path in jsonb_set when walking through JSONB array

2016-03-23 Thread Oleg Bartunov
On Wed, Mar 23, 2016 at 6:37 AM, Vitaly Burovoy 
wrote:

> Hello, Hackers!
>
> While I was reviewed a patch with "json_insert" function I found a bug
> which wasn't connected with the patch and reproduced at master.
>
> It claims about non-integer whereas input values are obvious integers
> and in an allowed range.
> More testing lead to understanding it appears when numbers length are
> multiplier of 4:
>
> postgres=# select jsonb_set('{"a":[[],1,2,3],"b":[]}', '{"a", }',
> '"4"');
> ERROR:  path element at the position 2 is not an integer
>

Hmm, I see in master

select version();
 version
-
 PostgreSQL 9.6devel on x86_64-apple-darwin15.4.0, compiled by Apple LLVM
version 7.3.0 (clang-703.0.29), 64-bit
(1 row)

select jsonb_set('{"a":[[],1,2,3],"b":[]}', '{"a", }', '"4"');
 jsonb_set

 {"a": [[], 1, 2, 3, "4"], "b": []}
(1 row)




>
> postgres=# select jsonb_set('{"a":[[],1,2,3],"b":[]}', '{"b", 1000}',
> '"4"');
> ERROR:  path element at the position 2 is not an integer
>
> postgres=# select jsonb_set('{"a":[[],1,2,3],"b":[]}', '{"a", -999}',
> '"4"');
> ERROR:  path element at the position 2 is not an integer
>
> postgres=# select jsonb_set('{"a":[[],1,2,3],"b":[]}', '{"a",
> 1000}', '"4"');
> ERROR:  path element at the position 2 is not an integer
>
> Close values are ok:
> postgres=# select jsonb_set('{"a":[[],1,2,3]}', '{"a", 0, 999}', '"4"');
> jsonb_set
> -
>  {"a": [["4"], 1, 2, 3]}
> (1 row)
>
> postgres=# select jsonb_set('{"a":[[],1,2,3]}', '{"a", 0, 1}', '"4"');
> jsonb_set
> -
>  {"a": [["4"], 1, 2, 3]}
> (1 row)
>
>
> Research lead to setPathArray where a string which is got via
> VARDATA_ANY but is passed to strtol which expects cstring.
>
> In case of string number length is not a multiplier of 4 rest bytes
> are padding by '\0', when length is a multiplier of 4 there is no
> padding, just garbage after the last digit of the value.
>
> Proposed patch in an attachment fixes it.
>
> There is a magic number "20" as a length of an array for copying key
> from a path before passing it to strtol. It is a maximal length of a
> value which can be parsed by the function. I could not find a proper
> constant for it. Also I found similar direct value in the code (e.g.
> in numeric.c).
>
> I've added a comment, I hope it is enough for it.
>
>
> --
> Best regards,
> Vitaly Burovoy
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Background Processes and reporting

2016-03-15 Thread Oleg Bartunov
On Tue, Mar 15, 2016 at 7:43 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas 
> wrote:
>
>> On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund 
>> wrote:
>> > On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
>> >> > I have already shown [0, 1] the overhead of measuring timings in
>> linux on
>> >> > representative workload. AFAIK, these tests were the only one that
>> showed
>> >> > any numbers. All other statements about terrible performance have
>> been and
>> >> > remain unconfirmed.
>> >>
>> >> Of course, those numbers are substantial regressions which would
>> >> likely make it impractical to turn this on on a heavily-loaded
>> >> production system.
>> >
>> > A lot of people operating production systems are fine with trading a <=
>> > 10% impact for more insight into the system; especially if that
>> > configuration can be changed without a restart.  I know a lot of systems
>> > that use pg_stat_statements, track_io_timing = on, etc; just to get
>> > that. In fact there's people running perf more or less continuously in
>> > production environments; just to get more insight.
>> >
>> > I think it's important to get as much information out there without
>> > performance overhead, so it can be enabled by default. But I don't think
>> > it makes sense to not allow features in that cannot be enabled by
>> > default, *if* we tried to make them cheap enough beforehand.
>>
>> Hmm, OK.  I would have expected you to be on the other side of this
>> question, so maybe I'm all wet.  One point I am concerned about is
>> that, right now, we have only a handful of types of wait events.  I'm
>> very interested in seeing us add more, like I/O and client wait.  So
>> any overhead we pay here is likely to eventually be paid in a lot of
>> places - thus it had better be extremely small.
>>
>
> OK. Let's start to produce light, not heat.
>
> As I get we have two features which we suspect to introduce overhead:
> 1) Recording parameters of wait events which requires some kind of
> synchronization protocol.
> 2) Recording time of wait events because time measurements might be
> expensive on some platforms.
>
> Simultaneously there are machines and workloads where both of these
> features doesn't produce measurable overhead.  And, we're talking not about
> toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic)
> internet companies in the world.  They do run both of this features in
> production highload database without noticing any overhead of them.
>

> It would be great progress, if we decide that we could add both of these
> features controlled by GUC (off by default).
>

enable_waits_statistics ?


>
> If we decide so, then let's start working on this. At first, we should
> construct list of machines and workloads for testing. Each list of machines
> and workloads would be not comprehensive. But let's find something that
> would be enough for testing of GUC controlled, off by default features.
> Then we can turn our conversation from theoretical thoughts to particular
> benchmarks which would be objective and convincing to everybody.
>

Vladimir, could you provide a test suite, so other people could measure
overhead on their machines ?




>
> Otherwise, let's just add these features to the list of unwanted
> functionality and close this question.
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-15 Thread Oleg Bartunov
On Mon, Mar 14, 2016 at 9:01 PM, David Steele  wrote:

> On 2/15/16 10:29 AM, Teodor Sigaev wrote:
>
> It's very pity but author is not able to continue work on this patch,
>> and I would like to raise this flag.
>>
>> I'd like to add some comments about patches:
>>
>> traversalValue patch adds arbitrary value assoсiated with branch in
>> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
>> it havn't to be a regular pgsql type. Also, it could be used independly
>> from reconstructedValue. This patch is used both following attached
>> patches.
>>
>> range patch just switchs using reconstructedValue to traversalValue in
>> range opclasses. reconstructedValue was used just because it was an
>> acceptable workaround in case of range type. Range opclase stores a full
>> value in leafs and doesn't need to use reconstructedValue to return
>> tuple in index only scan.
>> See http://www.postgresql.org/message-id/5399.1343512...@sss.pgh.pa.us
>>
>> q4d patch implements index over boxes using SP-GiST. Basic idea was an
>> observation, range opclass thinks about one-dimensional ranges as 2D
>> points.
>> Following this idea, we can think that 2D box (what is 2 points or 4
>> numbers) could represent a 4D point. We hoped that this approach will be
>> much more effective than traditional R-Tree in case of many overlaps in
>> box's collection.
>> Performance results are coming shortly.
>>
>
> It appears that the issues raised in this thread have been addressed but
> the patch still has not gone though a real review.
>
> Anybody out there willing to take a crack at a review?  All three patches
> apply (with whitespace issues).
>

Emre Hasegeli will review the patch.


>
> Thanks,
> --
> -David
> da...@pgmasters.net
>


Re: [HACKERS] Background Processes and reporting

2016-03-12 Thread Oleg Bartunov
On Sat, Mar 12, 2016 at 12:45 AM, Andres Freund  wrote:

> On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
>
>

> > So, situation looks like dead-end.  I have no idea how to convince Robert
> > about any kind of advanced functionality of wait monitoring to
> PostgreSQL.
> > I'm thinking about implementing sampling extension over current
> > infrastructure just to make community see that it sucks. Andres, it would
> > be very nice if you have any idea how to move this situation forward.
>
> I've had my share of conflicts with Robert. But if I were in his shoes,
> targeted by this kind of rhetoric, I'd be very tempted to just ignore
> any further arguments from the origin.  So I think the way forward is
> for everyone to cool off, and to see how we can incrementally make
> progress from here on.
>
>
We all are very different people from different cultures, so online
discussion on ill-defined topics  wouldn't work. Let's back to work.


> > Another aspect is that EnterpriseDB offers waits monitoring in
> proprietary
> > fork [5].
>
>
So?
>

So, Robert already has experience with the subject, probably,  he has bad
experience with edb implementation and he'd like to see something better in
community version. That's fair and I accept his position.

Wait monitoring is one of the popular requirement of russian companies, who
migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
to have configure flag for including wait monitoring at compile time
(default is no wait monitoring), or have GUC variable, which is also off by
default, so we have zero to minimal overhead of monitoring. That way we'll
satisfy many enterprises and help them to choose postgres, will get
feedback from production use and have time for feature improving.



>
> Greetings,
>
> Andres Freund
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread Oleg Bartunov
On Fri, Mar 11, 2016 at 7:11 PM, David Steele  wrote:

> On 2/10/16 12:50 PM, Konstantin Knizhnik wrote:
>
> > PostgresProffesional cluster teams wants to propose new version of
> > eXtensible Transaction Manager API.
> > Previous discussion concerning this patch can be found here:
> >
> >
> http://www.postgresql.org/message-id/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru
>
> I see a lot of discussion on this thread but little in the way of
> consensus.
>
> > The API patch itself is small enough, but we think that it will be
> > strange to provide just API without examples of its usage.
>
> It's not all that small, though it does apply cleanly even after a few
> months.  At least that indicates there is not a lot of churn in this area.
>
> I'm concerned about the lack of response or reviewers for this patch.
> It may be because everyone believes they had their say on the original
> thread, or because it seems like a big change to go into the last CF, or
> for other reasons altogether.
>

We'll prepare easy setup to play with our solutions, so any developers
could see how it works.  Hope this weekend we'll post something about this.



>
> I think you should try to make it clear why this patch would be a win
> for 9.6.
>

Looks like discussion shifted to different thread, we'll answer here.



>
> Is anyone willing to volunteer a review or make an argument for the
> importance of this patch?
>
> --
> -David
> da...@pgmasters.net
>
>


Re: [HACKERS] WIP: Access method extendability

2016-03-11 Thread Oleg Bartunov
On Wed, Mar 9, 2016 at 8:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi!
>
> On Wed, Mar 9, 2016 at 3:27 PM, Alvaro Herrera 
> wrote:
>
>> Hi.  As I just said to Tomas Vondra: since your patch creates a new
>> object type, please make sure to add a case to cover it in the
>> object_address.sql test.  That verifies some things such as
>> pg_identify_object and related.
>>
>
> Good catch, thanks! Tests were added.
> I also introduced numbering into patch names to make evident the order to
> their application.
>
>
Nice to see progress ! I hope to see Alexander' work in 9.6.

I and Teodor will show at PGCon new GIN AM as an extension, optimized for
full text search (millisecond FTS) , which we gave up to push into core.


> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-11 Thread Oleg Bartunov
On Fri, Mar 11, 2016 at 9:09 AM, Bruce Momjian  wrote:

>
>
>
> 3.  I have tried to encourage others to get involved, with limited
> success.  I do think the FDW is perhaps the only reasonable way to get
> _built-in_ sharding.  The external sharding solutions are certainly
> viable, but external.  It is possible we will make all the FDW
> improvements, find out it doesn't work, but find out the improvements
> allow us to go in another direction.
>

I remember last summer emails and we really wanted to participate in
development, but it happens all slots were occupied by edb and ntt people.
We wanted to work on distributed transactions and proposed our XTM.  Our
feeling that time from discussion was that we were invited, but all doors
were closed. It was very bad experience. Hopefully, we understand our
misunderstanding.


>
> There seems to be serious interest in how this idea came about, so let
> me say what I remember.
>

I think the idea was so obvious, so let's don't discuss this.


>
> As for why there is so much hostility, I think this is typical for any
> ill-defined feature development.  There was simmering hostility to the
> Windows port and pg_upgrade for many years because those projects were
> not easy to define and risky, and had few active developers.  The
> agreement was that work could continue as long as destabilization wasn't
> introduced.  Ideally everything would have a well-defined plan, it is
> sometimes hard to do.  Similar to our approach on parallelism (which is
> also super-important and doesn't many active developers), sometimes you
> just need to create infrastructure and see how well it solves problems.
>
>

Our XTM is the yet another example of infrastructure we need to work on
clustering. Should we wait other smart guy starts thinking on distributed
transactions ?  We described in https://wiki.postgresql.org/wiki/DTM our
API, which is just a wrapper on existed functions, but it will allow us and
fortunately others to play with their ideas.  We did several prototypes,
including FDW, to demonstrate viability of API, and plan to continue our
work on built-in high availability, multi-master.  Of course, there will be
a lot to learn, but it will be much easier if XTM will exists not as
separate patch, which is really small.


>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-08 Thread Oleg Bartunov
On Tue, Mar 8, 2016 at 11:17 PM, Oleg Bartunov <obartu...@gmail.com> wrote:

>
>
> On Thu, Mar 3, 2016 at 11:45 AM, Emre Hasegeli <e...@hasegeli.com> wrote:
>
>> > Emre, I checked original thread and didn't find sample data. Could you
>> provide them for testing ?
>>
>> I found it on the Git history:
>>
>>
>> https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true
>>
>
> Thanks !
>
> spgist index creates 2 times faster than gist, but index size is
> noticeably  bugger
>
> \di+ route_*
> List of relations
>  Schema | Name | Type  |  Owner   | Table  |  Size  | Description
> +--+---+--+++-
>  public | route_gist   | index | postgres | routes | 96 MB  |
>  public | route_spgist | index | postgres | routes | 132 MB |
> (2 rows)
>
> Spgist index tree is much better  than gist - 12149 pages vs 1334760 !
>

I also noticed, that spgist is much faster than gist for other inet
operators. I'd like to see in 9.6.



>
>
>
> EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
> routes.route && hmm.route;
>QUERY PLAN
>
> 
>  Nested Loop  (cost=0.41..570430.27 rows=2338 width=7) (actual
> time=5.730..12085.747 rows=8127 loops=1)
>Buffers: shared hit=1334760
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.013..0.528 rows=732 loops=1)
>  Buffers: shared hit=4
>->  Index Only Scan using route_gist on routes  (cost=0.41..550.26
> rows=22900 width=7) (actual time=2.491..16.503 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Buffers: shared hit=1334756
>  Planning time: 0.827 ms
>  Execution time: 12086.513 ms
> (10 rows)
>
> EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
> routes.route && hmm.route;
>QUERY PLAN
>
> -
>  Nested Loop  (cost=0.41..588634.27 rows=2338 width=7) (actual
> time=0.043..12.150 rows=8127 loops=1)
>Buffers: shared hit=12149
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.013..0.075 rows=732 loops=1)
>  Buffers: shared hit=4
>->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
> rows=22900 width=7) (actual time=0.011..0.015 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Buffers: shared hit=12145
>  Planning time: 0.779 ms
>  Execution time: 12.603 ms
> (10 rows)
>
>
>
>
>


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-08 Thread Oleg Bartunov
On Thu, Mar 3, 2016 at 11:45 AM, Emre Hasegeli  wrote:

> > Emre, I checked original thread and didn't find sample data. Could you
> provide them for testing ?
>
> I found it on the Git history:
>
>
> https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true
>

Thanks !

spgist index creates 2 times faster than gist, but index size is
noticeably  bugger

\di+ route_*
List of relations
 Schema | Name | Type  |  Owner   | Table  |  Size  | Description
+--+---+--+++-
 public | route_gist   | index | postgres | routes | 96 MB  |
 public | route_spgist | index | postgres | routes | 132 MB |
(2 rows)

Spgist index tree is much better  than gist - 12149 pages vs 1334760 !



EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
   QUERY PLAN

 Nested Loop  (cost=0.41..570430.27 rows=2338 width=7) (actual
time=5.730..12085.747 rows=8127 loops=1)
   Buffers: shared hit=1334760
   ->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.528 rows=732 loops=1)
 Buffers: shared hit=4
   ->  Index Only Scan using route_gist on routes  (cost=0.41..550.26
rows=22900 width=7) (actual time=2.491..16.503 rows=11 loops=732)
 Index Cond: (route && (hmm.route)::inet)
 Heap Fetches: 8127
 Buffers: shared hit=1334756
 Planning time: 0.827 ms
 Execution time: 12086.513 ms
(10 rows)

EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
   QUERY PLAN
-
 Nested Loop  (cost=0.41..588634.27 rows=2338 width=7) (actual
time=0.043..12.150 rows=8127 loops=1)
   Buffers: shared hit=12149
   ->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.075 rows=732 loops=1)
 Buffers: shared hit=4
   ->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
rows=22900 width=7) (actual time=0.011..0.015 rows=11 loops=732)
 Index Cond: (route && (hmm.route)::inet)
 Heap Fetches: 8127
 Buffers: shared hit=12145
 Planning time: 0.779 ms
 Execution time: 12.603 ms
(10 rows)


Re: [HACKERS] The plan for FDW-based sharding

2016-03-08 Thread Oleg Bartunov
On Tue, Mar 8, 2016 at 6:40 AM, Craig Ringer  wrote:



> Either that, or bless experimental features/API as an official concept.
> I'd quite like that myself - stuff that's in Pg, but documented as "might
> change or go away in the next release, experimental feature". As we're
> doing more stuff that spans multiple release cycles, where patches in a
> prior cycle might need revision based on what we learn in a later one, we
> might need more freedom to change things that're committed and user visible.
>
>
+1


> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-02 Thread Oleg Bartunov
On Thu, Mar 3, 2016 at 8:51 AM, Oleg Bartunov <obartu...@gmail.com> wrote:

>
>
> On Wed, Mar 2, 2016 at 11:56 PM, Emre Hasegeli <e...@hasegeli.com> wrote:
>
>> Attached patches add SP-GiST support to the inet datatypes.  The
>> operator class comes with a small change on the SP-GiST framework to allow
>> fixed number of child nodes.
>>
>> The index is like prefix tree except that it doesn't bother to split the
>> addresses into parts as text is split.  It also doesn't use labels to know
>> the part after the prefix, but relies on static node numbers.
>>
>>
> Thanks, Emre for interesting spgist. We are bit busy and will take a look
> on your patches when come to our spgist patch.
>
>

Emre, I checked original thread and didn't find sample data. Could you
provide them for testing ?


> The GiST index released with version 9.4 performs really bad with real
>> world data.  SP-GiST works much better with the query posted to the
>> performance list [1] a while ago:
>>
>> > hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =
>> 2914;
>> > SELECT 732
>> >
>> > hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
>> routes.route && hmm.route;
>> >QUERY
>> PLAN
>> >
>> 
>> >  Nested Loop  (cost=0.41..571742.27 rows=2248 width=7) (actual
>> time=12.643..20474.813 rows=8127 loops=1)
>> >->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
>> time=0.017..0.524 rows=732 loops=1)
>> >->  Index Only Scan using route_gist on routes  (cost=0.41..552.05
>> rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)
>> >  Index Cond: (route && (hmm.route)::inet)
>> >  Heap Fetches: 8127
>> >  Planning time: 1.507 ms
>> >  Execution time: 20475.605 ms
>> > (7 rows)
>> >
>> > hasegeli=# DROP INDEX route_gist;
>> > DROP INDEX
>> >
>> > hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
>> > CREATE INDEX
>> >
>> > hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
>> routes.route && hmm.route;
>> >   QUERY PLAN
>> >
>> -
>> >  Nested Loop  (cost=0.41..588634.27 rows=2248 width=7) (actual
>> time=0.081..16.961 rows=8127 loops=1)
>> >->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
>> time=0.022..0.079 rows=732 loops=1)
>> >->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
>> rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)
>> >  Index Cond: (route && (hmm.route)::inet)
>> >  Heap Fetches: 8127
>> >  Planning time: 1.376 ms
>> >  Execution time: 15.936 ms
>>
>> [1]
>> http://www.postgresql.org/message-id/flat/alpine.DEB.2.11.1508251504160.31004@pyrite#alpine.DEB.2.11.1508251504160.31004@pyrite
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-02 Thread Oleg Bartunov
On Wed, Mar 2, 2016 at 11:56 PM, Emre Hasegeli  wrote:

> Attached patches add SP-GiST support to the inet datatypes.  The operator
> class comes with a small change on the SP-GiST framework to allow fixed
> number of child nodes.
>
> The index is like prefix tree except that it doesn't bother to split the
> addresses into parts as text is split.  It also doesn't use labels to know
> the part after the prefix, but relies on static node numbers.
>
>
Thanks, Emre for interesting spgist. We are bit busy and will take a look
on your patches when come to our spgist patch.


> The GiST index released with version 9.4 performs really bad with real
> world data.  SP-GiST works much better with the query posted to the
> performance list [1] a while ago:
>
> > hasegeli=# SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn =
> 2914;
> > SELECT 732
> >
> > hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
> routes.route && hmm.route;
> >QUERY PLAN
> >
> 
> >  Nested Loop  (cost=0.41..571742.27 rows=2248 width=7) (actual
> time=12.643..20474.813 rows=8127 loops=1)
> >->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.017..0.524 rows=732 loops=1)
> >->  Index Only Scan using route_gist on routes  (cost=0.41..552.05
> rows=22900 width=7) (actual time=4.851..27.948 rows=11 loops=732)
> >  Index Cond: (route && (hmm.route)::inet)
> >  Heap Fetches: 8127
> >  Planning time: 1.507 ms
> >  Execution time: 20475.605 ms
> > (7 rows)
> >
> > hasegeli=# DROP INDEX route_gist;
> > DROP INDEX
> >
> > hasegeli=# CREATE INDEX route_spgist ON routes USING spgist (route);
> > CREATE INDEX
> >
> > hasegeli=# EXPLAIN ANALYZE SELECT routes.route FROM routes JOIN hmm ON
> routes.route && hmm.route;
> >   QUERY PLAN
> >
> -
> >  Nested Loop  (cost=0.41..588634.27 rows=2248 width=7) (actual
> time=0.081..16.961 rows=8127 loops=1)
> >->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.022..0.079 rows=732 loops=1)
> >->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
> rows=22900 width=7) (actual time=0.014..0.021 rows=11 loops=732)
> >  Index Cond: (route && (hmm.route)::inet)
> >  Heap Fetches: 8127
> >  Planning time: 1.376 ms
> >  Execution time: 15.936 ms
>
> [1]
> http://www.postgresql.org/message-id/flat/alpine.DEB.2.11.1508251504160.31004@pyrite#alpine.DEB.2.11.1508251504160.31004@pyrite
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov
On Mar 3, 2016 4:47 AM, "Michael Paquier"  wrote:
>
> On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
>  wrote:
> > If FDWs existed then Postgres XC/XL were being developed then I believe
they
> > would try to build full-featured prototype of FDW based sharding. If
this
> > prototype succeed then we could make a full roadmap.
>
> Speaking here with my XC hat, that's actually the case. A couple of
> years back when I worked on it, there were discussions about reusing
> FDW routines for the purpose of XC, which would have been roughly
> reusing postgres_fdw + the possibility to send XID, snapshot and
> transaction timestamp to the remote nodes after getting that from the
> GTM (global transaction manager ensuring global data visibility and
> consistency), and have the logic for query pushdown in the FDW itself
> when planning query on what would have been roughly foreign tables
> (not entering in the details here, those would have not been entirely
> foreign tables). At this point the global picture was not completely
> set, XC being based on 9.1~9.2 and the FDW base routines were not as
> extended as they are now. As history has told, this global picture has
> never showed up, though it would should XC have been merged with 9.3.
> The point is that XC would have moved as using the FDW approach, as a
> set of plugins.
>
> This was a reason behind this email of 2013 on -hackers actually:
>
http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com

Good to remember!

> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov
On Wed, Mar 2, 2016 at 4:36 AM, Tomas Vondra 
wrote:

Hi,
>
> On 03/01/2016 08:02 PM, Bruce Momjian wrote:
>
>> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
>>
>>> Note that I am not saying that other discussed approaches are any
>>> better, I am saying that we should know approximately what we
>>> actually want and not just beat FDWs with a hammer and hope sharding
>>> will eventually emerge and call that the plan.
>>>
>>
>> I will say it again --- FDWs are the only sharding method I can think
>> of that has a chance of being accepted into Postgres core.
>>
>
>
>
> While I disagree with Simon on various things, I absolutely understand why
> he was asking about a prototype, and some sort of analysis of what usecases
> we expect to support initially/later/never, and what pieces are missing to
> get the sharding working. IIRC at the FOSDEM Dev Meeting you've claimed
> you're essentially working on a prototype - once we have the missing FDW
> pieces, we'll know if it works. I disagree that - it's not a prototype if
> it takes several years to find the outcome.
>
>
fully agree. Probably, we all need to help to build prototype in
between-releases period. I see no legal way to resolve the situation.


>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov
On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> > On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> >> > Two reasons:
> >> > 1. There is no ideal implementation of DTM which will fit all
> possible needs
> >> > and be  efficient for all clusters.
> >>
> >> Hmm, what is the reasoning behind that statement?  I mean, it is
> >> certainly true that there are some places where we have decided that
> >> one-size-fits-all is not the right approach.  Indexing, for example.
> >
> > Uh, is that even true of indexing?  While the plug-in nature of indexing
> > allows for easier development and testing, does anyone create plug-in
> > indexing that isn't shipped by us?  I thought WAL support was something
> > that prevented external indexing solutions from working.
>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
>

We'd love to develop new special index AM, that's why we all are for
pluggable WAL. I think there are will be other AM developers, once we open
the door for that.


> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
>

I don't expect many other TM developers, so there is no problem with
improving API. We started from practical needs and analyses of many
academical papers. We spent a year to play with several prototypes to prove
our proposed API (expect more in several months). Everybody could download
them a test. Wish we can do that with FDW-based sharding solution.

Of course, we can fork postgres as XC/XL people did and certainly
eventually will do, if community don't accept our proposal, since it's very
difficult to work on cross-releases projects. But then there are will be no
winners, so why do we all are aggressively don't understand each other ! I
was watching  XC/XL for years and thought I don't want to go this way of
isolation from the community, so we decided to let TM pluggable to stay
with community and let everybody prove their concepts. if you have ideas
how to improve TM API, we are open, if you know it's broken by design,
let's help us to fix it.  I have my understanding about FDW, but I
deliberately don't participate in some very hot discussion, just because I
feel myself not commited to work on. Your group is very enthusiastic on
FDW, it's ok until you improve FDW in general way, I'm very happy on
current work.  I prefer you show prototype of sharding solution, which
convince us in functionality and perfromance. I agree with Thomas Vondra,
that we don't want to wait for years to see the result, we want to expect
results, based on prototype, which should be done between releases. If you
don't have enough resources for this, let's do together with community.
Nobody as I've seen are against FDW sharding, people complained about "the
only sharding solution" in postgres, without proving so.





>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] WIP: Upper planner pathification

2016-03-01 Thread Oleg Bartunov
On Tue, Mar 1, 2016 at 3:11 PM, Teodor Sigaev  wrote:

> The basic point of this patch is to apply the generate-and-compare-Paths
>> paradigm to the planning steps after query_planner(), which only covers
>>
> ...
> > The present patch addresses this problem by inventing Path nodes to
> > represent every post-scan/join step
>
> I'm really glad to see that. Separating path nodes for later steps opens a
> new ways to optimize queries. For first glance, consider
> select * from a left outer join b on a.i = b.i limit 1;
> Limit node could be pushed down to scan over 'a' table if b.i is unique.
>

This patch opens a lot of possibilities to our ongoing project on indexing
subselects, which we plan to use for  jsonb. Having it in 9.6 will
certainly facilitate this. So, I'm +1 for this patch, even if we have to
postpone 9.6 a bit. Hope, Robert, Teodor and other reviewers could help Tom
with this patch.


>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Oleg Bartunov
On Fri, Feb 26, 2016 at 3:50 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov <obartu...@gmail.com>
> wrote:
> > I already several times pointed, that we need XTM to be able to continue
> > development in different directions, since there is no clear winner.
> > Moreover, I think there is no fits-all  solution and while I agree we
> need
> > one built-in in the core, other approaches should have ability to exists
> > without patching.
>
> I don't think I necessarily agree with that.  Transaction management
> is such a fundamental part of the system that I think making it
> pluggable is going to be really hard.  I understand that you've done
> several implementations based on your proposed API, and that's good as
> far as it goes, but how do we know that's really going to be general
> enough for what other people might need?


Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.



> And what makes us think we
> really need multiple transaction managers, anyway?



If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.



> Even writing one
> good distributed transaction manager seems like a really hard project
> - why would we want to write two or three or five?
>

again, right now it's simply impossible to any bright person to work on
dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Oleg Bartunov
On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, Bruce!
>
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.
>
> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates
> etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is
> good too.
>
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be
> done under this restriction, because it requires every node to communicate
> every other node if it makes distributed query faster. And as I get, FDW
> approach currently have no research and no particular plan for that.
>
> As I get from Robert Haas's talk (
> https://docs.google.com/viewer?a=v=sites=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
> )
>
>> Before we consider repartitioning joins, we should probably get
>> everything previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>
>
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.
>
> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:
>
>> Second, as part of this staged implementation, there are several use
>> cases that will be shardable at first, and then only later, more complex
>> ones.  For example, here are some use cases and the technology they
>> require:
>>
>> 1. Cross-node read-only queries on read-only shards using aggregate
>> queries, e.g. data warehouse:
>>
>> This is the simplest to implement as it doesn't require a global
>> transaction manager, global snapshot manager, and the number of rows
>> returned from the shards is minimal because of the aggregates.
>>
>> 2. Cross-node read-only queries on read-only shards using non-aggregate
>> queries:
>>
>> This will stress the coordinator to collect and process many returned
>> rows, and will show how well the FDW transfer mechanism scales.
>>
>
> FDW would work for queries which fits pull-pushdown model. I see no plan
> to make other queries work.
>
>
>> 3. Cross-node read-only queries on read/write shards:
>>
>> This will require a global snapshot manager to make sure the shards
>> return consistent data.
>>
>> 4. Cross-node read-write queries:
>>
>> This will require a global snapshot manager and global snapshot manager.
>>
>
> At this point, it unclear why don't you refer work done in the direction
> of distributed transaction manager (which is also distributed snapshot
> manager in your terminology)
> http://www.postgresql.org/message-id/56bb7880.4020...@postgrespro.ru
>
>
>> In 9.6, we will have FDW join and sort pushdown
>> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
>> calability.html
>> ).
>> Unfortunately I don't think we will have aggregate
>> pushdown, so we can't test #1, but we might be able to test #2, even in
>> 9.5.  Also, we might have better partitioning syntax in 9.6.
>>
>> We need things like parallel partition access and replicated lookup
>> tables for more join pushdown.
>>
>> In a way, because these enhancements are useful independent of sharding,
>> we have not tested to see how well an FDW sharding setup will work and
>> for which workloads.
>>
>
> This is the point I agree. I'm not objecting against any single FDW
> advance, because it's useful by itself.
>
> We know Postgres XC/XL works, and scales, but we also know they require
>> too many code changes to be merged into Postgres (at least based on
>> previous discussions).  The FDW sharding approach is to enhance the
>> existing features of Postgres to allow as much sharding as possible.
>>
>
> This comparison doesn't seems correct to me. Postgres XC/XL supports data
> redistribution between nodes. And I haven't heard any single idea of
> supporting this in FDW. You are comparing not equal things.
>
>
>> Once that is done, we can see what workloads it covers and
>> decide if we are willing to copy the volume of code necessary
>> to implement all supported Postgres XC or XL workloads.
>> (The Postgres XL license now matches the Postgres license,
>> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>> 

Re: [HACKERS] [WIP] speeding up GIN build with parallel workers

2016-02-17 Thread Oleg Bartunov
On Wed, Feb 17, 2016 at 6:55 PM, Constantin S. Pan  wrote:

> On Sat, 16 Jan 2016 01:38:39 +0300
> "Constantin S. Pan"  wrote:
>
> > The task of building GIN can require lots of time and eats 100 % CPU,
> > but we could easily make it use more than a 100 %, especially since we
> > now have parallel workers in postgres.
> >
> > The process of building GIN looks like this:
> >
> > 1. Accumulate a batch of index records into an rbtree in maintenance
> > work memory.
> >
> > 2. Dump the batch to disk.
> >
> > 3. Repeat.
> >
> > I have a draft implementation which divides the whole process between
> > N parallel workers, see the patch attached. Instead of a full scan of
> > the relation, I give each worker a range of blocks to read.
> >
> > This speeds up the first step N times, but slows down the second one,
> > because when multiple workers dump item pointers for the same key,
> > each of them has to read and decode the results of the previous one.
> > That is a huge waste, but there is an idea on how to eliminate it.
> >
> > When it comes to dumping the next batch, a worker does not do it
> > independently. Instead, it (and every other worker) sends the
> > accumulated index records to the parent (backend) in ascending key
> > order. The backend, which receives the records from the workers
> > through shared memory, can merge them and dump each of them once,
> > without the need to reread the records N-1 times.
> >
> > In current state the implementation is just a proof of concept
> > and it has all the configuration hardcoded, but it already works as
> > is, though it does not speed up the build process more than 4 times
> > on my configuration (12 CPUs). There is also a problem with temporary
> > tables, for which the parallel mode does not work.
>
> Hey Hackers!
>
> I have made some progress on the proposal (see the attached patch):
>
> 0. Moved some repeated code to functions (e.g. ginDumpAccumulator,
> ginbuildCommon).
>
> 1. Implemented results merging on backend.
>
> 2. Disabled the usage of parallel mode when creating index on temporary
> tables. No point in using parallel mode for temporary tables anyway,
> right?
>
> 3. Added GUC parameter to control the number of workers for GIN
> building.
>
> 4. Hit the 8x speedup limit. Made some analysis of the reasons (see the
> attached plot or the data file).
>
> In order to analyze the performance issues, I have made the following:
>
> create table t (k int, v int[]);
>
> create or replace
> function randarray(width int, low int, high int)
> returns int[] as
> $$
> select array(select (random()*(high-low) + low)::int
> from generate_series(1,width))
> $$ language sql;
>
> insert into t select k, randarray(3000, 0, 10)
> from generate_series(1, 10) as k;
>
> create index t_v_idx on t using gin (v);
>
> This creates 10 arrays of 3000 random numbers each. The random
> numbers are in range [0, 10]. Then I measure how long the gin
> building steps take. There are two steps: scan and merge.
>
> The results show that 'scan' step is sped up perfectly. But the
> 'merge' step takes longer as you increase the number of workers. The
> profiler shows that the bottleneck here is ginMergeItemPointers(), which
> I use to merge the results.
>
> Also, I did encounter the problem with workers deadlocking during
> heap_open, but that seems to have been resolved by Robert Haas in his
> commit regarding group locking.
>
> Please leave your feedback!
>

My feedback is (Mac OS X 10.11.3)

set gin_parallel_workers=2;
create index message_body_idx on messages using gin(body_tsvector);
LOG:  worker process: parallel worker for PID 5689 (PID 6906) was
terminated by signal 11: Segmentation fault
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the 

Re: [HACKERS] Figures in docs

2016-02-16 Thread Oleg Bartunov
On Wed, Feb 17, 2016 at 4:17 AM, Tatsuo Ishii  wrote:

> It seems there's no figures/diagrams in our docs. I vaguely recall that
> we used to have a few diagrams in our docs. If so, was there any
> technical reason to remove them?
>

I don't know the reason, but it's shame, we are still in sgml !

We already do our translations (as others) in xml using custom scripting.
xml provides us better integration with available tools and ability to
insert graphics. Last time we asked in -docs about moving to xml and
Alexander demonstrated acceptable speed of xml build, but there were no
reply from Peter, who is (?) responsible for our documentation
infrastructure. Probably, we should just created a patch and submit to
commitfest.  You can check this thread
http://www.postgresql.org/message-id/1428009501118.85...@postgrespro.ru


> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] tsearch_extras extension

2016-02-16 Thread Oleg Bartunov
On Wed, Feb 17, 2016 at 6:57 AM, Tim Abbott  wrote:

> Just following up here since I haven't gotten a reply -- I'd love to work
> with someone from the Postgres community on a plan to make the
> tsearch_extras functionality available as part of mainline postgres.
>
>
> -Tim Abbott
>
> On Wed, Feb 3, 2016 at 9:41 PM, Tim Abbott  wrote:
>
>> Hello,
>>
>> I'm a maintainer of the Zulip open source group chat application.  Zulip
>> depends on a small (~200 lines) postgres extension called tsearch_extras (
>> https://github.com/zbenjamin/tsearch_extras) that returns the actual
>> (offset, length) pairs of all the matches for a postgres full text search
>> query.  This extension allows Zulip to do its own highlighting of the full
>> text search matches, using a more complicated method than what Postgres
>> supports natively.
>>
>> I think tsearch_extras is probably of value to others using postgres
>> full-text search (and I'd certainly love to get out of the business of
>> maintaining an out-of-tree postgres extension), so I'm wondering if this
>> feature (or a variant of it) would be of interest to upstream?
>>
>> Thanks!
>>
>> -Tim Abbott
>>
>> (See
>> http://www.postgresql.org/message-id/flat/52c7186d.8010...@strangersgate.com#52c7186d.8010...@strangersgate.com
>> for the discussion on postgres mailing lists that caused us to write this
>> module in the first place.)
>>
>
Tim,

take a look on this patch (https://commitfest.postgresql.org/7/385/) and
contact author.  It contains everything you need to your purposes.

btw, Stas, check on status "Returned with feedback" !


Regards,
Oleg


Re: [HACKERS] pglogical - logical replication contrib module

2016-02-16 Thread Oleg Bartunov
On Tue, Feb 16, 2016 at 5:38 PM, Bruce Momjian  wrote:

> On Tue, Jan 26, 2016 at 08:14:26PM -0800, Joshua Drake wrote:
> > On 12/31/2015 03:34 PM, Petr Jelinek wrote:
> > >Hi,
> > >
> > >I'd like to submit the replication solution which is based on the
> > >pglogical_output [1] module (which is obviously needed for this to
> > >compile).
> >
> > This is fantastic! However, history presents itself here and
> > PostgreSQL in the past has not "blessed" a single solution for
> > Replication. Obviously that changed a bit with streaming replication
> > but this is a bit different than that. As I understand it, PgLogical
> > is Logical Replication (similar to Slony and Londiste). I wouldn't
> > be surprised (although I don't know) if Slony were to start using
> > some of the pglogical_output module features in the future.
> >
> > If we were to accept PgLogical into core, it will become the default
> > blessed solution for PostgreSQL. While that is great in some ways
> > it is a different direction than the project has taken in the past.
> > Is this what we want to do?
>
> Replying late here, but I think with binary replication, we decided
> that, assuming you were happy with the features provided, our streaming
> binary replication solution was going to be the best and recommended way
> of doing it.
>
> I don't think we ever had that feeling with Slony or Londiste in that
> there were so many limitations and so many different ways of
> implementing logical replication that we never recommended a best way.
>
> So, the question is, do we feel that PgLogical is best and recommended
> way to do logical replication.  If it is, then having it in core makes
> sense.
>

DDL support is what it's missed for now.


>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] commitfest problem ?

2016-02-16 Thread Oleg Bartunov
This entry https://commitfest.postgresql.org/8/419/ contains very unrelated
patches from another commitfest. I think


Oleg


Re: [HACKERS] [WIP] ALTER ... OWNER TO ... CASCADE

2016-02-16 Thread Oleg Bartunov
On Mon, Feb 15, 2016 at 7:25 PM, Tom Lane  wrote:

> Teodor Sigaev  writes:
> >> So basically, a generic CASCADE facility sounds like a lot of work to
> >> produce something that would seldom be anything but a foot-gun.
>
> > DELETE FROM  or TRUNCATE could be a foot-gun too, but it's not a reason
> to
> > remove tham. I faced with problem when I tried to change owner of
> datadase with
> > all objects inside. Think, this feature could be useful although it
> should
> > restricted to superuser obly.
>
> That's a pretty weak argument, and I do not think you have thought through
> all the consequences.  It is not hard at all to imagine cases where using
> this sort of thing could be a security vulnerability.  Are you familiar
> with the reasons why Unix systems don't typically allow users to "give
> away" ownership of files?  The same problem exists here.
>

yes, I remember AT and BSD :)



>
> To be concrete about it:
>
> 1. Alice does, say, "CREATE EXTENSION cube".
>
> 2. Bob creates a security-definer function owned by himself, using a
>"cube"-type parameter so that it's dependent on the extension.
>(It needn't actually do anything with that parameter.)
>
> 3. Alice does ALTER EXTENSION cube OWNER TO charlie CASCADE.
>
> 4. Bob now has a security-definer function owned by (and therefore
>executing as) Charlie, whose contents were determined by Bob.
>Game over for Charlie ... and for everyone else too, if Charlie is
>a superuser, which is not unlikely for an extension owner.
>
> The only way Alice can be sure that the ALTER EXTENSION is safe is if
> she manually inspects every dependent object, in which case she might
> as well not use CASCADE.
>
> Moreover, the use case you've sketched (ie, change ownership of all
> objects inside a database) doesn't actually have anything to do with
> following dependencies.  It's a lot closer to REASSIGN OWNED ... in
> fact, it's not clear to me why REASSIGN OWNED doesn't solve that
> use-case already.
>
> I remain of the opinion that this is a terrible idea.
>

+1, I also suggest to check REASSIGN OWNED.


>
>
regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
On Wed, Feb 10, 2016 at 7:21 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Oleg Bartunov <obartu...@gmail.com> writes:
> > It  looks like there is a very old bug in full text parser (somebody
> > pointed me on it), which appeared after moving tsearch2 into the core.
> The
> > problem is in how full text parser process hyphenated words. Our original
> > idea was to report hyphenated word itself as well as its parts and ignore
> > hyphen. That was how tsearch2 works.
>
> > This behaviour was changed after moving tsearch2 into the core:
> > 1. hyphen now reported by parser, which is useless.
> > 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed
> differently
> > than ones with plain text words like 'four-dot', no hyphenated word
> itself
> > reported.
>
> > I think we should consider this as a bug and produce fix for all
> supported
> > versions.
>
> I don't see anything here that looks like a bug, more like a definition
> disagreement.  As such, I'd be pretty dubious about back-patching a
> change.  But it's hard to debate the merits when you haven't said exactly
> what you'd do instead.
>

Yeah, better say not bug, but inconsistency. We definitely should work on
better
"consistent" parser with predicted behaviour.


>
> I believe the commit you mention was intended to fix this inconsistency:
>
> http://www.postgresql.org/message-id/6269.1193184...@sss.pgh.pa.us
>
> so I would be against simply reverting it.  In any case, the examples
> given there make it look like there was already inconsistency about mixed
> words and numbers.  Do we really think that "4-dot" should be considered
> a hyphenated word?  I'm not sure.
>

I agree, that we shouldn't  just revert it.  My idea is to work on new
parser and leave old as is for compatibility reason. Fortunately, fts is
flexible enough, so we could add new parser at any time as an extension.



>
> regards, tom lane
>


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
On Wed, Feb 10, 2016 at 7:45 PM, Mike Rylander <mrylan...@gmail.com> wrote:

> On Wed, Feb 10, 2016 at 4:28 AM, Oleg Bartunov <obartu...@gmail.com>
> wrote:
> > It  looks like there is a very old bug in full text parser (somebody
> pointed
> > me on it), which appeared after moving tsearch2 into the core.  The
> problem
> > is in how full text parser process hyphenated words. Our original idea
> was
> > to report hyphenated word itself as well as its parts and ignore hyphen.
> > That was how tsearch2 works.
> >
> > This behaviour was changed after moving tsearch2 into the core:
> > 1. hyphen now reported by parser, which is useless.
> > 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed
> differently
> > than ones with plain text words like 'four-dot', no hyphenated word
> itself
> > reported.
> >
> > I think we should consider this as a bug and produce fix for all
> supported
> > versions.
> >
>
> The Evergreen project has long depended on tsearch2 (both as an
> extension and in-core FTS), and one thing we've struggled with is date
> range parsing such as birth and death years for authors in the form of
> 1979-2014, for instance.  Strings like that end up being parsed as two
> lexems, "1979" and "-2014".  We work around this by pre-normalizing
> strings matching /(\d+)-(\d+)/ into two numbers separated by a space
> instead of a hyphen, but if fixing this bug would remove the need for
> such a preprocessing step it would be a great help to us.  Would such
> strings be parsed "properly" into lexems of the form of "1979" and
> "2014" with you proposed change?
>
>
I'd love to consider all hyphenated "words" in one way, disregarding to
what is "a word", number of plain text, namely,  'w1-w2' should be reported
as {'w1-w2', 'w1', 'w2'}. The problem is in definition of "word".


We'll definitely look on parser again, fortunately, we could just fork
default parser and develop new one to not break compatibility. You have
chance to help us to produce "consistent" view of what tokens new parser
should recognize and how process them.





> Thanks!
>
> --
> Mike Rylander
>
> > After  investigation we found this commit:
> >
> > commit 73e6f9d3b61995525785b2f4490b465fe860196b
> > Author: Tom Lane <t...@sss.pgh.pa.us>
> > Date:   Sat Oct 27 19:03:45 2007 +
> >
> > Change text search parsing rules for hyphenated words so that digit
> > strings
> > containing decimal points aren't considered part of a hyphenated
> word.
> > Sync the hyphenated-word lookahead states with the subsequent
> > part-by-part
> > reparsing states so that we don't get different answers about how
> much
> > text
> > is part of the hyphenated word.  Per my gripe of a few days ago.
> >
> >
> > 8.2.23
> >
> > select tok_type, description, token from ts_debug('dot-four');
> >   tok_type   |  description  |  token
> > -+---+--
> >  lhword  | Latin hyphenated word | dot-four
> >  lpart_hword | Latin part of hyphenated word | dot
> >  lpart_hword | Latin part of hyphenated word | four
> > (3 rows)
> >
> > select tok_type, description, token from ts_debug('dot-4');
> >   tok_type   |  description  | token
> > -+---+---
> >  hword   | Hyphenated word   | dot-4
> >  lpart_hword | Latin part of hyphenated word | dot
> >  uint| Unsigned integer  | 4
> > (3 rows)
> >
> > select tok_type, description, token from ts_debug('4-dot');
> >  tok_type |   description| token
> > --+--+---
> >  uint | Unsigned integer | 4
> >  lword| Latin word   | dot
> > (2 rows)
> >
> > 8.3.23
> >
> > select alias, description, token from ts_debug('dot-four');
> >   alias  |   description   |  token
> > -+-+--
> >  asciihword  | Hyphenated word, all ASCII  | dot-four
> >  hword_asciipart | Hyphenated word part, all ASCII | dot
> >  blank   | Space symbols   | -
> >  hword_asciipart | Hyphenated word part, all ASCII | four
> > (4 rows)
> >
> > select alias, description, token from ts_debug('dot-4');
> >alias   |   description   | token
> > ---+-+---
> >  asciiword | Word, all ASCII | dot
> >  int   | Signed integer  | -4
> > (2 rows)
> >
> > select alias, description, token from ts_debug('4-dot');
> >alias   |   description| token
> > ---+--+---
> >  uint  | Unsigned integer | 4
> >  blank | Space symbols| -
> >  asciiword | Word, all ASCII  | dot
> > (3 rows)
> >
> >
> > Regards,
> > Oleg
>


Re: [HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
On Wed, Feb 10, 2016 at 12:28 PM, Oleg Bartunov <obartu...@gmail.com> wrote:

> It  looks like there is a very old bug in full text parser (somebody
> pointed me on it), which appeared after moving tsearch2 into the core.  The
> problem is in how full text parser process hyphenated words. Our original
> idea was to report hyphenated word itself as well as its parts and ignore
> hyphen. That was how tsearch2 works.
>
> This behaviour was changed after moving tsearch2 into the core:
> 1. hyphen now reported by parser, which is useless.
> 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed
> differently than ones with plain text words like 'four-dot', no hyphenated
> word itself reported.
>
> I think we should consider this as a bug and produce fix for all supported
> versions.
>
> After  investigation we found this commit:
>
> commit 73e6f9d3b61995525785b2f4490b465fe860196b
> Author: Tom Lane <t...@sss.pgh.pa.us>
> Date:   Sat Oct 27 19:03:45 2007 +
>
> Change text search parsing rules for hyphenated words so that digit
> strings
> containing decimal points aren't considered part of a hyphenated word.
> Sync the hyphenated-word lookahead states with the subsequent
> part-by-part
> reparsing states so that we don't get different answers about how much
> text
> is part of the hyphenated word.  Per my gripe of a few days ago.
>
>
> 8.2.23
>
> select tok_type, description, token from ts_debug('dot-four');
>   tok_type   |  description  |  token
> -+---+--
>  lhword  | Latin hyphenated word | dot-four
>  lpart_hword | Latin part of hyphenated word | dot
>  lpart_hword | Latin part of hyphenated word | four
> (3 rows)
>
> select tok_type, description, token from ts_debug('dot-4');
>   tok_type   |  description  | token
> -+---+---
>  hword   | Hyphenated word   | dot-4
>  lpart_hword | Latin part of hyphenated word | dot
>  uint| Unsigned integer  | 4
> (3 rows)
>
> select tok_type, description, token from ts_debug('4-dot');
>  tok_type |   description| token
> --+--+---
>  uint | Unsigned integer | 4
>  lword| Latin word   | dot
> (2 rows)
>
> 8.3.23
>
> select alias, description, token from ts_debug('dot-four');
>   alias  |   description   |  token
> -+-+--
>  asciihword  | Hyphenated word, all ASCII  | dot-four
>  hword_asciipart | Hyphenated word part, all ASCII | dot
>  blank   | Space symbols   | -
>  hword_asciipart | Hyphenated word part, all ASCII | four
> (4 rows)
>
> select alias, description, token from ts_debug('dot-4');
>alias   |   description   | token
> ---+-+---
>  asciiword | Word, all ASCII | dot
>  int   | Signed integer  | -4
> (2 rows)
>
> select alias, description, token from ts_debug('4-dot');
>alias   |   description| token
> ---+--+---
>  uint  | Unsigned integer | 4
>  blank | Space symbols| -
>  asciiword | Word, all ASCII  | dot
> (3 rows)
>
>

Oh, one more bug, which existed even in tsearch2.

select tok_type, description, token from ts_debug('4-dot');
 tok_type |   description| token
--+--+---
 uint | Unsigned integer | 4
 lword| Latin word   | dot
(2 rows)




>
> Regards,
> Oleg
>


[HACKERS] old bug in full text parser

2016-02-10 Thread Oleg Bartunov
It  looks like there is a very old bug in full text parser (somebody
pointed me on it), which appeared after moving tsearch2 into the core.  The
problem is in how full text parser process hyphenated words. Our original
idea was to report hyphenated word itself as well as its parts and ignore
hyphen. That was how tsearch2 works.

This behaviour was changed after moving tsearch2 into the core:
1. hyphen now reported by parser, which is useless.
2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed differently
than ones with plain text words like 'four-dot', no hyphenated word itself
reported.

I think we should consider this as a bug and produce fix for all supported
versions.

After  investigation we found this commit:

commit 73e6f9d3b61995525785b2f4490b465fe860196b
Author: Tom Lane 
Date:   Sat Oct 27 19:03:45 2007 +

Change text search parsing rules for hyphenated words so that digit
strings
containing decimal points aren't considered part of a hyphenated word.
Sync the hyphenated-word lookahead states with the subsequent
part-by-part
reparsing states so that we don't get different answers about how much
text
is part of the hyphenated word.  Per my gripe of a few days ago.


8.2.23

select tok_type, description, token from ts_debug('dot-four');
  tok_type   |  description  |  token
-+---+--
 lhword  | Latin hyphenated word | dot-four
 lpart_hword | Latin part of hyphenated word | dot
 lpart_hword | Latin part of hyphenated word | four
(3 rows)

select tok_type, description, token from ts_debug('dot-4');
  tok_type   |  description  | token
-+---+---
 hword   | Hyphenated word   | dot-4
 lpart_hword | Latin part of hyphenated word | dot
 uint| Unsigned integer  | 4
(3 rows)

select tok_type, description, token from ts_debug('4-dot');
 tok_type |   description| token
--+--+---
 uint | Unsigned integer | 4
 lword| Latin word   | dot
(2 rows)

8.3.23

select alias, description, token from ts_debug('dot-four');
  alias  |   description   |  token
-+-+--
 asciihword  | Hyphenated word, all ASCII  | dot-four
 hword_asciipart | Hyphenated word part, all ASCII | dot
 blank   | Space symbols   | -
 hword_asciipart | Hyphenated word part, all ASCII | four
(4 rows)

select alias, description, token from ts_debug('dot-4');
   alias   |   description   | token
---+-+---
 asciiword | Word, all ASCII | dot
 int   | Signed integer  | -4
(2 rows)

select alias, description, token from ts_debug('4-dot');
   alias   |   description| token
---+--+---
 uint  | Unsigned integer | 4
 blank | Space symbols| -
 asciiword | Word, all ASCII  | dot
(3 rows)


Regards,
Oleg


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-02-02 Thread Oleg Bartunov
On Tue, Feb 2, 2016 at 10:18 AM, Andreas Joseph Krogh <andr...@visena.com>
wrote:

> På tirsdag 02. februar 2016 kl. 04:22:57, skrev Michael Paquier <
> michael.paqu...@gmail.com>:
>
>
>
> On Mon, Feb 1, 2016 at 8:21 PM, Dmitry Ivanov <d.iva...@postgrespro.ru>
> wrote:
>>
>> This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
>> 2009, so all credit goes to them. Any feedback is welcome.
>
> This is not a small patch:
> 28 files changed, 2441 insertions(+), 380 deletions(-)
> And the last CF of 9.6 should not contain rather large patches.
> --
> Michael
>
>
>
> OTOH; It would be extremely nice to get this into 9.6.
>

will see how community decided.
anyway, it's already in our distribution.



>
>
> --
> *Andreas Joseph Krogh*
>


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-02-02 Thread Oleg Bartunov
On Tue, Feb 2, 2016 at 2:04 PM, Alvaro Herrera 
wrote:

> Andreas Joseph Krogh wrote:
>
> > Which seems to indicate it has received a fair amount of testing and is
> quite
> > stable.
> > Hopefully it integrates into the 9.6 codebase without too much risk.
>
> Yes, yes, that's all very good, but we're nearing the closure of the 9.6
> development cycle and we only have one commitfest left.  If someone had
> lots of community brownie points because of doing lots of reviews of
> other people's patches, they might push their luck by posting this patch
> to the final commitfest.  But if that someone didn't, then it wouldn't
> be fair, and if I were the commitfest manager of that commitfest I would
> boot their patch to the 9.7-First commitfest.
>
> The current commitfest which I'm trying to close still has 24 patches in
> needs-review state and 11 patches ready-for-committer; the next one (not
> closed yet) has 40 patches that will need review.  That means a total of
> 75 patches, and those should all be processed ahead of this one.  The
> effort needed to process each of those patches is not trivial, and I'm
> sorry I have to say this but I don't see PostgresPro contributing enough
> reviews, even though I pinged a number of people there, so putting one
> more patch on the rest of the community's shoulders doesn't seem fair to
> me.
>

I'll talk about this.


>
> Everybody has their favorite patch that they want in the next release,
> but we only have so much manpower to review and integrate those patches.
> All review help is welcome.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Oleg Bartunov
On Fri, Jan 29, 2016 at 1:11 PM, Alvaro Herrera 
wrote:

> Artur Zakirov wrote:
>
> > What status of this patch? In commitfest it is "Needs review".
>
> "Needs review" means it needs a reviewer to go over it and, uh, review
> it.  Did I send an email to you prodding you to review patches?  I sent
> such an email to several people from PostgresPro, but I don't remember
> getting a response from anyone, and honestly I don't see you guys/gal
> doing much review on-list.  If you can please talk to your colleagues so
> that they look over your patch, while at the same time your review their
> patches, that would help not only this one patch but everyone else's
> patches as well.
>

I think Teodor is planning to review these patches.


>
> > Can this patch get the status "Ready for Commiter"?
>
> Sure, as soon as it has gotten enough review to say it's past the "needs
> review" phase.  Just like all patches.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] thanks for FOSDEM/PGDay 2016 Developer Meeting

2016-01-28 Thread Oleg Bartunov
I read  https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting
and would like to say thanks for such nice review of meeting.

Oleg


Re: [HACKERS] New committer

2016-01-28 Thread Oleg Bartunov
On Thu, Jan 28, 2016 at 5:37 PM, Magnus Hagander 
wrote:

> Hello!
>
> The PostgreSQL core team would like to welcome Dean Rasheed as a new
> committer for the PostgreSQL project.
>
> Dean - welcome! Now let's see how quickly you can break the buildfarm!
>

Congratulations, of course !

I'd like to see next time a short review of a committer, so other
developers could see what they are missing :)


>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>


[HACKERS] strange behaviour of psql \e command

2016-01-01 Thread Oleg Bartunov
Recently,

I noticed, that psql's \e command doesn't worked for me complaining 'error
opening file'. I did research and found that following setting in joe's (if
I choose editor as joe) configuraton causes the problem:
-exask ^KX always confirms file name


That worked for decades :), so I'm wondering what was changed either in El
Capitan or psql code ?

There is no problem with vi.

I use homebrew on El Capitan.

Oleg


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2015-12-30 Thread Oleg Bartunov
On Wed, Dec 30, 2015 at 5:44 PM, Andres Freund  wrote:

> On 2015-12-30 11:37:19 -0300, Alvaro Herrera wrote:
> > Aleksander Alekseev wrote:
> >
> > > Here is a funny thing - benchmark results I shared 22.12.2015 are wrong
> > > because I forgot to run `make clean` after changing lwlock.h (autotools
> > > doesn't rebuild project properly after changing .h files).
> >
> > Running configure with --enable-depend should avoid this problem.
>
> I still maintain that --enable-depend should be on by default. We're
>

+1


> absurdly optimizing towards saving a handful of cycles in scenarios
> which are usually bottlenecked by other things (build boxes spend more
> times on tests and such), rather than optimizing for developer time. I
> don't know how many people failed setting --enable-depend by now, but it
> definitely goes into several hundres of wasted ours territory.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-19 Thread Oleg Bartunov
Tomas,

have you seen
http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru
I have very limited internet connection (no graphics) , so I may miss
something

Oleg

On Wed, Dec 16, 2015 at 4:15 AM, Tomas Vondra 
wrote:

> Hi,
>
> while working on the Hash Join improvements, I've been repeatedly running
> into the idea of bloom filter - various papers on hash joins mention bloom
> filters as a way to optimize access to the hash table by doing fewer
> lookups, etc.
>
> Sadly, I've been unable to actually see any real benefit of using a bloom
> filter, which I believe is mostly due to NTUP_PER_BUCKET=1, which makes the
> lookups much more efficient (so the room for bloom filter improvements is
> narrow).
>
> The one case where bloom filter might still help, and that's when the
> bloom filter fits into L3 cache (a few MBs) while the hash table (or more
> accurately the buckets) do not. Then there's a chance that the bloom filter
> (which needs to do multiple lookups) might help.
>
> But I think there's another case where bloom filter might be way more
> useful in Hash Join - when we do batching. What we do currently is that we
> simply
>
> 1) build the batches for the hash table (inner relation)
>
> 2) read the outer relation (usually the larger one), and split it
>into batches just like the hash table
>
> 3) while doing (2) we join the first batch, and write the remaining
>batches to disk (temporary files)
>
> 4) we read the batches one by one (for both tables) and do the join
>
> Now, imagine that only some of the rows in the outer table actually match
> a row in the hash table. Currently, we do write those rows into the
> temporary file, but with a bloom filter on the whole hash table (all the
> batches at once) we can skip that for some types of joins.
>
> For inner join we can immediately discard the outer row, for left join we
> can immediately output the row. In both cases we can completely eliminate
> the overhead with writing the tuple to the temporary file and then reading
> it again.
>
> The attached patch is a PoC of this approach - I'm pretty sure it's not
> perfectly correct (e.g. I only tried it with inner join), but it's good
> enough for demonstrating the benefits. It's rather incomplete (see the end
> of this e-mail), and I'm mostly soliciting some early feedback at this
> point.
>
> The numbers presented here are for a test case like this:
>
> CREATE TABLE dim (id INT, dval TEXT);
> CREATE TABLE fact (id INT, fval TEXT);
>
> INSERT INTO dim SELECT i, md5(i::text)
>   FROM generate_series(1,1000) s(i);
>
> -- repeat 10x
> INSERT INTO fact SELECT * FROM dim;
>
> and a query like this
>
> SELECT COUNT(fval) FROM fact JOIN dim USING (id) WHERE dval < 'a';
>
> with different values in the WHERE condition to select a fraction of the
> inner 'dim' table - this directly affects what portion of the 'fact' table
> has a matching row, and also the size of the hash table (and number of
> batches).
>
> Now, some numbers from a machine with 8GB of RAM (the 'fact' table has
> ~6.5GB of data, so there's actually quite a bit of memory pressure, forcing
> the temp files to disk etc.).
>
> With work_mem=16MB, it looks like this:
>
> batches   filter   select.bloom  masterbloom/master
> ---
>   41 6.25%23871   48631 49.09%
>   8212.50%25752   56692 45.42%
>   8318.75%31273 57455 54.43%
>  16425.01%37430   62325 60.06%
>  16531.25%39005   61143 63.79%
>  16637.50%46157   63533 72.65%
>  16743.75%53500   65483 81.70%
>  32849.99%53952 65730 82.08%
>  32956.23%55187 67521 81.73%
>  32a62.49%64454   69448 92.81%
>  32b68.73%66937 71692 93.37%
>  32c74.97%73323   72060101.75%
>  32d81.23%76703   73513104.34%
>  32e87.48%81970 74890109.45%
>  32f93.74%86102   76257112.91%
>
> The 'batches' means how many batches were used for the join, 'filter' is
> the value used in the WHERE condition, selectivity is the fraction of the
> 'dim' table that matches the condition (and also the 'fact'). Bloom and
> master are timings of the query in miliseconds, and bloom/master is
> comparison of the runtimes - so for example 49% means the hash join with
> bloom filter was running ~2x as fast.
>
> Admittedly, work_mem=16MB is quite low, but that's just a way to force
> batching. What really matters is the number of batches and selectivity (how
> many tuples we can eliminate 

Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Oleg Bartunov
On Sun, Dec 13, 2015 at 1:16 AM, Jaime Casanova <
jaime.casan...@2ndquadrant.com> wrote:

> indexrelid = 'indexname'::regclass;


This works, but might bloat system catalog.


[HACKERS] new full text search configurations

2015-11-17 Thread Oleg Bartunov
I checked new snowball site http://snowballstem.org/ and found several new
stemmers appeared (as external contributions):


   - Irish and Czech 
   - Object Pascal codegenerator for Snowball
   
   - Two stemmers for Romanian 
   - Hungarian 
   - Turkish 
   - Armenian 
   - Basque (Euskera)
   
   - Catalan 

Some of them we don't have in our list of default configurations. Since
these are external, not official stemmers, it'd be nice if  people  look
and test them. If they are fine, we can prepare new configurations for 9.6.

 \dF
   List of text search configurations
   Schema   |Name|  Description
++---
 pg_catalog | danish | configuration for danish language
 pg_catalog | dutch  | configuration for dutch language
 pg_catalog | english| configuration for english language
 pg_catalog | finnish| configuration for finnish language
 pg_catalog | french | configuration for french language
 pg_catalog | german | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian| configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian| configuration for russian language
 pg_catalog | simple | simple configuration
 pg_catalog | spanish| configuration for spanish language
 pg_catalog | swedish| configuration for swedish language
 pg_catalog | turkish| configuration for turkish language
 public | english_ns |
(17 rows)


Re: [HACKERS] Note about comparation PL/SQL packages and our schema/extensions

2015-11-05 Thread Oleg Bartunov
On Thu, Nov 5, 2015 at 9:36 AM, Pavel Stehule 
wrote:

> Hi
>
> I had talk about possibility to implement PL/SQL packages in Postgres.
>
> The package concept is coming from ADA language and it is partially
> foreign/redundant element in SQL world. Oracle needs it for modularization,
> because schema plays different role there than in Postgres. My opinion
> about packages in Postgres is clean - the concept of schemas and extension
> is simple and just work. I don't see any big gap there. If we don't play
> Oracle compatibility game, then we don't need to implement class like
> Oracle package. But there are few features, that can help to PL/pgSQL
> developers - generally or with porting from Oracle.
>
> 1. The encapsulation and local scope - all objects in schema are
> accessible from other objects in schema  by default (can be rewritten by
> explicit granting). Local objects are visible only from objects in schema.
> This needs enhancing of our search_path mechanism.
>
> 2. The schema variables - a server side session (can be emulated now) and
> server side local schema session variables (doesn't exist) is pretty useful
> for storing some temp data or high frequent change data - and can
> significantly increase speed of some use cases. Now we emulate it via
> PLPerl shared array, but the encapsulation is missing.
>
> 3. The initialization routines - the routines called when any object from
> schema is used first time.
>
> All three features we can emulate relative simply in C, and probably for
> all mentioned points we have some workaround (less/more ugly) for PL/pgSQL.
> Can be nice do it cleanly in PLpgSQL too.
>

I'd say go ahead !  Packages support is the one of the most requested
feature of people migrating from Oracle.


>
> I don't think we need ADA/ | PL/SQL Syntax - we can enhance our extension
> mechanism to support mentioned points.
>
> Comments, notes?
>
> Regards
>
> Pavel
>


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2015-10-31 Thread Oleg Bartunov
On Sat, Oct 31, 2015 at 9:49 PM, Alexander Lebedev  wrote:

> Hello, Hacker.
>
> * [PATCH] add a box index to sp-gist
>
>   We have extended sp-gist with an index that keeps track of boxes
>
>   We have used ideas underlying sp-gist range implementation to
>   represent 2D boxes as points in 4D space. We use quad tree
>   analogue, but in 4-dimensional space. We call this tree q4d. Each
>   node of this tree is a box (a point in 4D space) which splits space
>   in 16 hyperrectangles.
>
>   Rationale: r-tree assumes that boxes we're trying to index don't
>   overlap much. When this assumption fails, r-tree performs badly,
>   while our proposal to represent a rectangle as a point in 4D space
>   solves this problem.
>
>   NB: the index uses traversalValue introduced in a separate patch.
>
> * [PATCH] add traversalValue in sp-gist
>
>   During implementation of box index for sp-gist we saw that we only
>   keep rectangles, but to determine traversal direction we may need
>   to know boundaries of a hyperrectangle. So we calculate them
>   on the fly and store them in traversalValue, available
>   when traversing child nodes, because otherwise we would't be able to
>   calculate them from inside the inner_consistent function call.
>
>   This patch was written by Teodor Sigaev.
>
> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>
>   During traversal, local context is
>   insufficient to pick traversal direction. As of now, this is worked
>   around with the help of reconstructValue. reconstructValue only
>   stores data of the same type as a tree node, that is, a range.
>
>   We have written a patch that calculates auxillary values and makes
>   them accessible during traversal.
>
>   We then use traversalValue in a new box index and change
>   range_spgist.c to use it in place of reconstructValue.
>
>   NB: apply this patch after traversalValue patch.
>
>
Did you forget to show us performance numbers ?



>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Did the "Full-text search in PostgreSQL in milliseconds" patches land?

2015-10-30 Thread Oleg Bartunov
On Thu, Oct 29, 2015 at 5:31 PM, Colin 't Hart  wrote:

> Hi,
>
> I've been reading
>
> wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf
> with interest and am wondering if these patches ever made it in to the
> "official" version of Postgresql?
>

not all patches have committed. There are two more patches we need to
convince community to accept. We have them rebased for head.



>
> I've tried doing some of the queries as described in the slides using
> 9.5b1 but I get the "No operator matches the given name and argument
> type(s)." error.
>
> Thanks,
>
> Colin
>


Re: [HACKERS] Postgres service stops when I kill client backend on Windows

2015-10-12 Thread Oleg Bartunov
On Mon, Oct 12, 2015 at 4:42 PM, Dmitry Vasilyev 
wrote:

> Hello, Amit!
>
> On Пн, 2015-10-12 at 11:25 +0530, Amit Kapila wrote:
>
> On Sun, Oct 11, 2015 at 9:12 PM, Tom Lane  wrote:
> >
> > Magnus Hagander  writes:
> > > On Sun, Oct 11, 2015 at 5:22 PM, Tom Lane  wrote:
> > >> I'm a bit suspicious that we may have leaked a handle to the shared
> > >> memory block someplace, for example.  That would explain why this
> > >> symptom is visible now when it was not in 2009.  Or maybe it's
> dependent
> > >> on some feature that we didn't test back then --- for instance, if
> > >> the logging collector is in use, could it have inherited a handle and
> > >> not closed it?
> >
> > > Even if we leaked it, it should go away when the other processes died.
> >
> > I'm fairly certain that we do not kill/restart the logging collector
> > during a database restart (because it's impossible to reproduce the
> > original stderr destination if we do).
>
> True and it seems this is the reason for issue we are discussing here.
> The reason why this happens is that during creation of shared memory
> (PGSharedMemoryCreate()), we duplicate the handle such that it
> become inheritable by all child processes.  Then during fork
> (syslogger_forkexec()->postmaster_forkexec()->internal_forkexec) we
> always inherit the handles which causes syslogger to get a copy of
> shared memory handle which it neither uses and nor closes it.
>
> I could easily reproduce the issue if logging collector is on and even if
> we try to increase the loop count or sleep time in PGSharedMemoryCreate(),
> it doesn't change the situation as the syslogger has a valid handle to
> shared memory.  One way to fix is to just close the shared memory handle
> in sys logger as we are not going to need it and attached patch which does
> this fixes the issue for me.  Another invasive fix in case we want to
> retain shared memory handle for some purpose (future requirement) could
> be to send some signal to syslogger in restart path so that it can release
> the shared memory handle.
>
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> Specified patch with "ifdef WIN32" is working for me. Maybe it’s necessary
> to check open handlers from replication for example?
>
>
Assuming the problem will be fixed, should we release Beta2 soon ?



>
> --
> Dmitry Vasilyev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: [HACKERS] bugs and bug tracking

2015-10-08 Thread Oleg Bartunov
On Thu, Oct 8, 2015 at 8:11 PM, Nathan Wagner  wrote:

> On Wed, Oct 07, 2015 at 03:06:50PM -0400, Stephen Frost wrote:
> > * Nathan Wagner (nw...@hydaspes.if.org) wrote:
> > > I have added full text searching to my tracker.  I only index the first
> > > 50 KB of each message.  There's apparently a one MB limit on that
> > > anyway, which a few messages exceed.  I figure anything important is
> > > probably in the first 50KB.  I could be wrong.  I could re-index fairly
> > > easily.  It seems to work pretty well.
>

we have a patch, which eliminates 1MB limit, will be published soon.


> >
> > Note that we have FTS for the -bugs, and all the other, mailing lists..
>
> True, but that finds emails.  The search I have finds bugs (well, bug
> reports
> anyway).  Specifically, I have the following function:
>
> create or replace function bugvector(bugid bigint)
> returns tsvector language 'sql' as $$
> select tsvagg(
> setweight(to_tsvector(substr(body(msg), 1, 50*1024)), 'D')
> ||
> setweight(to_tsvector(header_value(msg, 'Subject')), 'C')
> )
> from emails
> where bug = $1
> $$ strict;
>
> which, as you can see, collects into one tsvector all the emails associated
> with that particular bug.  So a search hit is for the whole bug.  There's
> probably some search artifacts here.  I suspect a bug with a long email
> thread
> will be ranked higher than a one with a short thread.  Perhaps that's ok
> though.
>
>
it's possible to write bugs specific parser for fts. Also, order results by
date submitted, so we always will have originated message first.



> --
> nw
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] No Issue Tracker - Say it Ain't So!]

2015-10-05 Thread Oleg Bartunov
On Mon, Oct 5, 2015 at 3:08 AM, Nathan Wagner  wrote:

> On Sun, Oct 04, 2015 at 04:30:49PM -0700, Josh Berkus wrote:
> > That would be the key part, wouldn't it?  Nice that you have [code to
> > store and parse email messages].
>
> Yeah.  It actually made most of the work pretty easy.  It's available
> with a bunch of other code at https://pd.if.org/git/ if anyone wants it.
> I did find a bug in my header processing though, so I'll need to commit
> that fix.
>
> > We'd also want a way to link a bug fix to a commit, and probably a way
> > to give the bug a list of searchable keywords (and add to that list).
>
> I've been thinking of hooking it up to the fti machinery and providing
> a search box.  I've never really used fti before, so this might be a
> good opportunity to learn it for real.
>

Nathan, there are many options in our fts, which can be useful, so +1 to
help you.



>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-10-02 Thread Oleg Bartunov
On Tue, Sep 29, 2015 at 5:55 PM, Steve Crawford <
scrawf...@pinpointresearch.com> wrote:

> On Tue, Sep 29, 2015 at 7:16 AM, David Fetter  wrote:
>
>> ...What we're not fine with is depending on a proprietary system, no
>> matter what type of license, as infrastructure...
>>
>>
> Exactly. Which is why I was warning about latching onto features only
> available in the closed enterprise version.
>
> Cheers,
> Steve
>
>


If we have consensus of what we want, why not just hire some company to
develop it for us ? I'm sure we could find such a company in Russia and
even would sponsor postgres community and pay for the development.  There
are other postgres companies, which may join us.  Or better,  pay through
pg foundation.


  1   2   3   4   5   6   7   8   9   10   >