[HACKERS] Vacuuming big btree indexes without pages with deleted items

2015-03-27 Thread Vladimir Borodin
Hi all.

I have described [0] a problem with delaying replicas after vacuuming a 
relation with big btree index. It stucks in replaying WAL record of type 
XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0):

rmgr: Btree   len (rec/tot): 20/52, tx:  0, lsn: 
4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel 
1663/16420/16796; blk 31222118, lastBlockVacuumed 0

Master writes this record to xlog in btvacuumscan [1] function after vacuuming 
of all index pages. And in case of no pages with deleted items xlog record 
would contain lastBlockVacuumed 0.

In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed to 
last block of the index while applying this record because there is no api in 
the buffer manager to understand if the page is unpinned. 

So if the index is quite big (200+ GB in described case) it takes much time to 
do it. So the questions are:

1. Aren’t there still any api in buffer manager to understand that the page is 
not in shared_buffers without reading it?
2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some 
cases? For example, in case of not deleting any pages.

Or maybe there are some better ways of improving this situation?

[0] 
http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name
[1] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813
[2] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482

--
May the force be with you…
https://simply.name



[HACKERS] Implementing a join algorithm in Postgres

2015-03-27 Thread Ravi Kiran
Hi,

I have written a C program which reads from 3 files(Each file is table
having 2 columns and thousands of rows).The program is used to join those 3
tables and the algorithm which I have written will work only for those 3
files. Now I want to test this program for postgres. Can someone tell me
how to do that, where should I include this program so that I can
successfully implement that program in postgres.

Thank you.

-- 
Regards,

K.Ravikiran

ᐧ


Re: [HACKERS] WIP Patch for GROUPING SETS phase 1

2015-03-27 Thread Svenne Krap

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 18-03-2015 17:18, Svenne Krap wrote:

 I still need to check against the standard and I will run it against a
non-trivival production load... hopefully I will finish up my review
shortly after the weekend...

I am still on it, but a little delayed. I hope to get it done this weekend.

Svenne




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-27 Thread Haribabu Kommi
On Fri, Mar 13, 2015 at 1:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 3/4/15 1:34 AM, Haribabu Kommi wrote:
 On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 + foreach(line, parsed_hba_lines)

 In the above for loop it is better to add check_for_interrupts to
 avoid it looping
 if the parsed_hba_lines are more.

 Updated patch is attached with the addition of check_for_interrupts in
 the for loop.

 I tried out your latest patch.  I like that it updates even in running
 sessions when the file is reloaded.

Thanks for the review. Sorry for late reply.

 The permission checking is faulty, because unprivileged users can
 execute pg_hba_settings() directly.

corrected.

 Check the error messages against the style guide (especially
 capitalization).

corrected.

 I don't like that there is a hard-coded limit of 16 options 5 pages away
 from where it is actually used.  That could be done better.

changed to 12 instead of 16.

 I'm not sure about the name pg_hba_settings.  Why not just pg_hba or
 pg_hba_conf if you're going for a representation that is close to the
 file (as opposed to pg_settings, which is really a lot more abstract
 than any particular file).

changed to pg_hba_conf.

 I would put the line_number column first.

changed.

 I continue to think that it is a serious mistake to stick special values
 like 'all' and 'replication' into the arrays without additional
 decoration.  That makes this view useless or at least dangerous for
 editing tools or tools that want to reassemble the original file.
 Clearly at least one of those has to be a use case.  Otherwise we can
 just print out the physical lines without interpretation.

It is possible to provide more than one keyword for databases or users.
Is it fine to use the text array for keyword databases and keyword users.

 The mask field can go, because address is of type inet, which can
 represent masks itself.  (Or should it be cidr then?  Who knows.)  The
 preferred visual representation of masks in pg_hba.conf has been
 address/mask for a while now, so we should preserve that.
 Additionally, you can then use the existing inet/cidr operations to do
 things like checking whether some IP address is contained in an address
 specification.

removed.

 I can't tell from the documentation what the compare_method field is
 supposed to do.  I see it on the code, but that is not a natural
 representation of pg_hba.conf.  In fact, this just supports my earlier
 statement.  Why are special values in the address field special, but not
 in the user or database fields?

 uaImplicitReject is not a user-facing authentication method, so it
 shouldn't be shown (or showable).

removed.

 I would have expected options to be split into keys and values.

 All that code to reassemble the options from the parsed struct
 representation seems crazy to me.  Surely, we could save the strings as
 we parse them?

I didn't get this point clearly. Can you explain it a bit more.

 I can't make sense of the log message pg_hba.conf not reloaded,
 pg_hba_settings may show stale information.  If load_hba() failed, the
 information isn't stale, is it?

 In any case, printing this to the server log seems kind of pointless,
 because someone using the view is presumably doing so because they can't
 or don't want to read the server log.  The proper place might be a
 warning when the view is actually called.

Changed accordingly as if the reload fails, further selects on the
view through a warning
as view data may not be proper like below.

There was some failure in reloading pg_hba.conf file. The pg_hba.conf
settings data may contains stale information

Here I attached latest patch with the fixes other than keyword columns.
I will provide the patch with keyword columns and documentation changes later.

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pluggable Parser

2015-03-27 Thread Rajeev rastogi
What is Pluggable Parser:
It is an option for users to select a different kind of parser to evaluate 
PostgreSQL for their business logic without much manual effort.

Why do we need?
As of now migration from other databases to PostgreSQL requires manual effort 
of translating SQL  PL/SQL to PostgreSQL equivalent SQL queries. Because of 
this manual effort in converting scripts, migration to PostgreSQL considered to 
be very slow and sometime  de-motivating also.  So if we allow to plug 
different database syntaxes with PostgreSQL, then it will be one of the strong 
motivating result for many DBAs to try PostgreSQL.

How to Do?
This can be realized by supporting new SQL/Stored procedures syntaxes in the 
parser corresponding to each other database syntax and plug the one needs to be 
evaluated by putting the corresponding so file in dynamic_library_path. 
Default will be PostgreSQL syntax.

Parser Interface:
/* Hook for plugins to get control in Parser */
typedef List * (*parser_hook_type) const char *str);
parser_hook_type parser_hook = NULL;
extern PGDLLIMPORT parser_hook_type parser_hook;

Make the parser entry point as function pointer (raw_parser); which can be 
loaded based on parser type. The parse_hook will be initialized with proper 
function during shared library loading, which is done only during server 
startup. By this way it can be ensured as only one parser which is provided by 
the user is used in the PostgreSQL.

Each Database syntax related parser can be implemented as part 
of contrib module to keep it separate from the core code.

To start with, I am planning to (For 2015-06 commitFest):

1.   Support infrastructure to allow plugging.

2.   Support of ORACLE SQL query syntaxes.


Please let me know if community will be interested in this or if there were 
already any discussion about this in past?

Please provide your opinion/suggestion.

Thanks and Regards,
Kumar Rajeev Rastogi


Re: [HACKERS] WIP: Split of hash index bucket

2015-03-27 Thread Antonin Houska
Antonin Houska a...@cybertec.at wrote:
 I'm still testing it. especially the concurrent access. There are probably
 bugs in the code, but it can help understand what I mean.

I've traced the most important cases of concurrent insertion into a bucket
split of which is in progress. A few related bugs fixed. Some tool to check
the index structure is needed yet, before performance testing makes
sense. That might include enhancement of contrib/pageinspect.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
new file mode 100644
index 24b06a5..149bbcf
*** a/src/backend/access/hash/hash.c
--- b/src/backend/access/hash/hash.c
*** loop_top:
*** 572,577 
--- 572,599 
  			opaque = (HashPageOpaque) PageGetSpecialPointer(page);
  			Assert(opaque-hasho_bucket == cur_bucket);
  
+ 			/*
+ 			 * If the bucket participates in a split, give up.
+ 			 *
+ 			 * (Unlike the metapage copy, the flags at bucket level should
+ 			 * always be up-to-date.)
+ 			 *
+ 			 * TODO
+ 			 *
+ 			 * 1. Analyze if both buckets participating in the split impose
+ 			 * too severe restrictions, and if it makes sense to introduce
+ 			 * separate flags for old and new bucket. Also, would such a
+ 			 * restricted VACUUM still make sense?
+ 			 *
+ 			 * 2. Consider how statistics should reflect the fact that some
+ 			 * buckets are skipped because of split.
+ 			 */
+ 			if (opaque-hasho_flag  LH_BUCKET_SPLIT)
+ 			{
+ _hash_relbuf(rel, buf);
+ break;
+ 			}
+ 
  			/* Scan each tuple in page */
  			maxoffno = PageGetMaxOffsetNumber(page);
  			for (offno = FirstOffsetNumber;
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
new file mode 100644
index 63aaec9..de445c9
*** a/src/backend/access/hash/hashinsert.c
--- b/src/backend/access/hash/hashinsert.c
*** _hash_doinsert(Relation rel, IndexTuple
*** 37,42 
--- 37,43 
  	Page		page;
  	HashPageOpaque pageopaque;
  	Size		itemsz;
+ 	uint16  buckets_total;
  	bool		do_expand;
  	uint32		hashkey;
  	Bucket		bucket;
*** _hash_doinsert(Relation rel, IndexTuple
*** 123,129 
  		 */
  		BlockNumber nextblkno = pageopaque-hasho_nextblkno;
  
! 		if (BlockNumberIsValid(nextblkno))
  		{
  			/*
  			 * ovfl page exists; go get it.  if it doesn't have room, we'll
--- 124,131 
  		 */
  		BlockNumber nextblkno = pageopaque-hasho_nextblkno;
  
! 		if (BlockNumberIsValid(nextblkno) 
! 			!(pageopaque-hasho_flag  LH_BUCKET_SPLIT_LAST))
  		{
  			/*
  			 * ovfl page exists; go get it.  if it doesn't have room, we'll
*** _hash_doinsert(Relation rel, IndexTuple
*** 136,142 
  		else
  		{
  			/*
! 			 * we're at the end of the bucket chain and we haven't found a
  			 * page with enough room.  allocate a new overflow page.
  			 */
  
--- 138,145 
  		else
  		{
  			/*
! 			 * we're at the end of the bucket chain, or (during a split) right
! 			 * before redirection to the old bucket, and we haven't found a
  			 * page with enough room.  allocate a new overflow page.
  			 */
  
*** _hash_doinsert(Relation rel, IndexTuple
*** 151,157 
  			Assert(PageGetFreeSpace(page) = itemsz);
  		}
  		pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
! 		Assert(pageopaque-hasho_flag == LH_OVERFLOW_PAGE);
  		Assert(pageopaque-hasho_bucket == bucket);
  	}
  
--- 154,160 
  			Assert(PageGetFreeSpace(page) = itemsz);
  		}
  		pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
! 		Assert(pageopaque-hasho_flag  LH_OVERFLOW_PAGE);
  		Assert(pageopaque-hasho_bucket == bucket);
  	}
  
*** _hash_doinsert(Relation rel, IndexTuple
*** 173,180 
  	metap-hashm_ntuples += 1;
  
  	/* Make sure this stays in sync with _hash_expandtable() */
! 	do_expand = metap-hashm_ntuples 
! 		(double) metap-hashm_ffactor * (metap-hashm_maxbucket + 1);
  
  	/* Write out the metapage and drop lock, but keep pin */
  	_hash_chgbufaccess(rel, metabuf, HASH_WRITE, HASH_NOLOCK);
--- 176,184 
  	metap-hashm_ntuples += 1;
  
  	/* Make sure this stays in sync with _hash_expandtable() */
! 	buckets_total = metap-hashm_maxbucket + 1 + metap-hashm_split_count;
! 	do_expand = metap-hashm_split_count  HASH_MAX_SPLITS 
! 		metap-hashm_ntuples  (double) metap-hashm_ffactor * buckets_total;
  
  	/* Write out the metapage and drop lock, but keep pin */
  	_hash_chgbufaccess(rel, metabuf, HASH_WRITE, HASH_NOLOCK);
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
new file mode 100644
index b775164..4345f29
*** a/src/backend/access/hash/hashovfl.c
--- b/src/backend/access/hash/hashovfl.c
***
*** 21,27 
  #include utils/rel.h
  
  
- static Buffer _hash_getovflpage(Relation rel, Buffer metabuf);
  static uint32 _hash_firstfreebit(uint32 

[HACKERS] GSoC 2015: SP-GIST for geometrical objects

2015-03-27 Thread Dima Ivanovskiy

Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology

Abstract:
I chose project Indexing prolonged geometrical objects (i.e. boxes, circles, 
polygons, not points) with SP-GiST by mapping to 4d-space. 
According to the presentation
https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf
SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical 
data types: 
box, circle, polygon with operators:|   | @ @ @ | | ~ ~=
Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of 
geometrical features.

Project details:
After meeting with Alexander Korotkov, I wrote some plan. 
Using of K-D-tree and Quadtree in building index for geometrical data types can 
increase speed of search in some cases.
The main idea is representing 2-D geometrical objects in their bounding box. 
Set of 2-D boxes is 4-D space. 
New _ops will work with points from 4-D space, for example kd_box_ops, 
quad_circle_ops and will support all geometrical operators. 
After conversion object to their bounding box algo has set of tuples (x1, y1, 
x2, y2). 
Our goal is separate this space the most equally. If we talk about K-D-tree, on 
first step K-D-tree algorithm will split space in 2 parts by the first 
coordinate, in next step by the second coordinate etc., after 4-th coordinate 
we repeat this procedure. 
At the end we have index at geometrical objects and use traversal tree for 
every search operator. 

Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I 
will transfer this realization to other type of tree.

Of cource, I assume that SP-GIST can be not the best decision of this problem. 
So after testing this clear methods, I will try to find more effective way. 
Maybe with using combination of different spatial tree structures.

Project Schedule:

until May 25

Read documentation and source code, clarify details of implementation.

1st month

Implement new '_ops' with all geometrical operators for box, circle, polygon

2nd month

Research new methods for increase speed of geometrical query

3rd month

Final refactoring, testing and submitting a patch.

Links:
http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST
https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes
http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes
http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with 
geo objects



Re: [HACKERS] Pluggable Parser

2015-03-27 Thread Tom Lane
Rajeev rastogi rajeev.rast...@huawei.com writes:
 What is Pluggable Parser:
 It is an option for users to select a different kind of parser to evaluate 
 PostgreSQL for their business logic without much manual effort.

 Why do we need?
 As of now migration from other databases to PostgreSQL requires manual effort 
 of translating SQL  PL/SQL to PostgreSQL equivalent SQL queries. Because of 
 this manual effort in converting scripts, migration to PostgreSQL considered 
 to be very slow and sometime  de-motivating also.  So if we allow to plug 
 different database syntaxes with PostgreSQL, then it will be one of the 
 strong motivating result for many DBAs to try PostgreSQL.

While I don't have any strong reason to object to putting a hook where you
suggest, I think that the above represents an enormous oversale of the
benefits, which in actual fact are likely to be near zero.  Replacing
gram.y as you suggest will not allow more than the most trivial, cosmetic
grammar changes, because you'd still have to produce the same raw parse
trees as before.

Alternatively you could consider replacing both raw_parser() and
parse_analyze(), but then you're talking about maintaining a duplicate
copy of just about the whole of src/backend/parser/, which doesn't sound
terribly practical; it certainly would be unpleasant to maintain such a
thing across multiple PG releases.  And there will still be pretty strong
constraints on whether you could implement say a MySQL-workalike, because
you're still having to work with the Postgres execution engine.

In short, this proposal sounds a lot like a solution looking for a
problem.  I think it would be useful for you to pick a small number of
concrete alternative-grammar problems and think about how you could
provide hooks that would allow solving those issues without duplicating
90% of src/backend/parser/ first.

Here is an example of such a concrete problem: Oracle, I believe, uses
parentheses () rather than brackets [] for array subscripting expressions.
How would you allow that notation in Postgres?  Your original proposal
cannot do it because it's not a problem of the raw grammar, but rather
one for parse analysis --- you have to disambiguate array references
from function calls at the time of doing semantic analysis.

Another example you might consider is Oracle-style outer join notation,
which again will require significant surgery in parse analysis, not
just modifying the raw grammar.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug fix for missing years in make_date()

2015-03-27 Thread Pavel Stehule
2015-03-26 23:26 GMT+01:00 David Fetter da...@fetter.org:

 Folks,

 For reasons unclear, dates before the Common Era are disallowed in
 make_date(), even though about 2/3 of the underlying data type's range
 up until the present time fits that description.

 Please find attached a patch fixing same.


+1

Pavel



 Cheers,
 David.
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: david.fet...@gmail.com

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] GSoC 2015: SP-GIST for geometrical objects

2015-03-27 Thread Arthur Silva
On Mar 27, 2015 11:08 AM, Dima Ivanovskiy dima...@mail.ru wrote:

 Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology

 Abstract:

 I chose project Indexing prolonged geometrical objects (i.e. boxes,
circles, polygons, not points) with SP-GiST by mapping to 4d-space.
 According to the presentation
 https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf
 SP-GIST 3 times faster than GiST in some cases. But GIST supports
geometrical data types:
 box, circle, polygon with operators:|   | @ @ @ |
| ~ ~=
 Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot
of geometrical features.

 Project details:

 After meeting with Alexander Korotkov, I wrote some plan.
 Using of K-D-tree and Quadtree in building index for geometrical data
types can increase speed of search in some cases.
 The main idea is representing 2-D geometrical objects in their bounding
box. Set of 2-D boxes is 4-D space.
 New _ops will work with points from 4-D space, for example kd_box_ops,
quad_circle_ops and will support all geometrical operators.
 After conversion object to their bounding box algo has set of tuples (x1,
y1, x2, y2).
 Our goal is separate this space the most equally. If we talk about
K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by
the first coordinate, in next step by the second coordinate etc., after
4-th coordinate we repeat this procedure.
 At the end we have index at geometrical objects and use traversal tree
for every search operator.

 Postgresql has already has realization ideas of MBR in gist/gistproc.c.
So I will transfer this realization to other type of tree.

 Of cource, I assume that SP-GIST can be not the best decision of this
problem. So after testing this clear methods, I will try to find more
effective way. Maybe with using combination of different spatial tree
structures.

 Project Schedule:

 until May 25

 Read documentation and source code, clarify details of implementation.

 1st month

 Implement new '_ops' with all geometrical operators for box, circle,
polygon

 2nd month

 Research new methods for increase speed of geometrical query

 3rd month

 Final refactoring, testing and submitting a patch.


 Links:

 http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about
GIST
 https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes
 http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes
 http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working
with geo objects

Nice proposal.

Dynamic Kdtrees can perform badly as the splitting median can get way off
as updates are coming. What are your thoughts about that?

Also what's up with the 4d space? I don't quite get it. These types are 2
or 3 dimensions.


[HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );

2015-03-27 Thread Fabrízio de Royes Mello
Hi all,

I'm tweaking some autovacuum settings in a table with high write usage but
with ALTER TABLE .. SET ( .. ) this task was impossible, so I did a catalog
update  (pg_class) to change reloptions.

Maybe it's a stupid doubt, but why we need to get an AccessExclusiveLock on
relation to set reloptions if we just touch in pg_class tuples
(RowExclusiveLock) ?

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] GSoC 2015 proposal. Bitmap Index-only Count

2015-03-27 Thread Alexander Korotkov
On Wed, Mar 25, 2015 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Anastasia Lubennikova lubennikov...@gmail.com writes:
  2015-03-24 18:01 GMT+04:00 Tom Lane t...@sss.pgh.pa.us:
  I wonder whether it'd be possible to teach GIN to support index_getnext
  instead.  Initially it would probably work only for cases where the
  index didn't have to return any columns ... but if we did it, maybe the
  door would be open to cases where GIN could reconstruct actual values.

  Another idea is to write index_getnext() for GIN which would return some
  fake tuple. Since there is no difference for COUNT aggregate what the
 tuple
  contains. COUNT just wants to know whether we have tuple that satisfy the
  qual.

 Well, yeah, that would be the idea (at least initially).  You don't have
 to return any real data unless you claim you can do so via amcanreturn.
 The planner is still capable of selecting an index-only scan as long as
 the query retrieves no columns.

 The trick would be to not return the same heap TID more than once per
 scan.  A zero-order implementation would be to construct the same bitmap
 we do now and then just provide a gingetnext function that scans through
 that.  That would be pretty awful in terms of scan startup time, so doing
 better would be nice; but perhaps it would be useful even in that form.


My ideal picture for FTS using GIN looks like this:

1) Have lexemes offsets in GIN stored with item pointers.
2) Calculate relevance using only GIN information without using heap.
3) Sort results by relevance in either GIN itself or executor node.
4) Get both TOP-N most relevant rows and total rows count (using index-only
scan) from single GIN index scan.

Implementing index_getnext() for GIN looks step forward for me because it
allows index only count and potentially could be used for ordered output.
However, it's unclear for me if it's feasible to have #4? Could we return
TOP-N results and total count from single GIN index scan?

--
With best regards,
Alexander Korotkov.


[HACKERS] Re[2]: [HACKERS] GSoC 2015: SP-GIST for geometrical objects

2015-03-27 Thread Dima Ivanovskiy

On Mar 27, 2015 11:08 AM, Dima Ivanovskiy  dima...@mail.ru  wrote:

 Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology

 Abstract:

 I chose project Indexing prolonged geometrical objects (i.e. boxes, 
 circles, polygons, not points) with SP-GiST by mapping to 4d-space. 
 According to the presentation
  https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf
 SP-GIST 3 times faster than GiST in some cases. But GIST supports 
 geometrical data types: 
 box, circle, polygon with operators:|   | @ @ @ | | 
 ~ ~=
 Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of 
 geometrical features.

 Project details:

 After meeting with Alexander Korotkov, I wrote some plan. 
 Using of K-D-tree and Quadtree in building index for geometrical data types 
 can increase speed of search in some cases.
 The main idea is representing 2-D geometrical objects in their bounding box. 
 Set of 2-D boxes is 4-D space. 
 New _ops will work with points from 4-D space, for example kd_box_ops, 
 quad_circle_ops and will support all geometrical operators. 
 After conversion object to their bounding box algo has set of tuples (x1, 
 y1, x2, y2). 
 Our goal is separate this space the most equally. If we talk about K-D-tree, 
 on first step K-D-tree algorithm will split space in 2 parts by the first 
 coordinate, in next step by the second coordinate etc., after 4-th 
 coordinate we repeat this procedure. 
 At the end we have index at geometrical objects and use traversal tree for 
 every search operator. 

 Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I 
 will transfer this realization to other type of tree.

 Of cource, I assume that SP-GIST can be not the best decision of this 
 problem. So after testing this clear methods, I will try to find more 
 effective way. Maybe with using combination of different spatial tree 
 structures.

 Project Schedule:

 until May 25

 Read documentation and source code, clarify details of implementation.

 1st month

 Implement new '_ops' with all geometrical operators for box, circle, polygon

 2nd month

 Research new methods for increase speed of geometrical query

 3rd month

 Final refactoring, testing and submitting a patch.


 Links:

  http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST
  https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes
  http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes
  http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working 
 with geo objects

Nice proposal. 
Dynamic Kdtrees can perform badly as the splitting median can get way off as 
updates are coming. What are your thoughts about that? 
Also what's up with the 4d space? I don't quite get it. These types are 2 or 3 
dimensions. 
I read spgist README  one more time . I didn't find  the mechanism for 
maintaining good balance after updates.
I think we can use Bkd-Tree,  
https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf . But It can 
be not the best solving.
I include Research time in 2nd month of timeline.

About 4d space. All these types are 2 dimensional.
Just as i n R-tree object is approximated by MBR. MBR for 2d-objects can be 
mapped to 4d-point. More general, nd-object MBR can be mapped into 2nd-point. 




[HACKERS] Rounding to even for numeric data type

2015-03-27 Thread Michael Paquier
Hi all,

A couple of days ago a bug has showed up regarding rounding of float here:
http://www.postgresql.org/message-id/flat/20150320194337.2573.72...@wrigleys.postgresql.org#20150320194337.2573.72...@wrigleys.postgresql.org
The result being that the version of rint() shipped in src/port was
not IEEE compliant when rounding to even (MSVC  2013 at least using
it), leading to inconsistent results depending on if the platform uses
src/port's rint() or the platform's one.

During this thread, Tom has raised as well that rounding for numeric
is not that IEEE-compliant:
http://www.postgresql.org/message-id/22366.1427313...@sss.pgh.pa.us

For example:
=# SELECT round(2.5::numeric), round(1.5::numeric),
round(0.5::numeric), round(-2.5::numeric);
 round | round | round | round
---+---+---+---
 3 | 2 | 1 |-3
(1 row)
=# SELECT round(2.5::float), round(1.5::float), round(0.5::float),
round(-2.5::float);
 round | round | round | round
---+---+---+---
 2 | 2 | 0 |-2
(1 row)

It sounds appealing to switch the default behavior to something that
is more IEEE-compliant, and not only for scale == 0. Now one can argue
as well that changing the default is risky for existing applications,
or the other way around that other RDBMs (?) are more compliant than
us for their equivalent numeric data type, and people get confused
when switching to Postgres.
An idea, from Dean, would be to have a new specific version for
round() able to do compliant IEEE rounding to even as well...

Opinions?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans with btree_gist

2015-03-27 Thread Andreas Karlsson

On 03/26/2015 10:31 PM, Heikki Linnakangas wrote:

I've pushed Anastasia's patch to support index-only scans with GiST, and
it's time to add opclasses support for all the opclasses that are not
lossy. I think at least all the btree_gist opclasses need to be
supported, it would be pretty surprising if they didn't support
index-only scans, while some more complex opclasses did.

Attached is a WIP patch for that. It covers all the varlen types that
btree_gist supports, and int2, int4 and oid. The rest of the fixed-width
types should be just a matter of copy-pasting. I'll continue adding
those, but wanted to let people know I'm working on this.


Would it also be worth doing the same for the inet_ops class for 
inet/cidr? I have hacked a quick WIP patch which I believe should work, 
but have not looked into the index only scan code enough to be sure.


--
Andreas Karlsson
diff --git a/src/backend/utils/adt/network_gist.c b/src/backend/utils/adt/network_gist.c
index 14dd62b..0133032 100644
--- a/src/backend/utils/adt/network_gist.c
+++ b/src/backend/utils/adt/network_gist.c
@@ -105,6 +105,27 @@ typedef struct GistInetKey
 #define SET_GK_VARSIZE(dst) \
 	SET_VARSIZE_SHORT(dst, offsetof(GistInetKey, ipaddr) + gk_ip_addrsize(dst))
 
+Datum
+inet_gist_fetch(PG_FUNCTION_ARGS)
+{
+	GISTENTRY	*entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+	GistInetKey	*key = DatumGetInetKeyP(entry-key);
+	GISTENTRY	*retval;
+	inet		*dst;
+
+	dst = (inet *) palloc0(sizeof(inet));
+
+	ip_family(dst) = gk_ip_family(key);
+	ip_bits(dst) = gk_ip_minbits(key);
+	memcpy(ip_addr(dst), gk_ip_addr(key), ip_addrsize(dst));
+	SET_INET_VARSIZE(dst);
+
+	retval = palloc(sizeof(GISTENTRY));
+	gistentryinit(*retval, InetPGetDatum(dst), entry-rel, entry-page,
+  entry-offset, FALSE);
+
+	PG_RETURN_POINTER(retval);
+}
 
 /*
  * The GiST query consistency check
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 612a9d2..3b87f90 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -411,6 +411,7 @@ DATA(insert (	3550   869	869  4 3556 ));
 DATA(insert (	3550   869	869  5 3557 ));
 DATA(insert (	3550   869	869  6 3558 ));
 DATA(insert (	3550   869	869  7 3559 ));
+DATA(insert (   3550   869  869  9 3573 ));
 
 /* sp-gist */
 DATA(insert (	3474   3831 3831 1 3469 ));
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 77b7717..a96d369 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2240,6 +2240,8 @@ DATA(insert OID = 3555 (  inet_gist_compress	PGNSP PGUID 12 1 0 0 0 f f f f t f
 DESCR(GiST support);
 DATA(insert OID = 3556 (  inet_gist_decompress	PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 2281 _null_ _null_ _null_ _null_ inet_gist_decompress _null_ _null_ _null_ ));
 DESCR(GiST support);
+DATA(insert OID = 3573 (  inet_gist_fetch		PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 2281 _null_ _null_ _null_ _null_ inet_gist_fetch _null_ _null_ _null_ ));
+DESCR(GiST support);
 DATA(insert OID = 3557 (  inet_gist_penalty		PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 2281 2281 2281 _null_ _null_ _null_ _null_ inet_gist_penalty _null_ _null_ _null_ ));
 DESCR(GiST support);
 DATA(insert OID = 3558 (  inet_gist_picksplit	PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 2281 2281 2281 _null_ _null_ _null_ _null_ inet_gist_picksplit _null_ _null_ _null_ ));
diff --git a/src/include/utils/inet.h b/src/include/utils/inet.h
index 6694688..2d2bae4 100644
--- a/src/include/utils/inet.h
+++ b/src/include/utils/inet.h
@@ -123,6 +123,7 @@ extern int	bitncommon(const unsigned char *l, const unsigned char *r, int n);
 /*
  * GiST support functions in network_gist.c
  */
+extern Datum inet_gist_fetch(PG_FUNCTION_ARGS);
 extern Datum inet_gist_consistent(PG_FUNCTION_ARGS);
 extern Datum inet_gist_union(PG_FUNCTION_ARGS);
 extern Datum inet_gist_compress(PG_FUNCTION_ARGS);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-27 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 It sounds appealing to switch the default behavior to something that
 is more IEEE-compliant, and not only for scale == 0. Now one can argue
 as well that changing the default is risky for existing applications,
 or the other way around that other RDBMs (?) are more compliant than
 us for their equivalent numeric data type, and people get confused
 when switching to Postgres.

 An idea, from Dean, would be to have a new specific version for
 round() able to do compliant IEEE rounding to even as well...

I think confining the change to round() would be a fundamental error.
The main reason why round-to-nearest-even is IEEE standard is that it
reduces error accumulation over long chains of calculations, such as
in numeric's power and trig functions; if we go to the trouble of
implementing such a behavior, we certainly want to use it there.

I think the concern over backwards compatibility here is probably
overblown; but if we're sufficiently worried about it, a possible
compromise is to invent a numeric_rounding_mode GUC, so that people
could get back the old behavior if they really care.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2015: SP-GIST for geometrical objects

2015-03-27 Thread Arthur Silva
On Mar 27, 2015 6:41 PM, Dima Ivanovskiy dima...@mail.ru wrote:


 On Mar 27, 2015 11:08 AM, Dima Ivanovskiy dima...@mail.ru wrote:
 
  Hello, I am Dmitrii, student of Moscow Institute of Physics and
Technology
 
  Abstract:
 
  I chose project Indexing prolonged geometrical objects (i.e. boxes,
circles, polygons, not points) with SP-GiST by mapping to 4d-space.
  According to the presentation
  https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf
  SP-GIST 3 times faster than GiST in some cases. But GIST supports
geometrical data types:
  box, circle, polygon with operators:|   | @ @ @
| | ~ ~=
  Popular spatial extension PostGIS doesn't include SP-GIST, but has a
lot of geometrical features.
 
  Project details:
 
  After meeting with Alexander Korotkov, I wrote some plan.
  Using of K-D-tree and Quadtree in building index for geometrical data
types can increase speed of search in some cases.
  The main idea is representing 2-D geometrical objects in their
bounding box. Set of 2-D boxes is 4-D space.
  New _ops will work with points from 4-D space, for example kd_box_ops,
quad_circle_ops and will support all geometrical operators.
  After conversion object to their bounding box algo has set of tuples
(x1, y1, x2, y2).
  Our goal is separate this space the most equally. If we talk about
K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by
the first coordinate, in next step by the second coordinate etc., after
4-th coordinate we repeat this procedure.
  At the end we have index at geometrical objects and use traversal tree
for every search operator.
 
  Postgresql has already has realization ideas of MBR in
gist/gistproc.c. So I will transfer this realization to other type of tree.
 
  Of cource, I assume that SP-GIST can be not the best decision of this
problem. So after testing this clear methods, I will try to find more
effective way. Maybe with using combination of different spatial tree
structures.
 
  Project Schedule:
 
  until May 25
 
  Read documentation and source code, clarify details of implementation.
 
  1st month
 
  Implement new '_ops' with all geometrical operators for box, circle,
polygon
 
  2nd month
 
  Research new methods for increase speed of geometrical query
 
  3rd month
 
  Final refactoring, testing and submitting a patch.
 
 
  Links:
 
  http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html -
about GIST
  https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes
  http://www.slideshare.net/profyclub_ru/o-lt - presentation about
indexes
  http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf -
working with geo objects
 
 Nice proposal.

 Dynamic Kdtrees can perform badly as the splitting median can get way
off as updates are coming. What are your thoughts about that?

 Also what's up with the 4d space? I don't quite get it. These types are
2 or 3 dimensions.


 I read spgist README one more time. I didn't find the mechanism for
maintaining good balance after updates.
 I think we can use Bkd-Tree,
https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf. But It
can be not the best solving.
 I include Research time in 2nd month of timeline.

 About 4d space. All these types are 2 dimensional.
 Just as in R-tree object is approximated by MBR. MBR for 2d-objects can
be mapped to 4d-point. More general, nd-object MBR can be mapped into
2nd-point.



The reason I said that is because you pointed performance as one
motivation  factor (and the lack of balancing properties can degrade
performance really fast on larger indexes).

The Bkd variant seems interesting but I don't think spgist provides enough
abstraction to implement it.

A bounding box can still be inserted/queried with a 2d kdtree so I don't
know why you call it 4d. I assume it's a matter of naming.

Overall the proposal seems totally doable, although I have doubts about its
usefulness. But I'm no one... You should have some feedback from the
mentors soon.


Re: [HACKERS] Rounding to even for numeric data type

2015-03-27 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom I think the concern over backwards compatibility here is probably
 Tom overblown; but if we're sufficiently worried about it, a possible
 Tom compromise is to invent a numeric_rounding_mode GUC, so that
 Tom people could get back the old behavior if they really care.

I only see one issue with this, but it's a nasty one: do we really want
to make all numeric operations that might do rounding stable rather than
immutable?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers