Re: [HACKERS] Patch for fast gin cache performance improvement

2013-10-12 Thread Ian Link

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?

Sounds like a good idea. Thanks for the review!
Ian Link



   	   
   	Etsuro Fujita  
  Thursday, October
 10, 2013 1:01 AM
  Ian Link wrote:
Although I asked this question, I've reconsidered about these
parameters, and it seems that these parameters not only make code
rather complex but are a little confusing to users.  So I'd like to propose
to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance
for the fast update technique set this parameter to a large value,
users that give weight not only to update performance but to search
performance set this parameter to a small value.  What do you think about
this?
I think it makes sense to maintain this separation. If the user doesn't need
a per-index setting, they don't have to use the parameter. Since the parameter
is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need fine-grained
control. We can document this and I don't think it will be confusing to the
user.

OK, though I'd like to hear the opinion of others.

4. In my understanding, the small value of
gin_fast_limit/fast_cache_size leads to the increase in GIN search
performance, which, however, leads to the decrease in GIN update
performance.  Am I right?  If so, I think the tradeoff should be noted in
the documentation.
I believe this is correct.

5. The following documents in Chapter 57. GIN Indexes need to be
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
Tricks
Sure, I can add something.

6. I would like to see the results for the additional test cases
(tsvectors).
I don't really have any good test cases for this available, and have very
limited
time for postgres at the moment. Feel free to create a test case, but I don't
believe I can at the moment. Sorry!

Unfortunately, I don't have much time to do such a thing, though I think we
should do that.  (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

7. The commented-out elog() code should be removed.
Sorry about that, I shouldn't have submitted the patch with those still there.

I should have a new patch soonish, hopefully. Thanks for your feedback!

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?

Thanks,

Best regards,
Etsuro Fujita



   	   
   	Ian Link  
  Monday, September
 30, 2013 3:09 PM
  

Hi Etsuro,
Sorry for the delay but I have been very busy with work. I have been 
away from postgres for a while, so I will need a little time to review 
the code and make sure I give you an informed response. I'll get back to
 you as soon as I am able. Thanks for understanding.
Ian Link 


  
   	   
   	Etsuro Fujita  
  Friday, September
 27, 2013 2:24 AM
  I wrote:
I had a look over this patch.  I think this patch is interesting and very
useful.
Here are my review points:

8. I think there are no issues in this patch.  However, I have one question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In
this case, in my understanding, this patch inserts new entries into the
pending
list temporarily and immediately moves them to the main GIN data structure
using
ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.

Sorry, There are incorrect expressions.  I mean gin_fast_limit  0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users.  So I'd like to propose to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value.  What do you think about this?

Thanks,

Best regards,
Etsuro Fujita



   	   
   	Etsuro Fujita  
  Thursday, 
September 26, 2013 6:02 AM
  Hi Ian,

This patch contains a performance improvement for the fast gin cache. As you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size
called
gin_fast_limit. This global variable controls the size of the fast gin cache,
independently of work_mem. Currently, the default

Re: [HACKERS] Patch for fast gin cache performance improvement

2013-10-08 Thread Ian Link



Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users.  So I'd like to propose to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value.  What do you think about this?
I think it makes sense to maintain this separation. If the user doesn't 
need a per-index setting, they don't have to use the parameter. Since 
the parameter is off by default, they don't even need to worry about it. 
There might as well be one parameter for users that don't need 
fine-grained control. We can document this and I don't think it will be 
confusing to the user.



4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance.  Am I right?  If so, I think the tradeoff should be
noted in the documentation.

I believe this is correct.

5. The following documents in Chapter 57. GIN Indexes need to be 
updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and Tricks

Sure, I can add something.


6. I would like to see the results for the additional test cases (tsvectors).
I don't really have any good test cases for this available, and have 
very limited time for postgres at the moment. Feel free to create a test 
case, but I don't believe I can at the moment. Sorry!



7. The commented-out elog() code should be removed.
Sorry about that, I shouldn't have submitted the patch with those still 
there.


I should have a new patch soonish, hopefully. Thanks for your feedback!
Ian

Ian Link wrote:

8. I think there are no issues in this patch.  However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fast gin cache performance improvement

2013-09-30 Thread Ian Link
Hi Etsuro,
Sorry for the delay but I have been very busy with work. I have been 
away from postgres for a while, so I will need a little time to review 
the code and make sure I give you an informed response. I'll get back to
 you as soon as I am able. Thanks for understanding.
Ian Link 


   	   
   	Etsuro Fujita  
  Friday, September
 27, 2013 2:24 AM
  I wrote:
I had a look over this patch.  I think this patch is interesting and very
useful.
Here are my review points:

8. I think there are no issues in this patch.  However, I have one question:
how this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In
this case, in my understanding, this patch inserts new entries into the
pending
list temporarily and immediately moves them to the main GIN data structure
using
ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.

Sorry, There are incorrect expressions.  I mean gin_fast_limit  0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users.  So I'd like to propose to introduce only one parameter:
fast_cache_size.  While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value.  What do you think about this?

Thanks,

Best regards,
Etsuro Fujita



   	   
   	Etsuro Fujita  
  Thursday, 
September 26, 2013 6:02 AM
  Hi Ian,

This patch contains a performance improvement for the fast gin cache. As you
may know, the performance of the fast gin cache decreases with its size.
Currently, the size of the fast gin cache is tied to work_mem. The size of
work_mem can often be quite high. The large size of work_mem is inappropriate
for the fast gin cache size. Therefore, we created a separate cache size
called
gin_fast_limit. This global variable controls the size of the fast gin cache,
independently of work_mem. Currently, the default gin_fast_limit is set to
128kB.
However, that value could need tweaking. 64kB may work better, but it's hard
to say with only my single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries improve
from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
it should be attached. I can look into additional test cases (tsvectors) if
anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it is
disabled. If the user does not specify a per-index limit, the index will
simply
use the global limit.

I had a look over this patch.  I think this patch is interesting and very
useful.  Here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did performance evaluation using your test queries with 64kB and 128kB of
gin_fast_limit (or fast_cache_size), and saw that both values achieved the
performance gains over gin_fast_limit = '256MB'.  64kB worked better than 128kB.
64kB improved from 1.057 ms to 0.075 ms.  Great!
4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance.  Am I right?  If so, I think the tradeoff should be
noted in the documentation.
5. The following documents in Chapter 57. GIN Indexes need to be updated:
 * 57.3.1. GIN Fast Update Technique
 * 57.4. GIN Tips and Tricks
6. I would like to see the results for the additional test cases (tsvectors).
7. The commented-out elog() code should be removed.
8. I think there are no issues in this patch.  However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0?  In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup().  Am I right?  If so, that is obviously inefficient.

Sorry for the delay.

Best regards,
Etsuro Fujita



   	   
   	Ian Link  
  Monday, June 17, 
2013 9:42 PM
  

  This

 patch contains a performance improvement for the fast gin cache. As you
 may know, the performance of the fast gin cache decreases with its 
size. Currently, the size of the fast gin cache is tied to work_mem. The
 size of work_mem can often be quite high. The large size of work_mem is
 inappropriate for the fast gin cache size. Therefore, we created a 
separate cache size called gin_fast_limit. This global variable controls
 the size of the fast gin cache, independently of work_mem. Currently, 
the default gin_fast_limit is set to 128kB. However, that value could 
need tweaking. 64kB may work better, but it's hard to say with only my 
single machine to test on.On

 my machine

Re: [HACKERS] Review: query result history in psql

2013-07-02 Thread Ian Link
Maciej - I can see your 
resistance to some kind of interactive mode. It would definitely be more
 code and create a less simple user interface. I would be perfectly 
happy if we left that part as it is now. 

However, I think it would be important to have a way of displaying the 
query history. Yes, you can search through your console backwards. You 
can still search through your old queries, but this would be a good 
alternative. What if the user reconnects after working for a while? 
Their old bash history might be gone. This would leave them with a big 
query history and no point of reference. Personally, I would find this 
feature very worthwhile. The query history wouldn't be crippled without 
it, but it would be a lot less flexible.


   	   
   	Pavel Stehule  
  Monday, July 01, 
2013 4:05 AM
  a idea is good,
 but I don't think, it can be useful with currentimplementation. How
 I can identify, what is correct answer for myquery? Have I remember
 twenty numbers and twenty queries?RegardsPavel
   	   
   	Maciej Gajewski  
  Monday, July 01, 
2013 4:01 AM
  When I tested this feature, I had 30 caches per 5 
minutes, and only a

few from these queries had a sense. Switch between off and on is not
user friendly. I believe so there can be other solution than mine, but
a possibility to friendly clean unwanted caches is necessary. If you know that you'll need the result of a query 
beforehand, you can always use SELECT ... INTO ... . No client-side 
features required.
This feature is intended for 
people running plenty of ad-hoc queries, when every result could 
potentially be useful.

  
   	   
   	Pavel Stehule  
  Monday, July 01, 
2013 1:31 AM
  2013/7/1 Maciej Gajewski maciej.gajews...@gmail.com:
I'm not really bought into some of the ideas.


but maybe some interactive mode should be usefull - so after
execution, and showing result, will be prompt if result should be
saved or not.
I like the idea, in addition to the ordinary mode. Personally, I would use
the ordinary mode, but I can see how 'interactive' would be useful.

This would require a complex change to the client code. And the result would
eventually become annoying: an interactive question after each and every
query. Currently, when turned on, every result is stored and simple
notification is printed.

.
When I tested this feature, I had 30 caches per 5 minutes, and only a
few from these queries had a sense. Switch between off and on is not
user friendly. I believe so there can be other solution than mine, but
a possibility to friendly clean unwanted caches is necessary.

 yes, the names :ans01, :ans02, ... miss semantics - How I can join

this name (and content) with some SQL query?
That makes sense. I think having part of / the whole query string would be
very helpful. Great suggestion!

The naming is obscure and non-informative, I agree. If you have a nice idea
how to make it better, I'd love to discuss it. But please remember that it
has one huge advantage: simplicity. The client is a classical command-line
tool, and as such it delegates some of the functionality to external
programs, like pager or terminal.


Personally, I don't see a strong price for all users without friendly
interface.

Regards

Pavel

I'm pretty sure that your terminal emulator has a 'find' function that would
allow you to quickly locate the variable and associated query in the
scrollback.

M


   	   
   	Maciej Gajewski  
  Monday, July 01, 
2013 1:23 AM
  I'm not really bought into some of the ideas.
but maybe some 
interactive mode should be usefull - so after

execution, and
 showing result, will be prompt if result should besaved or not.

I like the idea, in addition to the ordinary mode. 
Personally, I would use the ordinary mode, but I can see how 
'interactive' would be useful. 
This would require a complex change to the 
client code. And the result would eventually become annoying: an 
interactive question after each and every query. Currently, when turned 
on, every result is stored and simple notification is printed.
 

 yes, the 
names :ans01, :ans02, ... miss semantics - How I can join

this name (and
 content) with some SQL query?That makes 
sense. I think having part of / the whole query string would be very 
helpful. Great suggestion! 

The naming is obscure and non-informative, I agree. If you have
 a nice idea how to make it better, I'd love to discuss it. But please 
remember that it has one huge advantage: simplicity. The client is a 
classical command-line tool, and as such it delegates some of the 
functionality to external programs, like pager or terminal.
I'm pretty sure that your terminal
 emulator has a 'find' function that would allow you to quickly locate 
the variable and associated query in the scrollback.
M

  
   	   
   	ian link  
  Monday, July 01, 
2013 12:19 AM
  but maybe some 
interactive mode should be usefull - so after
execution, and
 showing result, will be prompt if result should besaved or not.
I

Re: [HACKERS] Review: query result history in psql

2013-07-02 Thread ian link

 I haven't been able to find any real documentation on this feature,
 other than the additions to the psql help.

You're right, I missed that in my review. I agree that it needs some more
documentation.

What is ans?

We covered that earlier in the email thread, but it's the current name for
the query history. I think we are going to change it to something a little
more descriptive. I was thinking qh for short or query-history.

I'm not sure if I'll be able to implement this feature any time soon,
 as I'm very busy at the moment and going for a business trip in few
 days.

I can try implementing the feature, I might have some time.




On Tue, Jul 2, 2013 at 5:36 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 7/2/13 5:53 AM, Maciej Gajewski wrote:
  In the meantime, I've applied your suggestions and moved the
  sting-manipulating functions to stringutils. Also fixed a tiny bug.
  Patch attached.

 I haven't been able to find any real documentation on this feature,
 other than the additions to the psql help.  Could someone write some
 mock documentation at least, so we know what the proposed interfaces and
 intended ways of use are?

 What is ans?




Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-07-02 Thread ian link
I'm fine with moving the operators over to functions. I just don't want to
implement anything that is against best practice. If we are OK with that
direction, I'll go ahead and start on the new patch.

Ian


On Mon, Jul 1, 2013 at 9:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Craig Ringer cr...@2ndquadrant.com writes:
  On 07/02/2013 02:39 AM, Robert Haas wrote:
  I'm actually
  not clear that it would be all that bad to assume fixed operator
  names, as we apparently do in a few places despite the existence of
  operator classes.  But if that is bad, then I don't know how using @+
  and @- instead helps anything.

  Personally I'm not clear why it's bad to reserve certain fundamental
  operators like '+' and '-', requiring that they have particular
 semantics.

 It is bad.  It's against project policy, not least because we have
 assorted *existing* datatypes for which obvious operator names like
 = do not have all the properties you might expect.

 If you need a more concrete example of why that sort of thinking is
 bad, you might consider the difference between  and ~~ for type text.
 If we hard-wired knowledge about operator behavior to operator names,
 it would be impossible for the system to understand that both of those
 operators represent sorting-related behaviors.

 Or to be even more concrete: if we allow RANGE to suppose that there's
 only one possible definition of + for a datatype, we're effectively
 supposing that there's only one possible sort ordering for that type.
 Which is already a wrong assumption, and has been since Postgres was
 still at Berkeley.  If you go this way, you won't be able to support
 both WINDOW ... ORDER BY foo USING  RANGE ... and WINDOW ... ORDER BY
 foo USING ~~ RANGE ... because you won't know which addition operator
 to apply.

 (And yeah, I'm aware that the SQL standard only expects RANGE to support
 sort keys that are of numeric, datetime, or interval type.  I would hope
 that we have higher expectations than that.  Even if we don't, it's not
 exactly hard to credit that people might have multiple ideas about how
 to sort interval values.)

 There are indeed still some places where we rely on operator names to
 mean something, but we need to get away from that idea not add more.
 Ideally, any property the system understands about an operator or
 function should be explicitly declared through opclass membership or
 some similar representation.  We've made substantial progress in that
 direction in the last fifteen years.  I don't want to reverse that
 progress in the name of minor expediencies, especially not ones that
 fail to support flexibility that has been in the system for a couple
 or three decades already.

 regards, tom lane



Re: [HACKERS] Review: query result history in psql

2013-07-02 Thread ian link

 I'm kinda not all that convinced that this feature does anything
 that's actually useful.  If you want to save your query results, you
 can just stick CREATE TEMP TABLE ans AS in front of your query.
 What does that not give you that this gives you?

Convenience. It auto-increments with each new query, giving you a different
temporary table for each query. Seems pretty helpful to me.


On Tue, Jul 2, 2013 at 6:16 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jul 2, 2013 at 8:16 PM, ian link i...@ilink.io wrote:
  We covered that earlier in the email thread, but it's the current name
 for
  the query history. I think we are going to change it to something a
 little
  more descriptive. I was thinking qh for short or query-history.
 
  I'm not sure if I'll be able to implement this feature any time soon,
  as I'm very busy at the moment and going for a business trip in few
  days.
 
  I can try implementing the feature, I might have some time.

 I'm kinda not all that convinced that this feature does anything
 that's actually useful.  If you want to save your query results, you
 can just stick CREATE TEMP TABLE ans AS in front of your query.
 What does that not give you that this gives you?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-07-01 Thread ian link
Definitely not this week. Hopefully for next commit fest.


On Sun, Jun 30, 2013 at 9:56 PM, Josh Berkus j...@agliodbs.com wrote:

 On 06/30/2013 08:54 PM, ian link wrote:
  I found some time and I think I am up to speed now. I finally figured out
  how to add new operator strategies and made a little test operator for
  myself.
 
  It seems pretty clear that assuming '+' and '-' are addition and
  subtraction is a bad idea. I don't think it would be too tricky to add
  support for new operator strategies. Andrew Gierth suggested calling
 these
  new strategies offset - and offset +, which I think describes it
 pretty
  well. I assigned the operator itself to be @+ and @- but that can
  obviously be changed. If this sounds like a good path to you guys, I will
  go ahead and implement the operators for the appropriate types. Please
 let
  me know if I am misunderstanding something - I am still figuring stuff
 out
  :)
 
  Aside from the opclass stuff, there were some other important issues
  mentioned with the original RANGE support. I think I will address those
  after the opclass stuff is done.

 Are these things you plan to get done this week, or for next CommitFest?

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] Review: query result history in psql

2013-07-01 Thread ian link

 but maybe some interactive mode should be usefull - so after
 execution, and showing result, will be prompt if result should be
 saved or not.

I like the idea, in addition to the ordinary mode. Personally, I would use
the ordinary mode, but I can see how 'interactive' would be useful.

 yes, the names :ans01, :ans02, ... miss semantics - How I can join

this name (and content) with some SQL query?

That makes sense. I think having part of / the whole query string would be
very helpful. Great suggestion!

Maciej, would you be able/have time to implement these? Or do you need any
help getting them done?



On Sun, Jun 30, 2013 at 11:35 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 2013/7/1 ian link i...@ilink.io:
  Not sure about all of your suggestions. Let me see if I can clarify what
  you're looking for.
 
 
   * simply decision if content should be stored in history or not,
 
  Do you mean that the user should use a flag to place the result of a
 query
  into the history?
  like:
  --ans SELECT * FROM cities...
  Not sure if that's what you mean, but it seems kind of unnecesary. They
 can
  just hit the \ans flag beforehand.

 switching off on is not user friendly

 but maybe some interactive mode should be usefull - so after
 execution, and showing result, will be prompt if result should be
 saved or not.

 some like:

 \ans interactive
  SELECT * FROM pg_proc;

  result 

 should be saved last result [y, n]?
  y
 result is saved in :ans22

 


 
  * simply remove last entry (table) of history
 
  That could be useful. What do you think Maciej?

 yes, lot of queries is just +/- experiment and you don't would store result

 
   * queries should be joined to content, only name is not enough
 
  Don't know what you mean. Could you try re-wording that?
 

 yes, the names :ans01, :ans02, ... miss semantics - How I can join
 this name (and content) with some SQL query?

 I needs to reverese search in SQL of stored caches, and I need a
 information

 ans01  SELECT * FROM pg_proc
 ans02  SELECT * FROM ans02 WHERE ...
 ans03 ...

 Regards

 Pavel

  Ian
 
 
 
  On Fri, Jun 28, 2013 at 8:49 AM, Pavel Stehule pavel.steh...@gmail.com
  wrote:
 
  Hello
 
  I am not sure, this interface is really user friendly
 
  there is not possible searching in history, and not every query push
  to history some interesting content.
 
  It require:
 
  * simply decision if content should be stored in history or not,
  * simply remove last entry (table) of history
  * queries should be joined to content, only name is not enough
 
  Regards
 
  Pavel
 
  2013/6/28 Maciej Gajewski maciej.gajews...@gmail.com:
   Thanks for checking the patch!
  
   So what's left to fix?
   * Moving the escaping-related functions to separate module,
   * applying your corrections.
  
   Did I missed anything?
  
   I'll submit corrected patch after the weekend.
  
   M
  
 
 



Re: [HACKERS] Review: query result history in psql

2013-06-30 Thread ian link
Not sure about all of your suggestions. Let me see if I can clarify what
you're looking for.


  * simply decision if content should be stored in history or not,

Do you mean that the user should use a flag to place the result of a query
into the history?
like:
--ans SELECT * FROM cities...
Not sure if that's what you mean, but it seems kind of unnecesary. They can
just hit the \ans flag beforehand.

* simply remove last entry (table) of history

That could be useful. What do you think Maciej?

 * queries should be joined to content, only name is not enough

Don't know what you mean. Could you try re-wording that?

Ian



On Fri, Jun 28, 2013 at 8:49 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 I am not sure, this interface is really user friendly

 there is not possible searching in history, and not every query push
 to history some interesting content.

 It require:

 * simply decision if content should be stored in history or not,
 * simply remove last entry (table) of history
 * queries should be joined to content, only name is not enough

 Regards

 Pavel

 2013/6/28 Maciej Gajewski maciej.gajews...@gmail.com:
  Thanks for checking the patch!
 
  So what's left to fix?
  * Moving the escaping-related functions to separate module,
  * applying your corrections.
 
  Did I missed anything?
 
  I'll submit corrected patch after the weekend.
 
  M
 



Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-30 Thread ian link
I found some time and I think I am up to speed now. I finally figured out
how to add new operator strategies and made a little test operator for
myself.

It seems pretty clear that assuming '+' and '-' are addition and
subtraction is a bad idea. I don't think it would be too tricky to add
support for new operator strategies. Andrew Gierth suggested calling these
new strategies offset - and offset +, which I think describes it pretty
well. I assigned the operator itself to be @+ and @- but that can
obviously be changed. If this sounds like a good path to you guys, I will
go ahead and implement the operators for the appropriate types. Please let
me know if I am misunderstanding something - I am still figuring stuff out
:)

Aside from the opclass stuff, there were some other important issues
mentioned with the original RANGE support. I think I will address those
after the opclass stuff is done.

Thanks!
Ian


On Sat, Jun 22, 2013 at 4:38 PM, ian link i...@ilink.io wrote:

 Thanks Craig! That definitely does help. I probably still have some
 questions but I think I will read through the rest of the code before
 asking. Thanks again!

 Ian

  Craig Ringer
  Friday, June 21, 2013 8:41 PM

 
  On 06/22/2013 03:30 AM, ian link wrote:
 
  Forgive my ignorance, but I don't entirely understand the problem. What
  does '+' and '-' refer to exactly?
 
  Consider RANGE 4.5 PRECEDING'.
 
  You need to be able to test whether, for the current row 'b', any given
  row 'a' is within the range (b - 4.5)  a = b . Not 100% sure about the
   vs = boundaries, but that's irrelevant for the example.
 
  To test that, you have to be able to do two things: you have to be able
  to test whether one value is greater than another, and you have to be
  able to add or subtract a constant from one of the values.
 
  Right now, the b-tree access method provides information on the ordering
  operators  = =  =  , which provides half the answer. But these
  don't give any concept of *distance* - you can test ordinality but not
  cardinality.
 
  To implement the different by 4.5 part, you have to be able to add 4.5
  to one value or subtract it from the other.
 
  The obvious way to do that is to look up the function that implements
  the '+' or '-' operator, and do:
 
  ((OPERATOR(+))(a, 4.5))  b AND (a = b)
 
  or
 
  ((OPERATOR(-))(b, 4.5))  a AND (a = b);
 
  The problem outlined by Tom in prior discussion about this is that
  PostgreSQL tries really hard not to assume that particular operator
  names mean particular things. Rather than knowing that + is always
  an operator that adds two values together; is transitive, symmetric and
  reflexive, PostgreSQL requires that you define an *operator class* that
  names the operator that has those properties.
 
  Or at least, it does for less-than, less-than-or-equals, equals,
  greater-than-or-equals, greater-than, and not-equals as part of the
  b-tree operator class, which *usually* defines these operators as  = =
 
  =  , but you could use any operator names you wanted if you really
 
  liked.
 
  Right now (as far as I know) there's no operator class that lets you
  identify operators for addition and subtraction in a similar way. So
  it's necessary to either add such an operator class (in which case
  support has to be added for it for every type), extend the existing
  b-tree operator class to provide the info, or blindly assume that +
  and - are always addition and subtraction.
 
  For an example of why such assumptions are a bad idea, consider matrix
  multiplication. Normally, a * b = b * a, but this isn't true for
  multiplication of matrices. Similarly, if someone defined a + operator
  as an alias for string concatenation (||), we'd be totally wrong to
  assume we could use that for doing range-offset windowing.
 
  So. Yeah. Operator classes required, unless we're going to change the
  rules and make certain operator names special in PostgreSQL, so that
  if you implement them they *must* have certain properties. This seems
  like a pretty poor reason to add such a big change.
 
  I hope this explanation (a) is actually correct and (b) is helpful.
 
  ian link
  Friday, June 21, 2013 12:30 PM

  Forgive my ignorance, but I don't entirely understand the problem. What
 does '+' and '-' refer to exactly?
  Thanks!
 
 
 
  Hitoshi Harada
  Friday, June 21, 2013 4:35 AM
 
 
 

 On 06/22/2013 03:30 AM, ian link wrote:
  Forgive my ignorance, but I don't entirely understand the problem. What
  does '+' and '-' refer to exactly?

 Consider RANGE 4.5 PRECEDING'.

 You need to be able to test whether, for the current row 'b', any given
 row 'a' is within the range (b - 4.5)  a = b . Not 100% sure about the
  vs = boundaries, but that's irrelevant for the example.

 To test that, you have to be able to do two things: you have to be able
 to test whether one value is greater than another, and you have to be
 able to add or subtract a constant from one of the values.

 Right now, the b-tree

Re: [HACKERS] Review: query result history in psql

2013-06-28 Thread ian link

 It's better to post a review as a reply to the message which contains
 the patch.

Sorry about that, I did not have the email in my inbox and couldn't figure
out how to use the old message ID to send a reply. Here is the thread:
http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com

The 'EscapeForCopy' was meant to mean 'Escape string in a format require by
 the COPY TEXT format', so 'copy' in the name refers to the escaping format,
 not the action performed by the function.


I see, that makes sense now. Keep it as you see fit, it's not a big deal in
my opinion.

 Some mathematical toolkits, like Matlab or Mathematica, automatically set
 a variable called 'ans' (short for answer) containing the result of the
 last operation. I was trying to emulate exactly this behaviour.


I've actually been using Matlab lately, which must be why the name made
sense to me intuitively. I don't know if this is the best name, however. It
kind of assumes that our users use Matlab/Octave/Mathematica. Maybe 'qhist'
or 'hist' or something?

The history is not erased. The history is always stored in the client's
 memory.

Ah, I did not pick up on that. Thank you for explaining it! That's actually
a very neat way of doing it. Sorry I did not realize that at first.

I was considering such a behaviour. But since the feature is turned off by
 default, I decided that whoever is using it, is aware of cost. Instead of
 truncating the history automatically (which could lead to a nasty
 surprise), I decided to equip the user with \ansclean , a command erasing
 the history. I believe that it is better to let the user decide when
 history should be erased, instead of doing it automatically.


I think you are correct. However, if we turn on the feature by default (at
some point in the future) the discussion should probably be re-visited.

This is  my first submitted patch, so I can't really comment on the
 process. But if you could add the author's email to CC, the message would
 be much easier to spot. I replied after two days only because I missed the
 message in the flood of other pgsql-hacker messages. I think I need to scan
 the list more carefully...

My fault, I definitely should have CC'd you.

As for the patch, I made a new version of the latest one you provided in
the original thread. Let me know if anything breaks, but it compiles fine
on my box. Thanks for the feedback!

Ian


query-history-review1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: query result history in psql

2013-06-28 Thread ian link
I just applied the patch to a clean branch from the latest master. I
couldn't get a segfault from using the new feature. Could you provide a
little more info to reproduce the segfault? Thanks


On Thu, Jun 27, 2013 at 11:28 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 after patching I god segfault

 Program terminated with signal 11, Segmentation fault.
 #0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
 98  for (p = prompt_string;
 Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686
 ncurses-libs-5.7-9.20100703.fc14.i686 readline-6.1-2.fc14.i386
 (gdb) bt
 #0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
 #1  0x0805786a in MainLoop (source=0xc45440) at mainloop.c:134
 #2  0x0805a68d in main (argc=2, argv=0xbfcf2894) at startup.c:336

 Regards

 Pavel Stehule

 2013/6/28 ian link i...@ilink.io:
  It's better to post a review as a reply to the message which contains
  the patch.
 
  Sorry about that, I did not have the email in my inbox and couldn't
 figure
  out how to use the old message ID to send a reply. Here is the thread:
 
 http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com
 
  The 'EscapeForCopy' was meant to mean 'Escape string in a format require
  by the COPY TEXT format', so 'copy' in the name refers to the escaping
  format, not the action performed by the function.
 
 
  I see, that makes sense now. Keep it as you see fit, it's not a big deal
 in
  my opinion.
 
   Some mathematical toolkits, like Matlab or Mathematica, automatically
 set
  a variable called 'ans' (short for answer) containing the result of
 the
  last operation. I was trying to emulate exactly this behaviour.
 
 
  I've actually been using Matlab lately, which must be why the name made
  sense to me intuitively. I don't know if this is the best name, however.
 It
  kind of assumes that our users use Matlab/Octave/Mathematica. Maybe
 'qhist'
  or 'hist' or something?
 
  The history is not erased. The history is always stored in the client's
  memory.
 
  Ah, I did not pick up on that. Thank you for explaining it! That's
 actually
  a very neat way of doing it. Sorry I did not realize that at first.
 
  I was considering such a behaviour. But since the feature is turned off
 by
  default, I decided that whoever is using it, is aware of cost. Instead
 of
  truncating the history automatically (which could lead to a nasty
 surprise),
  I decided to equip the user with \ansclean , a command erasing the
 history.
  I believe that it is better to let the user decide when history should
 be
  erased, instead of doing it automatically.
 
 
  I think you are correct. However, if we turn on the feature by default
 (at
  some point in the future) the discussion should probably be re-visited.
 
  This is  my first submitted patch, so I can't really comment on the
  process. But if you could add the author's email to CC, the message
 would be
  much easier to spot. I replied after two days only because I missed the
  message in the flood of other pgsql-hacker messages. I think I need to
 scan
  the list more carefully...
 
  My fault, I definitely should have CC'd you.
 
  As for the patch, I made a new version of the latest one you provided in
 the
  original thread. Let me know if anything breaks, but it compiles fine on
 my
  box. Thanks for the feedback!
 
  Ian
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 



Re: [HACKERS] Review: query result history in psql

2013-06-28 Thread ian link
No worries! :)


On Fri, Jun 28, 2013 at 12:20 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 It's look like my bug - wrong compilation

 I am sorry

 Pavel

 2013/6/28 ian link i...@ilink.io:
  I just applied the patch to a clean branch from the latest master. I
  couldn't get a segfault from using the new feature. Could you provide a
  little more info to reproduce the segfault? Thanks
 
 
  On Thu, Jun 27, 2013 at 11:28 PM, Pavel Stehule pavel.steh...@gmail.com
 
  wrote:
 
  Hello
 
  after patching I god segfault
 
  Program terminated with signal 11, Segmentation fault.
  #0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
  98  for (p = prompt_string;
  Missing separate debuginfos, use: debuginfo-install glibc-2.13-2.i686
  ncurses-libs-5.7-9.20100703.fc14.i686 readline-6.1-2.fc14.i386
  (gdb) bt
  #0  0x0805aab4 in get_prompt (status=PROMPT_READY) at prompt.c:98
  #1  0x0805786a in MainLoop (source=0xc45440) at mainloop.c:134
  #2  0x0805a68d in main (argc=2, argv=0xbfcf2894) at startup.c:336
 
  Regards
 
  Pavel Stehule
 
  2013/6/28 ian link i...@ilink.io:
   It's better to post a review as a reply to the message which contains
   the patch.
  
   Sorry about that, I did not have the email in my inbox and couldn't
   figure
   out how to use the old message ID to send a reply. Here is the thread:
  
  
 http://www.postgresql.org/message-id/flat/caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com#caecsyxjri++t3pevdyzawh2ygx7kg9zrhx8kawtp1fxv3h0...@mail.gmail.com
  
   The 'EscapeForCopy' was meant to mean 'Escape string in a format
   require
   by the COPY TEXT format', so 'copy' in the name refers to the
 escaping
   format, not the action performed by the function.
  
  
   I see, that makes sense now. Keep it as you see fit, it's not a big
 deal
   in
   my opinion.
  
Some mathematical toolkits, like Matlab or Mathematica,
 automatically
   set
   a variable called 'ans' (short for answer) containing the result of
   the
   last operation. I was trying to emulate exactly this behaviour.
  
  
   I've actually been using Matlab lately, which must be why the name
 made
   sense to me intuitively. I don't know if this is the best name,
 however.
   It
   kind of assumes that our users use Matlab/Octave/Mathematica. Maybe
   'qhist'
   or 'hist' or something?
  
   The history is not erased. The history is always stored in the
 client's
   memory.
  
   Ah, I did not pick up on that. Thank you for explaining it! That's
   actually
   a very neat way of doing it. Sorry I did not realize that at first.
  
   I was considering such a behaviour. But since the feature is turned
 off
   by
   default, I decided that whoever is using it, is aware of cost.
 Instead
   of
   truncating the history automatically (which could lead to a nasty
   surprise),
   I decided to equip the user with \ansclean , a command erasing the
   history.
   I believe that it is better to let the user decide when history
 should
   be
   erased, instead of doing it automatically.
  
  
   I think you are correct. However, if we turn on the feature by default
   (at
   some point in the future) the discussion should probably be
 re-visited.
  
   This is  my first submitted patch, so I can't really comment on the
   process. But if you could add the author's email to CC, the message
   would be
   much easier to spot. I replied after two days only because I missed
 the
   message in the flood of other pgsql-hacker messages. I think I need
 to
   scan
   the list more carefully...
  
   My fault, I definitely should have CC'd you.
  
   As for the patch, I made a new version of the latest one you provided
 in
   the
   original thread. Let me know if anything breaks, but it compiles fine
 on
   my
   box. Thanks for the feedback!
  
   Ian
  
  
   --
   Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-hackers
  
 
 



[HACKERS] Review: query result history in psql

2013-06-25 Thread ian link
Hi Maciej,

I've been reviewing your patch for the ongoing commitfest. First let me say
that I think it's a great idea and provides some very useful functionality.

However, there are a few minor problems. There were a few
english/grammatical mistakes that I went ahead and fixed. Additionally, I
think some of the string manipulation might be placed outside of the main
ans.c file. I don't know if there's a better place for 'EscapeForCopy' and
'GetEscapedLen'. Not really a big deal, just an organizational idea. I also
changed 'EscapeForCopy' to 'EscapeAndCopy'. I think that better describes
the functionality. 'EscapeForCopy' kind of implies that another function is
needed to copy the string.

What does 'ans' stand for? I am not sure how it relates to the concept of a
query history. It didn't stop my understanding of the code, but I don't
know if a user will immediately know the meaning.

Probably the biggest problem is that the query history list is missing a
maximum size variable. I think this could be valuable for preventing users
from shooting themselves in the foot. If the user is running large queries,
they might accidentally store too much data. This probably somewhat of an
edge-case but I believe it is worth considering. We could provide a
sensible default limit (10 queries?) and also allow the user to change it.

Finally, is it worth resetting the query history every time a user
reconnects to the database? I can see how this might interrupt a user's
workflow. If the user suddenly disconnects (network connection interrupted,
etc) then they would lose their history. I think this is definitely up for
debate. It would add more management overhead (psql options etc) and might
just be unnecessary. However, with a sane limit to the size of the query
history, I don't know if there would be too many drawbacks from a storage
perspective.

Those issues aside - I think it's a great feature! I can add the
grammatical fixes I made whenever the final patch is ready. Or earlier,
whatever works for you. Also, this is my first time reviewing a patch, so
please let me know if I can improve on anything. Thanks!

Ian Link


Re: [HACKERS] Patch for fast gin cache performance improvement

2013-06-23 Thread Ian Link
I just tried it and my 
account works now. Thanks! 
Ian


   	   
   	Stefan Kaltenbrunner  
  Sunday, June 23, 
2013 2:05 AM
  have you tried 
to log in once to the main website per:http://www.postgresql.org/message-id/CABUevEyt9tQfcF7T2Uzcr8WeF9M=s8qSACuCmN5L2Et26=r...@mail.gmail.com?Stefan
   	   
   	ian link  
  Saturday, June 
22, 2013 7:03 PM
  Looks like my 
community login is still not working. No rush, just wanted to let you 
know. Thanks!Ian

  
   	   
   	Ian Link  
  Tuesday, June 18,
 2013 11:41 AM
  


No worries! I'll just wait until it's up again.
Thanks
Ian

  
   	   
   	Kevin Grittner  
  Tuesday, June 18,
 2013 9:15 AM
  Oops -- we seem
 to have a problem with new community logins at themoment, which 
will hopefully be straightened out soon. You mightwant to wait a 
few days if you don't already have a login.--Kevin GrittnerEnterpriseDB:
 http://www.enterprisedb.comThe Enterprise PostgreSQL Company
   	   
   	Kevin Grittner  
  Tuesday, June 18,
 2013 9:09 AM
  Ian Link i...@ilink.io wrote:

This patch contains a performance improvement for the fast gin
cache.

Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch! To get it
scheduled for review, please add it to this page:

https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process. Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch. Don't worry about
the fields for reviewers, committer, or date closed. 

Sorry for the administrative overhead, but without it things can
fall through the cracks. You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!






Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-22 Thread ian link
Thanks Craig! That definitely does help. I probably still have some
questions but I think I will read through the rest of the code before
asking. Thanks again!

Ian

 Craig Ringer
 Friday, June 21, 2013 8:41 PM

 On 06/22/2013 03:30 AM, ian link wrote:

 Forgive my ignorance, but I don't entirely understand the problem. What
 does '+' and '-' refer to exactly?

 Consider RANGE 4.5 PRECEDING'.

 You need to be able to test whether, for the current row 'b', any given
 row 'a' is within the range (b - 4.5)  a = b . Not 100% sure about the
  vs = boundaries, but that's irrelevant for the example.

 To test that, you have to be able to do two things: you have to be able
 to test whether one value is greater than another, and you have to be
 able to add or subtract a constant from one of the values.

 Right now, the b-tree access method provides information on the ordering
 operators  = =  =  , which provides half the answer. But these
 don't give any concept of *distance* - you can test ordinality but not
 cardinality.

 To implement the different by 4.5 part, you have to be able to add 4.5
 to one value or subtract it from the other.

 The obvious way to do that is to look up the function that implements
 the '+' or '-' operator, and do:

 ((OPERATOR(+))(a, 4.5))  b AND (a = b)

 or

 ((OPERATOR(-))(b, 4.5))  a AND (a = b);

 The problem outlined by Tom in prior discussion about this is that
 PostgreSQL tries really hard not to assume that particular operator
 names mean particular things. Rather than knowing that + is always
 an operator that adds two values together; is transitive, symmetric and
 reflexive, PostgreSQL requires that you define an *operator class* that
 names the operator that has those properties.

 Or at least, it does for less-than, less-than-or-equals, equals,
 greater-than-or-equals, greater-than, and not-equals as part of the
 b-tree operator class, which *usually* defines these operators as  = =

 =  , but you could use any operator names you wanted if you really

 liked.

 Right now (as far as I know) there's no operator class that lets you
 identify operators for addition and subtraction in a similar way. So
 it's necessary to either add such an operator class (in which case
 support has to be added for it for every type), extend the existing
 b-tree operator class to provide the info, or blindly assume that +
 and - are always addition and subtraction.

 For an example of why such assumptions are a bad idea, consider matrix
 multiplication. Normally, a * b = b * a, but this isn't true for
 multiplication of matrices. Similarly, if someone defined a + operator
 as an alias for string concatenation (||), we'd be totally wrong to
 assume we could use that for doing range-offset windowing.

 So. Yeah. Operator classes required, unless we're going to change the
 rules and make certain operator names special in PostgreSQL, so that
 if you implement them they *must* have certain properties. This seems
 like a pretty poor reason to add such a big change.

 I hope this explanation (a) is actually correct and (b) is helpful.

 ian link
 Friday, June 21, 2013 12:30 PM
 Forgive my ignorance, but I don't entirely understand the problem. What
does '+' and '-' refer to exactly?
 Thanks!



 Hitoshi Harada
 Friday, June 21, 2013 4:35 AM



 On 06/22/2013 03:30 AM, ian link wrote:
 Forgive my ignorance, but I don't entirely understand the problem. What
 does '+' and '-' refer to exactly?

Consider RANGE 4.5 PRECEDING'.

You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5)  a = b . Not 100% sure about the
 vs = boundaries, but that's irrelevant for the example.

To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.

Right now, the b-tree access method provides information on the ordering
operators  = =  =  , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.

To implement the different by 4.5 part, you have to be able to add 4.5
to one value or subtract it from the other.

The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:

((OPERATOR(+))(a, 4.5))  b AND (a = b)

or

((OPERATOR(-))(b, 4.5))  a AND (a = b);

The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than knowing that + is always
an operator that adds two values together; is transitive, symmetric and
reflexive, PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.

Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines

Re: [HACKERS] Patch for fast gin cache performance improvement

2013-06-22 Thread ian link
Looks like my community login is still not working. No rush, just wanted to
let you know. Thanks!

Ian


On Tue, Jun 18, 2013 at 11:41 AM, Ian Link i...@ilink.io wrote:


 No worries! I'll just wait until it's up again.
 Thanks
 Ian

   Kevin Grittner kgri...@ymail.com
  Tuesday, June 18, 2013 9:15 AM

 Oops -- we seem to have a problem with new community logins at the
 moment, which will hopefully be straightened out soon.  You might
 want to wait a few days if you don't already have a login.

 --
 Kevin Grittner
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
   Kevin Grittner kgri...@ymail.com
  Tuesday, June 18, 2013 9:09 AM

 Ian Link i...@ilink.io i...@ilink.io wrote:


 This patch contains a performance improvement for the fast gin
 cache.

 Our test queries improve from about 0.9 ms to 0.030 ms.

 Impressive!


 Thanks for reading and considering this patch!


 Congratulations on your first PostgreSQL patch!  To get it
 scheduled for review, please add it to this page:
 https://commitfest.postgresql.org/action/commitfest_view/open


 You will need to get a community login (if you don't already have
 one), but that is a quick and painless process.  Choose an
 appropriate topic (like Performance) and reference the message ID
 of the email to which you attached the patch.  Don't worry about
 the fields for reviewers, committer, or date closed.

 Sorry for the administrative overhead, but without it things can
 fall through the cracks.  You can find an overview of the review
 process with links to more detail here:

 http://wiki.postgresql.org/wiki/CommitFest

 Thanks for contributing!


   Ian Link i...@ilink.io
  Monday, June 17, 2013 9:42 PM
  *

 This patch contains a performance improvement for the fast gin cache. As
 you may know, the performance of the fast gin cache decreases with its
 size. Currently, the size of the fast gin cache is tied to work_mem. The
 size of work_mem can often be quite high. The large size of work_mem is
 inappropriate for the fast gin cache size. Therefore, we created a separate
 cache size called gin_fast_limit. This global variable controls the size of
 the fast gin cache, independently of work_mem. Currently, the default
 gin_fast_limit is set to 128kB. However, that value could need tweaking.
 64kB may work better, but it's hard to say with only my single machine to
 test on.

 On my machine, this patch results in a nice speed up. Our test queries
 improve from about 0.9 ms to 0.030 ms. Please feel free to use the test
 case yourself: it should be attached. I can look into additional test cases
 (tsvectors) if anyone is interested.

 In addition to the global limit, we have provided a per-index limit:
 fast_cache_size. This per-index limit begins at -1, which means that it is
 disabled. If the user does not specify a per-index limit, the index will
 simply use the global limit.

 I would like to thank Andrew Gierth for all his help on this patch. As
 this is my first patch he was extremely helpful. The idea for this
 performance improvement was entirely his. I just did the implementation.
 Thanks for reading and considering this patch!*


 Ian Link


postbox-contact.jpgcompose-unknown-contact.jpg

Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-21 Thread ian link
Forgive my ignorance, but I don't entirely understand the problem. What
does '+' and '-' refer to exactly?
Thanks!


On Fri, Jun 21, 2013 at 4:35 AM, Hitoshi Harada umi.tan...@gmail.comwrote:




 On Fri, Jun 21, 2013 at 3:20 AM, Craig Ringer cr...@2ndquadrant.comwrote:

 On 06/21/2013 05:32 PM, Hitoshi Harada wrote:

  I also later found that we are missing not only notion of '+' or '-',
  but also notion of 'zero value' in our catalog.  Per spec, RANGE BETWEEN
  needs to detect ERROR if the offset value is negative, but it is not
  always easy if you think about interval, numeric types as opposed to
  int64 used in ROWS BETWEEN.

 Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
 should make sense for any type in which the concept of zero makes sense.


 Yeah, I mean, it needs to know if offset is negative or not by testing
 with zero.  So we need zero value or is_negative function for each type.

 Thanks,
 --
 Hitoshi Harada



Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-20 Thread Ian Link
I am currently looking 
into this feature. However, as I am quite new to Postgres, I think it 
might take me a while to get up to speed. Anyways, I would also 
appreciate another round of discussion on the future of the windowing 
functions. 

Ian Link


   	   
   	Craig Ringer  
  Thursday, June 
20, 2013 7:24 PM
  Hi allSince 8.4, 
PostgreSQL has had extremely useful window function support -but 
support for "RANGE PRECEDING / FOLLOWING" windows was dropped latein
 8.4's development in order to get the rest of the feature in, perhttp://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.It
 looks like there was discussion of requiring a new opclass to bedeclared
 for types or otherwise extending opclasses to provide theinformation
 required for RANGE ... PRECEDING / FOLLOWING (http://www.postgresql.org/message-id/20100211201444.ga28...@svana.org
 ). I can't find any sign that it went anywhere beyond some broaddiscussion:http://www.postgresql.org/message-id/13993.1265920...@sss.pgh.pa.us
 atthe time.I've missed this feature more than once, and am 
curious about whetherany more recent changes may have made it 
cleaner to tackle this, orwhether consensus can be formed on adding 
the new entries to btree'sopclass to avoid the undesirable explicit 
lookups of the '+' and '-'oprators.Some question seems to 
remain open about how ranges overtimestamps/intervals should work, 
but this wasn't elaborated on.There's been interest in this, eg:http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions




Re: [HACKERS] Support for RANGE ... PRECEDING windows in OVER

2013-06-20 Thread Ian Link
Thanks! The discussions 
have been useful, although I am currently just reviewing the code. 
I think a good starting point will be to refactor/imrpove the 
WinGetFuncArgInPartition and WinGetFuncArgInFrame functions.
Tom Lane wrote this about them before comitting the patch:

I'm
 not terribly happy with the changes you made in 
WinGetFuncArgInPartitionand
 WinGetFuncArgInFrame to force the window function mark to not gopast
 frame start in some modes. Not only is that pretty ugly, but Ithink
 it can mask bugs in window functions: it's an error for a windowfunction
 to fetch a row before what it has set its mark to be, but insome
 cases that wouldn't be detected because of this change. I thinkit
 would be better to revert those changes and find another method ofprotecting
 fetches needed to determine the frame head. One idea isto
 create a separate read pointer that tracks the frame head wheneveractual
 fetches of the frame head might be needed by update_frameheadpos.I
 committed it without changing that, but I think this should berevisited
 before trying to add the RANGE value PRECEDING/FOLLOWINGoptions,
 because those will substantially expand the number of caseswhere
 that hack affects the behavior. 

I am honestly not 100% certain why these functions have issues, but this
 seems a good place to start investigating.

Ian Link


   	   
   	Craig Ringer  
  Thursday, June 
20, 2013 7:37 PM
  Good to know, 
and welcome.I hope the links to the archived discussions on the 
matter were usefulto you.
   	   
   	Craig Ringer  
  Thursday, June 
20, 2013 7:24 PM
  Hi allSince 8.4, 
PostgreSQL has had extremely useful window function support -but 
support for "RANGE PRECEDING / FOLLOWING" windows was dropped latein
 8.4's development in order to get the rest of the feature in, perhttp://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.It
 looks like there was discussion of requiring a new opclass to bedeclared
 for types or otherwise extending opclasses to provide theinformation
 required for RANGE ... PRECEDING / FOLLOWING (http://www.postgresql.org/message-id/20100211201444.ga28...@svana.org
 ). I can't find any sign that it went anywhere beyond some broaddiscussion:http://www.postgresql.org/message-id/13993.1265920...@sss.pgh.pa.us
 atthe time.I've missed this feature more than once, and am 
curious about whetherany more recent changes may have made it 
cleaner to tackle this, orwhether consensus can be formed on adding 
the new entries to btree'sopclass to avoid the undesirable explicit 
lookups of the '+' and '-'oprators.Some question seems to 
remain open about how ranges overtimestamps/intervals should work, 
but this wasn't elaborated on.There's been interest in this, eg:http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions




Re: [HACKERS] Patch for fast gin cache performance improvement

2013-06-18 Thread Ian Link

No worries! I'll just wait until it's up again.
Thanks
Ian

   	   
   	Kevin Grittner  
  Tuesday, June 18,
 2013 9:15 AM
  Oops -- we seem
 to have a problem with new community logins at themoment, which 
will hopefully be straightened out soon. You mightwant to wait a 
few days if you don't already have a login.--Kevin GrittnerEnterpriseDB:
 http://www.enterprisedb.comThe Enterprise PostgreSQL Company
   	   
   	Kevin Grittner  
  Tuesday, June 18,
 2013 9:09 AM
  Ian Link i...@ilink.io wrote:

This patch contains a performance improvement for the fast gin
cache.

Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch! To get it
scheduled for review, please add it to this page:

https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process. Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch. Don't worry about
the fields for reviewers, committer, or date closed. 

Sorry for the administrative overhead, but without it things can
fall through the cracks. You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!


   	   
   	Ian Link  
  Monday, June 17, 
2013 9:42 PM
  

  This

 patch contains a performance improvement for the fast gin cache. As you
 may know, the performance of the fast gin cache decreases with its 
size. Currently, the size of the fast gin cache is tied to work_mem. The
 size of work_mem can often be quite high. The large size of work_mem is
 inappropriate for the fast gin cache size. Therefore, we created a 
separate cache size called gin_fast_limit. This global variable controls
 the size of the fast gin cache, independently of work_mem. Currently, 
the default gin_fast_limit is set to 128kB. However, that value could 
need tweaking. 64kB may work better, but it's hard to say with only my 
single machine to test on.On

 my machine, this patch results in a nice speed up. Our test queries 
improve from about 0.9 ms to 0.030 ms. Please feel free to use the test 
case yourself: it should be attached. I can look into additional test 
cases (tsvectors) if anyone is interested. In

 addition to the global limit, we have provided a per-index limit: 
fast_cache_size. This per-index limit begins at -1, which means that it 
is disabled. If the user does not specify a per-index limit, the index 
will simply use the global limit. I
 would like to thank Andrew Gierth for all his help on this patch. As 
this is my first patch he was extremely helpful. The idea for this 
performance improvement was entirely his. I just did the implementation.
 Thanks for reading and considering this patch! 
    
    
Ian Link