Re: [HACKERS] NUMERIC private methods?
On Sun, Mar 22, 2015 at 04:08:32AM +, Andrew Gierth wrote: Bruce == Bruce Momjian br...@momjian.us writes: ! However, calculations on typenumeric/type values is very slow arithmetic ... is, but calculations ... are Ah, good point. Fixed an applied. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On Thu, Dec 18, 2014 at 11:51:12PM -0500, Tom Lane wrote: If you're doing any sort of higher math or statistics, I stand by my statement that you'd better think rather than just blindly assume that numeric is going to be better for you. A moment's fooling about finds this example, which is pretty relevant to the formula we started this thread with: regression=# select (1234::numeric/1235) * 1235; ?column? --- 1234.0100 (1 row) regression=# select (1234::float8/1235) * 1235; ?column? -- 1234 (1 row) What it boils down to is that numeric is great for storing given decimal inputs exactly, and it can do exact addition/subtraction/multiplication on those too, but as soon as you get into territory where the result is fundamentally inexact it is *not* promised to be better than float8. In fact, it's designed to be more or less the same as float8; see the comments in select_div_scale. Based on the analysis above, I have written the attached patch to the NUMERIC docs to mention this. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index edf636b..4a65971 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** *** 498,508 para The type typenumeric/type can store numbers with a ! very large number of digits and perform calculations exactly. It is ! especially recommended for storing monetary amounts and other ! quantities where exactness is required. However, arithmetic on ! typenumeric/type values is very slow compared to the integer ! types, or to the floating-point types described in the next section. /para para --- 498,510 para The type typenumeric/type can store numbers with a ! very large number of digits. It is especially recommended for ! storing monetary amounts and other quantities where exactness is ! required. Calculations with typenumeric/type values yield exact ! results where possible, e.g. addition, subtraction, multiplication. ! However, calculations on typenumeric/type values is very slow ! compared to the integer types, or to the floating-point types ! described in the next section. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Bruce == Bruce Momjian br...@momjian.us writes: ! However, calculations on typenumeric/type values is very slow arithmetic ... is, but calculations ... are -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On Thu, Dec 18, 2014 at 11:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: What it boils down to is that numeric is great for storing given decimal inputs exactly, and it can do exact addition/subtraction/multiplication on those too, but as soon as you get into territory where the result is fundamentally inexact it is *not* promised to be better than float8. I think what it boils down to is that several people here (and I'll add my voice to the chorus) are saying, hey, numeric is really useful, and we'd like to be able to manipulate numerics without all the palloc and fmgr overhead, and your response appears to be to say, use float8. That may be the right answer in some cases, but certainly not in all. We could probably improve on this if we were to redesign the algorithms around a concept of decimal floating-point, rather than decimal fixed-point as it is now. But I'm not sure how well that would comport with the SQL standard. And I'm very not sure that we could still do it once we'd tied one hand behind our backs by virtue of exporting a bunch of the internals as public API. You make this argument every time somebody wants to drop static from a function or stick PGDLLIMPORT on a variable, and frankly I think it's pretty developer-hostile. I would much rather see us go ahead and do those things and if people later complain that we broke stuff, we'll go tell them to pound sand. That's what we said when people complained about relistemp - relpersistence, and the number of people who are affected by a change in the system catalogs has got to be a good two orders of magnitude more than the number of people who are going to notice a change in the C API. Giving developers the ability to write extensions that *might* get broken by future changes is a lot better than not giving them that ability in the first place. There's only a problem for the core project if we think we're bound not to break the APIs in a future release, and I don't feel so bound. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Robert Haas robertmh...@gmail.com writes: I think what it boils down to is that several people here (and I'll add my voice to the chorus) are saying, hey, numeric is really useful, and we'd like to be able to manipulate numerics without all the palloc and fmgr overhead, and your response appears to be to say, use float8. That may be the right answer in some cases, but certainly not in all. Well, there are two components to what I'm saying. One is that the example David started with looks like it could use some better-informed consideration about which datatype to use. The other is that numeric leaves quite a lot to be desired still, and someday we might want to fix that, and that might require breaking the APIs you want to expose. You make this argument every time somebody wants to drop static from a function or stick PGDLLIMPORT on a variable, and frankly I think it's pretty developer-hostile. I would much rather see us go ahead and do those things and if people later complain that we broke stuff, we'll go tell them to pound sand. Really. I will remember that next time you bitch about us changing some extension-visible behavior, which AFAIR you are usually one of the first to complain about. Anyway, if we do this, I concur with Alvaro's suggestion that the additional exposure be in a header file named something like numeric_private.h, so that there's less room for complaint when we change it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On Fri, Dec 19, 2014 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, there are two components to what I'm saying. One is that the example David started with looks like it could use some better-informed consideration about which datatype to use. The other is that numeric leaves quite a lot to be desired still, and someday we might want to fix that, and that might require breaking the APIs you want to expose. As I say, I'm OK with that. That file has been nearly-static for a really long time, so I don't expect a massive rewrite imminently. But if it happens, and it breaks things for people relying on those functions, tough luck for them. You make this argument every time somebody wants to drop static from a function or stick PGDLLIMPORT on a variable, and frankly I think it's pretty developer-hostile. I would much rather see us go ahead and do those things and if people later complain that we broke stuff, we'll go tell them to pound sand. Really. I will remember that next time you bitch about us changing some extension-visible behavior, which AFAIR you are usually one of the first to complain about. Really? I have bitched about what I see as nearly-pointless reorganizations of header files, because I personally place very low priority on speeding up incremental compiles, and most of those changes have no other benefit. But I can't remember ever bitching about function signature changes or changes to the types of global variables, or even the disappearance of previously-exposed types or global variables. And I daresay, given the nature of my responsibilities at EDB, such things have at least one order of magnitude more impact on me than they do on most extension maintainers. If it were up to me, I'd go stick PGDLLIMPORT on every global variable I could find; I estimate that would make things easier for me, say, five times as often as it would make them harder. The amount of time we've spent tracking down regressions on Windows because some loadable module of ours depended on a variable that wasn't PGDLLIMPORT'd is large - everything works fine on Linux, and sometimes survives cursory testing on Windows too, but then breaks in some obscure scenario. And there have also been cases where we've had to work around the absence of PGDLLIMPORT markings with ugly hacks and then that workaround code has turned out to be buggy. I would not argue for indiscriminately making global every function we have anywhere in the backend, but I would favor a policy of being significantly more liberal about it than we have been heretofore. Anyway, if we do this, I concur with Alvaro's suggestion that the additional exposure be in a header file named something like numeric_private.h, so that there's less room for complaint when we change it. Yes, I liked that proposal, too. Also, I'm not sure whether there's room to worry that making those function extern rather than static would defeat compiler optimizations that materially affect performance. That might just be paranoia on my part, but I've seen cases where it matters. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On Thu, Dec 18, 2014 at 11:51:37PM -0300, Alvaro Herrera wrote: Robert Haas wrote: I think that's ridiculous. You're basically arguing that numeric doesn't offer meaningful advantages over float8, which flies in the face of the fact that essentially every database application I've ever seen uses numeric and I'm not sure I've ever seen one using float8. Nearly all database users prefer to store quantities like currency units in a type that is guaranteed not to lose precision. I think it's reasonable to expose NumericVar and the supporting function prototypes in, say, numeric_internal.h; normal applications that just want to operate on numerics as today can just include numeric.h, and continue to be at arms-length of the implementation details, while code that wants to optimize operations further can use numeric_internal.h and be very aware that they are subject to heavy breakage if we ever feel a need to change the internal API. While nothing can prevent negligence and pilot error, making it clear by the name of the included header that breakable stuff is being used seems like an excellent way to proceed. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On 12/18/14, 9:21 AM, Tom Lane wrote: As it stands, no extension can use the numeric type in any non-trivial way without paying a large penalty for repeated pallocs and data copies. Given that the ability to write C extensions easily is one of pg's great strengths, this is a defect that should be corrected. If copying data/palloc is the root of numeric's performance problems then we need to address that, because it will provide benefit across the entire database. The pattern of (palloc; copy) is repeated throughout a large part of the codebase. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On Thu, Dec 18, 2014 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Gierth and...@tao11.riddles.org.uk writes: Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom If you're concerned about arithmetic performance, there is a Tom very obvious fix here: use double. Independently of this specific example, the obvious issue there is that there are applications for which double is simply not acceptable. As the guy who last fooled with the numeric calculation algorithms in any major way, I'm painfully aware that numeric is not necessarily more accurate than double for anything more complicated than addition/subtraction/multiplication. The example that was shown upthread is pretty nearly a textbook case of something where I'd not believe that numeric offers any accuracy improvement without *very* careful investigation. In general, if your application is sufficiently arithmetic-intensive that you need to care about the speed of calculations, then it's not obvious which one to pick, and if I hear a knee-jerk simply not acceptable I'm simply going to conclude that you haven't actually studied the subject. I think that's ridiculous. You're basically arguing that numeric doesn't offer meaningful advantages over float8, which flies in the face of the fact that essentially every database application I've ever seen uses numeric and I'm not sure I've ever seen one using float8. Nearly all database users prefer to store quantities like currency units in a type that is guaranteed not to lose precision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Robert Haas wrote: I think that's ridiculous. You're basically arguing that numeric doesn't offer meaningful advantages over float8, which flies in the face of the fact that essentially every database application I've ever seen uses numeric and I'm not sure I've ever seen one using float8. Nearly all database users prefer to store quantities like currency units in a type that is guaranteed not to lose precision. I think it's reasonable to expose NumericVar and the supporting function prototypes in, say, numeric_internal.h; normal applications that just want to operate on numerics as today can just include numeric.h, and continue to be at arms-length of the implementation details, while code that wants to optimize operations further can use numeric_internal.h and be very aware that they are subject to heavy breakage if we ever feel a need to change the internal API. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 18, 2014 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote: As the guy who last fooled with the numeric calculation algorithms in any major way, I'm painfully aware that numeric is not necessarily more accurate than double for anything more complicated than addition/subtraction/multiplication. The example that was shown upthread is pretty nearly a textbook case of something where I'd not believe that numeric offers any accuracy improvement without *very* careful investigation. I think that's ridiculous. You're basically arguing that numeric doesn't offer meaningful advantages over float8, which flies in the face of the fact that essentially every database application I've ever seen uses numeric and I'm not sure I've ever seen one using float8. Nearly all database users prefer to store quantities like currency units in a type that is guaranteed not to lose precision. If you're doing banking, you don't do anything except addition, subtraction, and multiplication. And that is what those users who want guaranteed precision are doing, and yeah numeric will make them happy. If you're doing any sort of higher math or statistics, I stand by my statement that you'd better think rather than just blindly assume that numeric is going to be better for you. A moment's fooling about finds this example, which is pretty relevant to the formula we started this thread with: regression=# select (1234::numeric/1235) * 1235; ?column? --- 1234.0100 (1 row) regression=# select (1234::float8/1235) * 1235; ?column? -- 1234 (1 row) What it boils down to is that numeric is great for storing given decimal inputs exactly, and it can do exact addition/subtraction/multiplication on those too, but as soon as you get into territory where the result is fundamentally inexact it is *not* promised to be better than float8. In fact, it's designed to be more or less the same as float8; see the comments in select_div_scale. We could probably improve on this if we were to redesign the algorithms around a concept of decimal floating-point, rather than decimal fixed-point as it is now. But I'm not sure how well that would comport with the SQL standard. And I'm very not sure that we could still do it once we'd tied one hand behind our backs by virtue of exporting a bunch of the internals as public API. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Tom == Tom Lane t...@sss.pgh.pa.us writes: Hmm. You'd want to make add_var, mul_var etc. non-static? Tom -1 for that. possibly with more meaningful names. Tom If you're concerned about arithmetic performance, there is a Tom very obvious fix here: use double. Independently of this specific example, the obvious issue there is that there are applications for which double is simply not acceptable. As it stands, no extension can use the numeric type in any non-trivial way without paying a large penalty for repeated pallocs and data copies. Given that the ability to write C extensions easily is one of pg's great strengths, this is a defect that should be corrected. Tom (It would still be orders of magnitude slower, no matter how Tom much we were willing to destroy numeric.c's modularity Tom boundary.) There is no need to expose any details of NumericVar's implementation; it would suffice to provide an interface to allocate NumericVars, and access to the functions. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On 12/16/2014 08:34 AM, David Fetter wrote: Folks, While noodling with some weighted statistics https://github.com/davidfetter/weighted_stats, I noticed I was having to jump through a lot of hoops because of all the private methods in numeric.c, especially NumericVar. Would there be some major objection to exposing NumericVar as an opaque blob? Hmm. You'd want to make add_var, mul_var etc. non-static? Looking at the weighed_stats code, this probably illustrates the hoops you had to jump through: /* sqrt((n/(n-1)) * ((s0*s2 - s1*s1)/(s0*s0)) */ result = DirectFunctionCall1( numeric_sqrt, DirectFunctionCall2( numeric_mul, DirectFunctionCall2( numeric_div, n_prime, DirectFunctionCall2( numeric_sub, n_prime, /* * This rather convoluted way to compute the value * 1 gives us a result which should have at least * as big a decimal scale as s_2 does, which should * guarantee that our result is as precise as the * input... */ DirectFunctionCall2( numeric_add, DirectFunctionCall2( numeric_sub, state-s_2, state-s_2 ), make_numeric(1) ) ) ), DirectFunctionCall2( numeric_div, DirectFunctionCall2( numeric_sub, DirectFunctionCall2( numeric_mul, state-s_0, state-s_2 ), DirectFunctionCall2( numeric_mul, state-s_1, state-s_1 ) ), DirectFunctionCall2( numeric_mul, state-s_0, state-s_0 ) ) ) ); As a start, it would help a lot to #define a few helper macros like: #define ADD(a, b) DirectFunctionCall2(numeric_add, a, b) #define MUL(a, b) DirectFunctionCall2(numeric_mul, a, b) in your extension. That would already make that a lot shorter. You might also be worrying about performance, though. The above snippet was from the aggregate's final function, which isn't performance critical, but you have some numeric operations in the transition function too. I wonder how big the impact really is, though. init_var_from_num and make_result look quite cheap, but certainly not free. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Looking at the weighed_stats code, this probably illustrates Heikki the hoops you had to jump through: Actually that hoop-jumping expression is almost irrelevant. The part that hurts (and yes, it's performance that's at issue here, and not code aesthetics) is not being able to use NumericVar in the aggregate's transition variable, because that means that every computed intermediate value is palloc'd and pfree'd twice (once as the digit buffer of a NumericVar and again as a Numeric datum). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
On Tue, Dec 16, 2014 at 09:01:47AM +, Andrew Gierth wrote: Heikki == Heikki Linnakangas hlinnakan...@vmware.com writes: Heikki Looking at the weighed_stats code, this probably illustrates Heikki the hoops you had to jump through: Actually that hoop-jumping expression is almost irrelevant. Right. Not that it made things fun or easy (at least for me) to debug, as you can see by the git history. The part that hurts (and yes, it's performance that's at issue here, and not code aesthetics) is not being able to use NumericVar in the aggregate's transition variable, because that means that every computed intermediate value is palloc'd and pfree'd twice (once as the digit buffer of a NumericVar and again as a Numeric datum). Yep. Performance of NUMERIC might yet get some of the love it deserves. Perhaps something along the lines of a 128-bit default structure with a promotion/demotion scheme for larger representations. Until then, those of us writing extensions are stuck with heaps of extra instructions in it that could easily be trimmed away to good effect. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NUMERIC private methods?
Heikki Linnakangas hlinnakan...@vmware.com writes: On 12/16/2014 08:34 AM, David Fetter wrote: While noodling with some weighted statistics https://github.com/davidfetter/weighted_stats, I noticed I was having to jump through a lot of hoops because of all the private methods in numeric.c, especially NumericVar. Would there be some major objection to exposing NumericVar as an opaque blob? Hmm. You'd want to make add_var, mul_var etc. non-static? -1 for that. Looking at the weighed_stats code, this probably illustrates the hoops you had to jump through: /* sqrt((n/(n-1)) * ((s0*s2 - s1*s1)/(s0*s0)) */ If you're concerned about arithmetic performance, there is a very obvious fix here: use double. Is there some utterly compelling reason to use numeric, despite the fact that it's certain to be orders of magnitude slower? (It would still be orders of magnitude slower, no matter how much we were willing to destroy numeric.c's modularity boundary.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] NUMERIC private methods?
Folks, While noodling with some weighted statistics https://github.com/davidfetter/weighted_stats, I noticed I was having to jump through a lot of hoops because of all the private methods in numeric.c, especially NumericVar. Would there be some major objection to exposing NumericVar as an opaque blob? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers