On Thu, Jun 11, 2015 at 1:23 PM, Peter Geoghegan <[email protected]> wrote:
> I'm not sure if it's worth mentioning the "cheap equality for text"
> commit (e246b3d6eac09). I guess that it probably is, because it will
> help with things like index scans, too. Arguably that isn't a sorting
> thing (it's certainly not *just* a sorting thing).
I think we should really address this. Attached patch adds a new
release note item for it. It also adds to the documentation that
explains why users should prefer varchar(n)/text to character(n); the
lack of abbreviated key support now becomes a huge disadvantage for
character(n), whereas in previous versions the disadvantages were
fairly minor.
In passing, I updated the existing sort item to reflect that only
varchar(n), text, and numeric benefit from the abbreviation
optimization (not character types more generally + numeric), and added
a note on the effectiveness of the abbreviation optimization alone.
--
Peter Geoghegan
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 9d5ce95..bef9c6c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1106,7 +1106,7 @@ SELECT '52093.89'::money::numeric::float8;
<tip>
<para>
- There is no performance difference among these three types,
+ There is generally no performance difference among these three types,
apart from increased storage space when using the blank-padded
type, and a few extra CPU cycles to check the length when storing into
a length-constrained column. While
@@ -1114,9 +1114,11 @@ SELECT '52093.89'::money::numeric::float8;
advantages in some other database systems, there is no such advantage in
<productname>PostgreSQL</productname>; in fact
<type>character(<replaceable>n</>)</type> is usually the slowest of
- the three because of its additional storage costs. In most situations
- <type>text</type> or <type>character varying</type> should be used
- instead.
+ the three because of its additional storage costs, and
+ particularly because presently it does not take advantage of an
+ important optimization used when sorting. In most situations
+ <type>text</type> or <type>character varying</type> should be
+ used instead.
</para>
</tip>
diff --git a/doc/src/sgml/release-9.5.sgml b/doc/src/sgml/release-9.5.sgml
index 279fb11..9c97b9b 100644
--- a/doc/src/sgml/release-9.5.sgml
+++ b/doc/src/sgml/release-9.5.sgml
@@ -214,8 +214,32 @@
<listitem>
<para>
- Improve the speed of sorting character and numeric fields (Peter
- Geoghegan, Andrew Gierth, Robert Haas)
+ Improve the speed of sorting
+ <type>varchar(<replaceable>n</>)</type>, <type>text</type>,
+ and <type>numeric</type> fields (Peter Geoghegan, Andrew
+ Gierth, Robert Haas)
+ </para>
+
+ <para>
+ When there are multiple fields, the optimization is most
+ effective when most comparisons are resolved by comparing only
+ the first field.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Perform inexpensive binary equality pre-check for
+ <type>varchar(<replaceable>n</>)</type> and <type>text</type>
+ comparisons (Peter Geoghegan)
+ </para>
+
+ <para>
+ This improves the speed of operations where many
+ comparisons can be resolved with the pre-check, which may
+ include sort operations, merge joins, and index scans.
+ Multi-field sort operations with leading fields that have
+ relatively few distinct values will particularly benefit.
</para>
</listitem>
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers