Because there is no nice way in PostgreSQL (that I know of) to derive
a histogram after a join (on an intermediate result) currently
usingMostCommonValues is only enabled on a join when the outer (probe)
side is a table scan (seq scan only actually). See
getMostCommonValues (soon to be called
On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt pandas...@gmail.com wrote:
Because there is no nice way in PostgreSQL (that I know of) to derive
a histogram after a join (on an intermediate result) currently
usingMostCommonValues is only enabled on a join when the outer (probe)
side is a table
On Tue, Dec 23, 2008 at 09:22:27AM -0500, Robert Haas wrote:
On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt pandas...@gmail.com wrote:
Because there is no nice way in PostgreSQL (that I know of) to derive
a histogram after a join (on an intermediate result) currently
usingMostCommonValues is
It's equivalent to our assumption that distributions of values in
columns in the same table are independent. Making that assumption in
this case would probably result in occasional dramatic speed
improvements similar to the ones we've seen in less complex joins,
offset by just-as-occasional
On Tue, Dec 23, 2008 at 10:14:29AM -0500, Robert Haas wrote:
It's equivalent to our assumption that distributions of values in
columns in the same table are independent. Making that assumption in
this case would probably result in occasional dramatic speed
improvements similar to the ones
On Sun, Dec 21, 2008 at 10:25:59PM -0500, Robert Haas wrote:
[Some performance testing.]
I (finally!) have a chance to post my performance testing results... my
apologies for the really long delay. Excuses omitted
Unfortunately I'm not seeing wonderful speedups with the particular
queries I did
[Some performance testing.]
I ran this query 10x with this patch applied, and then 10x again with
enable_hashjoin_usestatmvcs set to false to disable the optimization:
select sum(1) from (select * from part, lineitem where p_partkey = l_partkey) x;
With the optimization enabled, the query took
Robert,
I thoroughly appreciate the constructive criticism.
The compile errors are due to my development process being convoluted.
I will endeavor to not waste your time in the future with errors
caused by my development process.
I have updated the code to follow the conventions and
Dr. Lawrence:
I'm still working on reviewing this patch. I've managed to load the
sample TPCH data from tpch1g1z.zip after changing the line endings to
UNIX-style and chopping off the trailing vertical bars. (If anyone is
interested, I have the results of pg_dump | bzip2 -9 on the resulting
Robert,
You do not need to use qgen.exe to generate queries as you are not
running the TPC-H benchmark test. Attached is an example of the 22
sample TPC-H queries according to the benchmark.
We have not tested using the TPC-H queries for this particular patch and
only use the TPC-H database
I have to admit that I haven't fully grokked what this patch is about
just yet, so what follows is mostly a coding style review at this
point. It would help a lot if you could add some comments to the new
functions that are being added to explain the purpose of each at a
very high level. There's
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
I'm a tad worried about what happens when the values that are
frequently
occurring in the outer relation are also frequently occurring in the
inner (which hardly seems an improbable case). Don't you stand a
severe
risk of
Lawrence, Ramon [EMAIL PROTECTED] writes:
We propose a patch that improves hybrid hash join's performance for
large multi-batch joins where the probe relation has skew.
...
The basic idea
is to keep build relation tuples in a small in-memory hash table that
have join values that are
On Wed, Nov 05, 2008 at 04:06:11PM -0800, Bryce Cutt wrote:
The error is causes by me Asserting against the wrong variable. I
never noticed this as I apparently did not have assertions turned on
on my development machine. That is fixed now and with the new patch
version I have attached all
On Wed, Nov 5, 2008 at 5:06 PM, Bryce Cutt [EMAIL PROTECTED] wrote:
The error is causes by me Asserting against the wrong variable. I
never noticed this as I apparently did not have assertions turned on
on my development machine. That is fixed now and with the new patch
version I have
On Thu, 2008-11-06 at 15:33 -0700, Joshua Tolley wrote:
Stay tuned.
Minor question on this patch. AFAICS there is another patch that seems
to be aiming at exactly the same use case. Jonah's Bloom filter patch.
Shouldn't we have a dust off to see which one is best? Or at least a
discussion to
On Thu, Nov 6, 2008 at 3:52 PM, Simon Riggs [EMAIL PROTECTED] wrote:
On Thu, 2008-11-06 at 15:33 -0700, Joshua Tolley wrote:
Stay tuned.
Minor question on this patch. AFAICS there is another patch that seems
to be aiming at exactly the same use case. Jonah's Bloom filter patch.
Shouldn't
-Original Message-
Minor question on this patch. AFAICS there is another patch that
seems
to be aiming at exactly the same use case. Jonah's Bloom filter
patch.
Shouldn't we have a dust off to see which one is best? Or at least a
discussion to test whether they overlap? Perhaps
On Thu, Nov 6, 2008 at 5:31 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote:
-Original Message-
Minor question on this patch. AFAICS there is another patch that
seems
to be aiming at exactly the same use case. Jonah's Bloom filter
patch.
Shouldn't we have a dust off to see which one
On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
We propose a patch that improves hybrid hash join's performance for large
multi-batch joins where the probe relation has skew.
I'm running into problems with this patch. It applies cleanly, and the
technique you provided for
On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
We propose a patch that improves hybrid hash join's performance for large
multi-batch joins where the probe relation has skew.
I also recommend modifying docs/src/sgml/config.sgml to include the
enable_hashjoin_usestatmcvs
Joshua Tolley [EMAIL PROTECTED] writes:
On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
We propose a patch that improves hybrid hash join's performance for large
multi-batch joins where the probe relation has skew.
I also recommend modifying docs/src/sgml/config.sgml to
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Wed, Nov 5, 2008 at 8:20 AM, Tom Lane wrote:
Joshua Tolley writes:
On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
We propose a patch that improves hybrid hash join's performance for large
multi-batch joins where the probe
The error is causes by me Asserting against the wrong variable. I
never noticed this as I apparently did not have assertions turned on
on my development machine. That is fixed now and with the new patch
version I have attached all assertions are passing with your query and
my test queries. I
On Wed, Nov 05, 2008 at 04:06:11PM -0800, Bryce Cutt wrote:
The error is causes by me Asserting against the wrong variable. I
never noticed this as I apparently did not have assertions turned on
on my development machine. That is fixed now and with the new patch
version I have attached all
Joshua,
Thank you for offering to review the patch.
The easiest way to test would be to generate your own TPC-H data and
load it into a database for testing. I have posted the TPC-H generator
at:
http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip
The generator can produce skewed data sets. It was
On Sun, Nov 2, 2008 at 4:48 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote:
Joshua,
Thank you for offering to review the patch.
The easiest way to test would be to generate your own TPC-H data and
load it into a database for testing. I have posted the TPC-H generator
at:
Lawrence, Ramon [EMAIL PROTECTED] writes:
The easiest way to test would be to generate your own TPC-H data and
load it into a database for testing. I have posted the TPC-H generator
at:
http://people.ok.ubc.ca/rlawrenc/TPCHSkew.zip
The generator can produce skewed data sets. It was produced
From: Tom Lane [mailto:[EMAIL PROTECTED]
What alternatives are there for people who do not run Windows?
regards, tom lane
The TPC-H generator is a standard code base provided at
http://www.tpc.org/tpch/. We have been able to compile this code on
Linux.
However, we
On Mon, Oct 20, 2008 at 4:42 PM, Lawrence, Ramon [EMAIL PROTECTED] wrote:
We propose a patch that improves hybrid hash join's performance for large
multi-batch joins where the probe relation has skew.
Project name: Histojoin
Patch file: histojoin_v1.patch
This patch implements the Histojoin
We propose a patch that improves hybrid hash join's performance for
large multi-batch joins where the probe relation has skew.
Project name: Histojoin
Patch file: histojoin_v1.patch
This patch implements the Histojoin join algorithm as an optional
feature added to the standard Hybrid Hash
31 matches
Mail list logo