Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Thu, 24 Nov 2022 at 02:27, Simon Riggs wrote: > > On Wed, 23 Nov 2022 at 13:04, David Rowley wrote: > > I'd rather see this solved like v4 is doing it. > > Please do. No further comments. Thanks for your help Thanks. I pushed the v4 patch with some minor comment adjustments and also renamed

Re: Hash index build performance tweak from sorting

2022-11-23 Thread Tomas Vondra
On 11/23/22 14:07, David Rowley wrote: > On Fri, 18 Nov 2022 at 03:34, Tomas Vondra > wrote: >> I did some simple benchmark with v2 and v3, using the attached script, >> which essentially just builds hash index on random data, with different >> data types and maintenance_work_mem values. And

Re: Hash index build performance tweak from sorting

2022-11-23 Thread Simon Riggs
On Wed, 23 Nov 2022 at 13:04, David Rowley wrote: > After getting rid of the HashInsertState code and just adding bool > sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch > is much more simple: Seems good to me and I wouldn't argue with any of your comments. > and v4

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Fri, 18 Nov 2022 at 03:34, Tomas Vondra wrote: > I did some simple benchmark with v2 and v3, using the attached script, > which essentially just builds hash index on random data, with different > data types and maintenance_work_mem values. And what I see is this > (median of 10 runs): > So to

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Wed, 16 Nov 2022 at 17:33, Simon Riggs wrote: > > Thanks for the review, apologies for the delay in acting upon your comments. > > My tests show the sorted and random tests are BOTH 4.6% faster with > the v3 changes using 5-test avg, but you'll be pleased to know your > kit is about 15.5%

Re: Hash index build performance tweak from sorting

2022-11-17 Thread Tomas Vondra
Hi, I did some simple benchmark with v2 and v3, using the attached script, which essentially just builds hash index on random data, with different data types and maintenance_work_mem values. And what I see is this (median of 10 runs): machine data type m_w_mmasterv2

Re: Hash index build performance tweak from sorting

2022-11-15 Thread Simon Riggs
On Wed, 21 Sept 2022 at 02:32, David Rowley wrote: > > I took this patch for a spin and saw a 2.5% performance increase using > the random INT test that Tom posted. The index took an average of > 7227.47 milliseconds on master and 7045.05 with the patch applied. Thanks for the review, apologies

Re: Hash index build performance tweak from sorting

2022-10-11 Thread Michael Paquier
On Wed, Sep 21, 2022 at 12:43:15PM +0100, Simon Riggs wrote: > Thanks for tests and review. I'm just jumping on a plane, so may not > respond in detail until next Mon. Okay. If you have time to address that by next CF, that would be interesting. For now I have marked the entry as returned with

Re: Hash index build performance tweak from sorting

2022-09-21 Thread Simon Riggs
On Wed, 21 Sept 2022 at 02:32, David Rowley wrote: > > On Tue, 2 Aug 2022 at 03:37, Simon Riggs wrote: > > Using the above test case, I'm getting a further 4-7% improvement on > > already committed code with the attached patch, which follows your > > proposal. > > > > The patch passes info via a

Re: Hash index build performance tweak from sorting

2022-09-20 Thread David Rowley
On Tue, 2 Aug 2022 at 03:37, Simon Riggs wrote: > Using the above test case, I'm getting a further 4-7% improvement on > already committed code with the attached patch, which follows your > proposal. > > The patch passes info via a state object, useful to avoid API churn in > later patches. Hi

Re: Hash index build performance tweak from sorting

2022-08-30 Thread Ranier Vilela
>It's a shame you only see 3%, but that's still worth it. Hi, I ran this test here: DROP TABLE hash_speed; CREATE unlogged TABLE hash_speed (x integer); INSERT INTO hash_speed SELECT random()*1000 FROM generate_series(1,1000) x; VACUUM Timing is on. CREATE INDEX ON hash_speed USING hash

Re: Hash index build performance tweak from sorting

2022-08-30 Thread Simon Riggs
On Fri, 5 Aug 2022 at 20:46, David Zhang wrote: > > On 2022-08-01 8:37 a.m., Simon Riggs wrote: > > Using the above test case, I'm getting a further 4-7% improvement on > > already committed code with the attached patch, which follows your > > proposal. > > I ran two test cases: for committed

Re: Hash index build performance tweak from sorting

2022-08-05 Thread David Zhang
On 2022-08-01 8:37 a.m., Simon Riggs wrote: Using the above test case, I'm getting a further 4-7% improvement on already committed code with the attached patch, which follows your proposal. I ran two test cases: for committed patch `hash_sort_by_hash.v3.patch`, I can see about 6 ~ 7%

Re: Hash index build performance tweak from sorting

2022-08-01 Thread Simon Riggs
On Fri, 29 Jul 2022 at 13:49, Simon Riggs wrote: > > On Thu, 28 Jul 2022 at 19:50, Tom Lane wrote: > > > > Simon Riggs writes: > > > Thanks for the nudge. New version attached. > > > > I also see a speed improvement from this > > --- > > DROP TABLE IF EXISTS hash_speed; > > CREATE unlogged

Re: Hash index build performance tweak from sorting

2022-07-29 Thread Simon Riggs
On Thu, 28 Jul 2022 at 19:50, Tom Lane wrote: > > Simon Riggs writes: > > Thanks for the nudge. New version attached. > > I also see a speed improvement from this, so pushed (after minor comment > editing). Thanks > I notice though that if I feed it random data, > > --- > DROP TABLE IF EXISTS

Re: Hash index build performance tweak from sorting

2022-07-28 Thread Tom Lane
Simon Riggs writes: > Thanks for the nudge. New version attached. I also see a speed improvement from this, so pushed (after minor comment editing). I notice though that if I feed it random data, --- DROP TABLE IF EXISTS hash_speed; CREATE unlogged TABLE hash_speed (x integer); INSERT INTO

Re: Hash index build performance tweak from sorting

2022-07-28 Thread Simon Riggs
On Wed, 27 Jul 2022 at 19:22, Tom Lane wrote: > > Simon Riggs writes: > > [ hash_sort_by_hash.v2.patch ] > > The cfbot says this no longer applies --- probably sideswiped by > Korotkov's sorting-related commits last night. Thanks for the nudge. New version attached. -- Simon Riggs

Re: Hash index build performance tweak from sorting

2022-07-27 Thread Tom Lane
Simon Riggs writes: > [ hash_sort_by_hash.v2.patch ] The cfbot says this no longer applies --- probably sideswiped by Korotkov's sorting-related commits last night. regards, tom lane

RE: Hash index build performance tweak from sorting

2022-07-21 Thread houzj.f...@fujitsu.com
On Monday, May 30, 2022 4:13 pmshiy.f...@fujitsu.com wrote: > > On Tue, May 10, 2022 5:43 PM Simon Riggs > wrote: > > > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila > > wrote: > > > > > > Few comments on the patch: > > > 1. I think it is better to use DatumGetUInt32 to fetch the hash key > > >

RE: Hash index build performance tweak from sorting

2022-05-30 Thread shiy.f...@fujitsu.com
On Tue, May 10, 2022 5:43 PM Simon Riggs wrote: > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila > wrote: > > > > Few comments on the patch: > > 1. I think it is better to use DatumGetUInt32 to fetch the hash key as > > the nearby code is using. > > 2. You may want to change the below comment in

Re: Hash index build performance tweak from sorting

2022-05-10 Thread Simon Riggs
On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > Few comments on the patch: > 1. I think it is better to use DatumGetUInt32 to fetch the hash key as > the nearby code is using. > 2. You may want to change the below comment in HSpool > /* > * We sort the hash keys based on the buckets they

Re: Hash index build performance tweak from sorting

2022-05-04 Thread Amit Kapila
On Mon, May 2, 2022 at 9:28 PM Simon Riggs wrote: > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > > > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs > > wrote: > > > > > > Hash index pages are stored in sorted order, but we don't prepare the > > > data correctly. > > > > > > We sort the

Re: Hash index build performance tweak from sorting

2022-05-02 Thread Simon Riggs
On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs > wrote: > > > > Hash index pages are stored in sorted order, but we don't prepare the > > data correctly. > > > > We sort the data as the first step of a hash index build, but we > > forget to sort

Re: Hash index build performance tweak from sorting

2022-04-30 Thread Amit Kapila
On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs wrote: > > Hash index pages are stored in sorted order, but we don't prepare the > data correctly. > > We sort the data as the first step of a hash index build, but we > forget to sort the data by hash as well as by hash bucket. > I was looking into