Shouldn't the patch status be set to "Waiting on Author"?

(I was curious if this is a patch that I can review.)

Julien Rouhaud <rjuju...@gmail.com> wrote:

> On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote:
> > >    Can you describe how it's kept in sync, and how it makes sure that the 
> > > property
> > >    is maintained over restart / gc?  I don't see any change in the code 
> > > for the
> > >    2nd part so I don't see how it could work (and after a quick test it 
> > > indeed
> > >    doesn't).
> >
> > There is a routine called qtext_hash_sync which removed all entries from
> > the qtext_hash and reloads it will all the query ids from pgss_hash.
> > [...]
> > All the points when it's called, an exclusive lock is held.this allows or 
> > syncing all
> > The present queryid's in pgss_hash with qtext_hash.
> 
> So your approach is to let the current gc / file loading behavior happen as
> before and construct your mapping hash using the resulting query text / offset
> info.  That can't work.
> 
> > >    2nd part so I don't see how it could work (and after a quick test it 
> > > indeed
> > >    doesn't).
> >
> > Can you tell me what test you used to determine it is not in sync?
> 
> What test did you use to determine it is in sync?  Have you checked how the 
> gc/
> file loading actually work?
> 
> In my case I just checked the size of the query text file after running some
> script that makes sure that there are the same few queries ran by multiple
> different roles, then:
> 
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B
> pg_ctl restart
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B
> 
> > >    Can you share more details on the benchmarks you did?  Did you also run
> > >    benchmark on workloads that induce entry eviction, with and without 
> > > need for
> > >    gc?  Eviction in pgss is already very expensive, and making things 
> > > worse just
> > >    to save a bit of disk space doesn't seem like a good compromise.
> >
> > Sorry this was poorly explained by me. I went back and did some benchmarks. 
> > Attached is
> > The script and results. But here is a summary:
> > On a EC2 r5.2xlarge. The benchmark I performed is:
> > 1. create 10k tables
> > 2. create 5 users
> > 3. run a pgbench script that performs per transaction a select on
> > A randomly chosen table for each of the 5 users.
> > 4. 2 variants of the test executed . 1 variant is with the default 
> > pg_stat_statements.max = 5000
> > and one test with a larger pg_stat_statements.max = 10000.
> 
> But you wrote:
> 
> > ##################################
> > ## pg_stat_statements.max = 15000
> > ##################################
> 
> So which one is it?
> 
> >
> > So 10-15% is not accurate. I originally tested on a less powered machine. 
> > For this
> > Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger 
> > sized
> > pg_stat_statements.max is used and less gc/deallocations.
> > Both tests show a drop in gc/deallocations and a net increase
> > In tps. Less GC makes sense since the external file has less duplicate SQLs.
> 
> On the other hand you're rebuilding the new query_offset hashtable every time
> there's an entry eviction, which seems quite expensive.
> 
> Also, as I mentioned you didn't change any of the heuristic for
> need_gc_qtexts().  So if the query texts are indeed deduplicated, doesn't it
> mean that  gc  will artificially
> be called less often?  The wanted target of "50% bloat" will become "50%
> bloat assuming no deduplication is done" and the average query text file size
> will stay the same whether the query texts are deduplicated or not.
> 
> I'm wondering the improvements you see due to the patch or simply due to
> artificially calling gc less often?  What are the results if instead of using
> vanilla pg_stat_statements you patch it to perform roughly the same number of
> gc as your version does?
> 
> Also your benchmark workload is very friendly with your feature, what are the
> results with other workloads?  Having the results with query texts that aren't
> artificially long would be interesting for instance, after fixing the problems
> mentioned previously.
> 
> Also, you said that if you run that benchmark with a single user you don't see
> any regression.   I don't see how rebuilding an extra hashtable in
> entry_dealloc(), so when holding an exclusive lwlock, while not saving any
> other work elsewhere could be free?
> 
> Looking at the script, you have:
> echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \
> 
> Is that really necessary?  Couldn't you upgrade the gc message to a higher
> level for your benchmark need, or expose some new counter in
> pg_stat_statements_info maybe?  Have you done the benchmark using a debug 
> build
> or normal build?
> 
> 

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply via email to