Extended stats - value not in MCV list

2021-05-01 Thread Pedro Luis Guzmán Hernández
Hi there,

I've just started using extended stats cause the planner was giving me
terrible estimates for a certain table. MCV extended stats solved my
problem when values are in the extended MCV list, but estimates are still
terrible when they are not in the MCV list. Limited as my knowledge of
database internals is, I dug into the source code and found an important
difference on how these not-MCV values are handled in the single-column and
multi-column cases.

For single columns, the estimate is calculated as follows:
selectivity = (1 - sum(MCV_frequencies)) / (distinct_values - count(MCV))
Which seems to assume a uniform distribution of non-MCV values and looks
like a sensible guess, at least to my amateur eyes.

For multi-column statistics it seems to me that the estimate is calculated
instead as:
selectivity = 1 - sum(MCV_frequencies)
Which instead seems to assume that the value could potentially be present
in all the rows not covered by the MCV. This seems like an adequate upper
bound, but is also quite conservative compared to the single-column
estimate. In my specific case this yields a selectivity even higher than
some of the least frequent MCV values, which is a condition that is
actually checked for and compensated in the single-column estimate as an
additional check. I have MCV and distinct extended stats, so I know the
distinct_values  stats is available.

So I hope my question is clear from the above. How come the estimates are
calculated with such different approaches? I insist I have no experience
with database/query planner development, so excuse me if I am overlooking
some obvious conceptual difference between single-column and multi-column
stats. The single-column estimate is actually described in the
documentation, but the multi-column estimate is not. If there is indeed a
good reason for this difference, I think it should be documented.

Thanks,
Pedro


Re: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)

2021-05-01 Thread Adrian Klaver

On 5/1/21 3:59 AM, Wolfgang Rißler wrote:

This is my problem, I completely dont know, how to start compiling my 
own actual 32bit libpq on windows (and I would like to do it with VS 2019).
For libpqxx there have been some hints how to do so in the past, and now 
there is a complete project, which deals with compiling it on windows 
with VS and CMake. But I didnt find such hints for libpq or the whole 
postgresDB.


Have you looked at below?:

https://www.postgresql.org/docs/current/install-windows.html




Thank you




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Access a newer Version of PGDB (v13) with an older libpq (v10 x86)

2021-05-01 Thread Wolfgang Rißler

Am 30.04.2021 um 16:16 schrieb Tom Lane:

=?UTF-8?Q?Wolfgang_Ri=c3=9fler?=  writes:

The problem is, that our application (IDE MS-VisualStudio, C++) has to
be 32bit, because of some old 32bit-dll's, which we cant kick out at the
moment.
So I compiled a libpqxx with the last 32bit libpq (which is v10).


Uh ... what's this about "last 32-bit libpq"?


Ok, I meant, the last ready to use packaged 32-bit libpq from EDB (~_0).



I can believe that a particular packager (EDB, say) might not be shipping
prebuilt 32-bit binaries anymore.  But if you are in a position to compile
your own libraries then you can certainly build any release you want as
32-bit.


This is my problem, I completely dont know, how to start compiling my 
own actual 32bit libpq on windows (and I would like to do it with VS 2019).
For libpqxx there have been some hints how to do so in the past, and now 
there is a complete project, which deals with compiling it on windows 
with VS and CMake. But I didnt find such hints for libpq or the whole 
postgresDB.
Or is there another provider, who supplys V13 32bit binary installers 
for Windows?





I would recommend trying to use a reasonably late-vintage libpq; we do
fix bugs in it on a regular basis.





The common stumbling block for cross-version situations is that the
client makes assumptions about system catalog contents that are not
valid in some other server release.  libpq proper doesn't really touch
the catalogs, so it's mostly impervious to that problem; but you'll need
to test your applications.


Of course we'll do. One thing is, that we load and write bytea's. And as 
I read, there have been some changes. All other Operations are less 
problematic.


Thank you

--

- may the source be with you -




Re: -1/0 virtualtransaction

2021-05-01 Thread Mike Beachy
In case this helps anyone else, I found a simple way to get a rough
idea of what's going on, which is to run:

select (select count(distinct virtualtransaction) from pg_locks) as
tx_with_locks, (select count(*) from pg_stat_activity where state =
'active') as active_tx, (select count(*) from pg_locks where
virtualtransaction = '-1/0') as summarized_locks;

I disabled the part of my application that seemed to be causing
problems with too many writes (a background cleanup task) and then
triggered it from a separate process. I can see the number of
transactions with locks climbing when it hits a problematic item,
while the number of active transactions (of course) stays low.

Mike

On Fri, Apr 30, 2021 at 4:53 PM Mike Beachy  wrote:
>
> On Fri, Apr 30, 2021 at 7:12 AM Thomas Munro  wrote:
> > But do you have lots of short overlapping transactions so that there
> > is never a moment where there are zero transactions running?
>
> Yeah, that almost certainly explains it.
>
> Thanks very much for the explanation about the summarized locks.
>
> > The number of SERIALIZABLEXACT objects is (max_connections +
> > max_prepared_transactions) * 10.  So, you could try increasing
> > max_connections (without increasing the actual number of connections)
> > to see if you can get to a point where you don't see these invalid
> > virtual xids, and then maybe it'll be able to clean up locks more
> > aggressively.
>
> Aha! I hadn't considered that some parameter besides
> max_pred_locks_per_transaction would come into play.  I'll give this a
> shot.
>
> Thanks,
> Mike




Re: database sorting algorithms.

2021-05-01 Thread Jian He
Thanks a lot.
I found out about this Youtube video (https://youtu.be/alJswNJ4P3U?t=1852),
in case you guys are interested.
This video really clarify about the time complixty of MergeSort.

On Sat, May 1, 2021 at 3:19 PM Gavan Schneider 
wrote:

> On 1 May 2021, at 17:06, Jian He wrote:
>
> Been self study Database, from database I deep dived into sorting
> algorithms.
>
> Databases can do in-memory QuickSort. It also has an on-disk MergeSort.
>
> For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108
> (around 1 minutes only)
>
> Also check https://en.wikipedia.org/wiki/Merge_sort
>
> But I am still not fully understanding about *nlogn*. I understand how
> many
> passes it will take, that is* logn. *
> Yes each pass will sort N elements.
> But I still don't get the *N* stand f*or in n*logn.*
>
> So, answering the question…
> The ’n’ refers to the need to do something to each element at least once,
> so the sort time grows in simple proportion to the size of the list that
> needs to be sorted. Unfortunately that is not enough work to get the list
> sorted so extra steps are needed. The log(n) indicates how many extra steps
> are needed. So the overall performance is proportional to the number of
> elements (N) multiplied by the log of the number of elements, viz., N *
> log(N)
>
> Regards
> Gavan Schneider
> ——
> Gavan Schneider, Sodwalls, NSW, Australia
> Explanations exist; they have existed for all time; there is always a
> well-known solution to every human problem — neat, plausible, and wrong.
> — H. L. Mencken, 1920
>


Re: database sorting algorithms.

2021-05-01 Thread Gavan Schneider

On 1 May 2021, at 17:06, Jian He wrote:

Been self study Database, from database I deep dived into sorting 
algorithms.


Databases can do in-memory QuickSort. It also has an on-disk 
MergeSort.


For MergeSort: I follow this tutorial 
https://youtu.be/6pV2IF0fgKY?t=1108

(around 1 minutes only)


Also check https://en.wikipedia.org/wiki/Merge_sort

But I am still not fully understanding about *nlogn*. I  understand 
how many

passes it will take, that is* logn. *
Yes each pass will sort N elements.
But I still don't get the *N* stand f*or in n*logn.*


So, answering the question…
The ’n’ refers to the need to do something to each element at least 
once, so the sort time grows in simple proportion to the size of the 
list that needs to be sorted. Unfortunately that is not enough work to 
get the list sorted so extra steps are needed. The log(n) indicates how 
many extra steps are needed. So the overall performance is proportional 
to the number of elements (N) multiplied by the log of the number of 
elements, viz., N * log(N)


Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a 
well-known solution to every human problem — neat, plausible, and 
wrong.

— H. L. Mencken, 1920


Re: database sorting algorithms.

2021-05-01 Thread Francisco Olarte
Jian He:

On Sat, May 1, 2021 at 9:07 AM Jian He  wrote:
> Been self study Database, from database I deep dived into sorting algorithms.

Peek a good book, because that is a hairy topic with lot of math and
other previous knowledge required.

> Databases can do in-memory QuickSort. It also has an on-disk MergeSort.

If you ignore top-n and other stuff, just full sorts, databases uses
two types of sorting, internal sorts, which are the ones used when you
can fit your data on ram, and external sorts, when they have to spill
to tape/disk/whatever.

For internal sorts you can use quicksort, heapsort, mergesort, even
insertion sort. Quicksort with fallback to insertion is a popular one
( quicksort and merge sort are divide and conquer algorithms, but once
they have divided enough it is easier to fallback to a simpler
algorithm ).

When the data no longers fit in ram you need to make an external sort.
The classic way to do this is to sort chunks as big as they fit in
ram, write presorted chunks to disk and then merge those chunks. These
is called merge sort too, and is similar to a memory merge sort, but
not the same ( when I've done it I use two way merge for memory merge
sorts, but N-way for disk sort ( current memories are so big I never
need to do more than 1 merge pass since the tape era )

> For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108  
> (around 1 minutes only)
> But I am still not fully understanding about nlogn. I understand how many 
> passes it will take, that is logn.
> Yes each pass will sort N elements.
> But I still don't get the N stand for in n*logn.

Basically you do logn passes over N elements.

> Why does each pass take  n time to sort it?

I have not time for looking at the video, but basic binary merge sort
is N log N because you have logN passes and each one must merge N
elements. Imagine you do an 8 elements pure merge sort.
1st you split it into 4+4 elements, recurse to sort them and merge
them.  Merging is an N-complexity op because you need to read the 8
elements.
4 elements are split into 2+2, same thing, merge neads four reads, but
you have two merges, so eight reads again.
2 elements are split into 1+1, two reads for merging, but you have
four chungk, so eight again.

Total, 3 passes ( log2(8) ) of eight reads.

Real sorts are more complex, but the N is always present, as you
always need to at least read the full input set to sort.

Note, in external sort you may have N elements, of which only M fit in ram.
You do C=N/M chunks , these need C*MlogM = N logMtime to be sorted
and C*M=N time to be written to and read from disk.
But to merge C chunks you basically do logC comparison for element, so
add N*logC N(logN-logM). If you add appropiate constants and add all
you'll find the final result is O(NlogN).

Francisco Olarte.








Francisco Olarte.




database sorting algorithms.

2021-05-01 Thread Jian He
Been self study Database, from database I deep dived into sorting
algorithms.

Databases can do in-memory QuickSort. It also has an on-disk MergeSort.

For MergeSort: I follow this tutorial https://youtu.be/6pV2IF0fgKY?t=1108
(around 1 minutes only)

But I am still not fully understanding about *nlogn*. I understand how many
passes it will take, that is* logn. *
Yes each pass will sort N elements.
But I still don't get the *N* stand f*or in n*logn.*
Why does each pass take
*n time to sort it? *