Tom Lane wrote:

So I have some results. I have tested the query on both PostgreSQL 8.0.1 and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL result is 11,667.916 ms, MySQL result is 448.4 ms.

That's a fairly impressive discrepancy :-(, and even the slot_getattr()
patch that Atsushi Ogawa provided isn't going to close the gap.
(I got about a 4x speedup on Miroslav's example in my testing, which
leaves us still maybe 6x slower than MySQL.)

As I wrote, the comparison is not "fair". Here are the conditions:

"Both databases are running on the same machine (my laptop) and contain the same data. However there are some differences in the data table definitions:
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not of the same type (smallint <-> integer (SERIAL)), in MySQL I use the same types"

For those not used to MySQL, enum is an integer "mapped" to a text string (that's how I see it). That means that when you have field such as enum('yes','no','DK'), in the table there are stored numbers 1, 2 and 3 which are mapped to the text values 'yes', 'no' and 'DK'. The description is not accurate (I'm not MySQL programmer, I didn't check it recently and I didn't inspect the code - I wouldn't understand it either) but I think it's not that important. What is important is the fact that MySQL has to work with some dozen fields that are numbers and PostgreSQL has to work with the same fields as varchar(). Some of the other fields are also varchars. This might (or might not) cause the speed difference. However I think that if devs figure out how to speed this up, other cases will benefit from the improvement too.

As I understood from the contributions of other, the 2) shouldn't have a great impact on the speed.

Looking at the post-patch profile for the test case, there is still
quite a lot of cycles going into tuple assembly and disassembly:

Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls ms/call ms/call name 24.47 4.49 4.49 _mcount
8.01 5.96 1.47 9143692 0.00 0.00 ExecEvalVar
6.92 7.23 1.27 6614373 0.00 0.00 slot_deformtuple
6.54 8.43 1.20 9143692 0.00 0.00 slot_getattr
6.21 9.57 1.14 103737 0.01 0.03 ExecTargetList
5.56 10.59 1.02 103775 0.01 0.01 DataFill
3.22 11.18 0.59 103775 0.01 0.01 ComputeDataSize
2.83 11.70 0.52 ExecEvalVar
2.72 12.20 0.50 9094122 0.00 0.00 memcpy
2.51 12.66 0.46 encore
2.40 13.10 0.44 427448 0.00 0.00 nocachegetattr
2.13 13.49 0.39 103775 0.00 0.02 heap_formtuple
2.07 13.87 0.38 noshlibs
1.20 14.09 0.22 225329 0.00 0.00 _doprnt
1.20 14.31 0.22 msquadloop
1.14 14.52 0.21 chunks
0.98 14.70 0.18 871885 0.00 0.00 AllocSetAlloc
0.98 14.88 0.18 $$dyncall
0.76 15.02 0.14 594242 0.00 0.00 FunctionCall3
0.71 15.15 0.13 213312 0.00 0.00 comparetup_heap
0.65 15.27 0.12 6364 0.02 0.13 printtup
0.60 15.38 0.11 790702 0.00 0.00 pfree

(_mcount is profiling overhead, ignore it.)  It looks to me like just
about everything in the top dozen functions is there as a result of the
fact that join steps form new tuples that are the merge of their input
tuples.  Even our favorite villains, palloc and pfree, are down in the
sub-percent range.

I am guessing that the reason MySQL wins on this is that they avoid
doing any data copying during a join step.  I wonder whether we could
accomplish the same by taking Ogawa's patch to the next level: allow
a TupleTableSlot to contain either a "materialized" tuple as now,
or a "virtual" tuple that is simply an array of Datums and null flags.
(It's virtual in the sense that any pass-by-reference Datums would have
to be pointing to data at the next level down.)  This would essentially
turn the formtuple and deformtuple operations into no-ops, and get rid
of a lot of the associated overhead such as ComputeDataSize and
DataFill.  The only operations that would have to forcibly materialize
a tuple would be ones that need to keep the tuple till after they fetch
their next input tuple --- hashing and sorting are examples, but very
many plan node types don't ever need to do that.

I haven't worked out the details, but it seems likely that this could be
a relatively nonintrusive patch.  The main thing that would be an issue
would be that direct reference to slot->val would become verboten (since
you could no longer be sure there was a materialized tuple there).
I think this would possibly affect some contrib stuff, which is a strong
hint that it'd break some existing user-written code out there.


Mine thought is that I don't know what you are talking about :-) Now seriously, I am far below this level of knowledge. But I can contribute a test that (maybe) can help. I have rewritten the query so it JOINs the varchar() fields (in fact all fields except the IDPK) at the last INNER JOIN. Though there is one more JOIN, the query is more than 5 times faster (1975.312 ms) :-)

So my silly opinion is that if the planner could decide that there are too much time expensive fields that are not needed during performing JOINs and these could be JOINed at the last step, it would do it this way :-)

Below is the adjusted query and the EXPLAIN ANALYZE output. (Tom, you can run it on the data I have sent you and it should run without changes.)

SELECT AdDevicesSites.IDPK, AdDevicesSites2.AdDevicesSiteSizeIDFK, AdDevicesSites2.AdDevicesSiteRegionIDFK, AdDevicesSites2.AdDevicesSiteCountyIDFK, AdDevicesSites2.AdDevicesSiteCityIDFK, AdDevicesSites2.AdDevicesSiteDistrictIDFK, AdDevicesSites2.AdDevicesSiteStreetIDFK, AdDevicesSites2.AdDevicesSiteStreetDescriptionIDFK, AdDevicesSites2.AdDevicesSitePositionIDFK, AdDevicesSites2.AdDevicesSiteVisibilityIDFK, AdDevicesSites2.AdDevicesSiteStatusTypeIDFK, AdDevicesSites2.AdDevicesSitePartnerIdentificationOperatorIDFK, AdDevicesSites2.AdDevicesSitePartnerElectricitySupplierIDFK, AdDevicesSites2.AdDevicesSitePartnerMaintainerIDFK, AdDevicesSites2.AdDevicesSitePartnerStickerIDFK, AdDevicesSites2.CadastralUnitIDFK, AdDevicesSites2.MediaType, AdDevicesSites2.Mark, AdDevicesSites2.Amount, AdDevicesSites2.Distance, AdDevicesSites2.OwnLightening, AdDevicesSites2.LocationDownTown, AdDevicesSites2.LocationSuburb, AdDevicesSites2.LocationBusinessDistrict, AdDevicesSites2.LocationResidentialDistrict, AdDevicesSites2.LocationIndustrialDistrict, AdDevicesSites2.LocationNoBuildings, AdDevicesSites2.ParkWayHighWay, AdDevicesSites2.ParkWayFirstClassRoad, AdDevicesSites2.ParkWayOtherRoad, AdDevicesSites2.ParkWayStreet, AdDevicesSites2.ParkWayAccess, AdDevicesSites2.ParkWayExit, AdDevicesSites2.ParkWayParkingPlace, AdDevicesSites2.ParkWayPassangersOnly, AdDevicesSites2.ParkWayCrossRoad, AdDevicesSites2.PositionStandAlone, AdDevicesSites2.NeighbourhoodPublicTransportation, AdDevicesSites2.NeighbourhoodInterCityTransportation, AdDevicesSites2.NeighbourhoodPostOffice, AdDevicesSites2.NeighbourhoodNewsStand, AdDevicesSites2.NeighbourhoodAmenities, AdDevicesSites2.NeighbourhoodSportsSpot, AdDevicesSites2.NeighbourhoodHealthServiceSpot, AdDevicesSites2.NeighbourhoodShops, AdDevicesSites2.NeighbourhoodShoppingCenter, AdDevicesSites2.NeighbourhoodSuperMarket, AdDevicesSites2.NeighbourhoodPetrolStation, AdDevicesSites2.NeighbourhoodSchool, AdDevicesSites2.NeighbourhoodBank, AdDevicesSites2.NeighbourhoodRestaurant, AdDevicesSites2.NeighbourhoodHotel, AdDevicesSites2.RestrictionCigarettes, AdDevicesSites2.RestrictionPolitics, AdDevicesSites2.RestrictionSpirits, AdDevicesSites2.RestrictionSex, AdDevicesSites2.RestrictionOther, AdDevicesSites2.RestrictionNote, AdDevicesSites2.SpotMapFile, AdDevicesSites2.SpotPhotoFile, AdDevicesSites2.SourcePhotoTimeStamp, AdDevicesSites2.SourceMapTimeStamp, AdDevicesSites2.Price, AdDevicesSites2.WebPrice, AdDevicesSites2.CadastralUnitCode, AdDevicesSites2.BuildingNumber, AdDevicesSites2.ParcelNumber, AdDevicesSites2.GPSLatitude, AdDevicesSites2.GPSLongitude, AdDevicesSites2.GPSHeight, AdDevicesSites2.MechanicalOpticalCoordinates, AdDevicesSites2.Deleted, AdDevicesSites2.Protected, AdDevicesSites2.DateCreated, AdDevicesSites2.DateLastModified, AdDevicesSites2.DateDeleted, AdDevicesSites2.CreatedByUserIDFK, AdDevicesSites2.LastModifiedByUserIDFK, AdDevicesSites2.DeletedByUserIDFK, AdDevicesSites2.PhotoLastModificationDate, AdDevicesSites2.MapLastModificationDate, AdDevicesSites2.DateLastImported, AdDevicesSiteRegions.Name AS AdDevicesSiteRegionName, AdDevicesSiteCounties.Name AS AdDevicesSiteCountyName, AdDevicesSiteCities.Name AS AdDevicesSiteCityName, AdDevicesSiteStreets.Name AS AdDevicesSiteStreetName, AdDevicesSiteDistricts.Name AS AdDevicesSiteDistrictName, AdDevicesSiteStreetDescriptions.Name_cs AS AdDevicesSiteStreetDescriptionName_cs, AdDevicesSiteStreetDescriptions.Name_en AS AdDevicesSiteStreetDescriptionName_en, AdDevicesSiteSizes.Name AS AdDevicesSiteSizeName, SUBSTRING(AdDevicesSiteVisibilities.Name_cs, 3) AS AdDevicesSiteVisibilityName_cs, SUBSTRING(AdDevicesSiteVisibilities.Name_en, 3) AS AdDevicesSiteVisibilityName_en, AdDevicesSitePositions.Name_cs AS AdDevicesSitePositionName_cs, AdDevicesSitePositions.Name_en AS AdDevicesSitePositionName_en, AdDevicesSiteStatusTypes.Name_cs AS AdDevicesSiteStatusTypeName_cs, AdDevicesSiteStatusTypes.Name_en AS AdDevicesSiteStatusTypeName_en, PartnerIdentificationsOperator.Name AS PartnerIdentificationOperatorName, PartnersElectricitySupplier.Name AS PartnerElectricitySupplierName, PartnersMaintainer.Name AS PartnerMaintainerName, PartnersSticker.Name AS PartnerStickerName, CadastralUnits.Code AS CadastralUnitCodeNative, CadastralUnits.Name AS CadastralUnitName
FROM AdDevicesSites
LEFT JOIN AdDevicesSiteRegions ON AdDevicesSites.AdDevicesSiteRegionIDFK = AdDevicesSiteRegions.IDPK
LEFT JOIN AdDevicesSiteCounties ON AdDevicesSites.AdDevicesSiteCountyIDFK = AdDevicesSiteCounties.IDPK
LEFT JOIN AdDevicesSiteCities ON AdDevicesSites.AdDevicesSiteCityIDFK = AdDevicesSiteCities.IDPK
LEFT JOIN AdDevicesSiteStreets ON AdDevicesSites.AdDevicesSiteStreetIDFK = AdDevicesSiteStreets.IDPK
LEFT JOIN AdDevicesSiteStreetDescriptions ON AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK = AdDevicesSiteStreetDescriptions.IDPK
LEFT JOIN AdDevicesSiteDistricts ON AdDevicesSites.AdDevicesSiteDistrictIDFK = AdDevicesSiteDistricts.IDPK
LEFT JOIN AdDevicesSiteSizes ON AdDevicesSites.AdDevicesSiteSizeIDFK = AdDevicesSiteSizes.IDPK
LEFT JOIN AdDevicesSiteVisibilities ON AdDevicesSites.AdDevicesSiteVisibilityIDFK = AdDevicesSiteVisibilities.IDPK
LEFT JOIN AdDevicesSitePositions ON AdDevicesSites.AdDevicesSitePositionIDFK = AdDevicesSitePositions.IDPK
LEFT JOIN AdDevicesSiteStatusTypes ON AdDevicesSites.AdDevicesSiteStatusTypeIDFK = AdDevicesSiteStatusTypes.IDPK
LEFT JOIN PartnerIdentifications AS PartnerIdentificationsOperator ON AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK = PartnerIdentificationsOperator.IDPK
LEFT JOIN Partners AS PartnersElectricitySupplier ON AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK = PartnersElectricitySupplier.IDPK
LEFT JOIN Partners AS PartnersMaintainer ON AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK = PartnersMaintainer.IDPK
LEFT JOIN Partners AS PartnersSticker ON AdDevicesSites.AdDevicesSitePartnerStickerIDFK = PartnersSticker.IDPK
LEFT JOIN CadastralUnits ON AdDevicesSites.CadastralUnitIDFK = CadastralUnits.IDPK
INNER JOIN AdDevicesSites AS AdDevicesSites2 ON AdDevicesSites.IDPK = AdDevicesSites2.IDPK


Hash Join  (cost=193867.68..235417.92 rows=142556 width=815) (actual 
time=1577.080..2200.677 rows=6364 loops=1)

 Hash Cond: ("outer".idpk = "inner".idpk)

 ->  Seq Scan on addevicessites addevicessites2  (cost=0.00..13118.56 
rows=142556 width=473) (actual time=40.650..49.195 rows=6364 loops=1)

 ->  Hash  (cost=186898.29..186898.29 rows=142556 width=350) (actual 
time=1534.080..1534.080 rows=0 loops=1)

       ->  Merge Right Join  (cost=184758.38..186898.29 rows=142556 width=350) 
(actual time=1187.653..1244.955 rows=6364 loops=1)

             Merge Cond: ("outer".idpk = "inner".cadastralunitidfk)

             ->  Index Scan using cadastralunits_pkey on cadastralunits  
(cost=0.00..314.49 rows=13027 width=31) (actual time=0.034..0.128 rows=63 loops=1)

             ->  Sort  (cost=184758.38..185114.77 rows=142556 width=325) 
(actual time=1187.582..1190.111 rows=6364 loops=1)

                   Sort Key: addevicessites.cadastralunitidfk

                   ->  Hash Left Join  (cost=93887.04..143074.76 rows=142556 
width=325) (actual time=502.584..1129.145 rows=6364 loops=1)

                         Hash Cond: ("outer".addevicessitepartnerstickeridfk = 

                         ->  Hash Left Join  (cost=93884.28..140933.65 
rows=142556 width=307) (actual time=502.388..1075.572 rows=6364 loops=1)

                               Hash Cond: ("outer".addevicessitepartnermaintaineridfk = 

                               ->  Hash Left Join  (cost=93881.51..138792.55 
rows=142556 width=289) (actual time=502.208..1023.802 rows=6364 loops=1)

                                     Hash Cond: 
("outer".addevicessitepartnerelectricitysupplieridfk = "inner".idpk)

                                     ->  Hash Left Join  
(cost=93878.75..136651.45 rows=142556 width=271) (actual time=502.041..969.959 
rows=6364 loops=1)

                                           Hash Cond: 
("outer".addevicessitepartneridentificationoperatoridfk = "inner".idpk)

                                           ->  Nested Loop Left Join  
(cost=93875.99..134510.35 rows=142556 width=253) (actual time=501.849..915.256 
rows=6364 loops=1)

                                                 Join Filter: 
("outer".addevicessitestatustypeidfk = "inner".idpk)

                                                 ->  Nested Loop Left Join  
(cost=93874.93..118471.74 rows=142556 width=228) (actual time=501.826..818.436 
rows=6364 loops=1)

                                                       Join Filter: 
("outer".addevicessitepositionidfk = "inner".idpk)

                                                       ->  Nested Loop Left 
Join  (cost=93873.90..108848.18 rows=142556 width=207) (actual 
time=501.802..737.137 rows=6364 loops=1)

                                                             Join Filter: 
("outer".addevicessitevisibilityidfk = "inner".idpk)

                                                             ->  Merge Right 
Join  (cost=93872.86..96017.09 rows=142556 width=177) (actual 
time=501.741..554.834 rows=6364 loops=1)

                                                                   Merge Cond: 
("outer".idpk = "inner".addevicessitesizeidfk)

                                                                   ->  Index 
Scan using addevicessitesizes_pkey on addevicessitesizes  (cost=0.00..5.62 
rows=110 width=14) (actual time=0.009..0.264 rows=110 loops=1)

                                                                   ->  Sort  
(cost=93872.86..94229.25 rows=142556 width=169) (actual time=501.697..504.267 
rows=6364 loops=1)

Key: addevicessites.addevicessitesizeidfk

Hash Left Join  (cost=57752.91..65764.23 rows=142556 width=169) (actual 
time=234.321..466.130 rows=6364 loops=1)

                                                                               Hash Cond: 
("outer".addevicessitedistrictidfk = "inner".idpk)

->  Hash Left Join  (cost=57743.17..63616.15 rows=142556 width=164) (actual 
time=233.456..421.267 rows=6364 loops=1)

                                                                                     Hash Cond: 
("outer".addevicessitestreetdescriptionidfk = "inner".idpk)

  ->  Hash Left Join  (cost=57634.86..62707.43 rows=142556 width=137) (actual 
time=223.396..362.945 rows=6364 loops=1)

Cond: ("outer".addevicessitestreetidfk = "inner".idpk)

        ->  Hash Left Join  (cost=57566.95..61844.20 rows=142556 width=119) 
(actual time=217.101..312.605 rows=6364 loops=1)

Hash Cond: ("outer".addevicessitecityidfk = "inner".idpk)

              ->  Merge Right Join  (cost=57561.85..59700.76 rows=142556 
width=110) (actual time=216.635..266.672 rows=6364 loops=1)

   Merge Cond: ("outer".idpk = "inner".addevicessitecountyidfk)

                    ->  Sort  (cost=6.19..6.48 rows=117 width=17) (actual 
time=0.350..0.389 rows=116 loops=1)

                             Sort Key: addevicessitecounties.idpk

                          ->  Seq Scan on addevicessitecounties  
(cost=0.00..2.17 rows=117 width=17) (actual time=0.008..0.146 rows=117 loops=1)

                    ->  Sort  (cost=57555.66..57912.05 rows=142556 width=99) 
(actual time=216.250..218.611 rows=6364 loops=1)

                             Sort Key: addevicessites.addevicessitecountyidfk

                          ->  Merge Right Join  (cost=33573.63..35712.03 
rows=142556 width=99) (actual time=173.646..199.036 rows=6364 loops=1)

               Merge Cond: ("outer".idpk = "inner".addevicessiteregionidfk)

                                ->  Sort  (cost=1.44..1.48 rows=15 width=23) 
(actual time=0.055..0.059 rows=13 loops=1)

                                         Sort Key: addevicessiteregions.idpk

                                      ->  Seq Scan on addevicessiteregions  
(cost=0.00..1.15 rows=15 width=23) (actual time=0.016..0.032 rows=15 loops=1)

                                ->  Sort  (cost=33572.19..33928.58 rows=142556 
width=82) (actual time=173.559..176.398 rows=6364 loops=1)

                                         Sort Key: 

                                      ->  Seq Scan on addevicessites  
(cost=0.00..13118.56 rows=142556 width=82) (actual time=62.345..164.783 rows=6364 

              ->  Hash  (cost=4.48..4.48 rows=248 width=17) (actual 
time=0.283..0.283 rows=0 loops=1)

                    ->  Seq Scan on addevicessitecities  (cost=0.00..4.48 
rows=248 width=17) (actual time=0.011..0.162 rows=138 loops=1)

        ->  Hash  (cost=60.53..60.53 rows=2953 width=34) (actual 
time=6.229..6.229 rows=0 loops=1)

              ->  Seq Scan on addevicessitestreets  (cost=0.00..60.53 rows=2953 
width=34) (actual time=0.010..3.816 rows=2984 loops=1)

  ->  Hash  (cost=96.85..96.85 rows=4585 width=43) (actual time=10.017..10.017 
rows=0 loops=1)

        ->  Seq Scan on addevicessitestreetdescriptions  (cost=0.00..96.85 
rows=4585 width=43) (actual time=0.008..6.371 rows=4585 loops=1)

->  Hash  (cost=8.59..8.59 rows=459 width=21) (actual time=0.815..0.815 rows=0 

  ->  Seq Scan on addevicessitedistricts  (cost=0.00..8.59 rows=459 width=21) 
(actual time=0.007..0.541 rows=382 loops=1)

                                                             ->  Materialize  
(cost=1.04..1.08 rows=4 width=36) (actual time=0.000..0.002 rows=4 loops=6364)

                                                                   ->  Seq Scan 
on addevicessitevisibilities  (cost=0.00..1.04 rows=4 width=36) (actual 
time=0.026..0.035 rows=4 loops=1)

                                                       ->  Materialize  
(cost=1.03..1.06 rows=3 width=27) (actual time=0.000..0.001 rows=3 loops=6364)

                                                             ->  Seq Scan on 
addevicessitepositions  (cost=0.00..1.03 rows=3 width=27) (actual 
time=0.007..0.010 rows=3 loops=1)

                                                 ->  Materialize  
(cost=1.05..1.10 rows=5 width=31) (actual time=0.000..0.002 rows=5 loops=6364)

                                                       ->  Seq Scan on 
addevicessitestatustypes  (cost=0.00..1.05 rows=5 width=31) (actual 
time=0.006..0.016 rows=5 loops=1)

                                           ->  Hash  (cost=2.61..2.61 rows=61 
width=34) (actual time=0.144..0.144 rows=0 loops=1)

                                                 ->  Seq Scan on 
partneridentifications partneridentificationsoperator  (cost=0.00..2.61 rows=61 
width=34) (actual time=0.007..0.103 rows=61 loops=1)

                                     ->  Hash  (cost=2.61..2.61 rows=61 
width=34) (actual time=0.122..0.122 rows=0 loops=1)

                                           ->  Seq Scan on partners 
partnerselectricitysupplier  (cost=0.00..2.61 rows=61 width=34) (actual 
time=0.004..0.072 rows=61 loops=1)

                               ->  Hash  (cost=2.61..2.61 rows=61 width=34) 
(actual time=0.136..0.136 rows=0 loops=1)

                                     ->  Seq Scan on partners 
partnersmaintainer  (cost=0.00..2.61 rows=61 width=34) (actual time=0.005..0.086 
rows=61 loops=1)

                         ->  Hash  (cost=2.61..2.61 rows=61 width=34) (actual 
time=0.143..0.143 rows=0 loops=1)

                               ->  Seq Scan on partners partnerssticker  
(cost=0.00..2.61 rows=61 width=34) (actual time=0.009..0.098 rows=61 loops=1)

Total runtime: 2210.937 ms

regards, tom lane

fn;quoted-printable:Miroslav =C5=A0ulc
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
tel;work:+420 257 225 602
tel;cell:+420 603 711 413

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to