from:"a_ogawa"

[PATCHES] doc/FAQ_DEV: about profile

2005-09-17 Thread a_ogawa


Though there is a description for the profile in doc/FAQ_DEV:

 You can also compile with profiling to see what functions are taking
 execution time. The backend profile files will be deposited in the
 pgsql/data/base/dbname directory.

The backend profile files is deposited in the pgsql/data directory
on my linux now. I think that we should correct doc/FAQ_DEV.

regards,

--- Atsushi Ogawa

--
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PATCHES] wchareq improvement

2005-05-26 Thread a_ogawa


Bruce Momjian wrote:

 Patch applied with adjustment --- the second part of your patch that
 skips comparing the first byte seemed unnecessary.  It seemed likely
 to cause a cpu stall, so just doing the loop seemed faster.

 Did you test if the second part of your patch actually caused a speedup?

Well, I measured the performance today. As a result, I confirmed the
second part of my patch is unnecessary as you pointed it out.
Thanks for comment and applying patch.

 a_ogawa wrote:
 
  In SQL that uses 'like' operator, wchareq is used to compare characters.
 
  At the head of wchareq, length of (multibyte) character is compared by
  using pg_mblen. Therefore, pg_mblen is executed many times, and it
  becomes a bottleneck.

regards,

--- Atsushi Ogawa


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PATCHES] AllocSetReset improvement

2005-05-16 Thread a_ogawa


(BTom Lane [EMAIL PROTECTED] writes:
(B a_ogawa [EMAIL PROTECTED] writes:
(B  It is a reasonable idea. However, the majority part of MemSet was not
(B  able to be avoided by this idea. Because the per-tuple contexts are used
(B  at the early stage of executor.
(B
(B Drat.  Well, what about changing that?  We could introduce additional
(B contexts or change the startup behavior so that the ones that are
(B frequently reset don't have any data in them unless you are working
(B with pass-by-ref values inside the inner loop.
(B
(BThat might be possible. However, I think that we should change only
(Baset.c about this article.
(BI thought further: We can check whether context was used from the last
(Breset even when blocks list is not empty. Please see attached patch.
(B
(BThe effect of the patch that I measured is as follows:
(B
(Bo Execution time that executed the SQL ten times.
(B(1)Linux(CPU: Pentium III, Compiler option: -O2)
(B - original: 24.960s
(B - patched : 23.114s
(B
(B(2)Linux(CPU: Pentium 4, Compiler option: -O2)
(B - original: 8.730s
(B - patched : 7.962s
(B
(B(3)Solaris(CPU: Ultra SPARC III, Compiler option: -O2)
(B - original: 37.0s
(B - patched : 33.7s
(B
(Bregards,
(B
(B---
(BAtsushi Ogawa

aset.c.patch
Description: Binary data

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PATCHES] AllocSetReset improvement

2005-05-14 Thread a_ogawa


(BTom Lane [EMAIL PROTECTED] writes:
(B  And I'm worried about adding even a small amount of overhead to
(B  palloc/pfree --- on the vast majority of the profiles I look at, those
(B  are more expensive than AllocSetReset.
(B
(B  I don't worry about palloc. Because overhead increases only when malloc
(B  is executed in AllocSetAlloc. But I'm wooried about pfree, too. However,
(B  when palloc/pfree was executed many times, I did not see a bad
(Binfluence.
(B
(B In most of the tests I've looked at, palloc/pfree are executed far more
(B often than AllocSetReset, and so adding even one instruction there to
(B sometimes save a little work in AllocSetReset is a bad tradeoff.  You
(B can't optimize to make just one test case look good.
(B
(BI agree. I give up adding instruction to palloc/pfree.
(B
(B I have another idea though: in the case you are looking at, I think
(B that the context in question never gets any allocations at all, which
(B means its blocks list stays null.  We could move the MemSet inside the
(B "if (blocks)" test --- if there are no blocks allocated to the context,
(B it surely hasn't got any chunks either, so the MemSet is unnecessary.
(B That should give us most of the speedup without any extra cost in
(B palloc/pfree.
(B
(BIt is a reasonable idea. However, the majority part of MemSet was not
(Bable to be avoided by this idea. Because the per-tuple contexts are used
(Bat the early stage of executor.
(B
(B function that calls  numbercontextset-blocks
(B MemoryContextReset   of calls  addressis null
(B-
(B execTuplesMatch(execGrouping.c:65)   450x836dd28  false
(B agg_fill_hash_table(nodeAgg.c:924)   500x836dd28  false
(B ExecHashJoin(nodeHashjoin.c:108) 510x836dec0  false
(B ExecHashJoin(nodeHashjoin.c:217) 500x836dec0  false
(B ExecHashGetHashValue(nodeHash.c:669) 550x836dec0  false
(B ExecScanHashBucket(nodeHash.c:785)   500x836dec0  false
(B ExecScan(execScan.c:86)  570x836df48  true
(B
(BI am considering another idea: I think that we can change behavior of the
(Bcontext by switching the method table of context.
(B
(BAn simple version of AllocSetAlloc and AllocSetReset is made. These API
(Bcan be accelerated instead of using neither a freelist nor the blocks
(B(The keeper excludes it). When the context is initialized and reset,
(Bthese new API is set to the method table. And, when a freelist or a new
(Bblock is needed, the method table is switched to normal API. This can be
(Bexecuted by doing the pfree/repalloc hook. As a result, overhead of pfree
(Bbecomes only once about one context.
(B
(BI think that this idea is effective in context that executes repeatedly
(Breset after small allocations such as per-tuple context.  And I think that
(Boverhead given to the context that executes a lot of palloc/pfree is a
(Bvery few.
(B
(BAn attached patch is a draft of that implementation. Test and comment on
(Bthe source code are insufficient yet.
(B
(Bregards,
(B
(B---
(BAtsushi Ogawa

aset.c.patch
Description: Binary data

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PATCHES] AllocSetReset improvement

2005-05-12 Thread a_ogawa


(BTom Lane [EMAIL PROTECTED] writes:
(B a_ogawa [EMAIL PROTECTED] writes:
(B  In SQL that executes aggregation, AllocSetReset is called many times and
(B  spend a lot of cycles.
(B  This patch saves the cycles spent by AllocSetReset.
(B
(B Hmm.  It doesn't seem like this could be a big win overall.  It's not
(B possible to save a whole lot of cycles inside AllocSetReset, because if
(B there isn't anything for it to do, it should fall through pretty quickly
(B anyway.
(B
(BI thought that I was able to avoid MemSet in AllocSetReset.
(B
(BMemSet(set-freelist, 0, sizeof(set-freelist));
(B
(BMy profile result in previous mail is as follows:
(B %   cumulative  selfself   total
(Btime   seconds  secondscalls s/call s/call  name
(B 9.20  3.063.06 38500155   0.00   0.00  AllocSetReset
(B
(BTherefore, sizeof(set-freelist) * (number of calls) =
(B 44 bytes * 38500155 = 1615 Mbytes.
(B
(B And I'm worried about adding even a small amount of overhead to
(B palloc/pfree --- on the vast majority of the profiles I look at, those
(B are more expensive than AllocSetReset.
(B
(BI don't worry about palloc. Because overhead increases only when malloc
(Bis executed in AllocSetAlloc. But I'm wooried about pfree, too. However,
(Bwhen palloc/pfree was executed many times, I did not see a bad influence.
(BIt is a result of executing 'select * from accounts' 20 times as follows.
(B
(Boriginal code:
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls   s/call   s/call  name
(B  6.79  4.03 4.03 9599 0.00 0.00  appendBinaryStringInfo
(B  6.57  7.93 3.90 50005879 0.00 0.00  AllocSetAlloc
(B  5.63 11.27 3.34 1000 0.00 0.00  printtup
(B  5.61 14.60 3.33 1000 0.00 0.00  slot_deform_tuple
(B  5.36 17.78 3.18 50001421 0.00 0.00  AllocSetFree
(B
(Bpatched code:
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls   s/call   s/call  name
(B  8.07  4.78 4.78 9599 0.00 0.00  appendBinaryStringInfo
(B  7.23  9.06 4.28 50005879 0.00 0.00  AllocSetAlloc
(B  5.40 12.26 3.20 1000 0.00 0.00  printtup
(B  5.20 15.34 3.08 1000 0.00 0.00  slot_deform_tuple
(B  5.13 18.38 3.04 50001421 0.00 0.00  AllocSetFree
(B
(BI think that it is difficult to measure the influence that this patch
(Bgives palloc/pfree.
(B
(B I duplicated your test case to see where the reset calls were coming
(B from, and got this:
(B
(B (Does this match your profile?  I only ran the query 5 times not 10.)
(B
(BI'm sorry. My profile in previous mail were 11 times not 10. And your
(Bprofile and my profile are match.
(B
(B This shows that the majority of the resets are coming from the hashjoin
(B code not the aggregation code.
(B
(BYou are right. I measured where MemoryContextReset had been called.
(B(The SQL was executed once)
(B
(B  filename(line)function number of calls
(B -
(B  execGrouping.c(65)execTuplesMatch   45
(B  execScan.c(86)ExecScan  57
(B  nodeAgg.c(924)agg_fill_hash_table   50
(B  nodeAgg.c(979)agg_retrieve_hash_table5
(B  nodeHash.c(669)   ExecHashGetHashValue  55
(B  nodeHash.c(785)   ExecScanHashBucket50
(B  nodeHashjoin.c(108)   ExecHashJoin  51
(B  nodeHashjoin.c(217)   ExecHashJoin  50
(B -
(B  Total  3500013
(B
(BMany are the one from hashjoin. Other is the one from grouping,
(Btable/index scan, and aggregation by hash.
(BAnd I measured the number of times that was able to avoid MemSet in
(BAllocSetReset.
(B
(B  avoided MemSet   358
(B  executed MemSet7
(B ---
(B  Total3500015
(B
(B(The execution time of AllocSetReset is more twice than MemoryContextReset
(Bbecause there is MemoryContextResetAndDeleteChildren in PostgresMain)
(B
(Bregards,
(B
(B---
(BAtsushi Ogawa
(B
(B
(B---(end of broadcast)---
(BTIP 2: you can get off all lists at once with the unregister command
(B(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PATCHES] AllocSetReset improvement

2005-05-11 Thread a_ogawa


(BIn SQL that executes aggregation, AllocSetReset is called many times and
(Bspend a lot of cycles.
(BThis patch saves the cycles spent by AllocSetReset.
(B
(BAn idea of the patch is to add a flag to AllocSetContext. This flag
(Bshows whether AllocSetReset should work.
(B
(BThe effect of the patch that I measured is as follows:
(B
(Bo Data for test was created by 'pgbench -i -s 5'.
(B
(Bo Test SQL:
(Bselect b.bid, sum(a.abalance), avg(a.abalance)
(Bfrom accounts a, branches b
(Bwhere a.bid = b.bid
(Bgroup by b.bid;
(B
(Bo I measured time that executed the SQL ten times.
(B(1)Linux(CPU: Pentium III, Compiler option: -O2)
(B - original: 31.310s
(B - patched : 28.812s
(B
(B(2)Linux(CPU: Pentium 4, Compiler option: -O2)
(B - original: 8.953s
(B - patched : 7.753s
(B
(B(3)Solaris(CPU: Ultra SPARC III, Compiler option: -O2)
(B - original: 41.8s
(B - patched : 38.6s
(B
(Bo gprof result(Linux, Compiler option: -O2 -pg -DLINUX_PROFILE)
(B- original
(BEach sample counts as 0.01 seconds.
(B  %   cumulative  selfself   total
(B time   seconds  secondscalls s/call s/call  name
(B  9.20  3.063.06 38500155   0.00   0.00  AllocSetReset
(B  8.99  6.052.99 27500055   0.00   0.00  slot_deform_tuple
(B  7.40  8.512.46 4400   0.00   0.00  slot_getattr
(B  4.81 10.111.60 27500110   0.00   0.00  ExecEvalVar
(B  3.64 11.321.21 38500143   0.00   0.00  MemoryContextReset
(B  3.64 12.531.21  6007086   0.00   0.00  LWLockRelease
(B  3.31 13.631.10  5500079   0.00   0.00  heapgettup
(B
(B- patched
(BEach sample counts as 0.01 seconds.
(B  %   cumulative  selfself   total
(B time   seconds  secondscalls s/call s/call  name
(B  8.76  2.822.82 27500055   0.00   0.00  slot_deform_tuple
(B  7.73  5.312.49 4400   0.00   0.00  slot_getattr
(B  4.72  6.831.52 27500110   0.00   0.00  ExecEvalVar
(B  4.32  8.221.39  5500011   0.00   0.00  ExecHashJoin
(B  4.28  9.601.38  6007086   0.00   0.00  LWLockRelease
(B  4.04 10.901.30 38500143   0.00   0.00  MemoryContextReset
(B  3.63 12.071.17  5500079   0.00   0.00  heapgettup
(B  3.04 13.050.98  5499989   0.00   0.00
(BExecMakeFunctionResultNoSets
(B  2.67 13.910.86  5500110   0.00   0.00  ExecProject
(B  2.61 14.750.84 1100   0.00   0.00  advance_transition_function
(B  2.55 15.570.82 38500155   0.00   0.00  AllocSetReset
(B
(Bregards,
(B
(B---
(BAtsushi Ogawa

aset.c.patch
Description: Binary data

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PATCHES] wchareq improvement

2005-04-12 Thread a_ogawa


(BIn SQL that uses 'like' operator, wchareq is used to compare character. 
(B
(BAt the head of wchareq, length of character is compared by
(Busing pg_mblen. Therefore, pg_mblen is executed many times, and it
(Bbecomes a bottleneck.
(B
(BThis patch makes a short cut, and reduces execution frequency of pg_mblen. 
(B
(Btest.sql:
(Bselect count(*) from accounts
(Bwhere aid like '%1';
(B... (repeated 10 times)
(B
(Btest command:
(B$ psql -f test.sql
(B
(Bresult of original code(compile option "-O2 -pg"):
(B--- 
(BEach sample counts as 0.01 seconds.
(B %  cumulative   selfself   total
(Btime  seconds   secondscalls s/call s/call name
(B 7.82 0.32 0.32 17566930   0.00   0.00 pg_euc_mblen
(B 7.09 0.61 0.29 17566930   0.00   0.00 pg_mblen
(B 6.60 0.88 0.27  100   0.00   0.00 MBMatchText
(B 5.38 1.10 0.22  100   0.00   0.00 HeapTupleSatisfiesSnapshot
(B 5.13 1.31 0.21   90   0.00   0.00 ExecMakeFunctionResultNoSets
(B 4.89 1.51 0.20 17566930   0.00   0.00 pg_eucjp_mblen
(B
(Bresult of patched code(compile option "-O2 -pg"):
(B
(BEach sample counts as 0.01 seconds.
(B %  cumulative  self self   total
(Btime  seconds  seconds calls s/call s/call name
(B 8.56 0.320.32   100   0.00   0.00 MBMatchText
(B 7.75 0.610.29   100   0.00   0.00 HeapTupleSatisfiesSnapshot
(B 6.42 0.850.24   100   0.00   0.00 slot_deform_tuple
(B 5.88 1.070.22   8789050   0.00   0.00 pg_euc_mblen
(B 5.88 1.290.22   112   0.00   0.00 heapgettup
(B 5.61 1.500.2190   0.00   0.00 ExecMakeFunctionResultNoSets
(B
(Bexecution time(compile option "-O2"):
(B original code: 4.795sec
(B patched code:  4.496sec
(B
(Bregards,
(B
(B--- Atsushi Ogawa
(B
(B
(B---(end of broadcast)---
(BTIP 3: if posting/reading through Usenet, please send an appropriate
(B  subscribe-nomail command to [EMAIL PROTECTED] so that your
(B  message can get through to the mailing list cleanly

[PATCHES] wchareq improvement

2005-04-12 Thread a_ogawa


(BI forgot to attach a patch. I do post once again.
(BIn SQL that uses 'like' operator, wchareq is used to compare characters. 
(B
(BAt the head of wchareq, length of (multibyte) character is compared by
(Busing pg_mblen. Therefore, pg_mblen is executed many times, and it
(Bbecomes a bottleneck.
(B
(BThis patch makes a short cut, and reduces execution frequency of pg_mblen. 
(B
(Btest.sql:
(Bselect count(*) from accounts
(Bwhere aid like '%1';
(B... (repeated 10 times)
(B
(Btest command:
(B$ psql -f test.sql
(B
(Bresult of original code(compile option "-O2 -pg"):
(B--- 
(BEach sample counts as 0.01 seconds.
(B %  cumulative   selfself   total
(Btime  seconds   secondscalls s/call s/call name
(B 7.82 0.32 0.32 17566930   0.00   0.00 pg_euc_mblen
(B 7.09 0.61 0.29 17566930   0.00   0.00 pg_mblen
(B 6.60 0.88 0.27  100   0.00   0.00 MBMatchText
(B 5.38 1.10 0.22  100   0.00   0.00 HeapTupleSatisfiesSnapshot
(B 5.13 1.31 0.21   90   0.00   0.00 ExecMakeFunctionResultNoSets
(B 4.89 1.51 0.20 17566930   0.00   0.00 pg_eucjp_mblen
(B
(Bresult of patched code(compile option "-O2 -pg"):
(B
(BEach sample counts as 0.01 seconds.
(B %  cumulative  self self   total
(Btime  seconds  seconds calls s/call s/call name
(B 8.56 0.320.32   100   0.00   0.00 MBMatchText
(B 7.75 0.610.29   100   0.00   0.00 HeapTupleSatisfiesSnapshot
(B 6.42 0.850.24   100   0.00   0.00 slot_deform_tuple
(B 5.88 1.070.22   8789050   0.00   0.00 pg_euc_mblen
(B 5.88 1.290.22   112   0.00   0.00 heapgettup
(B 5.61 1.500.2190   0.00   0.00 ExecMakeFunctionResultNoSets
(B
(Bexecution time(compile option "-O2"):
(B original code: 4.795sec
(B patched code:  4.496sec
(B
(Bregards,
(B
(B--- Atsushi Ogawa

wchareq.patch
Description: Binary data

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

[PATCHES] improves ExecMakeFunctionResultNoSets

2005-03-22 Thread a_ogawa


(BAttached patch improves ExecMakeFunctionResultNoSets, etc.
(B
(BThis patch uses InitFunctionCallInfoData macro instead of MemSet to
(Binitialize FunctionCallInfoData.
(BAn idea of this patch discussed in the "FunctionCallN improvement" thread.
(B(http://archives.postgresql.org/pgsql-hackers/2005-01/msg01054.php)
(BTo achieve this, InitFunctionCallInfoData macro was moved from fmgr.c to
(Bfmgr.h.
(B
(Btest sql:
(Bselect substr(c.relname, 1, 10) from pg_class c, pg_am, pg_amop;
(B(There are pg_am and pg_amop only to increase the number of the records.)
(B
(Bresult of original code:
(B---
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls   s/call   s/call  name
(B 21.43  0.36 0.36   219911 0.00 0.00
(BExecMakeFunctionResultNoSets
(B  7.14  0.48 0.12   219912 0.00 0.00  pg_mbstrlen_with_len
(B  6.25  0.58 0.10  1102916 0.00 0.00  AllocSetAlloc
(B  5.36  0.68 0.09  5936448 0.00 0.00  pg_euc_mblen
(B  5.36  0.77 0.09  5936448 0.00 0.00  pg_mblen
(B
(Bresult of after patch:
(B---
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls   s/call   s/call  name
(B  7.52  0.10 0.10  5936448 0.00 0.00  pg_mblen
(B  7.14  0.20 0.10  1104587 0.00 0.00  AllocSetAlloc
(B  6.77  0.28 0.09   219912 0.00 0.00  text_substring
(B  6.39  0.37 0.09  1547723 0.00 0.00  AllocSetFreeIndex
(B  6.02  0.45 0.08   219912 0.00 0.00  pg_mbstrlen_with_len
(B  4.51  0.51 0.06  5936448 0.00 0.00  pg_euc_mblen
(B  4.51  0.57 0.06   442745 0.00 0.00  ExecProcNode
(B  4.51  0.63 0.06   219911 0.00 0.00
(BExecMakeFunctionResultNoSets
(B
(Bregards,
(B
(B--- Atsushi Ogawa

ExecMakeFunctionResultNoSets.patch
Description: Binary data

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PATCHES] WIP: avoiding tuple construction/deconstruction overhead

2005-03-18 Thread a_ogawa


(BTom Lane wrote:
(B a_ogawa [EMAIL PROTECTED] writes:
(B  (1)We can improve compare_heap() by using TableTupleSlot instead of
(B  HeapTuple. Please see attached patch.
(B
(B Did you measure any performance improvement from that?  I considered it
(B but thought it would likely be a wash or a loss, because in most cases
(B only one attribute will be pulled from a tuple during comparetup_heap.
(B slot_getattr cannot improve on heap_getattr in that case, and is quite
(B likely to be slower.
(B
(BI measured performance of heap_getattr and slot_getattr in
(Bcomparetup_heap.
(B
(BI made the table which had ten varchar attributes, and registered
(Bdata for tests.  (Attached file includes SQL doing this.)
(BI carried out the following tests.
(B
(B(case 1)
(B test1: select * from sort_test order by v1 limit 100;
(B test2: select * from sort_test order by v1, v2 limit 100;
(B test3: select * from sort_test order by v1, v2, v3 limit 100;
(B test4: select * from sort_test order by v1, v2, v3, v4 limit 100;
(B test5: select * from sort_test order by v1, v2, v3, v4, v5 limit 100;
(B
(B result:test1test2test3test4test5
(B---
(B heap_getattr  2.149s   2.602s   3.204s   3.830s   4.159s
(B slot_getattr  2.523s   3.422s   3.977s   4.453s   4.721s
(B
(B(case 2)
(B test1: select * from sort_test order by v10 limit 100;
(B test2: select * from sort_test order by v10, v9 limit 100;
(B test3: select * from sort_test order by v10, v9, v8 limit 100;
(B test4: select * from sort_test order by v10, v9, v8, v7 limit 100;
(B test5: select * from sort_test order by v10, v9, v8, v7, v6 limit 100;
(B
(B result:test1test2test3test4test5
(B---
(B heap_getattr  3.654s   5.549s   6.575s   7.367s   7.870s
(B slot_getattr  4.027s   4.930s   5.249s   5.555s   5.756s
(B
(B(case 3)
(B test1: select * from sort_test order by v5 limit 100;
(B test2: select * from sort_test order by v5, v6 limit 100;
(B test3: select * from sort_test order by v5, v6, v7 limit 100;
(B test4: select * from sort_test order by v5, v6, v7, v8 limit 100;
(B test5: select * from sort_test order by v5, v6, v7, v8, v9 limit 100;
(B
(B result:test1test2test3test4test5
(B---
(B heap_getattr  2.657s   4.207s   5.194s   6.179s  6.662s
(B slot_getattr  3.126s   4.233s   4.806s   5.271s  5.557s
(B
(BIn most cases, heap_getattr is fast.
(BWhen the following conditions occurred, slot_getattr is fast.
(B (1)Tuple have varlen attributes.
(B (2)Sort key have more than two attributes.
(B (3)A position of a sort key is far from the head of tuple.
(B (4)As for the data of a sort key, there be many repetition.
(BActually it will be rare that these conditions are occurred.
(B
(BThinking from a result, I think that we had better continue using
(Bheap_getattr in comparetup_heap.
(B
(Bregards,
(B
(B--- Atsushi Ogawa

make_test_data.sql
Description: Binary data

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PATCHES] WIP: avoiding tuple construction/deconstruction overhead

2005-03-17 Thread a_ogawa


(BTom Lane wrote:
(B Attached is the current state of a patch to reduce the overhead of
(B passing tuple data up through many levels of plan nodes.
(B
(BIt is a good idea. 
(BI think that this patch improves performance of the whole executor.
(B
(BI have three comments.
(B
(B(1)We can improve compare_heap() by using TableTupleSlot instead of
(BHeapTuple. Please see attached patch.
(B
(B(2)In ExecStoreTuple(), we can omit initialization of slot-tts_nvalid.
(BIf slot-tts_isempty is false, tts_nvalid is initialized by
(BExecClearTuple(). If it is true, tts_nvalid is always zero.
(B
(B(3)There is a description of slot-val in comment of execTuple.c.
(BThis had better revise it.
(B
(B Finally, I have made some progress towards making the tuple access
(B routines consistently use "bool isNull" arrays as null markers, instead
(B of the char 'n' or ' ' convention that was previously used in some but
(B not all contexts.
(B
(BI agree. I think that this progress improves readability.
(B
(Bregards,
(B
(B--- Atsushi Ogawa

compare_heap.patch
Description: Binary data

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PATCHES] Cache last known per-tuple offsets to speed long tuple

2004-11-01 Thread a_ogawa


(BI remaked patch for "Cache last known per-tuple offsets to speed
(Blong tuple access" that is in TODO list.
(B
(BThe point of this patch is as follows:
(B(1)Add fields to TupleTableSlot and TupleTableData.
(BThis fields are for holding the tuple disassembly information.
(B
(B(2)Add the codes which initializes/cleans new  fields.
(BThese codes are added to execTuples.c.
(B
(B(3)Add two functions to execQual.c.
(BThis function name is slot_deformtuple and this is
(Bjust like heap_deformtuple. This function can be resumed
(Bfrom the previous execution using the information
(Bencapsulated in the TupleTableSlot.
(B
(BAnother function is just like heap_getattr and fast_getattr.
(BThis function name is slot_getattr. This is just like
(Bheap_getattr and fast_getattr macro, except it is given a
(BTupleTableSlot, and this function uses slot_deformtuple
(Binsetead of nocachegetattr.
(B
(B(4)ExecEvalVar uses new function slot_getattr instead of
(Bheap_getattr.
(B
(BI executed the test below.
(B---
(BTable has 16,384tuples, 200columns. All data type is text.
(BTable name is aaa. Column name is t001...t200.
(BExecuted SQL is,
(Bselect t100, t110, t120, t130, t140, t150,
(B   t160, t170, t180, t190, t200
(Bfrom aaa;
(B
(BThe profile result of original code is as follows.
(B---
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls   s/call   s/call  name
(B 74.19  1.38 1.38   163846 0.00 0.00  nocachegetattr
(B  4.30  1.46 0.08   163840 0.00 0.00  FunctionCall3
(B  1.61  1.49 0.03   397750 0.00 0.00  AllocSetFreeIndex
(B  1.61  1.52 0.0316384 0.00 0.00  ExecTargetList
(B  1.08  1.54 0.02   344152 0.00 0.00  appendBinaryStringInfo
(B
(BThe profile result after the patch applying is as follows.
(B---
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls  ms/call  ms/call  name
(B 30.38  0.24 0.24   163840 0.00 0.00  slot_deformtuple
(B 10.13  0.32 0.08   163840 0.00 0.00  FunctionCall3
(B  5.06  0.36 0.04   163840 0.00 0.00  slot_getattr
(B  5.06  0.40 0.0416384 0.00 0.00  heap_deformtuple
(B  3.80  0.43 0.0349159 0.00 0.00  ExecClearTuple
(B
(Bregards,
(B
(B--- Atsushi Ogawa
(B[EMAIL PROTECTED]

ExecEvalVar.patch
Description: Binary data

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PATCHES] Cache last known per-tuple offsets to speed long tuple

2004-10-31 Thread a_ogawa


(BThank you for advice.
(BI am going to remake a patch, in order to make it simple.
(BThe plan of a new patch is as follows.
(B
(B(1)Add fields to TupleTableSlot and TupleTableData.
(BThis fields are for holding the tuple disassembly information.
(B
(B(2)Add the codes which initializes/cleans new  fields.
(BThese codes are added to execTuples.c.
(B
(B(3)Add two functions to execQual.c.
(BOne function is just like heap_deformtuple. It is given a 
(BTupleTableSlot. And it extracts the field of tuple incrementary
(Busing the new fields of TupleTableSlot.
(B
(BThe meaning of incrementary is as the following example.
(BExample: The tupple has 100 columns.
(B - first call to get col5 will fill first 5 positions in the array.
(B - next call to get col75 will fill starts from 5 and up to 75.
(B - next call to get col60 will only refer to array, because 
(B   already extracted.
(B
(BAnother function is just like heap_getattr and fast_getattr.
(BIt is given a TupleTableSlot. And this function uses new
(Bfunction(like a heap_deformtuple), instead of nocachegetattr.
(B
(B(4)ExecEvalVar uses new function(like a heap_getattr) instead of 
(Bheap_getattr.
(B
(BWith a new patch, only three files of tuptable.h, and execTuple.c
(Band execQual.c are due to be changed.
(B
(B BTW, why is it that your profile shows *more* calls to
(B heap_deformtuple_incr after the patch than there were nocachegetattr
(B calls before? 
(B
(BMany one is for printtup. 
(B(printtup - heap_deformtuple - heap_deformtuple_incr)
(BSince the code of heap_deformtuple and heap_deformtuple_incr has been 
(Bshare, heap_deformtuple_incr looks many calls than nocachegetattr.
(B
(BIf a part for the number of calls of heap_deformtuple_incr 
(Bby printtup is removed, heap_deformtuple_incr and nocachegetattr 
(Bwill be the same number of calls.
(B
(BWith my test being to access the column in ascending order
(B(select t100, t110 ...), heap_deformtuple_incr and nocachegetattr 
(Bis the same calls.
(BIf the column is accessed in descending order(select t200, t190...),
(Bnumber of calls of heap_deformtuple_incr will decrease sharply. 
(BIt is because a result is cached by the first call to get t200.
(B
(Bregards,
(B
(B--- Atsushi Ogawa
(B[EMAIL PROTECTED]
(B
(B
(B---(end of broadcast)---
(BTIP 8: explain analyze is your friend

[PATCHES] Cache last known per-tuple offsets to speed long tuple access

2004-10-30 Thread a_ogawa


(BI made a patch for "Cache last known per-tuple offsets to speed 
(Blong tuple access" that is in TODO list.
(B
(BThis problem was discussed on hackers-list as "Terrible performance 
(Bon wide selects".
(BThe point of this problem is nocachegetattr() used from ExecEvalVar().
(BIf tuple has many columns, and it has varlen column or null data,
(Btime spent in nocachegetattr() is O(N^2) in the number of fields.
(B
(BI referred URL below for implementation.
(B http://archives.postgresql.org/pgsql-performance/2003-01/msg00262.php
(B
(BThe point of this patch is as follows:
(B(1)heap_deformtuple_incr() is added.
(B This function can extract attributes of tupple, incrementally.
(B
(B(2)The cache which keeps the result of heap_deformtuple_incr(), 
(B is added inside TupleTableSlot.
(B
(B(3)In ExecEvalVar(), heap_deformtuple_incr() is used in place of 
(B nocachegetattr(). This would reduce the time from O(N^2) to O(N).
(B
(BIn order to measure the effect, I executed the test below.
(B---
(BTable has 15,000tuples, 200columns. All data type is text. 
(BTable name is aaa. Column name is t001...t200.
(BExecuted SQL is,
(Bselect t100, t110, t120, t130, t140, t150, 
(B   t160, t170, t180, t190, t200
(Bfrom aaa;
(B
(BThe profile result of original code is as follows.
(B---
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls   s/call   s/call  name
(B 70.05  1.31 1.31   163846 0.00 0.00  nocachegetattr
(B  8.02  1.46 0.15   163840 0.00 0.00  FunctionCall3
(B  1.87  1.50 0.04   397763 0.00 0.00  AllocSetFreeIndex
(B  1.60  1.52 0.03   163840 0.00 0.00  ExecEvalVar
(B  1.34  1.55 0.03   200414 0.00 0.00  AllocSetAlloc
(B
(BThe profile result after the patch applying is as follows.
(B---
(BEach sample counts as 0.01 seconds.
(B  %   cumulative   self  self total
(B time   seconds   secondscalls  ms/call  ms/call  name
(B 39.73  0.29 0.29   180224 0.00 0.00  heap_deformtuple_incr
(B  9.59  0.36 0.07   163840 0.00 0.00  FunctionCall3
(B  6.85  0.41 0.0516384 0.00 0.02  ExecTargetList
(B  5.48  0.45 0.0423477 0.00 0.00  hash_any
(B  4.11  0.48 0.03   200414 0.00 0.00  AllocSetAlloc
(B
(BRegards,
(B
(B--- Atsushi Ogawa ([EMAIL PROTECTED])

deformtuple_cache.patch
Description: Binary data

---(end of broadcast)---
TIP 8: explain analyze is your friend

[PATCHES] doc/FAQ_DEV: about profile

Re: [PATCHES] wchareq improvement

Re: [PATCHES] AllocSetReset improvement

Re: [PATCHES] AllocSetReset improvement

Re: [PATCHES] AllocSetReset improvement

[PATCHES] AllocSetReset improvement

[PATCHES] wchareq improvement

[PATCHES] wchareq improvement

[PATCHES] improves ExecMakeFunctionResultNoSets

Re: [PATCHES] WIP: avoiding tuple construction/deconstruction overhead

Re: [PATCHES] WIP: avoiding tuple construction/deconstruction overhead

Re: [PATCHES] Cache last known per-tuple offsets to speed long tuple

Re: [PATCHES] Cache last known per-tuple offsets to speed long tuple

[PATCHES] Cache last known per-tuple offsets to speed long tuple access

14 matches

Site Navigation

Mail list logo

Footer information