Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-09-11 Thread Mitsumasa KONDO
2014-09-11 22:01 GMT+09:00 k...@rice.edu k...@rice.edu:

 On Thu, Sep 11, 2014 at 09:37:07AM -0300, Arthur Silva wrote:
  I agree that there's no reason to fix an algorithm to it, unless maybe
 it's
  pglz.

Yes, it seems difficult to judge only  the algorithm performance.
We have to start to consider source code maintenance, quality and the other
factors..



 The big (huge) win for lz4 (not the HC variant) is the enormous compression
 and decompression speed. It compresses quite a bit faster (33%) than snappy
 and decompresses twice as fast as snappy.

Show us the evidence. Postgres members showed the test result and them
consideration.
It's very objective comparing.

Best Regards,
--
Mitsumasa KONDO


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-11 Thread Mitsumasa KONDO
2014-09-11 15:47 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


 Hello Robert,

  I am not objecting to the functionality; I'm objecting to bolting on
 ad-hoc operators one at a time.  I think an expression syntax would
 let us do this in a much more scalable way.  If I had time, I'd go do
 that, but I don't.  We could add abs(x) and hash(x) and it would all
 be grand.


 Ok. I do agree that an expression syntax would be great!

Yes, it's not bad.

However, will we need some method of modulo calculation?
I don't think so much. I think most intuitive modulo calculation method is
floor modulo,
Because if we use the negative value in modulo calculation, it just set
negative value as both positive values,
it is easy to expect the result than others. This strong point is good for
benchmark script users.

But I don't have any strong opinion about this patch, not blocking:)

Best Regards
--
Mistumasa KONDO


Re: [HACKERS] pgbench throttling latency limit

2014-09-10 Thread Mitsumasa KONDO
Hi,

I find typo in your patch. Please confirm.

@line 239
- agg-sum2_lag = 0;
+  agg-sum_lag = 0;

And back patch is welcome for me.

Best Regards,
--
Mitsumasa KONDO


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-08 Thread Mitsumasa KONDO
Hi,

Here is the review result.

#1. Patch compatibility
Little bit hunk, but it can patch to latest master.

#2. Functionality
No problem.

#3. Documentation
I think modulo operator explanation should put at last at the doc line.
Because the others are more frequently used.

#4. Algorithm
You proposed three modulo algorithm, that are
1. general modulo, 2. floor modulo and 3. euclidian modulo.

They calculate different value when modulo2 or reminder is negative number.
Calculate examples are here,

1. general modulo (patch1)
5 %  3 = 2
5 % -3 = 1
   -5 %  3 = -1

2. floor modulo (patch2, 3)
   5 %  3  =  2
   5 % -3  = -2
  -5 %  3  =  2

3. euclidian modulo (patch2)
   5 %  3  =  2
   5 % -3  =  4
  -5 %  3  =  2

That's all.

I think if we want to create equal possibility and inutuitive random
generator, we select floor modulo, as you see the upper example. It can
create contrapositive random number. 1 and 2 method cannot.

I think euclidian modulo doesn't need by a lot of people. If we add it,
many people will confuse, because they doesn't know the mathematic
algorithms.

So I like patch3 which is simple and practical.

If you agree or reply my comment, I will mark ready for commiter.

Best Regards,
--
Mitsumsasa KONDO


Re: [HACKERS] add modulo (%) operator to pgbench

2014-09-08 Thread Mitsumasa KONDO
Hi Fabien-san,

Thank you for your fast work!

2014-09-08 23:08 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


 Hello Mutsumara-san,

  #3. Documentation
 I think modulo operator explanation should put at last at the doc line.
 Because the others are more frequently used.


  So I like patch3 which is simple and practical.


 Ok.

  If you agree or reply my comment, I will mark ready for commiter.


 Please find attached v4, which is v3 plus an improved documentation
 which is clearer about the sign of the remainder.


 The attached is seemed no problem. But I'd like to say about order of
explanation in five formulas.

Fix version is here. Please confirm, and I mark it for ready for commiter.

Best regards,
--
Mitsumasa KONDO


pgbench-modulo-4-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-30 Thread Mitsumasa KONDO
Hi,

2014-08-31 8:10 GMT+09:00 Andres Freund and...@2ndquadrant.com:

 On 2014-08-31 01:50:48 +0300, Heikki Linnakangas wrote:

 If we're going to fsync between each file, there's no need to sort all the
  buffers at once. It's enough to pick one file as the target - like in my
  crude patch - and sort only the buffers for that file. Then fsync that
 file
  and move on to the next file. That requires scanning the buffers multiple
  times, but I think that's OK.

 I really can't see that working out. Production instances of postgres
 with large shared_buffers settings (say 96GB in one case) have tens of
 thousands of relations (~34500 in the same case). And that's a database
 with a relatively simple schema. I've seen much worse.

Yeah, it is impossible in one checkpointer process. All buffer search cost
is
relatively high than we expect. We need clever algorithm for efficient and
distributed buffer search using multi process or threads.

Regards,
--
Mitsumasa KONDO


Re: [HACKERS] posix_fadvise() and pg_receivexlog

2014-08-07 Thread Mitsumasa KONDO
Hi,

2014-08-07 13:47 GMT+09:00 Fujii Masao masao.fu...@gmail.com:

 On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  On 08/06/2014 08:39 PM, Fujii Masao wrote:
  The WAL files that pg_receivexlog writes will not be re-read soon
  basically,
  so we can advise the OS to release any cached pages when WAL file is
  closed. I feel inclined to change pg_receivexlog that way. Thought?
 
 
  -1. The OS should be smart enough to not thrash the cache by files that
 are
  written sequentially and never read.

OS's buffer strategy is optimized for general situation. Do you forget OS
hackers discussion last a half of year?


 Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe
 not,
 so I was thinking that posix_fadvise is called when the server closes WAL
 file.

That's right.


  If we go down this path, we'd need to
  sprinkle posix_fadvises into many, many places.

Why do you aim to be perfect at the beginning?
It is as same as history of postgres, your concern doesn't make sense.


  Anyway, who are we to say that they won't be re-read soon? You might e.g
  have a secondary backup site where you copy the files received by
  pg_receivexlog, as soon as they're completed.

 So whether posix_fadvise is called or not needs to be exposed as an
 user-configurable option. We would need to measure how useful exposing
 that is, though.

By the way, does pg_receivexlog process have fsync() in every WAL commit?
If yes, I think that we need no or less fsync() option for the better
performance. It is general in NOSQL storages.
If no, we need fsync() option for more getting reliability and data
integrarity.


Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] add modulo (%) operator to pgbench

2014-08-06 Thread Mitsumasa KONDO
2014-08-06 23:38 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


  Three different modulo operators seems like a lot for a language that
 doesn't even have a real expression syntax, but I'll yield to whatever
 the consensus is on this one.


 Here is a third simpler patch which only implements the Knuth's modulo,
 where the remainder has the same sign as the divisor.

 I would prefer this version 3 (one simple modulo based on Knuth
 definition) or if there is a problem version 2 (all 3 modulos). Version 1
 which provides a modulo compatible with C  SQL is really useless to me.

I like version 3, it is simple and practical. And it's enough in pgbench.
If someone wants to use other implementation of modulo algorithm, he just
changes his source code.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench -- splits v4

2014-08-01 Thread Mitsumasa KONDO
Hi,

2014-08-01 16:26 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr


  Maybe somebody who knows more math than I do (like you, probably!) can
 come up with something more clever.


 I can certainly suggest other formula, but that does not mean beautiful
 code, thus would probably be rejected. I'll see.

 An alternative to this whole process may be to hash/modulo a non uniform
 random value.

   id = 1 + hash(some-random()) % n

 But the hashing changes the distribution as it adds collisions, so I have
 to think about how to be able to control the distribution in that case, and
 what hash function to use.

I think that we have to consider and select reproducible method, because
benchmark is always needed robust and reproducible result. And if we
realize this idea, we might need more accurate random generator that is
like Mersenne twister algorithm.  erand48 algorithm is slow and not
accurate very much.

By the way, I don't know relativeness of this topic and command line
option... Well whatever...

Regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench -- splits v4

2014-07-30 Thread Mitsumasa KONDO
Hi,

2014-07-31 5:18 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:

  I've committed the changes to pgbench.c and the documentation changes
 with some further wordsmithing.


 Ok, thanks a lot for your reviews and your help with improving the
 documentation.

Yeah, thanks for all relative members.


  I don't think including the other changes in patch A is a good idea,


 Fine. It was mostly for testing and checking purposes.

Hmm... It doesn't have harm for pgbench source code. And, in general,
checking script is useful for avoiding bug.

 nor am I in favor of patch B.


 Yep.

No, patch B is still needed. Please tell me the reason. I don't like
deciding by someones feeling,
and it needs logical reason. Our documentation is better than the past. I
think it can easy to understand decile probability.
This part of the discussion is needed to continue...

Would providing these as additional contrib files be more acceptable?
 Something like tpc-b-gauss.sql... Otherwise there is no example available
 to show the feature.

I agree the test script and including command line options. It's not harm,
and it's useful.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench -- splits Bv6

2014-07-25 Thread Mitsumasa KONDO
Thanks for your modify the patch! I confirmed that It seems to be fine.

I think that our latest patch fill all community comment.
So it is really ready for committer now.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench -- splits v4

2014-07-24 Thread Mitsumasa KONDO
Hi,

Thank you for your grate documentation and fix working!!!
It becomes very helpful for understanding our feature.

I add two feature in gauss_B_4.patch.

1) Add gaussianProbability() function
It is same as exponentialProbability(). And the feature is as same as
before.

2) Add result of max/min percent of the range
It is almost same as --exponential option's result. However, max percent of
the range is center of distribution
and min percent of the range is most side of distribution.
Here is the output example,

+ pgbench_account's aid selected with a truncated gaussian distribution

+ standard deviation threshold: 5.0

+ decile percents: 0.0% 0.1% 2.1% 13.6% 34.1% 34.1% 13.6% 2.1% 0.1% 0.0%

+ probability of max/min percent of the range: 4.0% 0.0%


And I add the explanation about this in the document.

I'm very appreciate for your works!!!


Best regards,

--

Mitsumasa KONDO


gauss_B_5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-07-18 Thread Mitsumasa KONDO
2014-07-18 5:13 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


  However, ISTM that it is not the purpose of pgbench documentation to be a
 primer about what is an exponential or gaussian distribution, so the idea
 would yet be to have a relatively compact explanation, and that the
 interested but clueless reader would document h..self from wikipedia or a
 text book or a friend or a math teacher (who could be a friend as
 well:-).


 Well, I think it's a balance.  I agree that the pgbench documentation
 shouldn't try to substitute for a text book or a math teacher, but I
 also think that you shouldn't necessarily need to refer to a text book
 or a math teacher in order to figure out how to use pgbench.  Saying
 it's complicated, so we don't have to explain it would be a cop out;
 we need to *make* it simple.  And if there's no way to do that, then
 IMHO we should reject the patch in favor of some future patch that
 implements something that will be easy for users to understand.

   [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
 starting vacuum...end.
 transaction type: Exponential distribution TPC-B (sort of)
 scaling factor: 1
 exponential threshold: 10.0

 decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
 highest/lowest percent of the range: 9.5% 0.0%


 I don't have a clue what that means.  None.


 Maybe we could add in front of the decile/percent

 distribution of increasing account key values selected by pgbench:


 I still wouldn't know what that meant.  And it misses the point
 anyway: if the documentation is good, this will be unnecessary.  If
 the documentation is bad, a printout that tries to illustrate it by
 example is not an acceptable substitute.


 The decile description is quite classic when discussing statistics.

Yeah, maybe, I and Fabien-san don't believe that he doesn't know the decile
percentage.
However, I think more description about decile is needed.

For example,  when we set the number of transaction 10,000 (-t 1),
range of aid is 100,000,
and --exponential is 10, decile percents is under following as you know.

decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

They mean that,
#number of access in range of aid (from decile percents):
  1 to 10,000 = 6,320 times
  10,001 to 20,000= 2,330 times
  20,001 to 30,000= 860 times
  ...
  90,001 to 10,  = 0 times

#number of access in range of aid (from highest/lowest percent of the
range):
 1 to 1,000= 950 times
 ...
 99,001 to 10,   = 0 times

that's all.

Their information is easy to understand distribution of access probability,
isn't it?
Maybe I and Fabien-san have a knowledge of mathematics, so we think decile
percentage is common sense.
But if it isn't common sense, I agree with adding about these explanation
in the documents.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench

2014-07-13 Thread Mitsumasa KONDO
Hi,

2014-07-04 19:05 GMT+09:00 Andres Freund and...@2ndquadrant.com:

 On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote:
 
  Yea. I certainly disagree with the patch in it's current state because
 it
  copies the same 15 lines several times with a two word difference.
  Independent of whether we want those options, I don't think that's going
  to fly.
 
  I liked a simple static string for the different variants, which means
  replication. Factorizing out the (large) common part will mean malloc 
  sprintf. Well, why not.

 It sucks from a maintenance POV. And I don't see the overhead of malloc
 being relevant here...

  OTOH, we've almost reached the consensus that supporting gaussian
  and exponential options in \setrandom. So I think that you should
  separate those two features into two patches, and we should apply
  the \setrandom one first. Then we can discuss whether the other patch
  should be applied or not.
 
  Sounds like a good plan.
 
  Sigh. I'll do that as it seems to be a blocker...

I still agree with Fabien-san. I cannot understand why our logical proposal
isn't accepted...

I think we also need documentation about the actual mathematical
 behaviour of the randomness generators.
  + para
  +  With the gaussian option, the larger the
 replaceablethreshold/,
  +  the more frequently values close to the middle of the interval
 are drawn,
  +  and the less frequently values close to the replaceablemin/
 and
  +  replaceablemax/ bounds.
  +  In other worlds, the larger the replaceablethreshold/,
  +  the narrower the access range around the middle.
  +  the smaller the threshold, the smoother the access pattern
  +  distribution. The minimum threshold is 2.0 for performance.
  + /para

 The only way to actually understand the distribution here is to create a
 table, insert random values, and then look at the result. That's not a
 good thing.

That's right. Therefore, we create command line option to easy to
understand parametrized Gaussian distribution.
When you want to know the parameter of distribution, you can use command
line option like under followings.

 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 10.0
decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

[nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 5.0
decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
highest/lowest percent of the range: 4.9% 0.0%

If you have a better method than our method, please share us.


  The caveat that I have is that without these options there is:
 
  (1) no return about the actual distributions in the final summary, which
  depend on the threshold value, and
 
  (2) no included mean to test the feature, so the first patch is less
  meaningful if the feature cannot be used simply and require a custom
 script.

 I personally agree that we likely want that as an additional
 feature. Even if just because it makes the results easier to compare.

If we can do positive and logical discussion, I will agree with the
proposal about separate patches.
However, I think that most opposite hacker decided by his feelings...
Actuary, he didn't answer to our proposal about understanding the
parametrized distribution...
So I also think it is blocker. Command line feature is also needed.
Besides, is there a other good method? Please share us.

Best regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench

2014-06-17 Thread Mitsumasa KONDO
Hello Fabien-san,

I have checked your v13 patch, and tested the new exponential distribution
generating algorithm. It works fine and less or no overhead than previous
version.
Great work! And I agree with your proposal.

And I'm also interested in your decile percents output like under
followings,

 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=20
 ~
 decile percents: 86.5% 11.7% 1.6% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
 ~
 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
 ~
 decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
 ~
 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
 ~
 decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
 ~

I think that it is easy to understand exponential distribution when I check
the exponential parameter. I also agree with it. So I create decile
percents output
 in gaussian distribution.
Here are the examples.

 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=20
 ~
 decile percents: 0.0% 0.0% 0.0% 0.0% 50.0% 50.0% 0.0% 0.0% 0.0% 0.0%
 ~
 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=10
 ~
 decile percents: 0.0% 0.0% 0.0% 2.3% 47.7% 47.7% 2.3% 0.0% 0.0% 0.0%
 ~
 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=5
 ~
 decile percents: 0.0% 0.1% 2.1% 13.6% 34.1% 34.1% 13.6% 2.1% 0.1% 0.0%

I think that it is easier than before. Sum of decile percents is just 100%.


However, I don't prefer highest/lowest percentage because it will be
confused
 with decile percentage for users, and anyone cannot understand this
digits.

Here is example when sets exponential=5,
 [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
 ~
 decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
 highest/lowest percent of the range: 4.9% 0.0%
 ~

I cannot understand 4.9%, 0.0% when I see the first time.
Then, I checked the source code, I understood it:( It's not good design...
#Why this parameter use 100?
So I'd like to remove it if you like. It will be more simple.

Attached patch is fixed version, please confirm it.
#Of course, World Cup is being held now. I'm not hurry at all.

Best regards,
-- 
Mitsumasa KONDO
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***
*** 41,46 
--- 41,47 
  #include math.h
  #include signal.h
  #include sys/time.h
+ #include assert.h
  #ifdef HAVE_SYS_SELECT_H
  #include sys/select.h
  #endif
***
*** 98,103  static int	pthread_join(pthread_t th, void **thread_return);
--- 99,106 
  #define LOG_STEP_SECONDS	5	/* seconds between log messages */
  #define DEFAULT_NXACTS	10		/* default nxacts */
  
+ #define MIN_GAUSSIAN_THRESHOLD		2.0	/* minimum threshold for gauss */
+ 
  int			nxacts = 0;			/* number of transactions per client */
  int			duration = 0;		/* duration in seconds */
  
***
*** 171,176  bool		is_connect;			/* establish connection for each transaction */
--- 174,187 
  bool		is_latencies;		/* report per-command latencies */
  int			main_pid;			/* main process id used in log filename */
  
+ /* gaussian distribution tests: */
+ double		stdev_threshold;   /* standard deviation threshold */
+ booluse_gaussian = false;
+ 
+ /* exponential distribution tests: */
+ double		exp_threshold;   /* threshold for exponential */
+ bool		use_exponential = false;
+ 
  char	   *pghost = ;
  char	   *pgport = ;
  char	   *login = NULL;
***
*** 332,337  static char *select_only = {
--- 343,430 
  	SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n
  };
  
+ /* --exponential case */
+ static char *exponential_tpc_b = {
+ 	\\set nbranches  CppAsString2(nbranches)  * :scale\n
+ 	\\set ntellers  CppAsString2(ntellers)  * :scale\n
+ 	\\set naccounts  CppAsString2(naccounts)  * :scale\n
+ 	\\setrandom aid 1 :naccounts exponential :exp_threshold\n
+ 	\\setrandom bid 1 :nbranches\n
+ 	\\setrandom tid 1 :ntellers\n
+ 	\\setrandom delta -5000 5000\n
+ 	BEGIN;\n
+ 	UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n
+ 	SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n
+ 	UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n
+ 	UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n
+ 	INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n
+ 	END;\n
+ };
+ 
+ /* --exponential with -N case */
+ static char *exponential_simple_update = {
+ 	\\set nbranches  CppAsString2(nbranches)  * :scale\n
+ 	\\set ntellers  CppAsString2(ntellers)  * :scale\n
+ 	\\set naccounts  CppAsString2(naccounts)  * :scale\n
+ 	\\setrandom aid 1 :naccounts exponential :exp_threshold\n
+ 	\\setrandom bid 1 :nbranches\n
+ 	\\setrandom tid 1 :ntellers\n
+ 	\\setrandom delta -5000 5000\n
+ 	BEGIN;\n
+ 	UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n
+ 	SELECT

Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-17 Thread Mitsumasa KONDO
2014-03-17 21:12 GMT+09:00 Fujii Masao masao.fu...@gmail.com:

 On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas robertmh...@gmail.com
 wrote:
  On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
  The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
  Windows) as an example for archive_command.  However, cp/copy does not
 sync
  the copied data to disk.  As a result, the completed WAL segments would
 be
  lost in the following sequence:
 
  1. A WAL segment fills up.
 
  2. The archiver process archives the just filled WAL segment using
  archive_command.  That is, cp/copy reads the WAL segment file from
 pg_xlog/
  and writes to the archive area.  At this point, the WAL file is not
  persisted to the archive area yet, because cp/copy doesn't sync the
 writes.
 
  3. The checkpoint processing removes the WAL segment file from pg_xlog/.
 
  4. The OS crashes.  The filled WAL segment doesn't exist anywhere any
 more.
 
  Considering the reliable image of PostgreSQL and widespread use in
  enterprise systems, I think something should be done.  Could you give me
  your opinions on the right direction?  Although the doc certainly
 escapes by
  saying (This is an example, not a recommendation, and might not work
 on all
  platforms.), it seems from pgsql-xxx MLs that many people are following
  this example.
 
  * Improve the example in the documentation.
  But what command can we use to reliably sync just one file?
 
  * Provide some command, say pg_copy, which copies a file synchronously
 by
  using fsync(), and describes in the doc something like for simple use
  cases, you can use pg_copy as the standard reliable copy command.
 
  +1.  This won't obviate the need for tools to manage replication, but
  it would make it possible to get the simplest case right without
  guessing.

 +1, too.

 And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
 archived file after the copy? Also It might be good idea to support the
 direct
 copy of the file to avoid wasting the file cache.

Use direct_cp.
http://directcp.sourceforge.net/direct_cp.html

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] gaussian distribution pgbench

2014-03-15 Thread Mitsumasa KONDO
Oh, sorry, I forgot to write URL referring picture.

http://en.wikipedia.org/wiki/Normal_distribution
http://en.wikipedia.org/wiki/Exponential_distribution

regards,
--
Mitsumasa KONDO


2014-03-15 17:50 GMT+09:00 Mitsumasa KONDO kondo.mitsum...@gmail.com:

 Hi

 2014-03-15 15:53 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


 Hello Heikki,


  A couple of comments:

 * There should be an explicit \setrandom ... uniform option too, even
 though you get that implicitly if you don't specify the distribution


 Indeed. I agree. I suggested it, but it got lost.

 OK. If we keep to the SQL grammar, your saying is right. I will add it.


  * What exactly does the threshold mean? The docs informally explain
 that the larger the thresold, the more frequent values close to the middle
 of the interval are drawn, but that's pretty vague.


 There are explanations and computations as comments in the code. If it is
 about the documentation, I'm not sure that a very precise mathematical
 definition will help a lot of people, and might rather hinder
 understanding, so the doc focuses on an intuitive explanation instead.

 Yeah, I think that we had better to only explain necessary infomation for
 using this feature. If we add mathematical theory in docs, it will be too
 difficult for user.  And it's waste.


  * Does min and max really make sense for gaussian and exponential
 distributions? For gaussian, I would expect mean and standard deviation as
 the parameters, not min/max/threshold.


 Yes... and no:-) The aim is to draw an integer primary key from a table,
 so it must be in a specified range. This is approximated by drawing a
 double value with the expected distribution (gaussian or exponential) and
 project it carefully onto integers. If it is out of range, there is a loop
 and another value is drawn. The minimal threshold constraint (2.0) ensures
 that the probability of looping is low.

 I think it is difficult to understand from our text... So I create picture
 that will help you to understand it.
 Please see it.



  * How about setting the variable as a float instead of integer? Would
 seem more natural to me. At least as an option.


 Which variable? The values set by setrandom are mostly used for primary
 keys. We really want integers in a range.

 I think he said threshold parameter. Threshold parameter is very sensitive
 parameter, so we need to set double in threshold. I think that you can
 consent it when you see attached picture.

 regards,
 --
 Mitsumasa KONDO
 NTT Open Source Software Center



Re: [HACKERS] gaussian distribution pgbench

2014-03-15 Thread Mitsumasa KONDO
2014-03-15 19:04 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr:


 Nice drawing!


   * How about setting the variable as a float instead of integer? Would

 seem more natural to me. At least as an option.


 Which variable? The values set by setrandom are mostly used for primary
 keys. We really want integers in a range.


 I think he said threshold parameter. Threshold parameter is very sensitive
 parameter, so we need to set double in threshold. I think that you can
 consent it when you see attached picture.

 Oh, sorry.. It is to Heikki. Not to you...


 I'm sure that the threshold must be a double, but I thought it was already
 the case, because of atof, the static variables which are declared double,
 and the threshold function parameters which are declared double as well,
 and the putVariable uses a %lf format...

I think it's collect. When we get double argument in scanf(), we can use
%lf format.


 Possibly I'm missing something?

Sorry. I think nothing is missing.

regards,
--
Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench

2014-02-14 Thread Mitsumasa KONDO
I add exponential distribution random generator (and little bit
refactoring:) ).
I use inverse transform method to create its distribution.  It's very
simple method that is
created by - log (rand()). We can control slope of distribution using
threshold parameter.
It is same as gaussian threshold.

usage example
  pgbench --exponential=NUM -S

Attached graph is created with exponential threshold = 5. We can see
exponential
distribution in the graphs. It supports -S, -N options and custom script.
So we set
¥setexponential [var] [min] [max] [threshold] in a transaction pattern
file,
it appear distribution we want.

We have no time to fix its very much... But I think almost part of patch
have been completed.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


gaussian_and_exponential_pgbench_v6.patch
Description: Binary data
attachment: exponential=5.png

gnuplot.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-31 Thread Mitsumasa KONDO
2014-01-31 Peter Geoghegan p...@heroku.com

 On Thu, Jan 30, 2014 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  In reality, actual applications
  could hardly be further from the perfectly uniform distribution of
  distinct queries presented here.
 
  Yeah, I made the same point in different words.  I think any realistic
  comparison of this code to what we had before needs to measure a workload
  with a more plausible query frequency distribution.

 Even though that distribution just doesn't square with anybody's
 reality, you can still increase the pg_stat_statements.max setting to
 10k and the problem goes away at little cost (a lower setting is
 better, but a setting high enough to cache everything is best). But
 you're not going to have terribly much use for pg_stat_statements
 anywayif you really do experience churn at that rate with 5,000
 possible entries, the module is ipso facto useless, and should be
 disabled.

I run extra test your and my patch with the pg_stat_statements.max
setting=10k
in other same setting and servers. They are faster than past results.

method |  try1   |  try2   |   try3

peter 3  | 6.769 | 6.784 | 6.785
method 5  | 6.770 | 6.774 | 6.761


I think that most significant overhead in pg_stat_statements is deleting
and inserting cost in hash table update, and not at LWLocks. If LWLock
is the most overhead, we can see the overhead -S pgbench, because it have
one select pet tern which are most Lock conflict case. But we can't see
such result.
I'm not sure about dynahash.c, but we can see hash conflict case in this
code.
IMHO, I think It might heavy because it have to run list search and compare
one
until not conflict it.

And past result shows that your patch's most weak point is that deleting
most old statement
and inserting new old statement cost is very high, as you know. It
accelerate to affect
update(delete and insert) cost in pg_stat_statements table. So you proposed
new setting
10k in default max value. But it is not essential solution, because it is
also good perfomance
 for old pg_stat_statements. And when we set max=10K in your patch and want
to get most
used only 1000 queries in pg_stat_statements, we have to use order-by-query
with limit 1000.
Sort cost is relatively high, so monitoring query will be slow and high
cost. But old one is only set
pg_stat_statements.max=1000, and performance is not relatively bad. It will
be best settings for getting
most used 1000 queries infomation.


That' all my assumption.

Sorry for a few extra test, I had no time in my office today.
If we hope, I can run 1/N distribution pgbench test next week,  I modify my
perl script little bit,
for creating multiple sql files with various sleep time.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-31 Thread Mitsumasa KONDO
2014-01-31 Fujii Masao masao.fu...@gmail.com

 On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander mag...@hagander.net
 wrote:
  On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila amit.kapil...@gmail.com
  wrote:
 
  On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao masao.fu...@gmail.com
  wrote:
   Hi,
  
   The files in pg_stat_tmp directory don't need to be backed up because
   they are
   basically reset at the archive recovery. So I think it's worth
   changing pg_basebackup
   so that it skips any files in pg_stat_tmp directory. Thought?
 
  I think this is good idea, but can't it also avoid
  PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
  pg_stat_tmp
 
 
  All stats files should be excluded. IIRC the
 PGSTAT_STAT_PERMANENT_TMPFILE
  refers to just the global one. You want to exclude based on
  PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
  stats_temp_directory if it's in PGDATA.

 Attached patch changes basebackup.c so that it skips all files in both
 pg_stat_tmp
 and stats_temp_directory. Even when a user sets stats_temp_directory
 to the directory
 other than pg_stat_tmp, we need to skip the files in pg_stat_tmp. Because,
 per recent change of pg_stat_statements, the external query file is
 always created there.

+1.

And, I'd like to also skip pg_log directory because security reason.
If you have time and get community agreed,
could you create these patch after committed your patch?
I don't want to bother you.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Exposing currentTransactionWALVolume

2014-01-31 Thread Mitsumasa KONDO
I send you my review comment.

2014-01-15 Simon Riggs si...@2ndquadrant.com:

 Short patch to expose a function GetCurrentTransactionWALVolume() that
 gives the total number of bytes written to WAL by current transaction.

* It's simple and good feature. It is useful for system management, and
forecasting
server spec(especially disk volume) on the system needed.

* Overhead is nothing unless my test.

* Compile and unit tests are no problem.

User interface to this information discussed on separate thread, so
 that we don't check the baby out with the bathwater when people
 discuss UI pros and cons.

Did you get good opinion in other thread?
I'd like to use seeing WAL volume sql and init value of WAL volume sql.
Your patch seems to init the value when start up the server now.
If we have init function, we can see database activities in each hours in a
day from WAL volumes.
Now, we only see number of transactions and database volumes.
I'd like to see more detail activities from WAL volume in each minutes or
hours.
It might be good for performance improvement by hackers, too

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-31 Thread Mitsumasa KONDO
2014-01-31 Fujii Masao masao.fu...@gmail.com:

 On Fri, Jan 31, 2014 at 10:18 PM, Mitsumasa KONDO
 kondo.mitsum...@gmail.com wrote:
  2014-01-31 Fujii Masao masao.fu...@gmail.com
 
  On Tue, Jan 28, 2014 at 5:51 PM, Magnus Hagander mag...@hagander.net
  wrote:
   On Tue, Jan 28, 2014 at 6:11 AM, Amit Kapila amit.kapil...@gmail.com
 
   wrote:
  
   On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao masao.fu...@gmail.com
   wrote:
Hi,
   
The files in pg_stat_tmp directory don't need to be backed up
 because
they are
basically reset at the archive recovery. So I think it's worth
changing pg_basebackup
so that it skips any files in pg_stat_tmp directory. Thought?
  
   I think this is good idea, but can't it also avoid
   PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in
   pg_stat_tmp
  
  
   All stats files should be excluded. IIRC the
   PGSTAT_STAT_PERMANENT_TMPFILE
   refers to just the global one. You want to exclude based on
   PGSTAT_STAT_PERMANENT_DIRECTORY (and of course based on the guc
   stats_temp_directory if it's in PGDATA.
 
  Attached patch changes basebackup.c so that it skips all files in both
  pg_stat_tmp
  and stats_temp_directory. Even when a user sets stats_temp_directory
  to the directory
  other than pg_stat_tmp, we need to skip the files in pg_stat_tmp.
 Because,
  per recent change of pg_stat_statements, the external query file is
  always created there.
 
  +1.
 
  And, I'd like to also skip pg_log directory because security reason.

 Yeah, I was thinking that, too. I'm not sure whether including log files
 in backup really increases the security risk, though. There are already
 very important data, i.e., database, in backups. Anyway, since
 the amount of log files can be very large and they are not essential
 for recovery, it's worth considering whether to exclude them. OTOH,
 I'm sure that some users prefer current behavior for some reasons.
 So I think that it's better to expose the pg_basebackup option
 specifying whether log files are included in backups or not.

I'm with you. Thanks a lot !

Regards,
--
Mitsumsasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Mitsumasa KONDO
2014-01-27 Andrew Dunstan and...@dunslane.net


 On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote:

 (2014/01/23 23:18), Andrew Dunstan wrote:

 What is more, if the square root calculation is affecting your
 benchmarks, I
 suspect you are benchmarking the wrong thing.

 I run another test that has two pgbench-clients in same time, one is
 select-only-query and another is executing 'SELECT *  pg_stat_statement'
 query in every one second. I used v6 patch in this test.



 The issue of concern is not the performance of pg_stat_statements, AUIU.
 The issue is whether this patch affects performance generally, i.e. is
 there a significant cost in collecting these extra stats. To test this you
 would compare two general pgbench runs, one with the patch applied and one
 without.

I showed first test result which is compared with without
pg_stat_statements and without patch last day.  They ran in same server and
same benchmark settings(clients and scale factor) as today's result. When
you merge and see the results, you can confirm not to affect of performance
in my patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-26 Thread Mitsumasa KONDO
2014-01-26 Simon Riggs si...@2ndquadrant.com

 On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com wrote:
  On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp
 wrote:
  Rebased patch is attached.
 
  Does this fix the Windows bug reported by Kumar on 20/11/2013 ?

 Please respond.

Oh.. Very sorry...

Last day, I tried to find Kumar mail at 20/11/2013. But I couldn't find
it... Could you tell me e-mail title?
My patch catches up with latest 9.4HEAD.

Regards,
--
Mitsumasa KONDO


Re: [HACKERS] Optimize kernel readahead using buffer access strategy

2013-12-12 Thread Mitsumasa KONDO
2013/12/12 Simon Riggs si...@2ndquadrant.com

 On 14 November 2013 12:09, KONDO Mitsumasa
 kondo.mitsum...@lab.ntt.co.jp wrote:

  For your information of effect of this patch, I got results of pgbench
 which are
  in-memory-size database and out-memory-size database, and postgresql.conf
  settings are always used by us. It seems to improve performance to a
 better. And
  I think that this feature is going to be necessary for business
 intelligence
  which will be realized at PostgreSQL version 10. I seriously believe
 Simon's
  presentation in PostgreSQL conference Europe 2013! It was very
 exciting!!!

 Thank you.

 I like the sound of this patch, sorry I've not been able to help as yet.

 Your tests seem to relate to pgbench. Do we have tests on more BI related
 tasks?

Yes, off-course!  We will need another benchmark test before conclusion of
this patch.
What kind of benchmark do you have?

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Time-Delayed Standbys

2013-12-12 Thread Mitsumasa KONDO
2013/12/12 Simon Riggs si...@2ndquadrant.com

 On 12 December 2013 10:42, KONDO Mitsumasa
 kondo.mitsum...@lab.ntt.co.jp wrote:

  I agree with your request here, but I don't think negative values are
  the right way to implement that, at least it would not be very usable.
 
  I think that my proposal is the easiest and simplist way to solve this
  problem. And I believe that the man who cannot calculate the difference
 in
  time-zone doesn't set replication cluster across continents.
 
 
  My suggestion would be to add the TZ to the checkpoint record. This
  way all users of WAL can see the TZ of the master and act accordingly.
  I'll do a separate patch for that.
 
  It is something useful for also other situations. However, it might be
  going to happen long and complicated discussions... I think that our
 hope is
  to commit this patch in this commit-fest or next final commit-fest.

 Agreed on no delay for the delay patch, as shown by my commit.

Our forecast was very accurate...
Nice commit, Thanks!

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Time-Delayed Standbys

2013-12-04 Thread Mitsumasa KONDO
2013/12/4 Andres Freund and...@2ndquadrant.com

 On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote:
  4) Start the slave and connect to it using psql and in another session
 I can see
  all archive recovery log
  Hmm... I had thought my mistake in reading your email, but it reproduce
 again.
  When I sat small recovery_time_delay(=3), it might work collectry.
  However, I sat long timed recovery_time_delay(=300), it didn't work.

  My reporduced operation log is under following.
  [mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4  -p5432
  starting vacuum...end.
  transaction type: TPC-B (sort of)
  scaling factor: 10
  query mode: simple
  number of clients: 8
  number of threads: 4
  duration: 30 s
  number of transactions actually processed: 68704
  latency average: 3.493 ms
  tps = 2289.196747 (including connections establishing)
  tps = 2290.175129 (excluding connections establishing)
  [mitsu-ko@localhost postgresql]$ vim slave/recovery.conf
  [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start
  server starting
  [mitsu-ko@localhost postgresql]$ LOG:  database system was shut down
 in recovery at 2013-12-03 10:26:41 JST
  LOG:  entering standby mode
  LOG:  consistent recovery state reached at 0/5C4D8668
  LOG:  redo starts at 0/5C4000D8
  [mitsu-ko@localhost postgresql]$ FATAL:  the database system is
 starting up
  FATAL:  the database system is starting up
  FATAL:  the database system is starting up
  FATAL:  the database system is starting up
  FATAL:  the database system is starting up
  [mitsu-ko@localhost postgresql]$ bin/psql -p6543
  psql: FATAL:  the database system is starting up
  [mitsu-ko@localhost postgresql]$ bin/psql -p6543
  psql: FATAL:  the database system is starting up
  I attached my postgresql.conf and recovery.conf. It will be reproduced.

 So, you brought up a standby and it took more time to become consistent
 because it waited on commits? That's the problem? If so, I don't think
 that's a bug?

When it happened, psql cannot connect standby server at all. I think this
behavior is not good.
It should only delay recovery position and can seen old delay table data.
Cannot connect server is not hoped behavior.
If you think this behavior is the best, I will set ready for commiter. And
commiter will fix it better.

Rregards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Time-Delayed Standbys

2013-12-04 Thread Mitsumasa KONDO
2013/12/4 Christian Kruse christ...@2ndquadrant.com

 You created a master node and a hot standby with 300 delay. Then
 you stopped the standby, did the pgbench and startet the hot standby
 again. It did not get in line with the master. Is this correct?

No. First, I start master, and execute pgbench. Second, I start standby
with 300ms(50min) delay.
Then it cannot connect standby server by psql at all. I'm not sure why
standby did not start.
It might because delay feature is disturbed in REDO loop when first standby
start-up.


 I don't see a problem here… the standby should not be in sync with the
 master, it should be delayed. I did step by step what you did and
 after 50 minutes (300ms) the standby was at the same level the
 master was.

I think we can connect standby server any time, nevertheless with delay
option.


 Did I missunderstand you?

I'm not sure... You might right or another best way might be existed.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


Re: [HACKERS] Time-Delayed Standbys

2013-12-04 Thread Mitsumasa KONDO
2013/12/4 Andres Freund and...@2ndquadrant.com

 On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote:
  2013/12/4 Andres Freund and...@2ndquadrant.com
  When it happened, psql cannot connect standby server at all. I think this
  behavior is not good.
  It should only delay recovery position and can seen old delay table data.

 That doesn't sound like a good plan - even if the clients cannot connect
 yet, you can still promote the server.

I'm not sure your argument, but does a purpose of this patch slip off?

Just not taking delay into
 consideration at that point seems like it would possibly surprise users
 rather badly in situations they really cannot use such surprises.

Hmm... I think user will be surprised...

 I think it is easy to fix behavior using recovery flag.
So we had better to wait for other comments.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


[HACKERS] Add accurate option to pgbench

2013-10-31 Thread Mitsumasa KONDO
Hi,

I create pgbench patch that adding accurate option in benchmark, and submit
it in CF3.
It is simple option to get more accurate benchmark result and to avoid miss
benchmark result in pgbench.

Logic of this option is under following.
  1. execute cluster command to sort records.
  2. execute checkpoint to clear dirty-buffers in shared_buffers.
  3. execute sync command to clear dirty-file-caches in OS.
  4. waiting 10 seconds for raid cache is until empty .
  5. execute checkpoint to init checkpoint_timeout and checkpoint_segments.
  6. start benchmark.

Sample output is under following.

[mitsu-ko@vm-kondo pgbench]$ ./pgbench -a
starting cluster...end.
starting checkpoint...end.
starting sync all buffers and wait 10 seconds...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
accurate mode: on
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average: 0.000 ms
tps = 187.677120 (including connections establishing)
tps = 236.417798 (excluding connections establishing)


I hope that it will be reccomended pgbench option in commnity development.
However, it might too carefuly option before starting benchmark.
Please give me comments.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


pgbench_accurate_option_v0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_fallocate

2013-10-31 Thread Mitsumasa KONDO
Hi,

I'l like to add fallocate() system call to improve sequential read/write
peformance. fallocate() system call is different from posix_fallocate()
that is zero-fille algorithm to reserve continues disk space. fallocate()
is almost less overhead alogotithm to reserve continues disk space than
posix_fallocate().

It will be needed by sorted checkpoint and more faster vacuum command in
near the future.


If you get more detail information, please see linux manual.

I go sight seeing in Dublin with Ishii-san now:-)



Regards,

--

Mitsumasa KONDO

NTT Open Source Software


pg_fallocate_v0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2013-09-23 Thread Mitsumasa KONDO
 However this pattern induces stronger cache effects which are maybe not
too realistic,

 because neighboring keys in the middle are more likely to be chosen.

I think that your opinion is right. However, in effect, it is a
paseudo-benchmark, so that I think that such a simple mechanism is also
necessary.


 Have you considered adding a randomization layer, that is once you have
a key in [1 ..  n] centered around n/2, then you perform a pseudo-random
transformation into the same  domain so that key values are scattered over
the whole domain?

Yes. I also consider this patch. It can realize by adding linear mapping
array which is created by random generator. However, current erand48
algorithm is not high accuracy and  fossil algorithm, I do not know whether
it works well. If we realize it, we may need more accurate random generator
algorithm which is like Mersenne Twister*.*


Regards,

--

Mitsumasa KONDO


Re: [HACKERS] gaussian distribution pgbench

2013-09-23 Thread Mitsumasa KONDO
 You had accidentally added to the CF In Progress.

Oh, I had completely mistook this CF schedule :-)

Maybe, Horiguchi-san is same situation...


However, because of your moving, I become first submitter in next CF.

Thank you for moving :-)

--

Mitsumasa KONDO


Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4

2013-04-26 Thread Mitsumasa KONDO
I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during
archive recovery. But I cannot recognize why illegal WAL file was created
in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(),  first check WAL file does not get
fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()
 if (sources  XLOG_FROM_ARCHIVE)
 {
   fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
if (fd != -1)
{
   elog(DEBUG1, got WAL segment from archive);
   return fd;
}
 }

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And
return illegal WAL File’s fd.

XLogFileReadAnyTLI()
  if (sources  XLOG_FROM_PG_XLOG)
  {
 fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
 if (fd != -1)
return fd;
  }

Returned fd is be readFile value. Of cource readFile value is over 0. So
out of for-loop.

XLogPageRead
  readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
  sources);
   switched_segment = true;
   if (readFile = 0)
  break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead
   if (lseek(readFile, (off_t) readOff, SEEK_SET)  0)
  {
  ereport(emode_for_corrupt_record(emode, *RecPtr),
(errcode_for_file_access(),
   errmsg(could not seek in log file %u, segment %u to offset %u: %m,
readId, readSeg, readOff)));
  goto next_record_is_invalid;
   }
   if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
   {
  ereport(emode_for_corrupt_record(emode, *RecPtr),
(errcode_for_file_access(),
   errmsg(could not read from log file %u, segment %u, offset %u: %m,
readId, readSeg, readOff)));
  goto next_record_is_invalid;
   }
   if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
  goto next_record_is_invalid;


I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO