[PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Daniel Cristian Cruz
Hi all,

I'm trying to figure out some common slow queries running on the server, by
analyzing the slow queries log.

I found debug_print_parse, debug_print_rewritten, debug_print_plan, which
are too much verbose and logs all queries.

I was thinking in something like a simple explain analyze just for queries
logged with log_min_duration_statement with the query too.

Is there a way to configure PostgreSQL to get this kind of information,
maybe I'm missing something? Is it too hard to hack into sources and do it
by hand? I never touched PostgreSQL sources.

I'm thinking to write a paper that needs this information for my
postgraduate course. The focus of my work will be the log data, not
PostgreSQL itself. If I succeed, maybe it can be a tool to help all of us.

Thank you,
-- 
Daniel Cristian Cruz
クルズ クリスチアン ダニエル


Re: [PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Andreas Kretschmer
Daniel Cristian Cruz  wrote:

> Hi all,
> 
> I'm trying to figure out some common slow queries running on the server, by
> analyzing the slow queries log.
> 
> I found debug_print_parse, debug_print_rewritten, debug_print_plan, which are
> too much verbose and logs all queries.
> 
> I was thinking in something like a simple explain analyze just for queries
> logged with log_min_duration_statement with the query too.
> 
> Is there a way to configure PostgreSQL to get this kind of information, maybe
> I'm missing something? Is it too hard to hack into sources and do it by hand? 
> I
> never touched PostgreSQL sources.

Consider auto_explain, it's a contrib-modul, see
http://www.postgresql.org/docs/9.1/interactive/auto-explain.html



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Tomas Vondra
There's auto_explain contrib module that does exactly what you're asking
for. Anyway, explain analyze is quite expensive - think twice before
enabling that on production server where you already have performance
issues.

Tomas

On 10.12.2011 17:52, Daniel Cristian Cruz wrote:
> Hi all,
> 
> I'm trying to figure out some common slow queries running on the server,
> by analyzing the slow queries log.
> 
> I found debug_print_parse, debug_print_rewritten, debug_print_plan,
> which are too much verbose and logs all queries.
> 
> I was thinking in something like a simple explain analyze just for
> queries logged with log_min_duration_statement with the query too.
> 
> Is there a way to configure PostgreSQL to get this kind of information,
> maybe I'm missing something? Is it too hard to hack into sources and do
> it by hand? I never touched PostgreSQL sources.
> 
> I'm thinking to write a paper that needs this information for my
> postgraduate course. The focus of my work will be the log data, not
> PostgreSQL itself. If I succeed, maybe it can be a tool to help all of us.
> 
> Thank you,
> -- 
> Daniel Cristian Cruz
> クルズ クリスチアン ダニエル


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Daniel Cristian Cruz
At work we have a 24 cores server, with a load average around 2.5.

I don't know yet if a system which use some unused CPU to minimize the load
of a bad query identified early is bad or worse.

Indeed, I don't know if my boss would let me test this at production too,
but it could be good to know how things work in "auto-pilot" mode.

2011/12/10 Tomas Vondra 

> There's auto_explain contrib module that does exactly what you're asking
> for. Anyway, explain analyze is quite expensive - think twice before
> enabling that on production server where you already have performance
> issues.
>
> Tomas
>
> On 10.12.2011 17:52, Daniel Cristian Cruz wrote:
> > Hi all,
> >
> > I'm trying to figure out some common slow queries running on the server,
> > by analyzing the slow queries log.
> >
> > I found debug_print_parse, debug_print_rewritten, debug_print_plan,
> > which are too much verbose and logs all queries.
> >
> > I was thinking in something like a simple explain analyze just for
> > queries logged with log_min_duration_statement with the query too.
> >
> > Is there a way to configure PostgreSQL to get this kind of information,
> > maybe I'm missing something? Is it too hard to hack into sources and do
> > it by hand? I never touched PostgreSQL sources.
> >
> > I'm thinking to write a paper that needs this information for my
> > postgraduate course. The focus of my work will be the log data, not
> > PostgreSQL itself. If I succeed, maybe it can be a tool to help all of
> us.
> >
> > Thank you,
> > --
> > Daniel Cristian Cruz
> > クルズ クリスチアン ダニエル
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Daniel Cristian Cruz
クルズ クリスチアン ダニエル


Re: [PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Tomas Vondra
On 10.12.2011 23:40, Daniel Cristian Cruz wrote:
> At work we have a 24 cores server, with a load average around 2.5.

A single query is processes by a single CPU, so even if the system is
not busy a single query may hit CPU bottleneck. The real issue is the
instrumentation overhead - timing etc. On some systems (with slow
gettimeofday) this may be a significant problem as the query hits the
CPU boundary sooner.

> I don't know yet if a system which use some unused CPU to minimize the
> load of a bad query identified early is bad or worse.

Not really, due to the "single query / single CPU" rule.

> Indeed, I don't know if my boss would let me test this at production
> too, but it could be good to know how things work in "auto-pilot" mode.

What I was pointing out is that you probably should not enable loggin
"explain analyze" output by "auto_explain.log_analyze = true". There are
three levels of detail:

1) basic, just log_min_duration_statement

2) auto_explain, without 'analyze' - just explain plain

3) auto_explain, with 'analyze' - explain plan with actual values

Levels (1) and (2) are quite safe (unless the minimum execution time is
too low).

Tomas

> 
> 2011/12/10 Tomas Vondra mailto:t...@fuzzy.cz>>
> 
> There's auto_explain contrib module that does exactly what you're asking
> for. Anyway, explain analyze is quite expensive - think twice before
> enabling that on production server where you already have performance
> issues.
> 
> Tomas
> 
> On 10.12.2011 17:52, Daniel Cristian Cruz wrote:
> > Hi all,
> >
> > I'm trying to figure out some common slow queries running on the
> server,
> > by analyzing the slow queries log.
> >
> > I found debug_print_parse, debug_print_rewritten, debug_print_plan,
> > which are too much verbose and logs all queries.
> >
> > I was thinking in something like a simple explain analyze just for
> > queries logged with log_min_duration_statement with the query too.
> >
> > Is there a way to configure PostgreSQL to get this kind of
> information,
> > maybe I'm missing something? Is it too hard to hack into sources
> and do
> > it by hand? I never touched PostgreSQL sources.
> >
> > I'm thinking to write a paper that needs this information for my
> > postgraduate course. The focus of my work will be the log data, not
> > PostgreSQL itself. If I succeed, maybe it can be a tool to help
> all of us.
> >
> > Thank you,
> > --
> > Daniel Cristian Cruz
> > クルズ クリスチアン ダニエル
> 
> 
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 
> 
> 
> -- 
> Daniel Cristian Cruz
> クルズ クリスチアン ダニエル


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Daniel Cristian Cruz
2011/12/10 Tomas Vondra 

> On 10.12.2011 23:40, Daniel Cristian Cruz wrote:
> > At work we have a 24 cores server, with a load average around 2.5.
>
> A single query is processes by a single CPU, so even if the system is
> not busy a single query may hit CPU bottleneck. The real issue is the
> instrumentation overhead - timing etc. On some systems (with slow
> gettimeofday) this may be a significant problem as the query hits the
> CPU boundary sooner.
>

Yes, I forgot it will run on the same PID. Since analyze will cause all
queries to slow down, maybe the 24 cores could became overloaded.


>
> > I don't know yet if a system which use some unused CPU to minimize the
> > load of a bad query identified early is bad or worse.
>
> Not really, due to the "single query / single CPU" rule.


I guess it will be a nice tool to run in the validation server.


> > Indeed, I don't know if my boss would let me test this at production
> > too, but it could be good to know how things work in "auto-pilot" mode.
>
> What I was pointing out is that you probably should not enable loggin
> "explain analyze" output by "auto_explain.log_analyze = true". There are
> three levels of detail:
>
> 1) basic, just log_min_duration_statement
>
> 2) auto_explain, without 'analyze' - just explain plain
>
> 3) auto_explain, with 'analyze' - explain plan with actual values
>
> Levels (1) and (2) are quite safe (unless the minimum execution time is
> too low).
>

I would start with 5 seconds.

Reading the manual again and I saw that enabling analyze, it analyze all
queries, even the ones that wasn't 5 second slower. And understood that
there is no way to disable for slower queries, since there is no way to
know it before it ends...

I read Bruce blog about some features going to multi-core. Could explain
analyze go multi-core too?

Another thing I saw is that I almost never look at times in explain
analyze. I always look for rows divergence and methods used for scan and
joins when looking for something to get better performance.

I had the nasty idea of putting a // before de gettimeofday in the code for
explain analyze (I guess it could be very more complicated than this).

Sure, its ugly, but I think it could be an option for an explain analyze
"with no time", and in concept, it's what I'm looking for.

-- 
Daniel Cristian Cruz
クルズ クリスチアン ダニエル


[PERFORM] copy vs. C function

2011-12-10 Thread Jon Nelson
I was experimenting with a few different methods of taking a line of
text, parsing it, into a set of fields, and then getting that info
into a table.

The first method involved writing a C program to parse a file, parse
the lines and output newly-formatted lines in a format that
postgresql's COPY function can use.
End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
output new data to new file -- 4 seconds, COPY new file -- 10
seconds).

The next approach I took was to write a C function in postgresql to
parse a single TEXT datum into an array of C strings, and then use
BuildTupleFromCStrings. There are 8 columns involved.
Eliding the time it takes to COPY the (raw) file into a temporary
table, this method took 120 seconds, give or take.

The difference was /quite/ a surprise to me. What is the probability
that I am doing something very, very wrong?

NOTE: the code that does the parsing is actually the same,
line-for-line, the only difference is whether the routine is called by
a postgresql function or by a C program via main, so obviously the
overhead is elsewhere.
NOTE #2: We are talking about approximately 2.6 million lines.

I was testing:

\copy some_table from 'some_file.csv' with csv
vs.
insert into some_table select (some_func(line)).* from some_temp_table;

where some_func had been defined with (one) IN TEXT and (8) OUT params
of varying types.

PostgreSQL 9.1.1 on Linux, x86_64

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-10 Thread Craig Ringer

On 12/11/2011 09:27 AM, Jon Nelson wrote:

The first method involved writing a C program to parse a file, parse
the lines and output newly-formatted lines in a format that
postgresql's COPY function can use.
End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
output new data to new file -- 4 seconds, COPY new file -- 10
seconds).

Why not `COPY tablename FROM /path/to/myfifo' ?

Just connect your import program up to a named pipe (fifo) created with 
`mknod myfifo p` either by redirecting stdout or by open()ing the fifo 
for write. Then have Pg read from the fifo. You'll save a round of disk 
writes and reads.

The next approach I took was to write a C function in postgresql to
parse a single TEXT datum into an array of C strings, and then use
BuildTupleFromCStrings. There are 8 columns involved.
Eliding the time it takes to COPY the (raw) file into a temporary
table, this method took 120 seconds, give or take.

The difference was /quite/ a surprise to me. What is the probability
that I am doing something very, very wrong?
Have a look at how COPY does it within the Pg sources, see if that's any 
help. I don't know enough about Pg's innards to answer this one beyond 
that suggestion, sorry.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-10 Thread Sam Gendler
Start a transaction before the first insert and commit it after the last one 
and it will be much better, but I believe that the copy code path is optimized 
to perform better than any set of queries can, even in a single transaction

Sent from my iPhone

On Dec 10, 2011, at 5:27 PM, Jon Nelson  wrote:

> I was experimenting with a few different methods of taking a line of
> text, parsing it, into a set of fields, and then getting that info
> into a table.
> 
> The first method involved writing a C program to parse a file, parse
> the lines and output newly-formatted lines in a format that
> postgresql's COPY function can use.
> End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
> output new data to new file -- 4 seconds, COPY new file -- 10
> seconds).
> 
> The next approach I took was to write a C function in postgresql to
> parse a single TEXT datum into an array of C strings, and then use
> BuildTupleFromCStrings. There are 8 columns involved.
> Eliding the time it takes to COPY the (raw) file into a temporary
> table, this method took 120 seconds, give or take.
> 
> The difference was /quite/ a surprise to me. What is the probability
> that I am doing something very, very wrong?
> 
> NOTE: the code that does the parsing is actually the same,
> line-for-line, the only difference is whether the routine is called by
> a postgresql function or by a C program via main, so obviously the
> overhead is elsewhere.
> NOTE #2: We are talking about approximately 2.6 million lines.
> 
> I was testing:
> 
> \copy some_table from 'some_file.csv' with csv
> vs.
> insert into some_table select (some_func(line)).* from some_temp_table;
> 
> where some_func had been defined with (one) IN TEXT and (8) OUT params
> of varying types.
> 
> PostgreSQL 9.1.1 on Linux, x86_64
> 
> -- 
> Jon
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-10 Thread Jon Nelson
On Sat, Dec 10, 2011 at 8:32 PM, Craig Ringer  wrote:
> On 12/11/2011 09:27 AM, Jon Nelson wrote:
>>
>> The first method involved writing a C program to parse a file, parse
>> the lines and output newly-formatted lines in a format that
>> postgresql's COPY function can use.
>> End-to-end, this takes 15 seconds for about 250MB (read 250MB, parse,
>> output new data to new file -- 4 seconds, COPY new file -- 10
>> seconds).
>
> Why not `COPY tablename FROM /path/to/myfifo' ?

If I were to do this in any sort of production environment, that's
exactly what I would do. I was much more concerned about the /huge/
difference -- 10 seconds for COPY and 120 seconds for (INSERT INTO /
CREATE TABLE AS / whatever).

>> The next approach I took was to write a C function in postgresql to
>> parse a single TEXT datum into an array of C strings, and then use
>> BuildTupleFromCStrings. There are 8 columns involved.
>> Eliding the time it takes to COPY the (raw) file into a temporary
>> table, this method took 120 seconds, give or take.
>>
>> The difference was /quite/ a surprise to me. What is the probability
>> that I am doing something very, very wrong?
>
> Have a look at how COPY does it within the Pg sources, see if that's any
> help. I don't know enough about Pg's innards to answer this one beyond that
> suggestion, sorry.

Ack.

Regarding a subsequent email, I was using full transactions.


-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Common slow query reasons - help with a special log

2011-12-10 Thread Tomas Vondra
On 11.12.2011 02:27, Daniel Cristian Cruz wrote:
> I would start with 5 seconds.
> 
> Reading the manual again and I saw that enabling analyze, it analyze all
> queries, even the ones that wasn't 5 second slower. And understood that
> there is no way to disable for slower queries, since there is no way to
> know it before it ends...

Yes, you can't predict how long a query will run until it actually
finishes, so you have to instrument all of them. Maybe this will change
because of the "faster than light" neutrinos, but let's stick with the
current laws of physics for now.

> I read Bruce blog about some features going to multi-core. Could explain
> analyze go multi-core too?

I don't think so. This is what Bruce mentioned as "parallel execution"
and that's the very hard part requiring rearchitecting parts of the system.

> Another thing I saw is that I almost never look at times in explain
> analyze. I always look for rows divergence and methods used for scan and
> joins when looking for something to get better performance.
> 
> I had the nasty idea of putting a // before de gettimeofday in the code
> for explain analyze (I guess it could be very more complicated than this).

I was thinking about that too, but I've never done it. So I'm not sure
what else is needed.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance