Re: [PERFORM] Talking about optimizer, my long dream

2011-02-06 Thread Mladen Gogala

Please, don't include me on your emails. I unsubscribed from the list.


Cédric Villemain wrote:

2011/2/4 Frank Heikens frankheik...@mac.com:
  

On 04 Feb, 2011,at 02:56 PM, Mladen Gogala mladen.gog...@vmsinfo.com
wrote:

Віталій Тимчишин wrote:


Hi, all.

All this optimizer vs hint thread
  

There is no optimizer vs. hint. Hints are a necessary part of the
optimizer in all other databases.


That has nothing to do with PostgreSQL: PostgreSQL = PostgreSQL. And it
doesn't have hints and everybody knows it.

Without hints Postgres will not get
used in the company that I work for, period.


That's up to you, that's fine. But why did you start with PostgreSQL in the
first place? You knew PostgreSQL doesn't have hints and the wiki told you
hints are not wanted as well. When hints are an essential requirement for
your company, you should pick another product, EnterpriseDB Postgres Plus
for example.

I was willing to wait but
the fatwa against hints seems unyielding,


There is no fatwa. The PostgreSQL project prefers to spend resources on a
better optimizer to solve the real problems, not on hints for working around
the problems. That has nothing to do with any fatwa or religion.

so that's it. I am even
inclined to believe that deep down under the hood, this fatwa has an
ulterior motive, which disgusts me deeply. With hints, there would be
far fewer consulting gigs.


The consulting guys are the ones who love hints: They know they have to come
back the other month because the old hint does more harm than good when data
changes. And data will change over time.

You said it's so simple to implement hints in PostgreSQL, so please, show
us. Or ask/pay somebody to write this simple code for you to support hints,
nobody will ever stop you from doing that. When you have a use case that
proves the usage of hints will improve the performance of PostgreSQL and you
have some code that can be maintained by the PostgreSQL project, it might be
implemented in the contrib or even core. It's up to you, not somebody else.



Just in case you miss it:
http://www.sai.msu.su/~megera/wiki/plantuner

Btw feel free to do how you want, it is open source, and BSD, you can
take PostgreSQL, add hints, go and sell that to your boss.


  


Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance






  



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Talking about optimizer, my long dream

2011-02-04 Thread Mladen Gogala

Віталій Тимчишин wrote:

Hi, all.

All this optimizer vs hint thread
There is no optimizer vs. hint. Hints are a necessary part of the 
optimizer in all other databases. Without hints Postgres will not get 
used in the company that I work for, period. I was willing to wait but 
the fatwa against hints seems unyielding, so that's it. I am even 
inclined to believe that deep down under the hood, this fatwa has an 
ulterior motive, which disgusts me deeply. With hints, there would be 
far fewer consulting gigs.


Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Talking about optimizer, my long dream

2011-02-04 Thread Mladen Gogala

Shaun Thomas wrote:

On 02/04/2011 07:56 AM, Mladen Gogala wrote:

  

Hints are a necessary part of the
optimizer in all other databases. Without hints Postgres will not get
used in the company that I work for, period.



I've said repeatedly that EnterpriseDB, a fork of PostgreSQL, has the 
hints you seek, yet you seem to enjoy berating the PostgreSQL community 
as if it owes you something.


Also, we don't care if you don't use PostgreSQL. If I put something up 
for free, some random guy not taking it won't exactly hurt my feelings.


  
Shaun, I don't need to convince you or the Postgres community. I needed 
an argument to convince my boss.
My argument was that the sanctimonious and narrow minded Postgres 
community is unwilling to even consider creating the tools I need for 
large porting projects, tools provided by other major databases. This 
discussion served my purpose wonderfully.  Project is killed, here we 
part ways. No more problems for either of us. Good luck with the 
perfect optimizer and good riddance. My only regret is about the time 
I have wasted.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Greg Smith wrote:

Mladen Gogala wrote:
  
The techies at big companies are the guys who will or will not make it 
happen. And these guys are not beginners.  Appeasing them may actually 
go a long way.



The PostgreSQL community isn't real big on appeasing people if it's at 
the expense of robustness or correctness, and this issue falls into that 
category.  
With all due respect, I don't see how does the issue of hints fall into 
this category? As I explained, the mechanisms are already there, they're 
just not elegant enough. The verb appease doesn't convey the meaning 
that I had in mind quite correctly. The phrase target population would 
have  described what I wanted to say in a much better way .
There are downsides to that, but good things too.  Chasing 
after whatever made people happy regardless of its impact on the server 
code itself has in my mind contributed to why Oracle is so bloated and 
MySQL so buggy, to pick two examples from my favorite horse to whip.  
  
Well, those two databases are also used much more widely than Postgres, 
which means that they're doing something better than Postgres.


Hints are not even that complicated to program. The SQL parser should 
compile the list of hints into a table and optimizer should check 
whether any of the applicable access methods exist in the table. If it 
does - use it. If not, ignore it. This looks to me like a philosophical 
issue, not a programming issue. Basically, the current Postgres 
philosophy can be described like this: if the database was a gas stove, 
it would occasionally catch fire. However, bundling a fire extinguisher 
with the stove is somehow seen as bad. When the stove catches fire, 
users is expected to report the issue and wait for a better stove to be 
developed. It is a very rough analogy, but rather accurate one, too.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Chris Browne wrote:

It's worth looking back to what has already been elaborated on in the
ToDo.
  


And that precisely is what I am trying to contest.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Shaun Thomas wrote:

On 02/03/2011 10:38 AM, Mladen Gogala wrote:

  
It all boils down to the database. Hints, whether they're 
well-intentioned or not, effectively cover up bugs in the optimizer, 
planner, or some other approach the database is using to build its 
execution. 
Hints don't cover up bugs, they simply make it possible for the user to 
circumvent the bugs and keep the users happy. As I hinted before, this 
is actually a purist argument which was made by someone who has never 
had to support a massive production database with many users for living.
Your analogy is that PG is a gas stove, so bundle a fire 
extinguisher. Well, the devs believe that the stove should be upgraded 
to electric or possibly even induction to remove the need for the 
extinguisher.
  
In the meantime, the fire is burning. What should the hapless owner of 
the database application do in the meantime? Tell the users that it will 
be better in the next version? As I've said before: hints are make it or 
break it point. Without hints, I cannot consider Postgres for the 
mission critical projects. I am managing big databases for living and I 
flatter myself that after more than two decades of doing it, I am not 
too bad at it.


If they left hints in, it would just be one more thing to deprecate as 
the original need for the hint was removed. If you really need hints 
that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and 
it seems to work alright. That doesn't mean it's right, just that it 
works. EnterpriseDB will now have to support those query hints forever, 
even if the planner gets so advanced they're effectively useless.
  


I don't foresee that to happen in my lifetime. And I plan to go on for 
quite a while. There will always be optimizer bugs, users will be 
smarter and know more about their data than computer programs in 
foreseeable future. What this attitude boils down to is that developers 
don't trust their users enough to give them control of the execution 
path. I profoundly disagree with that type of philosophy. DB2 also has 
hints: http://tinyurl.com/48fv7w7
So does SQL Server: 
http://www.sql-server-performance.com/tips/hints_general_p1.aspx
Finally, even the Postgres greatest  open source competitor MySQL 
supports hints: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html


I must say that this purist attitude is extremely surprising to me. All 
the major DB vendors support optimizer hints, yet in the Postgres 
community, they are considered bad with almost religious fervor.

Postgres community is quite unique with the fatwa against hints.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Chris Browne wrote:

Well, the community declines to add hints until there is actual
consensus on a good way to add hints.
  
OK. That's another matter entirely.   Who should make that decision? Is 
there a committee or a person who would be capable of making that decision?



Nobody has ever proposed a way to add hints where consensus was arrived
at that the way was good, so...
  


So, I will have to go back on my decision to use Postgres and 
re-consider MySQL? I will rather throw away the effort invested in 
studying Postgres than to risk an unfixable application downtime.  I am 
not sure about the world domination thing, though. Optimizer hints are a 
big feature that everybody else has and Postgres does not have because 
of religious reasons.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Shaun Thomas wrote:
You missed the argument. The community, or at least the devs, see hints 
as an ugly hack. Do I agree? Not completely, but I can definitely 
understand the perspective. Saying every other vendor has hints is 
really just admitting every other vendor has a crappy optimizer. Is that 
something to be proud of?
  
This is funny? Everybody else has a crappy optimizer? That's a funny way 
of looking at the fact that every other major database supports hints. I 
would be tempted to call that a major missing feature, but the statement 
that everybody else has a crappy optimizer sounds kind of funny. No 
disrespect meant. It's not unlike claiming that the Earth is 6000 years old.




And here's something I never saw you consider: hints making performance 
worse. 
  
Sure. If you give me the steering wheell, there is a chance that I might 
get car off the cliff or even run someone over, but that doesn't mean 
that there is no need for having one. After all, we're talking about the 
ability to control the optimizer decision.


Want to remove a reverse index scan? Reindex with DESC on the column 
being reversed. That was added in 8.3. Getting too many calls for nested 
loops when a merge or hash would be faster? Increase the statistics 
target for the column causing the problems and re-analyze. Find an 
actual bug in the optimizer? Tell the devs and they'll fix it. Just stay 
current, and you get all those benefits. This is true for any database; 
bugs get fixed, things get faster and more secure.
  
In the meantime, the other databases provide hints which help me bridge 
the gap. As I said before: hints are there, even if they were not meant 
to be used that way. I can do things in a way that I consider very 
non-elegant. The hints are there because they are definitely needed. 
Yet, there is a religious zeal and a fatwa against them.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Josh Berkus wrote:

However, since this system wasn't directly compatible with Oracle Hints,
folks pushing for hints dropped the solution as unsatisfactory. This is
the discussion we have every time: the users who want hints specifically
want hints which work exactly like Oracle's, and aren't interested in a
system designed for PostgreSQL.  It's gotten very boring; it's like the
requests to support MySQL-only syntax.
  
Actually, I don't want Oracle hints. Oracle hints are ugly and 
cumbersome. I would prefer something like this:


http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

That should also answer the question about other databases supporting hints.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Mladen Gogala wrote:
Actually, I don't want Oracle hints. Oracle hints are ugly and 
cumbersome. I would prefer something like this:


http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

That should also answer the question about other databases supporting hints.
  


Sorry. I forgot that MySQL too is now an Oracle product.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Maciek Sakrejda wrote:

The hints are there because they are definitely needed. Yet, there is a
religious zeal and a fatwa against them.



The opposition is philosophical, not religious. There is no fatwa.
If you want a serious discussion, avoid inflammatory terms.


  
I don't want to insult anybody but the whole thing does look strange. 
Maybe we can agree to remove that ridiculous we don't want hints note 
from Postgresql wiki? That would make it look less like , hmph, 
philosophical issue and more not yet implemented issue, especially if 
we have in mind that hints are already here, in the form of 
enable_method switches.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Mark Kirkwood wrote:

On 04/02/11 11:08, Josh Berkus wrote:
  

I don't think that's actually accurate.  Can you give me a list of
DBMSes which support hints other than Oracle?



DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:

http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html


  

SQL Server and MySQL too.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Josh Berkus wrote:

I don't want to insult anybody but the whole thing does look strange.
Maybe we can agree to remove that ridiculous we don't want hints note
from Postgresql wiki? That would make it look less like , hmph,
philosophical issue and more not yet implemented issue, especially if
we have in mind that hints are already here, in the form of
enable_method switches.



Link? There's a lot of stuff on the wiki.


  

http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

No. 2 on the list.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Kevin Grittner wrote:

Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
  

Maybe we can agree to remove that ridiculous we don't want hints
note from Postgresql wiki?

 
I'd be against that.  This is rehashed less frequently since that

went in.  Less wasted time and bandwidth with it there.
  


Well, the problem will not go away.  As I've said before, all other 
databases have that feature and none of the reasons listed here 
convinced me that everybody else has a crappy optimizer.  The problem 
may go away altogether if people stop using PostgreSQL.
 
  

That would make it look less like , hmph, philosophical issue and
more not yet implemented issue,

 
Exactly what we don't want.
  

Who is we?


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Joshua D. Drake wrote:

On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote:
  
 

 
Exactly what we don't want.
  
  

Who is we?



The majority of long term hackers.

  
If that is so,  I don't see world domination in the future, exactly 
the opposite. Database whose creators don't trust their users cannot 
count on the very bright future. All other databases do have that 
feature. I must say, this debate gave me a good deal of stuff to think 
about.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Mladen Gogala

Robert Haas wrote:

On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote:
  

Kevin Grittner wrote:


Mladen Gogala mladen.gog...@vmsinfo.com wrote:
  

Maybe we can agree to remove that ridiculous we don't want hints
note from Postgresql wiki?



 I'd be against that.  This is rehashed less frequently since that
went in.  Less wasted time and bandwidth with it there.
  

Well, the problem will not go away.  As I've said before, all other
databases have that feature and none of the reasons listed here convinced me
that everybody else has a crappy optimizer.  The problem may go away
altogether if people stop using PostgreSQL.



You seem to be asserting that without hints, problem queries can't be
fixed.  But you haven't offered any evidence for that proposition, and
it doesn't match my experience, or the experience of other people on
this list who have been using PostgreSQL for a very long time.  If you
want to seriously advance this conversation, you should (1) learn how
people who use PostgreSQL solve these problems and then (2) if you
think there are cases where those methods are inadequate, present
them, and let's have a discussion about it.  People in this community
DO change their mind about things - but they do so in response to
*evidence*.  You haven't presented one tangible example of where the
sort of hints you seem to want would actually help anything, and yet
you're accusing the people who don't agree with you of being engaged
in a religious war.  It seems to me that the shoe is on the other
foot.  Religion is when you believe something first and then look for
evidence to support it.  Science goes the other direction.

  
Actually, it is not unlike a religious dogma, only stating that hints 
are bad. It even says so in the wiki. The arguments are
1) Refusal to implement hints is motivated by distrust toward users, 
citing that some people may mess things up.

   Yes, they can, with and without hints.
2) All other databases have them. This is a major feature and if I were 
in the MySQL camp, I would use it as an
  argument. Asking me for some proof is missing the point. All other 
databases have hints precisely because
  they are useful. Assertion that only Postgres is so smart that can 
operate without hints doesn't match the
  reality. As a matter of fact, Oracle RDBMS on the same machine will 
regularly beat PgSQL in performance.
  That has been my experience so far.   I even posted counting query 
results.
3) Hints are make it or break it feature. They're absolutely needed in 
the fire extinguishing situations.


I see no arguments to say otherwise and until that ridiculous we don't 
want hints dogma is on wiki, this is precisely what it is:  a dogma. 
Dogmas do not change and I am sorry that you don't see it that way. 
However, this discussion
did convince me that I need to take another look at MySQL and tone down 
my engagement with PostgreSQL community. This is my last post on the 
subject because posts are becoming increasingly personal. This level of 
irritation is also
characteristic of a religious community chastising a sinner. Let me 
remind you again: all other major databases have that possibility: 
Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof 
about hints is equivalent to saying that all these databases are 
developed by idiots and have a crappy optimizer.
I am not going to back down, but I may stop using Postgres altogether. 
If that was your goal, you almost achieved it. Oh yes, and good luck 
with the world domination. If there is not enough common sense even to 
take down  that stupid dogma on the wiki, there isn't much hope left.
With this post, my participation in this group is finished, for the 
foreseeable future.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-02 Thread Mladen Gogala

Robert Haas wrote:
On Tue, Feb 1, 2011 
It would be pretty hard to make autoanalyze work on such tables

without removing some of the performance benefits of having such
tables in the first place - namely, the local buffer manager.  But you
could ANALYZE them by hand.

  
Not necessarily autoanalyze, some default rules for the situations when 
stats is not there should be put in place,

like the following:
1) If there is a usable index on the temp table - use it.
2) It there isn't a usable index on the temp table and there is a join, 
make the temp table the first table

   in the nested loop join.

People are complaining about the optimizer not using the indexes all 
over the place, there should be a way to
make the optimizer explicitly prefer the indexes, like was the case with 
Oracle's venerable RBO (rules based
optimizer). RBO didn't use statistics, it had a rank of access method 
and used the access method with the highest
rank of all available access methods. In practice, it translated into: 
if an index exists - use it.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Fwd: Re: [HACKERS] [PERFORM] Slow count(*) again...]

2011-02-02 Thread Mladen Gogala

I mistakenly replied to sender only.

Jon Nelson wrote:

However, sometimes using an index results in a HORRIBLE HORRIBLE plan.
I recently encountered the issue myself, and plopping an ANALYZE
$tablename in there, since I was using a temporary table anyway, make
all the difference. The planner switched from an index-based query to
a sequential scan, and a sequential scan was (is) vastly more
efficient in this particular case.
  


That can be fixed by modifying the query.  One can write the query in 
such a way that optimizer cannot use an index.



Personally, I'd get rid of autovacuum/autoanalyze support on temporary
tables (they typically have short lives and are often accessed
immediately after creation preventing the auto* stuff from being
useful anyway), *AND* every time I ask I'm always told make sure
ANALYZE the table before you use it.

  
I consider that requirement very bad. I hate it when I have to do things 
like this:

try {
   $tmprows=array();
   $db-StartTrans();
   foreach ($result[matches] as $doc = $docinfo) {
   $tmp=$result[matches][$doc][attrs][created];
   $tmprows[]=array(date($FMT,$tmp),$doc);
   }
   $db-Execute($TMPINS,$tmprows);
   $db-CommitTrans();

// Why the heck is this needed?

   $db-Execute(analyze tempids);

   $tmprows=array();
   if ($result[total_found]$result[total]) {
   print Total results: . $result[total_found] . br;
   print Returned results: . $result[total] . br;
   }
   $result=array();
   $rs = $db-Execute($IGEN, array($beg, $end));
   show($fmt,$rs);
   }
   catch(Exception $e) {

The analyze tempids line makes my code ugly and slows it down.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions







--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-02 Thread Mladen Gogala

Greg Smith wrote:
Given that even Oracle kicked out the RBO a long time ago, I'm not so 
sure longing for those good old days will go very far.  I regularly see 
queries that were tweaked to always use an index run at 1/10 or less the 
speed of a sequential scan against the same data.  The same people 
complaining all over the place about this topic are also the sort who 
write them.  There are two main fallacies at play here that make this 
happen:
  
Oracle just gives an impression that RBO is gone. It's actually still 
there, even in 11.2:


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing 
options


SQL alter session set optimizer_mode=rule;

Session altered.

Oracle people were just as puritanical as Postgres  people, if not more 
so. However, the huge backlash made them reconsider the decision. RBO is 
officially de-supported, obsolete and despised but it is also widely 
used, even in the Oracle's own SYS schema. Oracle is having huge 
problems with trying to get people to the cost based optimizer, but they 
are not yet quite done.


1) Even if you use an index, PostgreSQL must still retrieve the 
associated table data to execute the query in order to execute its 
version of MVCC
  
Of course. Nobody contests that.  However, index scans for OLTP are 
indispensable. Sequential scans just don't do the trick in some situations.



2) The sort of random I/O done by index lookups can be as much as 50X as 
expensive on standard hard drives as sequential, if every block goes to 
physical hardware.
  


Greg, how many questions about queries not using an index have you seen? 
There is a reason why people keep asking that. The sheer number of 
questions like that on this group should tell you that there is a 
problem there. 
There must be a relatively simple way of influencing optimizer 
decisions. With all due respect, I consider myself smarter than the 
optimizer.  I'm 6'4, 235LBS so telling me that you disagree and that I 
am more stupid than a computer program,  would not be a smart thing to 
do. Please, do not misunderestimate me.


If I were to work on improving this area, it would be executing on some 
plans a few of us have sketched out for exposing some notion about what 
indexes are actually in memory to the optimizer.  There are more obvious 
fixes to the specific case of temp tables though.


  
I've had a run in with a temporary table, that I had to resolve by 
disabling hash join and merge join, that really irritated me.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-02 Thread Mladen Gogala

Justin Pitts wrote:

With all
due respect, I consider myself smarter than the optimizer.  I'm 6'4, 235LBS
so telling me that you disagree and that I am more stupid than a computer
program,  would not be a smart thing to do. Please, do not misunderestimate
me.



I don't see computer programs make thinly veiled threats, especially
in a public forum.
I'll do what you claim is not the smart thing and disagree with you.
You are wrong.
You are dragging the signal-to-noise ratio of this discussion down.
You owe Greg an apology.
  
I apologize if that was understood as a threat. It was actually a joke. 
I thought that my using of the word misunderestimate has made it 
abundantly clear. Apparently, G.W. doesn't have as many fans as I have 
previously thought. Once again, it was a joke, I humbly apologize if 
that was misunderstood.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-02 Thread Mladen Gogala

On 2/2/2011 7:03 PM, Greg Smith wrote:

I think that's not quite the right question.  For every person like
yourself who is making an informed the optimizer is really picking the
wrong index request, I think there are more who are asking for that but
are not actually right that it will help.  I think you would agree that
this area is hard to understand, and easy to make mistakes about, yes?
So the right question is how many questions about queries not using an
index would have actually benefitted from the behavior they asked for?
That's a much fuzzier and harder to answer question.

I agree that it would be nice to provide a UI for the informed.
Unfortunately, the problem I was pointing out is that doing so could, on
average, make PostgreSQL appear to run worse to people who use it.

Greg, I understand your concerns, but let me point out two things:
1)  The basic mechanism is already there. PostgreSQL has a myriad of 
ways to actually control the optimizer.  One, completely analogous to 
Oracle mechanisms, is to control the cost of sequential vs. random page 
scan. The other,  completely analogous to Oracle  hints, is based on the 
group of switches for turning on and off various join and access 
methods.  This also includes setting join_collapse limit to 1, to force 
the desired join order. The third way is to actually make the optimizer 
work a lot harder by setting gego_effort to 10 and 
default_statistics_target to 1000 or more, which will increase the size 
of histograms and increase the time and CPU spent on parsing.  I can 
literally force the plan of my choosing on Postgres optimizer. The 
mechanisms are already there, I am only pleading for a more elegant version.


2) The guys who may spread Postgres and help it achieve the desired 
world domination, discussed here the other day, are database 
administrators in the big companies. If you get people from JP Morgan 
Chase, Bank of America, Goldman Sachs or Lehman Brothers to start using 
Postgres for serious projects, the rest will follow the suit.  People 
from some of these companies have already been seen on NYC Postgres 
meetings.
Granted, MySQL started on the other end of the spectrum, by being used 
for ordering downloaded MP3 collections, but it had found its way into 
the corporate server rooms, too. The techies at big companies are the 
guys who will or will not make it happen. And these guys are not 
beginners.  Appeasing them may actually go a long way.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-01 Thread Mladen Gogala

On 2/1/2011 6:03 PM, Andrew Dunstan wrote:

Whether or not it's bad application design, it's ubiquitous, and we
should make it work as best we can, IMNSHO. This often generates
complaints about Postgres, and if we really plan for world domination
this needs to be part of it.


There are many other things to fix first. One of them would be optimizer 
decisions when a temp table is involved. I would also vote for wait 
event interface, tracing and hints, much rather than speeding up 
count(*). World domination will not be achieved by speeding up count(*), 
it will be achieved by providing overall performance akin to what the 
player who has already achieved the world domination. I believe that the 
company is called Oracle Corp. or something like that?


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-01 Thread Mladen Gogala

Samuel Gendler wrote:


 
Don't listen to him.  He's got an oracle bias.

And bad sinuses, too.
 Slashdot already announced that NoSQL is actually going to dominate 
the world, so postgres has already lost that battle.  Everything 
postgres devs do now is just an exercise in relational masturbation. 
 Trust me.


I knew that there is some entertainment value on this list. Samuel, your 
point of view is very..., er, refreshing. Trust me.



--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Any experience using shake defragmenter?

2011-01-30 Thread Mladen Gogala
Did anyone try using shake while the cluster is active? Any problems 
with corruption or data loss? I ran the thing on my home directory and 
nothing was broken. I didn't develop any performance test, so cannot 
vouch for the effectiveness of the procedure. Did anyone play with that? 
Any positive or negative things to say about shake?


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any experience using shake defragmenter?

2011-01-30 Thread Mladen Gogala

Mark Felder wrote:

Why do you feel the need to defrag your *nix box?


  
Let's stick to the original question and leave my motivation for some 
other time. Have you used the product? If you have, I'd be happy to hear 
about your experience with it.



--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any experience using shake defragmenter?

2011-01-30 Thread Mladen Gogala

Marcin Mirosław wrote:

W dniu 2011-01-30 22:31, Mark Felder pisze:
  

Why do you feel the need to defrag your *nix box?



I'm guessing, maybe he used filefrag and saw 3 extents? :)
Next question will be which fs do you use? and then flame will start:(
Regards

  
With all due respect, I don't want to start a fruitless flame war. I am 
asking those who have used it about their experiences with the product.  
Let's leave discussion of my motivation for some other time. I guess 
it's all about my unhappy childhood. If you have used the defragmenter, 
I'd be grateful for your experience.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load,

2011-01-28 Thread Mladen Gogala

Michael Kohl wrote:

We are already doing the logging part, we are just a bit behind on the
explain analyze part of things. One day soon...

  

There is, of course, the auto_explain module which will do that for you.

--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: FW: [PERFORM] Queries becoming slow under heavy load

2011-01-28 Thread Mladen Gogala

Shaun Thomas wrote:

On 01/27/2011 11:12 PM, Anne Rosset wrote:

  

Thanks for your response.
We are over NFS for our storage ...



NFS? I'm not sure you know this, but NFS has major locking issues
that would make it a terrible candidate for hosting a database.

  
That depends on the implementation. Vendor supported NAS, running NFS3 
or NFS4 should be OK. There are other databases that can use it, too. 
Some databases even have a built-in NFS client.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Real vs Int performance

2011-01-27 Thread Mladen Gogala

On 1/27/2011 9:30 AM, Shaun Thomas wrote:

I'm not sure about orders of magnitude on the storage/index side, but my
tests gave us a 10% boost if just the keys are switched over to INT or
BIGINT.


Well, it depends on what you're doing. Searching by an integer vs. 
searching by a text string will probably not make much of a difference. 
However, if you are calculating sums or averages, there will be a huge 
difference.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

I have a table EMP, with 14 rows and a description like this:
scott= \d+ emp
 Table public.emp
  Column  |Type | Modifiers | Storage  | 
Description

--+-+---+--+-
 empno| smallint| not null  | plain|
 ename| character varying(10)   |   | extended |
 job  | character varying(9)|   | extended |
 mgr  | smallint|   | plain|
 hiredate | timestamp without time zone |   | plain|
 sal  | double precision|   | plain|
 comm | double precision|   | plain|
 deptno   | smallint|   | plain|
Indexes:
emp_pkey PRIMARY KEY, btree (empno)
emp_mgr_i btree (mgr)
Foreign-key constraints:
fk_deptno FOREIGN KEY (deptno) REFERENCES dept(deptno)
Has OIDs: no

scott=

A recursive query doesn't use existing index on mgr:
scott= explain analyze
with recursive e(empno,ename,mgr,bossname,level) as (
select empno,ename,mgr,NULL::varchar,0 from emp where empno=7839
union
select emp.empno,emp.ename,emp.mgr,e.ename,e.level+1
from emp,e
where emp.mgr=e.empno)
select * from e;
 QUERY PLAN
-
 CTE Scan on e  (cost=20.59..23.21 rows=131 width=78) (actual 
time=0.020..0.143 rows=14 loops=1)

   CTE e
 -  Recursive Union  (cost=0.00..20.59 rows=131 width=52) (actual 
time=0.018..0.128 rows=14 loops=1)
   -  Seq Scan on emp  (cost=0.00..1.18 rows=1 width=10) 
(actual time=0.013..0.015 rows=1 loops=1)

 Filter: (empno = 7839)
   -  Hash Join  (cost=0.33..1.68 rows=13 width=52) (actual 
time=0.016..0.021 rows=3 loops=4)

 Hash Cond: (public.emp.mgr = e.empno)
 -  Seq Scan on emp  (cost=0.00..1.14 rows=14 
width=10) (actual time=0.001..0.004 rows=14 loops=4)
 -  Hash  (cost=0.20..0.20 rows=10 width=44) (actual 
time=0.004..0.004 rows=4 loops=4)

   Buckets: 1024  Batches: 1  Memory Usage: 1kB
   -  WorkTable Scan on e  (cost=0.00..0.20 
rows=10 width=44) (actual time=0.001..0.002 rows=4 loops=4)

 Total runtime: 0.218 ms
(12 rows)

scott=

The optimizer will not use index, not even when I turn off both hash and 
merge joins. This is not particularly important for a table with 14 
rows, but for a larger table, this is a problem. The
only way to actually force the use of index is by disabling seqscan, but 
that chooses a wrong path
again, because it reads the outer table by primary key, which will be 
very slow. Full table scan,
done by the primary key is probably the slowest thing around. I know 
about the PostgreSQL philosophy
which says hints are bad, and I deeply disagree with it, but would it 
be possible to have at
least one parameter that would change calculations in such a way that 
indexes are favored, where they exist?


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

On 1/27/2011 10:45 AM, Kenneth Marshall wrote:

PostgreSQL will only use an index if the planner thinks that it
will be faster than the alternative, a sequential scan in this case.
For 14 rows, a sequential scan is 1 read and should actually be
faster than the index. Did you try the query using EXPLAIN ANALYZE
once with index and once without? What were the timings? If they
do not match reality, adjusting cost parameters would be in order.


I did. I even tried with an almost equivalent outer join:

 explain analyze select e1.empno,e1.ename,e2.empno,e2.ename
from emp e1 left outer join emp e2 on (e1.mgr=e2.empno);
  QUERY PLAN


--
 Nested Loop Left Join  (cost=0.00..7.25 rows=14 width=16) (actual 
time=0.028..0

.105 rows=14 loops=1)
   Join Filter: (e1.mgr = e2.empno)
   -  Seq Scan on emp e1  (cost=0.00..2.14 rows=14 width=10) (actual 
time=0.006

..0.010 rows=14 loops=1)
   -  Materialize  (cost=0.00..2.21 rows=14 width=8) (actual 
time=0.001..0.003

rows=14 loops=14)
 -  Seq Scan on emp e2  (cost=0.00..2.14 rows=14 width=8) 
(actual time=

0.001..0.005 rows=14 loops=1)
 Total runtime: 0.142 ms
(6 rows)

This gives me the same result as the recursive version, minus the level 
column. I am porting an application from Oracle, there is a fairly large 
table that is accessed by connect by. Rewriting it as a recursive join 
is not a problem, but the optimizer doesn't really use the indexes.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

On 1/27/2011 10:51 AM, J Sisson wrote:

Also, if random_page_cost is set to default (4.0), the planner will
tend towards sequential scans.

scott= show random_page_cost;
 random_page_cost
--
 1
(1 row)

scott= show seq_page_cost;
 seq_page_cost
---
 2
(1 row)


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

On 1/27/2011 11:40 AM, Tom Lane wrote:

It is worth noting that EXPLAIN results should not be extrapolated
 to situations other than the one you are actually testing; for
 example, results on a toy-sized table cannot be assumed to apply to
 large tables.
Well, that's precisely what I tried. Bummer, I will have to copy a large 
table over.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

On 1/27/2011 3:10 PM, Igor Neyman wrote:


Mladen,

I don't think, this is exclusive Postgres feature.
I'm pretty sure, Oracle optimizer will do TABLE ACCESS (FULL) instead
of using index on 14-row table either.

Regards,
Igor Neyman


Well, lets' see:

SQL select * from v$version;

BANNER

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
PL/SQL Release 11.2.0.2.0 - Production
CORE11.2.0.2.0Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

Elapsed: 00:00:00.00
SQL set autotrace on explain
SQL with e(empno,ename,mgr,bossname,lev) as (
  2  select empno,ename,mgr,NULL,0 from emp where empno=7839
  3  union all
  4  select emp.empno,emp.ename,emp.mgr,e.ename,e.lev+1
  5  from emp,e
  6  where emp.mgr=e.empno)
  7  select * from e
  8  /

 EMPNO ENAME MGR BOSSNAME  LEV
-- -- -- -- --
  7839 KING  0
  7566 JONES7839 KING1
  7698 BLAKE7839 KING1
  7782 CLARK7839 KING1
  7499 ALLEN7698 BLAKE   2
  7521 WARD 7698 BLAKE   2
  7654 MARTIN   7698 BLAKE   2
  7788 SCOTT7566 JONES   2
  7844 TURNER   7698 BLAKE   2
  7900 JAMES7698 BLAKE   2
  7902 FORD 7566 JONES   2

 EMPNO ENAME MGR BOSSNAME  LEV
-- -- -- -- --
  7934 MILLER   7782 CLARK   2
  7369 SMITH7902 FORD3
  7876 ADAMS7788 SCOTT   3

14 rows selected.

Elapsed: 00:00:00.01

Execution Plan
--
Plan hash value: 2925328376




| Id  | Operation | Name   | Rows  | 
Bytes | Cos

t (%CPU)| Time |




|   0 | SELECT STATEMENT  ||15 |   795 |
 6  (17)| 00:00:56 |

|   1 |  VIEW ||15 |   795 |
 6  (17)| 00:00:56 |

|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST||   |   |
|  |

|   3 |TABLE ACCESS BY INDEX ROWID| EMP| 1 |24 |
 1   (0)| 00:00:11 |

|*  4 | INDEX UNIQUE SCAN | PK_EMP | 1 |   |
 0   (0)| 00:00:01 |

|*  5 |HASH JOIN  ||14 |   798 |
 5  (20)| 00:00:46 |

|   6 | RECURSIVE WITH PUMP   ||   |   |
|  |

|   7 | TABLE ACCESS FULL | EMP|14 |   336 |
 3   (0)| 00:00:31 |





Predicate Information (identified by operation id):
---

   4 - access(EMPNO=7839)
   5 - access(EMP.MGR=E.EMPNO)

Note
-
   - SQL plan baseline SQL_PLAN_1tmxjj25531vff51d791e used for this 
statement


SQL spool off


There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

On 1/27/2011 3:37 PM, Scott Marlowe wrote:

On Thu, Jan 27, 2011 at 1:31 PM, Mladen Gogala
mladen.gog...@vmsinfo.com  wrote:

There is INDEX UNIQUE SCAN PK_EMP.  Oracle will use an index.

That's because Oracle has covering indexes.

I am not sure what you mean by covering indexes but I hope that for 
the larger table I have in mind,  indexes will be used.  For a small 
table like this, not using an index may actually be a better plan. I 
cannot compare because my development PostgreSQL cluster is on a much 
weaker machine than the development Oracle database.
I even looked into Wikipedia for the notion of covering index and it 
is defined as an index which contains all the data requested in a query. 
This is not the case, EMP is not an index-organized table. The only 
index used was the primary key, also available in the PostgreSQL version 
of the table.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala
 |   795 
|10

  (10)| 00:01:36 |

|   1 |  VIEW |  |15 |   795 
|10

  (10)| 00:01:36 |

|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|  |   |   |
  |  |

|*  3 |TABLE ACCESS FULL  | EMP  | 1 |24 
| 3

   (0)| 00:00:31 |

|*  4 |HASH JOIN  |  |14 |   798 
| 7

  (15)| 00:01:06 |

|   5 | RECURSIVE WITH PUMP   |  |   |   |
  |  |

|   6 | TABLE ACCESS FULL | EMP  |14 |   336 
| 3

   (0)| 00:00:31 |


--


Predicate Information (identified by operation id):
---

   3 - filter(EMPNO=7839)
   4 - access(E2.MGR=E1.EMPNO)

SQL spool off

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0 has a bias against indexes

2011-01-27 Thread Mladen Gogala

On 1/27/2011 4:25 PM, Scott Marlowe wrote:

On Oracle?  Then how can it get the values it needs without having to
hit the data store?


It can't. It does hit the data store.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Mladen Gogala

On 1/21/2011 12:09 PM, Robert Haas wrote:

Looks like the bad selectivity estimate there is what's killing it.
Not sure I completely understand why 9.0.2 is coming up with such a
bad estimate, though.



I would recommend setting default_statistics_target to 1024 and 
effective cache size to 20480MB and see what happens.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-21 Thread Mladen Gogala

On 1/21/2011 12:51 PM, Robert Haas wrote:

I am starting to suspect that there is a bug in the join selectivity
logic in 9.0.  We've had a few complaints where the join was projected
to return more rows than the product of the inner side and outer side
of the join, which is clearly nonsense.  I read the function and I
don't see anything weird... and it clearly can't be too bad or we
would have had more complaints... but...


Well the way to test it would be to take the function from 8.3, input 
the same arguments and see if there is any difference with the results.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy command and blobs

2011-01-20 Thread Mladen Gogala

Madhu Ramachandran wrote:

hello,

i have a table with OID column.. I want to use the copy command to 
insert bunch of rows (1 million).
but iam unable to specify the correct format for the oid type (i have 
.jpg files to be stored in this column)..
Huh? oid is a keyword, an automatically generated row id, and is 
deprecated. You shouldn't be doing anything with it, much less copying it.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-18 Thread Mladen Gogala
 width=4) (actual 
time=2.491..2.491 rows=273 loops=1)
 -  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 
width=4) (actual time=0.012..1.306 rows=273 loops=1)
 Total runtime: 553.601 ms
(33 rows)

Is there any other data i could post (pg_stat,...) that would help?

thanx a lot.

  

Regards


Ing. Marcos Luís Ortíz Valmaseda
Linux User # 418229  PostgreSQL DBA
Centro de Tecnologías Gestión de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice






--
Achilleas Mantzios

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
  

Achilleas, here is the slow part from 9.02:

-  Hash Semi Join  
(cost=2768.00..5671.67 rows=1 width=12) (actual time=39.708..81.501 rows=1876 
loops=1)
  Hash Cond: 
(msold.marinerid = msold2.marinerid)
  Join Filter: ((msold2.id  
msold.id) AND (msold2.starttime  msold.starttime) AND ((msold.starttime - 
msold2.endtime) = '1 year 6 mons'::interval))
  -  Seq Scan on 
marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual 
time=0.003..33.952 rows=2625 loops=1)
Filter: 
(((state)::text = 'Active'::text) AND ((starttime)::date = '2007-01-11'::date) 
AND ((COALESCE(endtime, now()))::date = '2006-07-15'::date))
  -  Hash  
(cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.613..39.613 rows=41250 
loops=1)
Buckets: 8192  
Batches: 1  Memory Usage: 2246kB
-  Seq Scan on marinerstates msold2  (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..24.882 



The same part from 8.3.3 looks like this:

Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual 
time=0.017..80.153 rows=2625 loops=1)
  Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date = '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
= '2006-07-15'::date))
-  Hash  (cost=630491.54..630491.54 rows=7103 width=23) 
(actual time=437.307..437.307 rows=12832 loops=1)
  -  Index Scan using mariner_pkey on mariner m  
(cost=628776.89..630491.54 rows=7103 width=23) (actual time=311.023..380.168 
rows=12832 loops=1)
Filter: ((NOT (hashed subplan)) AND 
((marinertype)::text = 'Mariner'::text))
SubPlan
  -  Unique  (cost=0.00..628772.30 rows=1834 
width=4) (actual time=0.129..303.981 rows=1454 loops=1)
-  Nested Loop  (cost=0.00..628767.72 
rows=1834 width=4) (actual time=0.120..289.961 rows=1835 loops=1)


This leads me to the conclusion that the queries differ significantly. 
8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering 
conditions look differently. Are you sure that the plans are from the 
same query?


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

2011-01-17 Thread Mladen Gogala

Achilleas Mantzios wrote:

From the whole set of the tests involved, it seems like the NOT IN version of 
the query runs slow
in any postgresql 9.0.2 tested.
  

Not only that, it will run slower even using Oracle 11.2 or MySQL 5.5.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Mladen Gogala

Tom Lane wrote:
If I'm not mistaken, that's a DB2-ish locution 


It could also be a part of the Oracle vernacular. I've seen queries like 
that running against Oracle RDBMS, too.



for a query with DISTINCT
ON, ie, you're looking for the row with highest _revision_created for
each value of id.  It might perform well on DB2, but it's going to
mostly suck on Postgres --- we don't optimize window-function queries
very much at all at the moment.  
Hmmm, what optimizations do you have in mind? I thought that window 
functions are just clever tricks with memory? Anything that can be 
expected for 9.0x?




Try writing it with DISTINCT ON instead
of a window function, like so:
  

Wouldn't distinct necessarily bring about the sort/merge?

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The good, old times

2011-01-14 Thread Mladen Gogala

Craig Ringer wrote:

On 01/12/2011 10:16 PM, Guillaume Cottenceau wrote:

  

What's your point and in what is it related to that ML?



Given the package names, I suspect this is a poorly-expressed complaint 
about the performance of downloads from the pgdg/psqlrpms site. If that 
was the original poster's intent, they would've been better served with 
a post that included some minimal details like:
  

Yes, it was a complaint about the download speed.



- Information abut their local connectivity
- mtr --report / traceroute output
- tests from other available hosts
  


As for the traceroute information, here it is:
traceroute yum.pgrpms.org
traceroute to yum.pgrpms.org (77.79.103.58), 30 hops max, 40 byte packets
1  216.169.135.254 (216.169.135.254)  0.389 ms  0.404 ms  0.451 ms
2  host189.131.26.216.vmsinfo.com (216.26.131.189)  9.355 ms  9.357 ms  
9.368 ms

3  v11.lc2.lou.peak10.net (216.26.190.10)  9.645 ms  9.645 ms  9.637 ms
4  ge-7-41.car1.Cincinnati1.Level3.net (4.53.64.41)  13.002 ms  13.002 
ms  13.018 ms
5  ae-2-5.bar1.Cincinnati1.Level3.net (4.69.132.206)  13.101 ms  13.098 
ms  13.087 ms
6  ae-10-10.ebr2.Chicago1.Level3.net (4.69.136.214)  22.096 ms  21.358 
ms  21.329 ms
7  ae-1-100.ebr1.Chicago1.Level3.net (4.69.132.41)  27.729 ms  10.812 
ms  24.132 ms
8  ae-2-2.ebr2.NewYork2.Level3.net (4.69.132.66)  34.008 ms  33.960 ms  
34.088 ms
9  ae-1-100.ebr1.NewYork2.Level3.net (4.69.135.253)  34.152 ms  35.353 
ms  37.068 ms
10  ae-4-4.ebr1.NewYork1.Level3.net (4.69.141.17)  36.998 ms  37.248 ms  
36.986 ms
11  ae-43-43.ebr2.London1.Level3.net (4.69.137.73)  107.031 ms 
ae-42-42.ebr2.London1.Level3.net (4.69.137.69)  104.624 ms  107.000 ms
12  ae-2-52.edge4.London1.Level3.net (4.69.139.106)  107.506 ms  106.993 
ms  180.229 ms

13   (195.50.122.174)  168.849 ms  160.917 ms  161.713 ms
14  static.turktelekom.com.tr (212.156.103.42)  176.503 ms  179.012 ms  
179.394 ms
15  gayrettepe-t3-1-gayrettepe-t2-1.turktelekom.com.tr (212.156.118.29)  
167.867 ms  167.870 ms  167.862 ms

16  88.255.240.110 (88.255.240.110)  167.515 ms  168.172 ms  165.829 ms
17  ns1.gunduz.org (77.79.103.58)  171.574 ms !X * *
[mgogala@lpo-postgres-d01 ~]$

Are there any good mirrors?  Apparently, there is something slow in the 
force.





If that wasn't the original poster's intent, perhaps it'd be worth a 
second try to explain what they were *trying* to say? Was it just a joke 
- 'cos if so, it was kinda flat.


--
Craig Ringer

  



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] queries with lots of UNIONed relations

2011-01-13 Thread Mladen Gogala

On 1/13/2011 5:41 PM, Robert Haas wrote:

You might be right, but I'm not sure.  Suppose that there are 100
inheritance children, and each has 10,000 distinct values, but none of
them are common between the tables.  In that situation, de-duplicating
each individual table requires a hash table that can hold 10,000
entries.  But deduplicating everything at once requires a hash table
that can hold 1,000,000 entries.

Or am I all wet?



Have you considered using Google's map-reduce framework for things like 
that? Union and group functions look like  ideal candidates for such a 
thing.  I am not sure whether map-reduce can be married to a relational 
database, but I must say that I was impressed with the speed of MongoDB. 
I am not suggesting that PostgreSQL should sacrifice its ACID compliance 
for speed, but Mongo sure does look like a speeding bullet.
On the other hand, the algorithms that have been paralleled for a long 
time are precisely sort/merge and hash algorithms used for union and 
group by functions. This is what I have in mind:

http://labs.google.com/papers/mapreduce.html

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] The good, old times

2011-01-12 Thread Mladen Gogala

I am running a postgres update on one of my machines:

Downloading Packages:
(1/7): postgresql90-plpython-9.0.2-2PGDG.rhel5.x86_64.rp |  50 kB 
00:02 
(2/7): postgresql90-plperl-9.0.2-2PGDG.rhel5.x86_64.rpm  |  51 kB 
00:03 
(3/7): postgresql90-libs-9.0.2-2PGDG.rhel5.x86_64.rpm| 217 kB 
00:14 
(4/7): postgresql90-contrib-9.0.2-2PGDG.rhel5.x86_64.rpm | 451 kB 
00:40 
(5/7): postgresql90-9.0.2-2PGDG.rhel5.x86_64.rpm | 1.4 MB 
01:57 
(6/7): postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64.rpm   | 1.6 MB 
02:48 
(7/7): postgresql90-se (68%) 44% [=   ] 7.0 kB/s | 2.2 MB 
06:33 ETA


7 kilobytes per second???  That brings back the times of the good, old 
9600 USR modems and floppy disks.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT .. WHERE NOT IN query running for hours

2011-01-10 Thread Mladen Gogala

Γιωργος Βαλκανας wrote:


Are there any particular semantics for the NOT IN statement that 
cause the correlated query to execute for every row of the outter 
query, as opposed to the NOT EXISTS ? Or are there any other 
practical reasons, related to IN / NOT IN, for this to be happening? 
Or is it simply due to implementation details of each RDBMS? I guess 
the former (or the 2nd one), since, as you say, this is common in most 
databases, but I would most appreciate an answer to clarify this.


Thanks again!

Best regards,
George





Well, I really hoped that Bruce, Robert or Greg would take on this one, 
but since there are no more qualified takers, I'll take a shot at this 
one. For the NOT IN (result of a correlated sub-query), the sub-query 
needs to be executed for every row matching the conditions on the 
driving table, while the   !EXISTS is just a complement of join. It's 
all in the basic set theory which serves as a model for the relational 
databases.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT .. WHERE NOT IN query running for hours

2011-01-06 Thread Mladen Gogala

On 1/6/2011 9:36 PM, Γιωργος Βαλκανας wrote:


1) Why is it taking *so* long for the first query (with the NOT IN ) 
to do even the simple select?
Because NOT IN has to execute the correlated subquery for every row and 
then check whether the requested value is in the result set, usually by 
doing sequential comparison. The NOT EXIST plan is also bad because 
there is no index but at least it can use very fast and efficient hash 
algorithm. Indexing the hwdocid column on the Document table or, 
ideally, making it a primary key, should provide an additional boost to 
your query. If you already do have an index, you may consider using 
enable_seqscan=false for this session, so that the hwdocid index will 
be used. It's a common wisdom that in the most cases NOT EXISTS will 
beat NOT IN. That is so all over the database world. I've seen that in 
Oracle applications, MS SQL applications and, of course MySQL 
applications. Optimizing queries is far from trivial.


Μλαδεν Γογαλα

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound

2011-01-03 Thread Mladen Gogala

Jim Nasby wrote:

On Dec 20, 2010, at 12:47 AM, Mladen Gogala wrote:
  

Good time accounting is the most compelling reason for having a wait event 
interface, like Oracle. Without the wait event interface, one cannot really 
tell where the time is spent, at least not without profiling the database code, 
which is not an option for a production database.



Out of curiosity, have you tried using the information that Postgres exposes to 
dtrace? I suspect it comes close to what you can get directly out of Oracle...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



  
No, I haven't but I looked it in the documentation. I surmise, however, 
that I will have to build my software with --enable-dtrace, which is 
not enabled by default. This certainly looks promising.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] long wait times in ProcessCatchupEvent()

2010-12-29 Thread Mladen Gogala

On 12/29/2010 2:58 PM, Tom Lane wrote:

It happened for you on 8.4 too?  In that previous thread you were still
on 8.3.  If you did see it on 8.4 then it wasn't sinval ...

regards, tom lane

May I ask what exactly is sinval? I took a look at Craig's problem and 
your description but I wasn't able to figure out what is sinval lock and 
what does it lock? I apologize if the question is stupid.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] concurrent IO in postgres?

2010-12-25 Thread Mladen Gogala

Jeff Janes wrote:

If the background writer cannot keep up, then the individual backends
start doing writes as well, so it isn't really serialized..

  
Is there any parameter governing that behavior? Can you tell me where in 
the code (version 9.0.2) can I find that? Thanks.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound

2010-12-21 Thread Mladen Gogala

On 12/20/2010 10:33 AM, James Cloos wrote:


And how exactly, given that the kernel does not know whether the CPU is
active or waiting on ram, could an application do so?

-JimC

That particular aspect will remain hidden, it's a domain of the hardware
architecture. Nevertheless, there are things like waiting on I/O or
waiting on lock, that would be extremely useful.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance of PostgreSQL over NFS

2010-12-21 Thread Mladen Gogala
I was asked about performance of PostgreSQL on NetApp, the protocol 
should be NFSv3.  Has anybody tried it? The database in question is a DW 
type, a bunch of documents indexed by Sphinx. Does anyone have any 
information?

--


Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of PostgreSQL over NFS

2010-12-21 Thread Mladen Gogala

Rich wrote:
I am wondering why anyone would do that?  Too much overhead and no 
reliable enough.


Apparently, NetApp  thinks that  it is reliable. They're selling that 
stuff  for years.  I know that Oracle works with NetApp, they even have 
their own user mode NFS client driver, I am not sure about PostgreSQL. 
Did anybody try that?


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound

2010-12-19 Thread Mladen Gogala

On 12/19/2010 7:57 PM, James Cloos wrote:

RA == Royce Ausburnro...@inomial.com  writes:

RA  I notice that when restoring a DB on a laptop with an SDD,
RA  typically postgres is maxing out a CPU - even during a COPY.

The time the CPUs spend waiting on system RAM shows up as CPU
time, not as Wait time.  It could be just that the SSD is fast
enough that the RAM is now the bottleneck, although parsing
and text=binary conversions (especially for integers, reals
and anything stored as an integer) also can be CPU-intensive.

-JimC


Good time accounting is the most compelling reason for having a wait 
event interface, like Oracle. Without the wait event interface, one 
cannot really tell where the time is spent, at least not without 
profiling the database code, which is not an option for a production 
database.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat - how to tell?

2010-12-17 Thread Mladen Gogala

Robert Haas wrote:


This doesn't really match my definition of the word fragmentation, though...

  
Same here. However, I did run reindex on one table and this indicator 
did drop to 0. I will shoot an email to the author, he's probably 
smarter than me and will be able to provide a reasonable explanation.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat - how to tell?

2010-12-16 Thread Mladen Gogala

Dave Crooke wrote:
There is a plugin called pgstattuple which can be quite informative 
 however, it actually does a full scan of the table / index files, 
which may be a bit invasive depending on your environment and load.


http://www.postgresql.org/docs/current/static/pgstattuple.html

It's in the contrib (at least for 8.4), and so you have to import its 
functions into your schema using the script in the contrib directory.


Cheers
Dave

I tried it with one of my databases:


testtrack=# select * from pgstatindex('public.defects_pkey');
version | tree_level | index_size | root_block_no | internal_pages | 
leaf_pages | empty_pages | deleted_pages | avg_leaf_density | 
leaf_fragmentation


-+++---+++-+---+--+---
-
  2 |  1 | 827392 | 3 |  0 
|100 |   0 | 0 |70.12 
| 22

(1 row)


What is leaf_fragmentation? How is it defined? I wasn't able to find 
out any definition of that number. How is it calculated. I verified that 
running reindex makes it 0:



testtrack=# reindex table public.defects;
REINDEX
testtrack=# select * from pgstatindex('public.defects_pkey');
version | tree_level | index_size | root_block_no | internal_pages | 
leaf_pages | empty_pages | deleted_pages | avg_leaf_density | 
leaf_fragmentation


-+++---+++-+---+--+---
-
  2 |  1 | 647168 | 3 |  0 
| 78 |   0 | 0 |89.67 
|  0

(1 row)


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat - how to tell?

2010-12-14 Thread Mladen Gogala

Can you explain this query a bit? It isn't at all clear to me.


Plugge, Joe R. wrote:

I have used this in the past ... run this against the database that you want to 
inspect.


SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint, 
relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS 
tbloat,
  CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS 
wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric 
END,1) AS ibloat,
  CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma 
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, 
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- 
very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma 
END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma 
END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename 
= s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 
END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND 
nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange
Sent: Tuesday, December 14, 2010 8:48 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Index Bloat - how to tell?

How can you tell when your indexes are starting to get bloated and when you 
need to rebuild them.  I haven't seen a quick way to tell and not sure if it's 
being tracked.



___

| John W. Strange | Investment Bank | Global Commodities Technology 


| J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | 
F: 713 236-

| john.w.stra...@jpmchase.com | jpmorgan.com



This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
and affiliates.



This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.



Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

  



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql

Re: [PERFORM] Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

2010-12-04 Thread Mladen Gogala

Tom Lane wrote:

Hmm.  Rules?  Triggers?  You seem to be assuming the problem is at the
planner stage but I'm not sure you've proven that.

regards, tom lane

  
Hmmm, I vaguely recollect a similar thread, started by me, although with 
fewer partitions. In my experience, planner doesn't do a very good job 
with partitions, especially with things like min or max which should 
not be resolved by a full table scan, if there are indexes on partitions.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

2010-12-04 Thread Mladen Gogala

Tom Lane wrote:

Partitioning using these techniques will work well with up to
perhaps a hundred partitions; don't try to use many thousands of
	partitions. 


regards, tom lane
  
Hmmm, what happens if I need 10 years of data, in monthly partitions? It 
would be 120 partitions. Can you please elaborate on that limitation? 
Any plans on lifting that restriction?


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala




Mario Splivalo wrote:

  
Yes, as Mladen Gogala had advised. No noticable change in performance -
it's still slow :)
  


Declaring constraints as deferrable  doesn't do anything as such, you
have to actually set the constraints deferred to have an effect. You
have to do it within a transaction block. If done outside of the
transaction block, there is no effect:

This is what happens when "set constraints" is issued outside the
transaction block:

 constraint test1_pk primary key(col1)
deferrable);    
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test1_pk" for table "test1"
CREATE TABLE
Time: 41.218 ms
scott=# set constraints all deferred;   
SET CONSTRAINTS
Time: 0.228 ms
scott=# begin;  
BEGIN
Time: 0.188 ms
scott=#  insert into test1 values(1);   
INSERT 0 1
Time: 0.929 ms
scott=#  insert into test1 values(1);   
ERROR:  duplicate key value violates unique constraint "test1_pk"
DETAIL:  Key (col1)=(1) already exists.
scott=# end;
ROLLBACK
Time: 0.267 ms
scott=# 
  

It works like a charm when issued within the transaction block:
scott=# begin;  
BEGIN
Time: 0.202 ms
scott=# set constraints all deferred;   
SET CONSTRAINTS
Time: 0.196 ms
scott=#  insert into test1 values(1);   
INSERT 0 1
Time: 0.334 ms
scott=#  insert into test1 values(1);   
INSERT 0 1
Time: 0.327 ms
scott=# end;
ERROR:  duplicate key value violates unique constraint "test1_pk"
DETAIL:  Key (col1)=(1) already exists.
scott=# 

I was able to insert the same value twice, it only failed at the end of
the transaction.

  
But, just for the sake of clarification - I tought that DEFERRABLE would
matter if I do a lot of INSERTs, inside a FOR loop or something like
that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference?
  

You cannot tell which part takes a long time, select or insert, without
profiling. I certainly cannot do it over the internet.

-- 
 
Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions







[PERFORM] Clarification, please

2010-12-01 Thread Mladen Gogala
In Oracle, deferrable primary keys are enforced by non-unique indexes. 
That seems logical, because index should tolerate duplicate values for 
the duration of transaction:


   Connected to:
   Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
   With the Partitioning, OLAP, Data Mining and Real Application
   Testing options

   SQL create table test1
 2  (col1 integer,
 3   constraint test1_pk primary key(col1) deferrable);

   Table created.

   Elapsed: 00:00:00.35
   SQL select uniqueness from user_indexes where index_name='TEST1_PK';

   UNIQUENES
   -
   NONUNIQUE

PostgreSQL 9.0, however, creates a unique index:

   scott=# create table test1
   scott-# (col1 integer,
   scott(#  constraint test1_pk primary key(col1) deferrable);
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
   test1_pk for table test1
   CREATE TABLE
   Time: 67.263 ms
   scott=# select indexdef from pg_indexes where indexname='test1_pk';
indexdef
   --

CREATE UNIQUE INDEX test1_pk ON test1 USING btree (col1)
   (1 row)

When the constraint is deferred in the transaction block, however, it 
tolerates duplicate values until the end of transaction:


   scott=# begin;
   BEGIN

   Time: 0.201 ms
   scott=# set constraints test1_pk  deferred;
   SET CONSTRAINTS
   Time: 0.651 ms
   scott=# insert into test1 values(1);
   INSERT 0 1
   Time: 1.223 ms
   scott=# insert into test1 values(1);
   INSERT 0 1
   Time: 0.390 ms
   scott=# rollback;
   ROLLBACK
   Time: 0.254 ms
   scott=#


No errors here. How is it possible to insert the same value twice into a 
UNIQUE index? What's going on here?



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Clarification, please

2010-12-01 Thread Mladen Gogala

Richard Broersma wrote:


It looks like the check isn't preformed until COMMIT.

  
So, the index is not actually updated until commit? H, that seems 
unlikely.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala

Mario Splivalo wrote:
I'll try what Pierre suggested, on whole new filesystem. This one did 
get quite filled with thousands of files that I deleted while the 
database was working.


Mario
  


Yes, that is a good idea. That's the reason why we need a 
defragmentation tool on Linux. Unfortunately, the only file system that 
currently has a decent defragmentation tool is XFS and that is a paid 
option, at least with Red Hat. Greg Smith has recently posted a 
wonderful review of PostgreSQL on various file systems:


http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html

There is a operating system which comes with a very decent extent based 
file system and a defragmentation tool, included in the OS. The file 
system is called NTFS and company is in the land of Redmond, WA where 
the shadows lie. One OS to rule them all...


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala

Kenneth Marshall wrote:

Redhat6 comes with ext4 which is an extent based filesystem with
decent performance.

Ken
  
But e4defrag is still not available. And, of course, Red Hat 6 is still 
not available, either.  Maybe Red Hat 7 will do the trick? I assume it 
will work beautifully with PostgreSQL 15.0.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala

Kevin Grittner wrote:

Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
  
 
Been there, done that.  Not only was performance quite poor compared

to Linux, but reliability and staff time to manage things suffered
in comparison to Linux.
 
  
I must say that I am quite impressed with Windows 7 servers, especially 
64 bit version. Unfortunately, I don't have any PostgreSQL instances on 
those, but Exchange works very, very well. Also, personal impressions 
from clicking and running office applications are quite good. Don't get 
me wrong, I am an old Unix/Linux hack and I would like nothing better 
but to see Linux succeed, but I don't like

what I see.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Mladen Gogala
 5k new drones per CSV).


INSERT INTO public.drones (drone_id, drone_log_notice, coordinates, 
drone_temperature, drone_temperature)
SELECT drone_id, log_notice, point(coord_x, coord_y) as coordinates, 
temp, press FROM tmpNew;
INSERT INTO public.drones_history (sample_id, drone_id, 
drone_log_notice, drone_temperature, drone_pressure)

SELECT a_sample_id, drone_id, log_notice, temp, pressue FROM tmpNew;

This is also done in miliseconds.

Now, I 'update' data for the existing drones, and fill in the history 
table on those drones. First I create temporary table with just the 
changed rows:


CREATE TEMPORARY TABLE tmpUpdate ON COMMIT DROP AS
SELECT a_batch_id, t.drone_id, t.log_notice, t.temp, t.press
FROM temporary.drones t
JOIN public.drones p
ON t.drone_id = p.drone_id
WHERE p.drone_log_notice != t.log_notice OR p.temp != t.temp OR p.press 
!= t.press;


Now, that part is also fast. I usualy have around 100-1000 drones that 
changed 'state', but sometimes I get even half of the drones change 
states (around 50k) and creation of the tmpUpdate takes no more than ten 
to twenty milliseconds.


This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes 
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start lag 
because I get new CSV every 10 minutes or so.


And the last part is to upadte the actual drones table:
UPDATE public.drones p
SET drone_log_notice = t.log_notice, drone_temperature = t.temp, 
drone_pressure = t.press

FROM temporary.drones t
WHERE t.drone_id = p.drone_id
AND (t.log_notice != p.drone_log_notice OR t.temp != p.drone_temperature 
OR p.press != t.drone_pressure);


This is also very fast, even when almost half the table is updated the 
UPDATE takes around 10 seconds. Usualy it's around 30-50 ms.


The machine I'm doing this has 4 GB of RAM, dual-Xeon something (3GHz). 
Two SAS drives in mirror, capable of around 100 MB/s in sequential r/w 
(i know it means nothing, but just to get an idea).


Database is around 2 GB is size (pg_database_size). When I dump/recreate 
the database I can speedup things a bit, but after half day of 
operations the INSERTs are slow again.
When I do dump/restore of the database I get around 40/50 MB/sec 
reding/writing from the disk (COPYing data, PK/FK constraints creation), 
but when that INSERT gets stuck io-wait goes to skies - iostat shows 
that Postgres is mainly reading from the disks, around 800k/sec - 1024k/sec.


I've set shared_buffers to 256M, work_mem to 96M, wal_buffers to 16M and 
checkpoint_segments to 16. I've turned off autovaccum, I do 
analyze/vacuum after each insert-job is done, after TRUNCATEing 
temporary.drones table.


Out of despair I tried to set fsync=off, but that gave me just a small 
performance improvement.


When I remove foreign constraints (drones_history_fk__samples and 
drones_history_fk__drones) (I leave the primary key on drones_history) 
than that INSERT, even for 50k rows, takes no more than a second.


So, my question is - is there anything I can do to make INSERTS with PK 
faster? Or, since all the reference checking is done inside the 
procedure for loading data, shall I abandon those constraints entirely?


Mario


  



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Mladen Gogala
I thought that I've seen an announcement about the SQL Server for Linux on 
04/01/2005? I cannot find the link right now, but I am quite certain that there 
was such an announcement.


From: pgsql-performance-ow...@postgresql.org 
pgsql-performance-ow...@postgresql.org
To: Tomas Vondra t...@fuzzy.cz
Cc: pgsql-performance@postgresql.org pgsql-performance@postgresql.org
Sent: Wed Nov 17 15:51:55 2010
Subject: Re: [PERFORM] Query Performance SQL Server vs. Postgresql

I have to concur.  Sql is written specifially and only for Windows. It is 
optimized for windows.  Postgreal is writeen for just about everything trying 
to use common code so there isn't much optimization because it has to be 
optimized based on the OS that is running it.  Check out your config and send 
it to us.  That would include the OS and hardware configs for both machines.

On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra 
t...@fuzzy.czmailto:t...@fuzzy.cz wrote:
Dne 17.11.2010 05:47, Pavel Stehule napsal(a):
 2010/11/17 Humair Mohammed huma...@hotmail.commailto:huma...@hotmail.com:

 There are no indexes on the tables either in SQL Server or Postgresql - I am
 comparing apples to apples here. I ran ANALYZE on the postgresql tables,

Actually no, you're not comparing apples to apples. You've provided so
little information that you may be comparing apples to cucumbers or
maybe some strange animals.

1) info about the install

What OS is this running on? I guess it's Windows in both cases, right?

How nuch memory is there? What is the size of shared_buffers? The
default PostgreSQL settings is very very very limited, you have to bump
it to a much larger value.

What are the other inportant settings (e.g. the work_mem)?

2) info about the dataset

How large are the tables? I don't mean number of rows, I mean number of
blocks / occupied disk space. Run this query

SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid))
FROM pg_class WHERE relname IN ('table1', 'table2');

3) info about the plan

Please, provide EXPLAIN ANALYZE output, maybe with info about buffers,
e.g. something like

EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ...

4) no indexes ?

Why have you decided not to use any indexes? If you want a decent
performance, you will have to use indexes. Obviously there is some
overhead associated with them, but it's premature optimization unless
you prove the opposite.

BTW I'm not a MSSQL expert, but it seems like it's building a bitmap
index on the fly, to synchronize parallelized query - PostgreSQL does
not support that.

regards
Tomas

--
Sent via pgsql-performance mailing list 
(pgsql-performance@postgresql.orgmailto:pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Mladen Gogala

Josh Berkus wrote:

On 11/16/10 12:39 PM, Greg Smith wrote:
  

I want to next go through and replicate some of the actual database
level tests before giving a full opinion on whether this data proves
it's worth changing the wal_sync_method detection.  So far I'm torn
between whether that's the right approach, or if we should just increase
the default value for wal_buffers to something more reasonable.



We'd love to, but wal_buffers uses sysV shmem.

  

Speaking of the SYSV SHMEM, is it possible to use huge pages?

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-14 Thread Mladen Gogala

Marti Raudsepp wrote:



Another advantage of Oracle's approach seems that they need much less
tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger
drawback in Postgres than table fragmentation.

Regards,
Marti

  
Oracle, however, does have a problem with ORA-1555 Snapshot too old, 
precisely because of their implementation of MVCC. In other words, if 
your query is running long and Oracle is not able to reconstruct the old 
rows from the UNDO segments, you're out of luck and your query will die. 
The greatest burden of the Postgres implementation is the fact that 
there is no row id, so that the table header and the indexes need to be 
updated much more frequently than is the case with Oracle.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC performance issue

2010-11-13 Thread Mladen Gogala

Craig Ringer wrote:

It sounds like you're describing Oracle-style MVCC, using redo logs.

http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/

  
Craig, this is an interesting blog page, making some valid points about 
the multiversioning vs. locking. The ATM example, however, is 
unrealistic and couldn't have happened the way the author describes. 
Oracle has the same write consistency mechanism as Postgres and it 
restarts the transaction if the transaction blocks were updated while 
the transaction was waiting. In other words, the wife's transaction 
would have been restarted before committing, the transaction would get 
the balance accurately and there wouldn't be a loss of $250.
Such an example is naive, sheer FUD. If that was the case, no bank in 
the whole wide world would be using Oracle, and many of them do, I dare 
say many more are using Oracle than Sybase. That means that they're not 
losing money if 2 spouses decide to withdraw money from the joint 
account simultaneously. Given the number of people in the world, I 
imagine that to be a rather common and ordinary situation for the banks. 
The example is plain silly. Here is what I have in mind as write 
consistency:

http://www.postgresql.org/docs/9.0/static/transaction-iso.html#XACT-READ-COMMITTED:
 If the first updater rolls back, then its effects are negated and the 
second updater can proceed with updating the originally found row. If 
the first updater commits, the second updater will ignore the row if the 
first updater deleted it, otherwise it will attempt to apply its 
operation to the updated version of the row. The search condition of the 
command (the WHERE clause) is re-evaluated to see if the updated version 
of the row still matches the search condition.


Essentially the same behavior is described here, for Oracle:
http://tkyte.blogspot.com/2005/08/something-different-part-i-of-iii.html
Obviously, we cannot modify an old version of a block—when we go to 
modify a row, we must modify the current version of that block. 
Additionally, Oracle cannot just simply skip this row, as that would be 
an inconsistent read and unpredictable. What we’ll discover is that in 
such cases, Oracle will restart the write modification from scratch.


Postgres re-evaluates the where condition, Oracle restarts the entire 
transaction, but neither MVCC mechanism would allow for the silly ATM 
example described in the blog. Both databases would have noticed change 
in the balance, both databases would have ended with the proper balance 
in the account.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Mladen Gogala

Kenneth Marshall wrote:

I agree with the goal of avoiding the need for a GUC. This needs to
be as automatic as possible. One idea I had had was computing a value
for the amount of cache data in the system by keeping a sum or a
weighted sum of the table usage in the system. Smaller tables and
indexes would contribute a smaller amount to the total, while larger
indexes and tables would contribute a larger amount. Then by comparing
this running total to the effective_cache_size, set the random and
sequential costs for a query. This would allow the case of many 4MB
tables to favor disk I/O more than memory I/O. The weighting could
be a function of simultaneous users of the table. I know this is a
bit of hand-waving but some sort of dynamic feedback needs to be
provided to the planning process as system use increases.

Regards,
Ken

  
Kenneth, you seem to be only concerned with the accuracy of the planning 
process, not with the plan stability. As a DBA who has to monitor real 
world applications, I find things like an execution plan changing with 
the use of the system to be my worst nightmare. The part where you say 
that this needs to be as automatic as possible probably means that I 
will not be able to do anything about it, if the optimizer, by any 
chance, doesn't get it right. That looks to me like an entirely wrong 
way to go.
When application developer tunes the SQL both him and me expect that SQL 
to always perform that way, not to change the execution plan because the 
system is utilized more than it was 1 hour ago. Nobody seems to have 
taken my suggestion about having a parameter
which would simply invent the percentage out of thin air seriously, 
because it's obviously not accurate.
However, the planner accuracy is not the only concern. Running 
applications on the system usually requires plan stability. Means of
external control of the execution plan, DBA knobs and buttons that can 
be turned and pushed to produce the desired plan are also very much desired.


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Mladen Gogala

Kevin Grittner wrote:

Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
  

create a definitive bias toward one type of the execution plan.

 
We're talking about trying to support the exact opposite.  
I understand this, that is precisely the reason for my intervention into 
the discussion of experts, which I am not.

This all
started because a database which was tuned for good response time
for relatively small queries against a hot portion of some tables
chose a bad plan for a weekend maintenance run against the full
tables.  We're talking about the possibility of adapting the cost
factors based on table sizes as compared to available cache, to more
accurately model the impact of needing to do actual disk I/O for
such queries.
  
Kevin, in my experience, the hardest thing to do is to tune so called 
mixed type databases. In practice, databases are usually separated: OLTP 
database on one group of servers, reporting database and the data 
warehouse on another group of servers. Postgres 9.0 has made a great 
stride toward such possibility with the new replication facilities. 
Again, having an optimizer which will choose the plan completely 
accurately is, at least in my opinion, less important than having a 
possibility of manual control, the aforementioned knobs and buttons 
and produce the same plan for the same statement. Trying to make the 
optimizer smart enough for all types of loads is akin to looking for the 
Holy Grail. Inevitably, you will face some hard questions, like the one 
about the airspeed velocity of an unladen swallow, and the whole search 
is likely to end in pretty funny way, not producing the desired 
optimizing genie in the CPU.
 
This also is very different from trying to adapt queries to what

happens to be currently in cache.  As already discussed on a recent
thread, the instability in plans and the failure to get to an
effective cache set make that a bad idea.  The idea discussed here
would maintain a stable plan for a given query, it would just help
choose a good plan based on the likely level of caching.
  
Kevin, I am talking from the perspective of a DBA who is involved with a 
production databases on day-to-day basis. I am no expert but I do 
believe to speak from a perspective of users that Postgres has to win in 
order to make further inroads into the corporate server rooms. Without 
the possibility of such control and the plan stability, it is hard for 
me to recommend more extensive use of PostgreSQL to my boss. Whatever 
solution is chosen, it needs to have knobs and buttons and produce the 
plans that will not change when the CPU usage goes up.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Mladen Gogala

Tom Lane wrote:
More knobs and buttons is the Oracle way, 


True. Very true.


and the end result of that
process is that you have something as hard to use as Oracle.  


Also, you end up with something which is extremely reliable and 
adjustable to variety of conditions.



That's
generally not thought of as desirable in this community.

regards, tom lane
  
Allow me to play the devil's advocate again. This community is still 
much, much smaller than even the MySQL community, much less Oracle's 
community. If growth of the community is the goal, copying a page or two 
from the Oracle's book, looks like a good idea to me. The only thing I 
dislike about Oracle is its price, not its complexity.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-10 Thread Mladen Gogala

On 11/10/2010 5:43 PM, Kevin Grittner wrote:

The only half-sane answer I've thought of is to apply a different
cost to full-table or full-index scans based on the ratio with
effective cache size.


The effective_cache_size is, in my humble opinion, a wrong method.  It 
would be much easier to have a parameter, let's call it 
optimizer_index_caching, which would give the assumption of the 
percentage of an index that is cached. In other words, if 
optimizer_index_caching was set to 80, the optimizer would assume that 
80% of any index is cached and would apply different cost estimate. It's 
not exact but it's simple and modifiable. It would also be a great tool 
in the hands of the DBA which has to manage OLTP database or DW database 
and would be able to create a definitive bias toward one type of the 
execution plan.
I have to confess that the idea about such parameter is not entirely 
mine:*http://tinyurl.com/33gu4f6*


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com



[PERFORM] Array interface

2010-11-08 Thread Mladen Gogala
I wrote a little Perl script, intended to test the difference that array 
insert makes with PostgreSQL. Imagine my surprise when a single  record 
insert into a local database was faster than batches of 100 records. 
Here are the two respective routines:


sub do_ssql
{
   my $exec_cnt = 0;
   while (FL)
   {
   chomp;
   my @row = split /$sep/;
   $sth-execute(@row);
   $exec_cnt++;
   }
   $dbh-commit();
   print Insert executed $exec_cnt times.\n;
}

sub do_msql
{
   my $bsz = shift;
   die(Batch size must be 0!\n) unless $bsz  0;
   my $exec_cnt = 0;
   my @tstat;
   my (@col1, @col2, @col3);
   while (FL)
   {
   chomp;
   my @row = split /$sep/;
   push @col1, $row[0];
   push @col2, $row[1];
   push @col3, $row[2];
   if ($. % $bsz == 0)
   {
   my $tuples = $sth-execute_array({ArrayTupleStatus = \...@tstat},
\...@col1, \...@col2, \...@col3);
   die(Multiple insert failed!\n) if (!$tuples);
   @col1 = ();
   @col2 = ();
   @col3 = ();
   $exec_cnt++;
   }

   }
   if ($#col1 = 0)
   {
   my $tuples = $sth-execute_array({ArrayTupleStatus = \...@tstat},
\...@col1, \...@col2, \...@col3);
   die(Multiple insert failed!\n) if (!$tuples);
   $exec_cnt++;
   }
   $dbh-commit();
   print Insert executed $exec_cnt times.\n;
}


The variable $sth is a prepared  statement handle.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Array interface

2010-11-08 Thread Mladen Gogala
I wrote a little Perl script, intended to test the difference that array 
insert makes with PostgreSQL. Imagine my surprise when a single  record 
insert into a local database was faster than batches of 100 records. 
Here are the two respective routines:


sub do_ssql
{
   my $exec_cnt = 0;
   while (FL)
   {
   chomp;
   my @row = split /$sep/;
   $sth-execute(@row);
   $exec_cnt++;
   }
   $dbh-commit();
   print Insert executed $exec_cnt times.\n;
}

sub do_msql
{
   my $bsz = shift;
   die(Batch size must be 0!\n) unless $bsz  0;
   my $exec_cnt = 0;
   my @tstat;
   my (@col1, @col2, @col3);
   while (FL)
   {
   chomp;
   my @row = split /$sep/;
   push @col1, $row[0];
   push @col2, $row[1];
   push @col3, $row[2];
   if ($. % $bsz == 0)
   {
   my $tuples = $sth-execute_array({ArrayTupleStatus = \...@tstat},
\...@col1, \...@col2, \...@col3);
   die(Multiple insert failed!\n) if (!$tuples);
   @col1 = ();
   @col2 = ();
   @col3 = ();
   $exec_cnt++;
   }

   }
   if ($#col1 = 0)
   {
   my $tuples = $sth-execute_array({ArrayTupleStatus = \...@tstat},
\...@col1, \...@col2, \...@col3);
   die(Multiple insert failed!\n) if (!$tuples);
   $exec_cnt++;
   }
   $dbh-commit();
   print Insert executed $exec_cnt times.\n;
}


The variable $sth is a prepared  statement handle for the insert statement.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences

2010-11-05 Thread Mladen Gogala

Devrim GÜNDÜZ wrote:

On Fri, 2010-11-05 at 11:59 +0100, A B wrote:
  

If you just wanted PostgreSQL to go as fast as possible WITHOUT any
care for your data (you accept 100% dataloss and datacorruption if any
error should occur), what settings should you use then? 



You can initdb to ramdisk, if you have enough RAM. It will fast, really.

  
That is approximately the same thing as the answer to the question 
whether Ford Taurus can reach 200mph.

It can, just once,  if you run it down the cliff.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Array interface

2010-11-03 Thread Mladen Gogala

Conor Walsh wrote:


I generally suspect this is a Perl problem rather than a Postgres
problem, 


So do I. I had the same situation with Oracle, until John Scoles had the 
DBD::Oracle driver fixed and started utilizing the Oracle array interface.



but can't say more without code.  Maybe try pastebin if
you're having email censorship issues.

-Conor

  

I posted it to comp.databases.postgresql.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Test

2010-11-02 Thread Mladen Gogala

Can you hear me now?

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Query- Simple taking

2010-10-28 Thread Mladen Gogala

On 10/28/2010 10:42 AM, Robert Haas wrote:

I can believe that MySQL is faster, because they probably don't need
to do the bitmap heap scan.  There is a much-anticipated feature
called index-only scans that we don't have yet in PG, which would help
cases like this a great deal.

Yyesss! Any time frame on that? Can you make it into 9.0.2?

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Query- Simple taking

2010-10-28 Thread Mladen Gogala

On 10/28/2010 10:53 AM, Richard Broersma wrote:

On Thu, Oct 28, 2010 at 7:51 AM, Mladen Gogala
mladen.gog...@vmsinfo.com  wrote:


Yyesss! Any time frame on that? Can you make it into 9.0.2?

Maybe 9.1.0 or 9.2.0 :)  9.0's features are already frozen.


Well, with all this global warming around us, index scans may still thaw 
in time to make it into 9.0.2


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] temporary tables, indexes, and query plans

2010-10-27 Thread Mladen Gogala

On 10/27/2010 1:29 PM, Jon Nelson wrote:

I have an app which imports a lot of data into a temporary table, does
a number of updates, creates some indexes, and then does a bunch more
updates and deletes, and then eventually inserts some of the columns
from the transformed table into a permanent table.

Things were not progressing in a performant manner - specifically,
after creating an index on a column (INTEGER) that is unique, I
expected statements like this to use an index scan:

update foo set colA = 'some value' where indexed_colB = 'some other value'

but according to the auto_explain module (yay!) the query plan
(always) results in a sequential scan, despite only 1 row getting the
update.

In summary, the order goes like this:

BEGIN;
CREATE TEMPORARY TABLE foo ...;
copy into foo 
UPDATE foo  -- 4 or 5 times, updating perhaps 1/3 of the table all told
CREATE INDEX ... -- twice - one index each for two columns
ANALYZE foo;  -- didn't seem to help
UPDATE foo SET ... WHERE indexed_column_B = 'some value'; -- seq scan?
Out of 10 million rows only one is updated!
...

What might be going on here?

How big is your default statistics target? The default is rather small, 
it doesn't produce very good or usable histograms.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

2010-10-27 Thread Mladen Gogala

On 10/27/2010 1:48 PM, Scott Carey wrote:


It is almost always significantly faster than a direct bulk load into a table.
* The temp table has no indexes, the final table usually does, bulk operations 
on indexes are faster than per row operations.
* The final table might require both updates and inserts, doing these in bulk 
from a temp stage table is far faster than per row.
* You don't even have to commit after the merge from the temp table, and can 
loop until its all done, then commit -- though this can have table/index bloat 
implications if doing updates.


Scott, I find this very hard to believe. If you are inserting into a 
temporary table and then into the target table, you will do 2 inserts 
instead of just one. What you are telling me is that it is faster for me 
to drive from NYC to Washington DC by driving first to Miami and then 
from Miami to DC.



2) This is what I had in mind:

mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;
  count
---
   1000
(1 row)

mgogala=# select count(*) from t1;
  count
---
  0
(1 row)

The table is created with on commit obliterate rows option which means
that there is no need to do truncate. The truncate command is a
heavy artillery. Truncating a temporary table is like shooting ducks in
a duck pond, with a howitzer.

???  Test it.  DELETE is slow, truncate is nearly instantaneous for normal 
tables.  For temp tables its the same thing.  Maybe in Oracle TRUNCATE is a 
howitzer, in Postgres its lightweight.


Truncate has specific list of tasks to do:
1)  lock the table in the exclusive mode to prevent concurrent 
transactions on the table.

2)  Release the file space and update the table headers.
3)  Flush any buffers possibly residing in shared memory.
4)  Repeat the procedures on the indexes.

Of course, in case of the normal table, all of these changes are logged, 
possibly producing WAL archives. That is still much faster than delete 
which depends on the number of rows that need to be deleted, but not 
exactly lightweight, either. In Postgres, truncate recognizes that the 
table is a temporary table so it makes a few shortcuts, which makes the 
truncate faster.


1) No need to flush buffers.
2) Locking requirements are much less stringent.
3) No WAL archives are produced.

Temporary tables are completely different beasts in Oracle and Postgres. 
Yes, you are right, truncate of a temporary table is a big no-no in the 
Oracle world, especially in the RAC environment. However, I do find ON 
COMMIT DELETE ROWS trick to be more elegant than the truncate. Here is 
the classic Tom Kyte, on the topic of truncating the temporary tables: 
*http://tinyurl.com/29kph3p


*NO. truncate is DDL. DDL is expensive. Truncation is something that 
should be done very infrequently.
 Now, I don't mean turn your truncates into DELETE's -- that would  
be even worse. I mean -- avoid having
 to truncate or delete every row in the first  place. Use a transaction 
based temporary table and upon commit, it'll empty itself.



Your loop above requires a commit after every 1000 rows.  What if you require 
that all rows are seen at once or not at all?  What if you fail part way 
through?  One big transaction is often a better idea and/or required.  
Especially in postgres, with no undo-log, bulk inserts in one large transaction 
work out very well -- usually better than multiple smaller transactions.


I don't contest that. I also prefer to do things in one big transaction, 
if possible.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions





Re: [PERFORM] which one is faster

2010-10-26 Thread Mladen Gogala

On 10/26/2010 6:56 AM, AI Rumman wrote:

Which one is faster?
select count(*) from talble
or
select count(id) from table
where id is the primary key.
PostgreSQL doesn't utilize the access methods known as FULL INDEX SCAN 
and FAST FULL INDEX SCAN, so the optimizer will generate the 
sequential scan in both cases. In other words, PostgreSQL will read the 
entire table when counting, no matter what.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

2010-10-26 Thread Mladen Gogala

On 10/26/2010 11:41 AM, Merlin Moncure wrote:

yup, that's exactly what I mean -- this will give you more uniform
insert performance (your temp table doesn't even need indexes).  Every
N records (say 1) you send to permanent and truncate the temp
table.  Obviously, this is more fragile approach so weigh the
pros/cons carefully.

merlin


Truncate temporary table? What a horrible advice! All that you need is 
the temporary table to delete rows on commit.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

2010-10-26 Thread Mladen Gogala

On 10/26/2010 5:27 PM, Jon Nelson wrote:

start loop:
   populate rows in temporary table
   insert from temporary table into permanent table
   truncate temporary table
   loop

I do something similar, where I COPY data to a temporary table, do
lots of manipulations, and then perform a series of INSERTS from the
temporary table into a permanent table.



1) It's definitely not faster because you have to insert into the 
temporary table, in addition to inserting into the permanent table.

2) This is what I had in mind:

mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;
 count
---
  1000
(1 row)

mgogala=# select count(*) from t1;
 count
---
 0
(1 row)

The table is created with on commit obliterate rows option which means 
that there is no need to do truncate. The truncate command is a 
heavy artillery. Truncating a temporary table is like shooting ducks in 
a duck pond, with a howitzer.


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

2010-10-25 Thread Mladen Gogala
Profiling could tell you where is the time lost and where is your 
program spending time. Having experience with both Oracle and Postgres, 
I don't feel that there is much of a difference in the insert speed. I 
am not using C++, I am using scripting languages like Perl and PHP and, 
as far as inserts go, I don't see much of a difference. I have an 
application which inserts approximately 600,000 records into a 
PostgreSQL 9.0.1 per day, in chunks of up to 60,000 records every hour. 
The table is partitioned and there are indexes on the underlying 
partitions. I haven't noticed any problems with inserts. Also, if I use 
copy instead of the insert command, I can be even faster.  In 
addition to that, PostgreSQL doesn't support index organized tables.


Divakar Singh wrote:
Storage test was simple, but the data (seconds taken) for INSERT test 
for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:

PG:
25
30
37
42
45



Oracle:

33
43
50
65
68

Rows inserted: 100,000
Above results show good INSERT performance of PG when using SQL 
procedures. But performance when I use C++ lib is very bad. I did that 
test some time back so I do not have data for that right now.



*From:* Scott Marlowe scott.marl...@gmail.com
*To:* Divakar Singh dpsma...@yahoo.com
*Cc:* pgsql-performance@postgresql.org
*Sent:* Mon, October 25, 2010 11:56:27 PM
*Subject:* Re: [PERFORM] Postgres insert performance and storage 
requirement compared to Oracle


On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh dpsma...@yahoo.com 
mailto:dpsma...@yahoo.com wrote:

 Hello Experts,
 My application uses Oracle DB, and makes use of OCI interface.
 I have been able to develop similar interface using postgreSQL library.
 However, I have done some tests but results for PostgreSQL have not been
 encouraging for a few of them.

Tell us more about your tests and results please.




--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index scan is not working, why??

2010-10-21 Thread Mladen Gogala

AI Rumman wrote:
I don't know why seq scan is running on the following query where the 
same query is giving index scan on other servers:

explain analyze
select *
from act
where act.acttype in ( 'Meeting','Call','Task');
  QUERY PLAN  

 Seq Scan on act (cost=0.00..13386.78 rows=259671 width=142) (actual 
time=0.013..484.572 rows=263639 loops=1)
  Filter: (((acttype)::text = 'Meeting'::text) OR ((acttype)::text = 
'Call'::text) OR ((acttype)::text = 'Task'::text))

 Total runtime: 732.956 ms
(3 rows)
Al, what percentage of the rows fits the above criteria? How big are 
your histograms?


--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Periodically slow inserts

2010-10-21 Thread Mladen Gogala

Gael Le Mignot wrote:

Hello,

We are  using PostgreSQL for  storing data and full-text  search indexes
for the webiste of a daily newspaper. We are very happy overall with the
results, but we have one weird behaviour that we would like to solve.

The problem is  when we index objects into the  full-text search part of
the database (which a DELETE and  then an INSERT into a specific table),
the INSERT  sometimes take a long time  (from 10s to 20s),  but the same
insert (and many other similar ones) are fast (below 0.2s).
  

Have you tried with strace, just to see where the time is spent?

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is postmaster doing?

2010-10-20 Thread Mladen Gogala

Dimi Paun wrote:

Folks,

I am running into a problem with the postmaster: from time to time, it
runs for a long time. E.g., from top:

23425 postgres  20   0 22008  10m  10m R 99.9  0.5  21:45.87 postmaster

I'd like to figure out what it is doing. How can I figure out what
statement causes the problem? 


is there a way I can log all SQL statements to a file, together with the
time it took to execute them?

  
You can do one better: you can even explain the statements, based on the 
execution time. There is a module called auto explain:


http://www.postgresql.org/docs/8.4/static/auto-explain.html

For the log files, you can parse them using pgfouine and quickly find 
out the most expensive SQL statements.




--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-19 Thread Mladen Gogala

Scott Carey wrote:


If the cost to hash is 1200493, and it needs to probe the hash 20241 times, why would the total cost be 631471410?  The cost to probe can't be that big! A cost of 500 to probe and join?  
Why favor hashing the large table and probing with the small values rather than the other way around?
  



May I ask a stupid question: how is the query cost calculated? What are 
the units? I/O requests? CPU cycles? Monopoly money?



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Select count(*), the sequel

2010-10-18 Thread Mladen Gogala
There was some doubt as for the speed of doing the select count(*) in 
PostgreSQL and Oracle.
To that end, I copied the most part of the Oracle table I used before to 
Postgres. Although the copy
wasn't complete, the resulting table is already significantly larger 
than the table it was copied from. The result still shows that Oracle is 
significantly faster:

Oracle result:

SQL alter system flush buffer_cache;

System altered.

SQL select /*+ full(NO) noparallel */ count(*) from ni_occurrence no;

  COUNT(*)
--
 402062638

Elapsed: 00:03:16.45



Hints are necessary because Oracle table is declared as parallel and I 
didn't want the PK index to be used for counting. Oracle has a good 
habit of using PK's for counting, if available.



SQL select bytes/1048576 as MB
  2  from user_segments
  3  where segment_name='NI_OCCURRENCE';

MB
--
 35329

Elapsed: 00:00:00.85
SQL

So, oracle stores 402 million records in 35GB and counts them in 3 
minutes 16.45 seconds  The very same table was partially copied to 
Postgres, copying died with ORA-01555 snapshot too old sometimes this 
morning. I ran vacuumdb -f -z on the database after the copy completed 
and the results are below.


mgogala=# select count(*) from ni_occurrence;
   count
---
 382400476
(1 row)

Time: 221716.466 ms
mgogala=#
mgogala=# select 221/60::real;
 ?column?
--
 3.68
(1 row)

Time: 0.357 ms
mgogala=#
mgogala=# select pg_size_pretty(pg_table_size('ni_occurrence'));
 pg_size_pretty

 46 GB
(1 row)

Time: 0.420 ms
mgogala=#

The database wasn't restarted, no caches were flushed, the comparison 
was done with a serious advantage for PostgreSQL. Postgres needed 3.68 
minutes to complete the count which is about the same Oracle but still 
somewhat slower. Also, I am worried about the sizes. Postgres table is 
11GB larger than the original, despite having less data. That was an 
unfair and unbalanced comparison because Oracle's cache was flushed and 
Oracle was artificially restrained to use the full table scan without 
the aid of parallelism. Here is the same result, with no hints and the 
autotrace on, which shows what happens if I turn the hints off:


SQL select count(*) from ni_occurrence no;

  COUNT(*)
--
 402062638

Elapsed: 00:00:52.61

Execution Plan
--
Plan hash value: 53476935




| Id  | Operation  | Name  | Rows  | Cost (%CPU)|
 Time  |TQ  |IN-OUT| PQ Distrib |




|   0 | SELECT STATEMENT  |  |1 | 54001  (19)|
 00:01:08 |   |  |   |

|   1 |  SORT AGGREGATE   |  |1 |   |
  |   |  |   |

|   2 |   PX COORDINATOR  |  |  |   |
  |   |  |   |

|   3 |PX SEND QC (RANDOM)  | :TQ1  |1 |   |
  |  Q1,00 | P-S | QC (RAND)  |

|   4 | SORT AGGREGATE  |  |1 |   |
  |  Q1,00 | PCWP |   |

|   5 |  PX BLOCK ITERATOR  |  |   402M| 54001  (19)|
 00:01:08 |  Q1,00 | PCWC |   |

|   6 |   INDEX FAST FULL SCAN| IDX_NI_OCCURRENCE_PID |   402M| 
54001  (19)|

 00:01:08 |  Q1,00 | PCWP |   |




It took just 52 seconds to count everything, but Oracle didn't even scan 
the table, it scanned a unique index, in parallel. That is the 
algorithmic advantage that forced me to restrict the execution plan with 
hints. My conclusion is that the speed of the full scan is OK, about the 
same as Oracle speed.  There are, however, three significant algorithm 
advantages on the Oracle's side:


1) Oracle can use indexes to calculate select count
2) Oracle can use parallelism.
3) Oracle can use indexes in combination with the parallel processing.



Here are the descriptions:

SQL desc ni_occurrence
 Name   Null?Type
 -  


 ID   NOT NULL NUMBER(22)
 PERMANENT_ID   NOT NULL VARCHAR2(12)
 CALL_LETTERS   NOT NULL VARCHAR2(5)
 AIRDATE   NOT NULL DATE
 DURATION   NOT NULL NUMBER(4)
 PROGRAM_TITLEVARCHAR2(360)
 COSTNUMBER(15)
 ASSETIDNUMBER(12)
 MARKET_IDNUMBER
 GMT_TIMEDATE
 ORIG_ST_OCC_ID NUMBER
 EPISODEVARCHAR2(450)
 IMPRESSIONSNUMBER

SQL

Re: [PERFORM] Select count(*), the sequel

2010-10-18 Thread Mladen Gogala

 On 10/18/2010 3:58 AM, Vitalii Tymchyshyn wrote:

Hello.

Did you vacuum postgresql DB before the count(*). I ask this because
(unless table was created  loaded in same transaction) on the first
scan, postgresql has to write hint bits to the whole table. Second scan
may be way faster.

Best regards, Vitalii Tymchyshyn


Vitalli, yes I did vacuum before the count.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with duration of statement: EXECUTE unnamed [PREPARE: COMMIT]

2010-10-18 Thread Mladen Gogala

Tom Lane wrote:

My guess would be overstressed disk subsystem.  A COMMIT doesn't require
much except fsync'ing the commit WAL record down to disk ... 
Doesn't the commit statement also release all the locks held by the 
transaction?


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Select count(*), the sequel

2010-10-16 Thread Mladen Gogala
NUMBER

SQL
mgogala=# \d ni_occurrence
   Table public.ni_occurrence
 Column |Type | Modifiers
+-+---
 id | bigint  | not null
 permanent_id   | character varying(12)   | not null
 call_letters   | character varying(5)| not null
 airdate| timestamp without time zone | not null
 duration   | smallint| not null
 program_title  | character varying(360)  |
 cost   | bigint  |
 assetid| bigint  |
 market_id  | bigint  |
 gmt_time   | timestamp without time zone |
 orig_st_occ_id | bigint  |
 episode| character varying(450)  |
 impressions| bigint  |
Indexes:
ni_occurrence_pk PRIMARY KEY, btree (id)

mgogala=#

Oracle block is 16k, version is 10.2.0.5 RAC, 64 bit (is anybody still 
using 32bit db servers?) . Postgres is 9.0.1, 64 bit. Both machines are 
running Red Hat 5.5:



[mgog...@lpo-postgres-d01 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[mgog...@lpo-postgres-d01 ~]$

Linux lpo-postgres-d01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 
2010 x86_64 x86_64 x86_64 GNU/Linux

[mgog...@lpo-postgres-d01 ~]$

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com
The Leader in integrated Media Intelligence Solutions



Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Mladen Gogala

Samuel Gendler wrote:



On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala 
mladen.gog...@vmsinfo.com mailto:mladen.gog...@vmsinfo.com wrote:


 If working with partitioning, be very aware that PostgreSQL
optimizer has certain problems with partitions, especially with
group functions. If you want speed, everything must be prefixed
with partitioning column: indexes, expressions, joins. There is no
explicit star schema and creating hash indexes will not buy you
much, as a matter of fact, Postgres community is extremely
suspicious of the hash indexes and I don't see them widely used.
Having said that, I was able to solve the problems with my speed
and partitioning.


Could you elaborate on this, please? What do you mean by 'everythin 
must be prefixed with partitioning column?'


--sam
If you have partitioned table part_tab, partitioned on the column 
item_date and if there is a global primary key in Oracle, let's call it 
item_id, then queries like select * from part_tab where item_id=12345 
will perform worse than queries with item_date


select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the 
item_date column in the query above is not necessary, after all, the 
item_id column is the primary key. However, with range scans you will 
get much better results if you include the item_date column than if you 
use combination of columns without. The term prefixed indexes is 
borrowed from Oracle RDBMS and means that the beginning column in the 
index is the column on which the table is partitioned. Oracle, as 
opposed to Postgres, has global indexes, the indexes that span all 
partitions. PostgreSQL only maintains indexes on each of the partitions 
separately.  Oracle calls such indexes local indexes and defines them 
on the partitioned table level. Here is a brief and rather succinct  
explanation of the terminology:


http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php


Of, course, there are other differences between Oracle partitioning and 
PostgreSQL partitioning. The main difference is $1/CPU.

I am talking from experience:

news= \d moreover_documents
 Table moreover.moreover_documents
   Column|Type | Modifiers
--+-+---
document_id  | bigint  | not null
dre_reference| bigint  | not null
headline | character varying(4000) |
author   | character varying(200)  |
url  | character varying(1000) |
rank | bigint  |
content  | text|
stories_like_this| character varying(1000) |
internet_web_site_id | bigint  | not null
harvest_time | timestamp without time zone |
valid_time   | timestamp without time zone |
keyword  | character varying(200)  |
article_id   | bigint  | not null
media_type   | character varying(20)   |
source_type  | character varying(20)   |
created_at   | timestamp without time zone |
autonomy_fed_at  | timestamp without time zone |
language | character varying(150)  |
Indexes:
   moreover_documents_pkey PRIMARY KEY, btree (document_id)
Triggers:
   insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH 
ROW EXE

CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL desc moreover_documents
Name   Null?Type
-  


DOCUMENT#   NOT NULL NUMBER
DRE_REFERENCE   NOT NULL NUMBER
HEADLINEVARCHAR2(4000)
AUTHOR VARCHAR2(200)
URLVARCHAR2(1000)
RANKNUMBER
CONTENTCLOB
STORIES_LIKE_THISVARCHAR2(1000)
INTERNET_WEB_SITE#   NOT NULL NUMBER
HARVEST_TIMEDATE
VALID_TIMEDATE
KEYWORDVARCHAR2(200)
ARTICLE_ID   NOT NULL NUMBER
MEDIA_TYPEVARCHAR2(20)
CREATED_ATDATE
SOURCE_TYPEVARCHAR2(50)
PUBLISH_DATEDATE
AUTONOMY_FED_ATDATE
LANGUAGEVARCHAR2(150)

SQL



I must say that it took me some time to get things right.

--

Mladen Gogala 
Sr. Oracle DBA

1500

Re: [PERFORM] Slow count(*) again...

2010-10-15 Thread Mladen Gogala

Jon Nelson wrote:


Well, I didn't quite mean that - having no familiarity with Oracle I
don't know what the alter system statement does, but I was talking
specifically about the linux buffer and page cache. 
  


Those are not utilized by Oracle.  This is a RAC instance, running on 
top of ASM, which is an Oracle volume manager, using raw devices. There 
is no file system on those disks:


SQL select file_name from dba_data_files
 2  where tablespace_name='ADBASE_DATA';

FILE_NAME

+DGDATA/stag3/datafile/adbase_data.262.727278257
+DGDATA/stag3/datafile/adbase_data.263.727278741
+DGDATA/stag3/datafile/adbase_data.264.727280145
+DGDATA/stag3/datafile/adbase_data.265.727280683

[ora...@lpo-oracle-30 ~]$ $ORA_CRS_HOME/bin/crs_stat -l
NAME=ora.STAG3.STAG31.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.STAG3.STAG32.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.STAG3.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-30.ASM1.asm
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.LISTENER_LPO-ORACLE-30.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.ons
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-31.ASM2.asm
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.LISTENER_LPO-ORACLE-31.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.ons
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31



The only way to flush cache is the aforementioned alter system 
command. AFAIK, Postgres doesn't have anything like that. Oracle uses 
raw devices precisely to avoid double buffering.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-15 Thread Mladen Gogala

Neil Whelchel wrote:



That is why I suggested an estimate(*) that works like (a faster) count(*) 
except that it may be off a bit. I think that is what he was talking about 
when he wrote this.


  
The main problem with select count(*) is that it gets seriously 
mis-used.  Using select count(*) to establish existence is bad for 
performance and for code readability. 


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bogus startup cost for WindowAgg

2010-10-14 Thread Mladen Gogala

Ants Aasma wrote:

I hit an issue with window aggregate costing while experimenting with
providing a count of the full match along side a limited result set.
Seems that the window aggregate node doesn't take into account that it
has to consume the whole input before outputting the first row. When
this is combined with a limit, the resulting cost estimate is wildly
underestimated, leading to suboptimal plans.
  

What is your histogram size? That's defined by the
default_statistics_target in your postgresql.conf.
Check the column histograms like this:

   news= select attname,array_length(most_common_vals,1)
   from pg_stats
   where tablename='moreover_documents_y2010m09';
  attname| array_length
   --+--
document_id  |   
dre_reference|   
headline | 1024

author   |  212
url  |   
rank |   59

content  | 1024
stories_like_this|   
internet_web_site_id | 1024

harvest_time | 1024
valid_time   | 1024
keyword  |   95
article_id   |   
media_type   |5

source_type  |1
created_at   | 1024
autonomy_fed_at  | 1024
language |   37
   (18 rows)

   news= show default_statistics_target;
default_statistics_target
   ---
1024
   (1 row)

You will see that for most of the columns, the length of the histogram
array corresponds to the value of the default_statistics_target
parameter. For those that are smaller, the size is the total number of
values in the column in the sample taken by the analyze command. The
longer histogram, the better plan. In this case, the size does matter.
Note that there are no histograms for the document_id and dre_reference
columns. Those are the primary and unique keys, the optimizer can easily
guess the distribution of values.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   >