Re: [HACKERS] Cached Query Plans (was: global prepared statements)

2008-04-12 Thread Perez
In article <[EMAIL PROTECTED]>,
 > PFC wrote:
> 
> > So, where to go from that ? I don't see a way to implement this without 
> > a (backwards-compatible) change to the wire protocol, because the clients 
> > will want to specify when a plan should be cached or not. Since the user  
> > should not have to name each and every one of the statements they want to 
> > use plan caching, I see the following choices :
> 


Doesn't Oracle do this now transparently to clients?  That, I believe 
Oracle keeps a statement/plan cache in its shared memory segment (SGA) 
that greatly improves its performance at running queries that don't 
change very often.

>From that point of view, Oracle at least sees benefits in doing this.  
>From my POV a transparent performance enhancer for all those PHP and 
Rails apps out there.

With plan invalidation in 8.3 this becomes feasible for pgSQL to do as 
well.

-arturo

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature freeze progress report

2007-05-01 Thread Arturo Perez
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (Jim Nasby) wrote:

> Two more ideas for the manager, now that we seem to have consensus to  
> build one.
> 

One other thing a webapp would allow that would help grow the community.  
If the patches are all in a public place then reviewer wannabees can get 
their feet wet relatively easily.  

Some may argue this is already possible but I, personally, don't even 
know where to look for patches.

-arturo

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-20 Thread Arturo Perez

I don't know if this has come up yet but

In terms of tagging errors we might be able to use some machine  
learning techniques.



There are NLP/learning systems that interpret logs.  They learn over  
time what is normal and what isn't and can flag things that are  
abnormal.


For example, people are using support vector machines (SVM) analysis  
on log files to do intrusion detection.  Here's a link for intrusion  
detection called Robust Anomaly Detection Using Support Vector  
Machines  http://wwwcsif.cs.ucdavis.edu/~liaoy/research/ 
RSVM_Anomaly_journal.pdf


This paper from IBM gives some more background information on how  
such a thing might work.  http://www.research.ibm.com/journal/sj/413/ 
johnson.html


I have previously used an open source toolkit from CMU called rainbow  
to do these types of analysis.


-arturo


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hints (Was: Index Tuning Features)

2006-10-13 Thread Arturo Perez
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (Andrew Sullivan) wrote:

> On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote:
> > 
> > Some statistics are very hard to gather from a sample, e.g. the number
> > of distinct values in a column.
> 

> I like the suggestion, though, that there be ways to codify known
> relationships in the system in such a way that the optimizer can
> learn to use that information.  

Since there is already a genetic-algorithm based optimizer, is there any 
way to use that to gather information to improve statistics?

For example, put the GA optimizer into a mode where it tries some of the 
plans it comes up with and collects data on how they perform?

-arturo

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixed length data types issue

2006-09-13 Thread Arturo Perez
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (Jim Nasby) wrote:

> I'd love to have the ability to control toasting thresholds  
> manually. ... Being able to force a field to be  
> toasted before it normally would could drastically improve tuple  
> density without requiring the developer to use a 'side table' to  
> store the data.

+1 :-)

-arturo

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] An Idea for planner hints

2006-08-14 Thread Perez
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] ("Jim C. Nasby") wrote:

> On Wed, Aug 09, 2006 at 08:31:42AM -0400, Perez wrote:
> > Every once in a while people talk about collecting better statistics, 
> > correlating multi-column correlations etc.  But there never seems to be 
> > a way to collect that data/statistics.  
> > 
> > Would it be possible to determine the additional statistics the planner 
> > needs, modify the statistics table to have them and document how to 
> > insert data there?  We wouldn't have a good automated way to determine 
> > the information but a properly educated DBA could tweak things until 
> > they are satisfied.
> > 
> > At worse if this new information is unpopulated then things would be as 
> > they are now.  But if a human can insert the right information then some 
> > control over the planner would be possible.
> > 
> > Is this a viable idea?  Would this satisfy those that need to control 
> > the planner immediately without code changes?
> 
> Sure, it's a Simple Matter of Code.
> 
> The real issue is figuring out what to do with these stats. I think all
> the estimator fucntions could use improvement, but no one's taken that
> on yet.

I thought, from watching the list for a while, that the planner 
statistics needed were known but that how to gather the statistics was 
not?

For example,  there is the discussion around multi-column correlation.  
I got the impression that we (you all ) knew what to do with the 
stats but that there was no reliable way to get them.

So, the situation is that we need better stats, but we don't know how to 
collect them AND we don't know what they are either?  If we did know 
what to do then my idea and SMC would prevail?

If that's the case then it sounds to me like we should figure out the 
statistics we wish we had that the planner could work with.  Something 
for the 8.5 timeframe I guess :-)

-arturo

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] An Idea for planner hints

2006-08-13 Thread Perez
In article <[EMAIL PROTECTED]>,
 Perez <[EMAIL PROTECTED]> wrote:

> In article <[EMAIL PROTECTED]>,
>  [EMAIL PROTECTED] (Tom Lane) wrote:
> 
> > Martijn van Oosterhout  writes:
> > > My main problem is that selectivity is the wrong measurement. What
> > > users really want to be able to communicate is:
> > 
> > > 1. If you join tables a and b on x, the number of resulting rows will be
> > > the number of roows selected from b (since b.x id a foreign key
> > > referencing a.x).
> > 
> > FWIW, I believe the planner already gets that case right, because a.x
> > will be unique and it should know that.  (Maybe not if the FK is across
> > a multi-column key, but in principle it should get it right.)
> > 
> > I agree though that meta-knowledge like this is important, and that
> > standard SQL frequently doesn't provide any adequate way to declare it.
> > 
> > regards, tom lane
> 
> 
> Every once in a while people talk about collecting better statistics, 
> correlating multi-column correlations etc.  But there never seems to be 
> a way to collect that data/statistics.  
> 
> Would it be possible to determine the additional statistics the planner 
> needs, modify the statistics table to have them and document how to 
> insert data there?  We wouldn't have a good automated way to determine 
> the information but a properly educated DBA could tweak things until 
> they are satisfied.
> 
> At worse if this new information is unpopulated then things would be as 
> they are now.  But if a human can insert the right information then some 
> control over the planner would be possible.
> 
> Is this a viable idea?  Would this satisfy those that need to control 
> the planner immediately without code changes?
> 
> -arturo

I didn't see any response to this idea so I thought I'd try again with a 
real email.

-arturo

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] An Idea for planner hints

2006-08-09 Thread Perez
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (Tom Lane) wrote:

> Martijn van Oosterhout  writes:
> > My main problem is that selectivity is the wrong measurement. What
> > users really want to be able to communicate is:
> 
> > 1. If you join tables a and b on x, the number of resulting rows will be
> > the number of roows selected from b (since b.x id a foreign key
> > referencing a.x).
> 
> FWIW, I believe the planner already gets that case right, because a.x
> will be unique and it should know that.  (Maybe not if the FK is across
> a multi-column key, but in principle it should get it right.)
> 
> I agree though that meta-knowledge like this is important, and that
> standard SQL frequently doesn't provide any adequate way to declare it.
> 
>   regards, tom lane


Every once in a while people talk about collecting better statistics, 
correlating multi-column correlations etc.  But there never seems to be 
a way to collect that data/statistics.  

Would it be possible to determine the additional statistics the planner 
needs, modify the statistics table to have them and document how to 
insert data there?  We wouldn't have a good automated way to determine 
the information but a properly educated DBA could tweak things until 
they are satisfied.

At worse if this new information is unpopulated then things would be as 
they are now.  But if a human can insert the right information then some 
control over the planner would be possible.

Is this a viable idea?  Would this satisfy those that need to control 
the planner immediately without code changes?

-arturo

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq