Re: [HACKERS] Concurrent connections in psql

2007-03-29 Thread Sailesh Krishnamurthy
+++

We'd love this feature as it would really help us write better test cases !

Regards
Sailesh

--
Sailesh Krishnamurthy
Amalgamated Insight
[W] (650) 242-3503
[C] (650) 804-6585

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark
Sent: Tuesday, March 27, 2007 6:39 AM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Concurrent connections in psql


Would people be interested in this feature? There was some positive reaction
from users but I'm not sure how excited developers are about complicating 
the
logic in psql (which is already pretty tangled).

This code bitrotted severely when Tom added the cursor support to psql. I
don't mind redoing it if people want it though. I already did a first pass 
at
doing so but it wasn't clear to me how best to integrate it with that cursor
support change. I elected to treat each chunk of results from the cursor as 
a
separate result set which makes it possible to switch connections between
chunks. That's nice but probably not really acceptable judging by how much
effort Tom went through in the cursor code to avoid having the chunks appear
as separate result sets. Probably I'll have to do more work in that area.

Are people interested in having this? The reason I think it's particularly
interesting is writing regression tests -- especially to test HOT cases.


Gregory Stark [EMAIL PROTECTED] writes:

 I mentioned this a while back, now that 8.2 is out perhaps others will be 
 more
 interested in new code.

 Currently Postgres regression tests only test functionality within a 
 single
 session. There are no regression tests that test the transaction semantics 
 or
 locking behaviour across multiple transactions.

 I modified psql to allow you to open multiple connections and switch 
 between
 them with a sort of csh job control style interface. It actually works out
 pretty well. It's fairly easy to write regression tests for basic 2-client 
 or
 3-client cases.

 The actual user interface may need some discussion though. I didn't want 
 to
 play the name game so I just prefixed all my commands with c and figured 
 we
 can always rename them later.

 And experience with actually writing the tests shows that the explicit 
 \cwait
 command which was needed to eliminate (in practice if not in theory) race
 conditions in regression tests turns out to be more flexibility than
 necessary. Since you end up having to insert one in precisely predictable
 locations -- namely after every asynchronous command and after every
 connection switch -- perhaps it would be simpler to just have a \pset 
 cwait
 command that automatically introduces timeouts in precisely those places.

 A brief explanation including an example regression test (the SAVEPOINT
 locking bug discovered recently) and the patch here:

   http://community.enterprisedb.com/concurrent/index.html

 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Concurrent connections in psql

2007-03-27 Thread Sailesh Krishnamurthy
+++

We'd love this feature as it would really help us write better test cases !

Regards
Sailesh

--
Sailesh Krishnamurthy
Amalgamated Insight
[W] (650) 242-3503
[C] (650) 804-6585

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Stark
Sent: Tuesday, March 27, 2007 6:39 AM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Concurrent connections in psql


Would people be interested in this feature? There was some positive reaction
from users but I'm not sure how excited developers are about complicating 
the
logic in psql (which is already pretty tangled).

This code bitrotted severely when Tom added the cursor support to psql. I
don't mind redoing it if people want it though. I already did a first pass 
at
doing so but it wasn't clear to me how best to integrate it with that cursor
support change. I elected to treat each chunk of results from the cursor as 
a
separate result set which makes it possible to switch connections between
chunks. That's nice but probably not really acceptable judging by how much
effort Tom went through in the cursor code to avoid having the chunks appear
as separate result sets. Probably I'll have to do more work in that area.

Are people interested in having this? The reason I think it's particularly
interesting is writing regression tests -- especially to test HOT cases.


Gregory Stark [EMAIL PROTECTED] writes:

 I mentioned this a while back, now that 8.2 is out perhaps others will be 
 more
 interested in new code.

 Currently Postgres regression tests only test functionality within a 
 single
 session. There are no regression tests that test the transaction semantics 
 or
 locking behaviour across multiple transactions.

 I modified psql to allow you to open multiple connections and switch 
 between
 them with a sort of csh job control style interface. It actually works out
 pretty well. It's fairly easy to write regression tests for basic 2-client 
 or
 3-client cases.

 The actual user interface may need some discussion though. I didn't want 
 to
 play the name game so I just prefixed all my commands with c and figured 
 we
 can always rename them later.

 And experience with actually writing the tests shows that the explicit 
 \cwait
 command which was needed to eliminate (in practice if not in theory) race
 conditions in regression tests turns out to be more flexibility than
 necessary. Since you end up having to insert one in precisely predictable
 locations -- namely after every asynchronous command and after every
 connection switch -- perhaps it would be simpler to just have a \pset 
 cwait
 command that automatically introduces timeouts in precisely those places.

 A brief explanation including an example regression test (the SAVEPOINT
 locking bug discovered recently) and the patch here:

   http://community.enterprisedb.com/concurrent/index.html

 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] how to add seconds to a TimestampTz

2007-03-15 Thread Sailesh Krishnamurthy
If you read the autovacuum_naptime into an Interval object once, why can't 
you just use timestamptz_pl_interval ? You won't be using the interval 
input/output repeatedly surely.

Regards
Sailesh

--
Sailesh Krishnamurthy
Amalgamated Insight
[W] (650) 242-3503
[C] (650) 804-6585

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Alvaro Herrera
Sent: Wednesday, March 14, 2007 3:46 PM
To: Hackers
Subject: [HACKERS] how to add seconds to a TimestampTz

Is there a better way than going to time_t and back?  I am currently
using this:

db-next_worker =
time_t_to_timestamptz(timestamptz_to_time_t(current_time) +
  autovacuum_naptime);

(db-next_worker is a TimestampTz, as is current_time.
autovacuum_naptime is integer for a number of seconds)

but it doesn't seem clean, and the comments in the functions more or
less say that their use is discouraged.

I saw about doing it via the interval input/output but that's an awful
lot of work ...

Is this the first time this is going to be done in the backend?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Update on TelegraphCQ

2006-07-26 Thread Sailesh Krishnamurthy
Folks

I'm glad to be back on the pgsql-hackers mailing list. I'm writing to give
an update on the status of TelegraphCQ.

As Josh says, the TelegraphCQ project at UC Berkeley is more or less done as
most of the students on the project have graduated, or will be done this
summer. 

Having said that, some of us from the TelegraphCQ team have started a
company (Amalgamated Insight, Inc.) in order to commercialize an
enterprise-class stream processing engine. We're working on a much more
polished streaming engine that is based on the v8.1 code (among other more
application-oriented things). If anybody in the community is interested in
more details we'll be happy to have an off-line conversation, whether it's
about a potential use for our technology or a possible employment
opportunity. For instance, Neil Conway has interned with our team this
summer and we have benefited enormously from his experience. I'll also make
it a point to attend the next SFPUG meeting in case anybody local would like
to talk in vivo. Let me however say that we can't talk too much about
various aspects of our business model and/or distribution strategy (in other
words, I don't yet know if we will do open-source releases of everything we
build). 

 Josh Berkus said:

 Dragan,

 What are the possibilities (if any) for continuous dataflow streaming 
 with PostgreSQL v.8.1 ? Something like TelegraphCQ project,but it was 
 for v.7.3.Is there any alternatives for the latest version of PostgreSQL
?

 The TelegraphCQ team has stopped public development. So it's  
 pretty much waiting for someone to take on their code.

 
Regards
Sailesh
 
--
Sailesh Krishnamurthy
Amalgamated Insight
[W] (650) 267-2503
[C] (650) 804-6585


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Call to build-in operator from new operator

2005-04-18 Thread Sailesh Krishnamurthy

As others pointed out, DirectFunctionCall2 is your friend. There is a
mailing list for TelegraphCQ. It's a Yahoo group - visit
http://groups.yahoo.com/group/telegraphcq 

About accessing data from a table in another database ... we need to
know more about what exactly you're doing. Please post on the tcq
mailing list.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Query optimizer 8.0.1 (and 8.0)

2005-02-21 Thread Sailesh Krishnamurthy

Sounds a bit like multi-dimensional clustering ... 

http://www.research.ibm.com/mdc/

After the ARC experience though ...

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Design notes for BufMgrLock rewrite

2005-02-14 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom and changing the buf_table hash table.  The only common
Tom operation that needs exclusive lock is reading in a page that
Tom was not in shared buffers already, which will require at
Tom least a kernel call and usually a wait for I/O, so it will be
Tom slow anyway.

Why not a separate lock per bucket chain in the hash table in addition
to the system-wide LWLock ? It's not so much that such an operation will be
slow anyway but that such a slow operation will unnecessarily block
other operations.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] enforcing a plan (in brief)

2005-02-10 Thread Sailesh Krishnamurthy

Hicham

For your experiments (VLDB ? :-) your best bet of specifically bolting
on a plan (if you can't convince the optimizer to do the right thing)
is to hack it in the query planner. 

I've done similar hacks in the past, but only in the TelegraphCQ code
and not in PostgreSQL. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Group-count estimation statistics

2005-01-28 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom The only real solution, of course, is to acquire cross-column
Tom statistics, but I don't see that happening in the near
Tom future.

Another approach is a hybrid hashing scheme where we use a hash table
until we run out of memory at which time we start spilling to disk. In
other words, no longer use SortAgg at all ..

Under what circumstances will a SortAgg consumer more IOs than a
hybrid hash strategy ?

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-18 Thread Sailesh Krishnamurthy
 Jonah == Jonah H Harris [EMAIL PROTECTED] writes:

Jonah Replying to the list as a whole:

Jonah If this is such a bad idea, why do other database systems
Jonah use it?  As a businessperson myself, it doesn't seem
Jonah logical to me that commercial database companies would
Jonah spend money on implementing this feature if it wouldn't be
Jonah used.  Remember guys, I'm just trying to help.

Systems like DB2 don't implement versioning schemes. As a result there
is no need to worry about maintaining visibility in
indexes. Index-only plans are thus viable as they require no change in
the physical structure of the index and no overhead on
update/delete/insert ops. 

I don't know about Oracle, which I gather is the only commercial
system to have something like MVCC.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Much Ado About COUNT(*)

2005-01-18 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom People who hang around Postgres too long tend to think that
Tom MVCC is the obviously correct way to do things, but much of
Tom the rest of the world thinks differently ;-)

It works the other way too ... people who come from the locking world
find it difficult to wrap their heads around MVCC. A big part of this
is because Gray's original paper on transaction isolation defines the
different levels based on what kind of lock acquisitions they involve. 

A very nice alternative approach to defining transaction isolation is
Generalized isolation level definitions by Adya, Liskov and O'Neill
that appears in ICDE 2000. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] plans for bitmap indexes?

2004-10-20 Thread Sailesh Krishnamurthy
 Gavin == Gavin Sherry [EMAIL PROTECTED] writes:

Gavin I'm uncertain about the potential benefit of
Gavin this. Isn't/shouldn't the effects of caching be assisting
Gavin us here?

It all depends on how large your table is, and how busy the system is
(other pressures on the cache). While it's difficult to plan for the
latter you can plan for the former. For the latter you could assume
that the effective cache size is some fraction of the real size to
account for the effects of other queries. Unless you are real sure
that you will never kick out a page from the buffer cache .. 

I believe that for large enough tables this can certainly help .. it
sure is something that many other systems have implemented. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] plans for bitmap indexes?

2004-10-19 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom One huge advantage is that the actual heap visiting becomes
Tom efficient, eg you never visit the same page more than once.
Tom (What you lose is the ability to retrieve data in index
Tom order, so this isn't a replacement for existing indexscan
Tom methods, just another plan type to consider.)

Even without bitmap indexes, without trying to use multiple indexes
etc. this (visiting a page only once) is useful. 

In other words, I'd like to see the indexscan broken up into: (1) an
operator that returns a list of TIDs, (2) Sort the TIDs and (3) an
operator that fetches heap tuples from the sorted TID list. 

Of course the resulting data from the heap will be out of order but
that often times is less important than unnecessarily visiting (and
possibly even re-fetching from disk) the same heap page twice for a
given index scan. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] APR 1.0 released

2004-10-08 Thread Sailesh Krishnamurthy
 Marc == Marc G Fournier [EMAIL PROTECTED] writes:

Marc On Thu, 7 Oct 2004, Bruce Momjian wrote:
  Added to TODO:
 
 * Consider parallel processing a single query
 
 This would involve using multiple threads or processes to do
 optimization, sorting, or execution of single query.  The major
 advantage of such a feature would be to allow multiple CPUs to
 work together to process a single query.

Marc Do we have 'make backend thread safe' listed yet?  As I
Marc recall it, until that gets done, parallelization of anything
Marc was considered to be a relatively onerous task, no?


You don't really need to parallelize in separate threads .. you can
have more than one process working on one query. This is in fact the
model that exploits SMPs in at least one commercial RDBMS.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] APR 1.0 released

2004-10-08 Thread Sailesh Krishnamurthy

IMHO the best references to parallelizing query plans are in the
Volcano papers. The Exchange operator is a really clean abstraction -
the idea is to place the Exchange operator in query plans and that way
you don't have to paralellize any other operator. Exchange takes care
of managing the IPC queues and also worries about whether or not you
have to, say, rehash the data, or broadcast the data to all other
processes or direct the data to a single node ... 

I'd suggest reading the following paper:

Encapsulation of parallelism in the Volcano query processing system

By Goetz Graefe in SIGMOD 1990. 

Link: http://portal.acm.org/citation.cfm?id=98720

The above link also has references to Gamma but I really like the
exposition in the Volcano/Exchange work much better. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] plans for bitmap indexes?

2004-10-08 Thread Sailesh Krishnamurthy
 Yann == Yann Michel [EMAIL PROTECTED] writes:

Yann O.K. I downloaded it :-) We will see if and how I can
Yann help

FYI .. in case you aren't aware already:

http://portal.acm.org/citation.cfm?id=98720

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] beta1 beta2 Windows heavy load

2004-09-14 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom instead of increasing it.  And I don't want to create a
Tom full-fledged alloc/free package for shared memory --- the
Tom bang-for-buck ratio for that is way too low.  So I'm inclined

I think I've said it before, but we actually have such a beast - a
shared memory MemoryContext that uses libmm (which apparently Apache
also uses). It should be fairly easy to patch in if it's worth it for
you folks.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] APR 1.0 released

2004-09-13 Thread Sailesh Krishnamurthy
 CB == Christopher Browne [EMAIL PROTECTED] writes:

CB futile discussions ask for it.  Notably, on an SMP system, it
CB would be a neat idea for complex queries involving joins to
CB split themselves so that different parts run in separate
CB threads.

You don't really need threads for this. All you need is to have
multiple backends and use queues to exchange tuples at specific
points. This is much like the Exchange operator in Volcano. 

CB The other Way, Way Cool part would be for queries that are
CB scanning big tables to split the scans into unions of partial
CB scans, so that on an 8 CPU box you'd take the Big 4GB Table
CB and have 8 threads simultaneously scanning different parts of
CB it.  (And making ARC all the more important :-).)

Again this can be done without threads .. you just need inter-process
communication. 

(BTW, there is at least one commercial system that follows exactly
this model).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pulling projection up in plans

2004-08-16 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Hicham G. Elmongui [EMAIL PROTECTED] writes:
 is there a way to pull the projection operator to the top of
 the query plan? I wish there's a variable that can be set to do
 so.

Tom Could you be more specific about what you're hoping to
Tom accomplish?

This is just a suspicion, but I suspect that Hicham wants to avoid
tuples being projected in Scan operators ..  in a select (*) query
projecting a tuple essentially causes a copy from the buffer pool to
the backend process' heap space. I guess it would work just fine to
have the tuple remain in the buffer and keep the buffer pinned. 

(In TelegraphCQ we actually do this .. we also don't really
materialize join tuples - instead we have an intermediate tuple
format which is a list of pointers to the various source tuples. This
makes sense as we do shared query processing - ie. different queries
with similar join predicates and different projection attributes use
the same physical join tuples - projection is the final operation on
the intermediate tuples when we return the tuples to clients. We did
this really to facilitate sharing and don't really have any hard
numbers on whether this would make sense in a general context with
pgsql. Actually IIRC we did consider this - if this would make a diff
to pgsql - and did some crude perf studies and found that it didn't
really help if there was no sharing .. as is the case with pgsql). 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Extended customizing, SQL functions,

2004-06-02 Thread Sailesh Krishnamurthy
 pgsql == pgsql  [EMAIL PROTECTED] writes:

pgsql The have a database of information that is coming in at a
pgsql high speed regular basis. One bit of information is a
pgsql value. To get this value they must perform SELECT
pgsql sum(field) FROM table. Well, this simply does not
pgsql scale. They've used a trigger system with a small summary
pgsql table where they update, the number in the sumary
pgsql field. That works fine, except, that after a few thousand
pgsql updates, the SELECT time takes a while. Then they have to
pgsql vacuum constanty. It just seems like an ugly and wastefull
pgsql process.

Sounds like something that TelegraphCQ can do well ..

http://telegraph.cs.berkeley.edu 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] On query rewrite

2004-05-28 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom This particular issue is handled as part of our Path
Tom enumeration mechanism, but the more hard-wired sorts of
Tom transformations that you are asking about live mostly in

Thanks again. To confirm the actual cost comparison with plan
enumeration is a dynamic programming algorithm, is it not ?
Selinger-style with 2-way join paths enumerated, then 3-way using the
best 2-way etc. ? 

BTW, do lots of people use the GEQO ? 

Tom optimizer/prep/* and plan/planner.c.  In particular you
Tom probably want to look at prepjointree.c and prepqual.c.
Tom (Note prepqual also looks considerably different in CVS tip
Tom than in prior releases.)

Thanks .. I've extracted cvstip .. sigh .. one of these days I'll have
to do another merge with the TelegraphCQ code. You guys hack too much :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy

Hackers

Are there any default query rewrite rules that kick in, in the absence
of any user-defined RULE or VIEW ?

Also, is there any place that lists any interesting query rewrite
that PG does on queries for perf. improvement ? 

For instance, in the presence of a view or a subquery, does PG do a
subquery to join transformation ?

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy
 Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:

 For instance, in the presence of a view or a subquery, does PG
 do a subquery to join transformation ?

Alvaro Yes, there are transformations of this sort, but they are
Alvaro not called query rewrite in the code's terminology, but
Alvaro optimization -- rewrite (rules and views) happens to the
Alvaro parsed statement, and the optimizer works on the output of
Alvaro rewriting.  So actually the optimizations happen whether
Alvaro there were or not rules or views.

Interesting .. so these are rule-based then ? Not cost-based ?

I understand that there is a cost-based optimizer anyway that does the
planning and selects the right plan .. but does this come _after_ all
these transformations ? Or does it happen along with the
transformations ? 

Alvaro The query's path is SQL - parse - rewrite - optimize -
Alvaro execute

Can you please point me to the code that indeed does such
transformations ? 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] On query rewrite

2004-05-27 Thread Sailesh Krishnamurthy
 Alvaro == Alvaro Herrera [EMAIL PROTECTED] writes:

 I understand that there is a cost-based optimizer anyway that
 does the planning and selects the right plan .. but does this
 come _after_ all these transformations ? Or does it happen
 along with the transformations ?

Alvaro No, there's no rules optimizer, only the cost-based one
Alvaro you already know of.

Okay ...

Alvaro The query's path is SQL - parse - rewrite - optimize -
Alvaro execute
  Can you please point me to the code that indeed does such
 transformations ?

Alvaro Sorry, I don't know the optimizer code.  You can find a
Alvaro lot of detail in backend/optimizer/README.  Probably you
Alvaro want to look at what happens to JOIN_IN nodes, for
Alvaro example, regarding the conversion of a

Couldn't find the README but I'm operating on an older souce base. 

Anyway, thanks for the tips. I think I found what I'm looking for: the
function is probably pull_up_subqueries .. and what it tries to do is
check if a subquery is simple or not .. simple means not having
Aggs or something more complicated. If that's the case, and if some
NULLable conditions are safe then, the subquery gets pulled up -
essentially, a subquery to join transformation.


Now my next question is more subtle .. 

Are these alternatives (pulling up vs not pulling up subqueries)
considered in different plans ? 

Because, if not, it seems that this is really just a query rewrite
.. it's just that it happens as part of the optimizer component. In
fact, there is an implicit rule here in operation (by rule, I don't
mean a pgsql RULE).

Are more such transformations spread around the optimizer component ?
Is there any reason to have it integrated with the planner as opposed
to having it be part of the rewrite component (apart from historical
.. plus the code is solid and works etc.) ? 

Sorry for all the questions .. as I stated before I'm working on a
chapter of a text book that is a case-study of pgsql (the 4th ed
contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is
gonna have pgsql. 

Another question about regular RULE processing .. suppose after
applying a rule the resultant query tree is eligible for another rule,
does pgsql's rule system keep iterating over and over until it reaches
a fixed point or is there some heuristic in operation (just apply the
rules twice ..) ? From my cursory inspection of the code it looks like
the latter, but I'd like to know for sure. 

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] proposal: be smarter about i/o patterns in index scan

2004-05-19 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom For starters, read the previous discussions of this in the
Tom archives.

Tom Two questions you should have answers to before starting to
Tom implement, rather than after, are how to deal with locking
Tom considerations and what will be the implications of giving up
Tom the property that indexscans deliver sorted output.


I don't know about the former, but as to the latter, we should
certainly have the ability for both output sorted by key and
TID-List-Fetch ...

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] proposal: be smarter about i/o patterns in index scan

2004-05-17 Thread Sailesh Krishnamurthy

Yes, fetching a RID list from an index scan, sorting 'em and then
fetching from the table would be a very appreciable speedup for many
queries. I would imagine that all the commercial engines do this (db2
calls it a sorted RID-list-fetch) .. and this has in fact been
discussed on -hackers before.

One issue for this is that there could be a slip between the cup and
the lip .. ie., between the fetch from the index, the sort, and the
fetch from the table a vaccuum could have taken place rendering TIDs
invalid. This should however not be a show-stopper .. surely all
that's needed is a lock against vaccuuming in the presence of a
tid-list-fetch. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Position available at the Telegraph project

2004-04-19 Thread Sailesh Krishnamurthy

Dear Hackers

I apologize in advance if this posting is construed as spam. 

As many of you are aware, the database research group at UC Berkeley
is building TelegraphCQ, a system for processing continuous queries
over data streams. TelegraphCQ was built with the PostgreSQL source
base.

We are looking for a full-time programmer to work with our group. We
believe that PostgreSQL hackers will be great candidates for such a
position. 

Appended to this email is a full description of the job requirements.

I encourage any hackers interested to apply for the position.

To apply for the position please do the following:

   - Point your web browser to http://jobs.berkeley.edu
   - Click on External applicants
   - Click on View job posting / apply for jobs

   - Check the Information Technology box, Choose Regular from
   Regular/Temporary, fill in telegraph in the Keyword box and hit
   the search button.

   - Click on Programmer/Analyst IV-Ucb, job #001090 

Please note: YOU HAVE TO APPLY FOR THE JOB THROUGH THE WEB SITE .. 


Job Description


Posting Title:  

Programmer/Analyst IV-Ucb
Requisition:

001090
Department:

Electronics Research Lab
Location:

Main Campus-Berkeley
First Review Date:  

03/29/2004

This requisition will remain open until filled.

Job Description:

The focus of the Telegraph research group is on developing a
general-purpose software infastructure for the dataflow applications.
Dataflow applications are prevalent in networked environments where
information from multiple sources must be flowed through various
networked machines and software services for integration and analysis.
The unpredictable nature of decoupled, networked machines and services
requires novel, very adaptive software infrastructures: the design and
study of such infastructure is the goal of the Telegraph project.

The position is in support of the Telegraph project, with a focus on
integration with two other software projects currently underway in the
UC Berkeley database group: TinyDB, a query processing system that
runs on ad hoc networks of wireless sensor motes, and YFilter, a
distributed XML filtering and brokering system.

This position will represent the chief programmer for the Telegraph
project, with a focus on overall system architecture, documentation,
open-source software delivery, and demonstration development and
deployment.  This work will be done in collaboration with the students
and faculty developing the system.  The success of the work will be
best measured by the ability of our group to deploy interesting
functionality with Telegraph, and for others outside of our group to
successfully exploit our open source software in the development of
applications in a wide range of scenarios.  We want to make Telegraph
the system of choice for research groups that are experimenting with
data streaming and advanced dataflow applications.

This project involves a large amount of existing software developed in
the three projects (Telegraph, TinyDB, and YFilter), probably totaling
several hundred thousand lines of code.  The person holding the
position will need to understand database management internals, XML
processing and must be able to interface such systems with external
devices such as sensor networks.

Responsibilities:

Design, implement, test, deliver, release, and maintain the core
Telegraph system as a viable open source dataflow infastructure. This
position will make all client-side design decisions: i.e. they will
decide on user interface design, client-server APIs, and client and
application design (there are a number of client applications as yet
undeveloped which will be the full responsibility of the programmer.)
The position will collaborate with students and faculty on server-side
design decisions.  The position will also be fully responsible for all
public code release and installer issues, for documentation, and for
testing and other quality assurance of both server and client code.

Design, implement, test, deliver and maintain a number of
demonstration applications over Telegraph- including conception,
execution, and deployment at venues such as international research
conferences, departmental meetings, CITRIS functions, etc.

Coordinate with students and faculty to integrate the core Telegraph
code with ongoing research extensions thereof, including participation
in research meetings of the Telegraph group.


Outreach activities to promote the adoption of Telegraph for future
application, including assistance to selected users as well as general
tutorials and presentations.

Requirements  Qualifications:

The position requires a broad-based skill and experience level with
the ability to quickly grasp the intricacies of complex software
systems.  Position 

Re: [HACKERS] linked list rewrite

2004-03-23 Thread Sailesh Krishnamurthy

Re changing APIs or not.

I agree with Tom that an incremental change is easier. More
importantly, it is also easier to test your implementation.

Even if everybody agrees that the API should be changed, IMO a better
way would be to first use your implementation with the old API and go
through some thorough testing to make sure it is AOK. Once we are
certain (or as certain as we can be) about that, you can changeover to
the new API. This way, when bugs show up we won't be left wondering if
its because of the new API interacting with some hidden assumption at
the call-sites or if it is only the implementation. 

One step at a time seems safer to me.

I would think a new CVS branch would be the right way to merge your
stuff especially if you are going with the API change route. That way
you won't have to lock the tip and can incrementally work with a
broken branch until it is fine (and keep re-merging).

Which brings me to another question .. has anybody considered using
subversion instead of CVS ? 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Chapter on PostreSQL in a book

2004-03-23 Thread Sailesh Krishnamurthy

Hackers

Along with some other folks I am co-authoring a chapter on PostgreSQL
in the next edition of Database System Concepts by Silberschatz,
Korth and Sudarshan (http://db-book.com)

This is in the form of a case study and will follow the pattern in
Chapters 25,26 and 27 (Part 8 in the Table of Contents available at
the above URL) that deal with Oracle, DB2 and MS SQL Server
respectively.

There are three sections, though, that are non-engine specific: 

 - Database Design and Querying Tools
 - Replication, Distribution and External Data
 - Database Administration Tools

I'm at a loss as to how much we should focus on these sections. Do we
use what's in GBorg ? Do the hackers have any suggestions ? 

There are 33 DBA tools and 19 Design tools in GBorg .. are there any
specific tools that are recommended ? 

Suggestions appreciated ..

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] subversion vs cvs

2004-03-23 Thread Sailesh Krishnamurthy
 Marc == Marc G Fournier [EMAIL PROTECTED] writes:

Marc On Tue, 23 Mar 2004, Sailesh Krishnamurthy wrote:
 Which brings me to another question .. has anybody considered
 using subversion instead of CVS ?

Marc Why?  not that I'm for a chance from something that isn't
Marc broken, but what advantages does subversion give us over
Marc what we already have?

I've had plenty of pain with cvs in terms of directories not being
first-class etc .. but I don't really contribute to pgsql so you guys
probably don't have the same experience. 

I was just curious as it looks like eventually subversion (or arch :-)
will be an alternative to cvs. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during

2004-03-21 Thread Sailesh Krishnamurthy
 Kevin == Kevin Brown [EMAIL PROTECTED] writes:

 The bigger problem though with this is that it makes the
 problem of list overflow much worse.  The hard part about
 shared memory management is not so much that the available
 space is small, as that the available space is fixed --- we
 can't easily change it after postmaster start.  The more finely

Again, I can suggest the shared memory MemoryContext we use in
TelegraphCQ that is based on the OSSP libmm memory manager. We use it
to grow and shrink shared memory at will.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Reducing expression evaluation overhead

2004-03-16 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom The idea I was toying with is to generate, not x = y with
Tom repeated copies of x, but placeholder = y where placeholder
Tom is a dummy expression tree node.  Then at runtime, the CASE
Tom code would evaluate the test expression once and save it into
Tom the econtext so the dummy node(s) could regurgitate that
Tom value.  We already have exactly such a mechanism in place to
Tom handle the VALUE keyword for domain check constraints; it'd
Tom be easy to duplicate it for CASE.

That is exactly what I was proposing. I implemented something like
this in TCQ and used a Const expression tree node. This was for
something we call grouped filters where we build an index on
predicates from multiple queries. So if you have a bunch of queries
(say ~1000) each with a predicate R.a ??? xxx where ??? is one of
..=,=,= then we evaluate using the predicate index which queries
fail for each incoming tuple. 

In a separate experiment we found that ExecEvalVar is particularly
expensive for us .. this is because we have an IntermediateHeapTuple
data structure to represent join tuples (in our framework, join orders
are not fixed) .. the IHT has a set of pointers to the constituent
tuples. This means that we have to do more work in ExecEvalVar
.. essentially one more lookup into the IHT. All this was only
possible because you guys kept around the varnoold and the attnoold !!

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Reducing expression evaluation overhead

2004-03-15 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom I'm not sure that this would let us catch up to what Arjen
Tom reports as MySQL's expression evaluation speed, but it should
Tom at least speed things up a bit with only fairly localized
Tom changes.

I like the idea of memoizing the switch with function pointers as I
don't think branch prediction helps much with varying switch arms
selected with different exprs. Also I agree that the delta of indirect
function invocation is probably small.

I've forgotten the syntax of case, but for the simple form isn't
expr=const going to be the same expr for each case arm ? If that's the
case, couldn't we actually save the value of expr in a Datum and then
reuse that (through a Const) in each of the other arms to evaluate the
actual exprs ? That should reduce the number of times ExecEvalVar (and
through it heapgetattr) are called. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Performance and WAL on big inserts/updates

2004-03-11 Thread Sailesh Krishnamurthy
 Marty == Marty Scholes [EMAIL PROTECTED] writes:

Marty Why have I not seen this in any database?
Marty There must be a reason.

For ARIES-style systems, logging parsed statements (commonly called
logical logging) is not preferred compared to logging data items
(physical or physiological logging). 

A major reason for this is that logical logs make recovery contingent
on being able to execute the parsed statements. This execution
might, however, not be possible if the system is itself not in a
consistent state .. as is normally the case during recovery. 

What if, for instance, it's the catalog tables that were hosed when
the system went down ? It may be difficult to execute the parsed
statements without the catalogs. 

For this reason, a major goal of ARIES was to have each and every data
object (tables/indexes) individually recoverable. So ARIES follows
page-oriented redo logging.

Having said that, page-oriented undo logging can be a pain when B-tree
pages split. For higher concurrency, ARIES uses logical undo
logging. In this case, the logs are akin to your parsed statement
idea.

In any case, the only place that parsed statements are useful, imo are
with searched updates that cause a large number of records to change
and with insert into from select statements.

Then, there is also the case that this, the parsed statements
approach, is not a general solution. How would you handle the update
current of cursor scenarios ? In this case, there is some application
logic that determines the precise records that change and how they
change. 

Ergo, it is my claim that while logical redo logging does have some
benefits, it is not a viable general solution.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Performance and WAL on big inserts/updates

2004-03-11 Thread Sailesh Krishnamurthy

(Just a note: my comments are not pg-specific .. indeed I don't know
much about pg recovery).

 Marty == Marty Scholes [EMAIL PROTECTED] writes:

Marty If the DB state cannot be put back to a consistent state
Marty prior to a SQL statement in the log, then NO amount of
Marty logging will help.  The idea is that the state can be put
Marty back to what it was prior to a particular log entry, be it
Marty raw datafile blocks or a SQL statement.

The point is that with redo logging, you can just blindly apply the
log to the data pages in question, without even really restarting the
database.

Note that in ARIES, recovery follows: (1) analysis, (2) redo
_everything_ since last checkpoint, (3) undo losers. 

So logging carefully will indeed help get the system to a consistent
state - actually after phase (2) above the system will be in precisely
the state during the crash .. and all that's left to do is undo all
the live transactions (losers). 

BTW, logging raw datafile blocks would be pretty gross (physical
logging) and so ARIES logs the changes to each tuple in logical
fashion .. so if only one column changes only that value (before and
after images) are logged. This is what's called physiological
logging.

Marty See above.  If this cannot be resolved prior to
Marty re-executing a statement in the log, then the problem is
Marty beyond ANY subsequent logging.

Not true ! By applying the log entries carefully you should be able to
restore the system to a consistent state. 

 Having said that, page-oriented undo logging can be a pain when
 B-tree pages split. For higher concurrency, ARIES uses logical
 undo logging. In this case, the logs are akin to your parsed
 statement idea.
 

Marty Yes, my experience exactly.  Maybe we are the only company
Marty on the planet that experiences this sort of thing.  Maybe

Well, logical undo is still at a much lower level than parsed
statements. Each logical undo log is something like delete key 5 from
index xyz. 

Marty Maybe this is not a traditional RDBMS app, but I am not
Marty in the mood to write my own storage infrastructure for it.

I agree that your app has a lot of updates .. it's just that I'm not
convinced that logical logging is a clean solution. 

I also don't have a solution for your problem :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

2003-12-18 Thread Sailesh Krishnamurthy

We have a rude hack of temping hashed aggs to disk to deal with the
case where there is not enough memory. I don't think that's an ideal
solution, but it certainly has the code to dump to file. I can post
the patch later in the day ..

(This is some code for our undergrad db class assignment. I was
planning to clean it up and submit it properly but I never got the
time)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] PostgreSQL port to pure Java?

2003-12-09 Thread Sailesh Krishnamurthy

We remain sceptical about writing an RDBMS in Java. The earlier
version of TelegraphCQ was in Java and turned out to be a bit of a
pain. 

Some more information:

Mehul A. Shah, Samuel Madden, Michael J. Franklin, Joseph
M. Hellerstein: Java Support for Data-Intensive Systems: Experiences
Building the Telegraph Dataflow System. SIGMOD Record 30(4): 103-114
(2001)

Apart from our group, the database research group in Wisconsin also
rewrote their Niagara system from Java to C++.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] relation_byte_size()

2003-12-03 Thread Sailesh Krishnamurthy

Hackers

Here is the definition of relation_byte_size() in optimizer/path/costsize.c:

--
/*
 * relation_byte_size
 *Estimate the storage space in bytes for a given number of tuples
 *of a given width (size in bytes).
 */
static double
relation_byte_size(double tuples, int width)
{
return tuples * (MAXALIGN(width) + MAXALIGN(sizeof(HeapTupleData)));
}

--

Shouldn't this be HeapTupleHeaderData and not HeapTupleData ? 

(Of course, from a costing perspective these shouldn't be very different but ...)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Materialized views proposal

2003-11-30 Thread Sailesh Krishnamurthy
 Hannu == Hannu Krosing [EMAIL PROTECTED] writes:

Hannu Neil Conway kirjutas P, 30.11.2003 kell 02:18:
 Jonathan Gardner [EMAIL PROTECTED] writes:  3) We
 would implement some sort of differential view update scheme 
 based on the paper Efficiently Updating Materialized
 Views[1].

Hannu Maybe the TelegraphCQ engine can give some ideas

Hannu http://telegraph.cs.berkeley.edu/

Hannu from brief reading, it seems that all they do is kind of
Hannu materialized views ;) - they call it Continuous Dataflow
Hannu Processing, i.e. queries that run continuously over
Hannu incoming data.


A fair portion of the community argues that everything that we do is
merely materialized views :-)

While materialized views are certainly related, there are however,
things that are quite different in processing continuous queries over
data streams. The first ideas on stream processing did come from work
on scalable triggers which is intimately related to materialized
views. Work on mviews isn't really concerned with things like windowed
joins and aggregate processing.

My advice is to take a very incremental approach to materialized view
implementation. There are essentially 2 pieces - one the maintenance
of the views and second the routing of queries to automagically use
the materialized views. The former is done by something like triggers
and the latter is through either a query-rewrite mechanism or
something like an optimizer choosing a different access method. 

As for the former, there are again two subdivisions - what changes you
propagate and when you apply 'em. What you propagate are the
deltas. So you have the following choices:

IPIA - Immediate Propagate, Immediate Apply 
IPDA - Immediate Propagate, Deferred Apply
DPDA - Deferred Propagate, Deferred Apply

DPIA - makes no sense ..

The easiest is DPDA .. in this model you essentially recompute the
view on demand. The next is IPIA.. with IPIA, you can choose to either
recompute the entire view or only figure out how to translate a delta
into appropriate changes in the mview. After that there is IPDA
.. this involves creating a delta table that is populated based on the
triggers. Every so often you process a set of deltas and recompute the
view (either in its entirety or in parts

Another way to slice the problem is to limit the scope of queries that
can be used to define views. 

I suggest that first we don't consider joins .. and only consider
grouped aggregates over a single table. Joins are more hairy with
changes in one table potentially requiring a wholesale recomputation
of the join. 

Note that all this discussion is only about the first part of the
problem .. maintenance of the views. The other part, routing is almost
equally complicated .. there you have to solve the query subsumption
problem without also eliminating choices in your access plan. 

As people have said there are plenty of papers on this in the
literature. While I am no theorist I can certainly help with reading
the papers .. not every bit of a paper is very useful. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] detecting poor query plans

2003-11-26 Thread Sailesh Krishnamurthy
 Neil == Neil Conway [EMAIL PROTECTED] writes:

Neil It occurred to me that these kinds of poor planning
Neil decisions could easily be detected by PostgreSQL itself:
Neil after we've finished executing a plan, we can trivially
Neil compare the # of results produced by each node in the query
Neil tree with the # of results the planner expected that node to
Neil produce (look at EXPLAIN ANALYZE, for example). If the

Indeed. This is the approach being followed by the LeO project
(Learning Optimizer) at IBM Almaden. 

http://www.almaden.ibm.com/software/dm/SMART/leo.shtml

There is a vldb paper that describes it .. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the

2003-11-19 Thread Sailesh Krishnamurthy
 Greg == Greg Stark [EMAIL PROTECTED] writes:

Greg I think you're talking about situations like where x = ? or
Greg y = ?  or where x = ? and y = ?

Greg When both `x' and `y' are indexed. It's possible to do the
Greg index lookup, gather a list of tid pointers in some
Greg efficient format like a bit vector, and apply the and/or and
Greg any other clauses.

Yes, Index ANDing/ORing are useful whether or not the list of tids are
in an efficient format. Especially ORing for performing disjunctions. 

Greg Oracle can do this, and it's useful in some cases when you
Greg have DSS-style where all the indexes have poor selectivity
Greg but using enough of them together gets you a reasonable
Greg number of records.

I guess this is the piece where variant indexes is useful -
essentially when you have a large number of matches for a given key.

I'm not sure how useful it is in practice - I've only read the
original research paper. 

Greg I think this is different from what he meant, but yes,
Greg bitmap indexes might be an interesting project. Like hash

You're right .. a sorted TLF is really something quite simple that can
make quite a difference in accessing a non-clustered index. I believe
this is something all the commercial guys do. Sorting the Tids before
fetching 'em buys you buffer cache locality. When there are large
numbers of hits, it also buys you sequential scans where the file
system prefetcher can help. The additional overhead you pay is the
sorting cost. 


-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the

2003-11-19 Thread Sailesh Krishnamurthy
 Mike == Mike Mascari [EMAIL PROTECTED] writes:

Mike Robert Treat wrote:

 While some form of bitmapped indexing would be cool, other ideas might
 be to implement different buffer manager strategies. I was impressed by
 how quickly Jan was able to implement ARC over LRU, but there are a host
 of other strategies that could also be implemented. 

We already do that ! 

We have a first warm-up assignment for which they get 2 weeks and
have to change the strategy to MRU from LRU (in an earlier semester
they were assigned 2Q). The idea here more to just get used to the
code and the debugger. 

Sadly the undergraduate OS class uses Java (horrors) as an
implementation language and many of our juniors and seniors are not as
uncomfortable with C programming (and pointers) as I'd like. The good
news is that they all pretty much got into the groove fast. 


Re PITR, maybe that's an option - the thing is we are looking less at
a full semester long project and more at a 3/4 week assignment where
students get to hack something, learn about the practical side to
what's in lecture, and learn to do some performance comparisons. 

Mike If you go to sequential access of larger chunks of the disk, you will
Mike get 500 times more bandwidth—you can read or write the disk in a day.
Mike So programmers have to start thinking of the disk as a sequential
Mike device rather than a random access device.

Mike Isn't a TID-List-Fetch implementation a crucial first step in the
Mike right direction?

I believe so .. I think it's a clear win. I believe there are some
concurrency issues although I'm not sure .. what if there is a vaccuum
that comes in between building the Tid list and then doing a fetch ? 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the

2003-11-19 Thread Sailesh Krishnamurthy
 Robert == Robert Treat [EMAIL PROTECTED] writes:

Robert allowing indexes for searching nulls, or adding
Robert concurrency to GIST, or allowing non btree indexes to

Oh this has come up before on -hackers and I've been meaning to chime
in. 

Marcel Kornacker did implement concurrency for GiST - I confirmed as
much with Joe Hellerstein (his advisor). I know there's a paper he
wrote with C.Mohan on it. I don't know which version his
implementation was for.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the

2003-11-18 Thread Sailesh Krishnamurthy

PostgreSQL most definitely works great on Solaris x86 !

At UC Berkeley, we have our undergraduate students hack on the
internals of PostgreSQL in the upper-division Introduction to
Database Systems class ..

http://www-inst.eecs.berkeley.edu/~cs186/

The official platform is Solaris x86 - that's where the students get
accounts and they have to get their code working on that platform as
the TAs will only test and grade their submissions on Solaris x86.

(Besides, I also got TelegraphCQ running on Solaris x86 .. just for
kicks though .. and TelegraphCQ is based off of pgsql-7.3.2)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the

2003-11-18 Thread Sailesh Krishnamurthy
 Mike == Mike Mascari [EMAIL PROTECTED] writes:

Mike Robert Treat wrote:
 http://www-inst.eecs.berkeley.edu/~cs186/hwk0/index.html
 
 Are these screenshots of PgAccess on Mac OSX?

Yup .. that's from Joe Hellerstein, who was the instructor in the
Spring when I was a TA.

Mike It's pretty sad that Mike Stonebraker only has a salary of
Mike $15,000.  ;-)

We've got to do something now that he's a competitor :-)

Mike I also thought this SIGMOD article was a nice read:

Mike http://www.acm.org/sigmod/record/issues/0309/4.JHdbcourseS03.pdf

That's right .. it was a fun experience and I like to think that the
students enjoyed it. The best part was that we got to pick up 3 great
undergraduates for our research - and pgsql hacking experience is
invaluable in hacking TelegraphCQ. 

Mike How about extra credit for PITR?

One step at a time :-)

Actually a big problem is figuring out new pieces for the
projects. Most of the items in the TODO list are way too much for a
class project - we gave 'em 3 weeks to make the Hash GroupedAgg work
for large numbers of unique values (by using a form of hybrid hashing).

Another thing I toyed with was having an implementation of a
Tid-List-Fetch .. sorting a TID-list from an index and fetching the
records of the relation off the sorted list for better IO
performance. AFAICT something like this isn't present yet .. can pgsql
do this already ?

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Is there going to be a port to Solaris 9 x86 in the

2003-11-18 Thread Sailesh Krishnamurthy
 Chris == Christopher Kings-Lynne [EMAIL PROTECTED] writes:

 PostgreSQL most definitely works great on Solaris x86 !  At UC
 Berkeley, we have our undergraduate students hack on the
 internals of PostgreSQL in the upper-division Introduction to
 Database Systems class ..
 http://www-inst.eecs.berkeley.edu/~cs186/

Chris Hi Sailesh,

Chris You know what would be kind of cool?  If you could write a
Chris Guide to PostgreSQL to Teach Databases.

Chris eg. You could cover how to set up the server securely
Chris (eg. schemas for each person), etc.

Chris How to manage it all, handle upgrades, etc.  Mention what
Chris things are good to get students to hack on in the
Chris internals, etc.

Chris Could be a good techdocs.postgresql.org article.

Hmm .. this is probably a good idea. I ought to do it for the benefit
of future TAs here anyway - but I wasn't thinking on the lines of a
full-fledged article .. just burnt out with too much writing :-)

I'm just underwater until the end of the semester. If I don't come
through by the end of december do ping me. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Sailesh Krishnamurthy

DB2 supports cooked and raw file systems - SMS (System Manged Space)
and DMS (Database Managed Space) tablespaces. 

The DB2 experience is that DMS tends to outperform SMS but requires
considerable tuning and administrative overhead to see these wins. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] O_DIRECT in freebsd

2003-10-30 Thread Sailesh Krishnamurthy
 Jordan == Jordan Henderson [EMAIL PROTECTED] writes:

Jordan significantly better results.  I would not say it requires
Jordan considerable tuning, but an understanding of data, storage
Jordan and access patterns.  Additionally, these features did not
Jordan cause our group considerable administrative overhead.

I won't dispute the specifics. I have only worked on the DB2 engine -
never written an app for it nor administered it. You're right - the
bottomline is that you can get a significant performance advantage
provided you care enough to understand what's going on. 

Anyway, I merely responded to provide a data point. Will PostgreSQL
users/administrators care for additional knobs or is there a
preference for keep it simple, stupid ?

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-22 Thread Sailesh Krishnamurthy
 Josh == Josh Berkus [EMAIL PROTECTED] writes:

 Relational is all about theory and proving things
 mathematically correct. MV is all about engineering and
 getting the result. And if that means pinching all the best
 ideas we can find from relational, then we're engineers - of
 course we'll do it :-)

Josh Relational is all about preserving the *long-term*
Josh integrity, utility, and accessability of your data.  MV is
Josh all about getting an expedient result immediately and to
Josh heck with the future.

To emphasize the *long-term* - relational is all about separating
physical database representation from a logical view of the data. 

Josh, I'll be happy to meet at Intermezzo - after Nov 24 perhaps. I
have a conference deadline .. the papers that we write just to secure
funding - your tax dollars at work. Long live the NSF ! 

BTW, I'm not necessarily that far from your views. There is, however,
more to an XML database than just storing data - relational databases
do just fine there. The tools that are used to build database systems
have other uses - for instance XML message brokers. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-19 Thread Sailesh Krishnamurthy
 Josh == Josh Berkus [EMAIL PROTECTED] writes:

This is an unfair characterization of XML databases, and I can say
this without accusations of bias for I vastly prefer working with the
relational model. 

Josh Actually, amusingly enough, there is a body of theory
Josh backing XML databases, but it's not one any current devloper
Josh would want to admit to: the XML database is functionally
Josh identical to the Network Databases of the 1960's.  Of
Josh course, Network Databases gave way, historically, to
Josh Relational for good reason.

If you look at the academic research work, there have been gazillions
of recent papers on XML database technology. All the major database
vendors (Oracle, IBM and Microsoft) are investing fairly heavily in
core-engine XMLDB technology. 

Finally, while it is true that some of XML db technology is evocative
of network databases, XML databases are certainly more than network
databases. For one, they are semi-structure .. in addition they
present query language access to their data (although I'm not a big
fan of XQuery).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-19 Thread Sailesh Krishnamurthy
 Christopher == Christopher Browne [EMAIL PROTECTED] writes:


Christopher Ah, but do papers honestly indicate the emergence
Christopher of some underlying theoretical model for which
Christopher fidelity could be evaluated?

Certainly. The model is that of semi-structured data, where often
times schema isn't very clearly defined. It's a different model from
the relational model - which I'm partial to. There are situations
where the XML model does make sense. 

Christopher Or merely that academics are looking to write papers
Christopher on whatever topics can attract research funding?

Bash academics if you want. The truth is that industry is also working
on it. 

As I said before, I have no axe to grind in this. I might be in
academia now, but frankly I couldn't give a toss about XML.

Christopher Half the articles in SIGOS have been about pretend
Christopher applications of Java to operating systems; why does
Christopher it seem likely that the database academics are any
Christopher less affected by this?

I think you are looking at the wrong publishing location :-) The
premier venue for the OS community are the SOSP and OSDI
conferences. Please look at the SOSP04 papers - you'll find fairly
good systems work.

BTW, I'm not sure which database papers you read - the premeer venues
for database systems work are the SIGMOD and VLDB conferences. 

Christopher CODASYL had a query system, albeit something that
Christopher looked more like assembly language than anything

Please take a fair look at the XQuery data model and the XQuery
language before comparing it with CODASYL. I will not admit (at least
in public :-) to being a big fan of XQuery but that is because of
certain details, not anything fundamental. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom I tend to agree with the opinion that Oracle's architecture
Tom is based on twenty-year-old assumptions.  Back then it was
Tom reasonable to assume that database-specific algorithms could
Tom outperform a general-purpose operating system.  In today's
Tom environment that assumption is not a given.


In fact: 

   Michael Stonebraker: Operating System Support for Database Management. 
   CACM 24(7): 412-418 (1981)

   Abstract: 

 Several operating system services are examined with a
 view toward their applicability to support of database
 management functions. These services include buffer pool
 management; the file system; scheduling, process
 management, and interprocess communication; and
 consistency control.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] PostgreSQL not ACID compliant?

2003-09-26 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom AFAIK, no commercial database does predicate locking either,

True .. 

Tom so we all fall short of true serializability.  The usual
Tom solution if you need the sort of behavior you're talking
Tom about is to take a non-sharable write lock on the table you
Tom want to modify, so that only one transaction can do the

Not really. If you have B+-tree indexes on the table you can get by
with key-value locking (as in ARIES/KVL) and achieve some of the
effects of predicate locking to get true serializability without
losing too much concurrency. While this falls short in the general
case, it turns out to be pretty acceptable normally (when indexes are
present).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] this is in plain text (row level locks)

2003-07-30 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Personally I'd feel more comfortable with a shared-lock
Tom approach, if we could work out the scalability issues.  Dirty
Tom reads seem ... well ...  dirty.

Tom

I was going to do some experiments to measure the costs of our shared
memory MemoryContext.

Is there something specific you'd be interested in ? My first goal is
mainly to measure the amount of time it takes to allocate and
deallocate shared memory in the contested and uncontested cases.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] this is in plain text (row level locks)

2003-07-24 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom That doesn't work, unless you insist that the first backend
Tom can't exit its transaction until all the other ones are done.
Tom Which introduces its own possibilities for deadlock --- but
Tom even worse, how does the first backend *know* that the other
Tom ones are done?  You're right back where you started: it has
Tom to be possible to tell which backends have share-locked a
Tom particular row.

Is a count a solution ? 

The first backend gets the S lock on the row - I'm assuming you plan
to do it by recording it on the tuple and not in a shared memory lock
table, which means that you might have to unnecessarily write an
unmodified page if its buffer pool frame is stolen. 

The problem is that on commit time, you must carefully decrement the
count value of shared locks on any tuple that you own. This can be
accomplished by having each backend keep track of the list of files
and TIDs for any rows for which it acquired S locks. Is this the same
way that pgsql releases the X locks ? 

Bruce, I don't disagree that MVCC has the very nice property that
writers don't block readers. However, I don't buy that 2-phase
locking, with lock escalation is either unworkable because of too many
locks, or causes any extra pain for the user application (apart from
the fact that writers not blocking readers gives you more concurrency
at some very minor overhead of not being strictly serializable).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] this is in plain text (row level locks)

2003-07-24 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Rod Taylor [EMAIL PROTECTED] writes:
 It may be best to have a locking manager run as a separate
 process.  That way it could store locks in ram or spill over to
 disk.

Tom Hmm, that might be workable.  We could imagine that in place
Tom of the HEAP_MARKED_FOR_UPDATE status bit, we have a this row
Tom is possibly locked hint bit.  Only if you see the bit set do
Tom you need to query the lock manager.  If the answer comes back

Why do you want to query the lock manager as a separate process ? 

Why not have the traditional approach of a lock table in shared
memory, growing and shrinking as appropriate, and have each individual
process update it (need to protect it with a latch of course).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] this is in plain text (row level locks)

2003-07-24 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 Why not have the traditional approach of a lock table in shared
 memory, growing and shrinking as appropriate,

Tom Because we can't grow shared memory.  Whatever size we get at
Tom startup is what we're stuck with.  (I suppose we could try
Tom asking the kernel for additional segments, but there's every
Tom likelihood that that will fail.)

We implemented a Shared Memory MemoryContext using OSSP libmm (used in
Apache) for TelegraphCQ. 

If you think it's useful I can submit it as a patch.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] this is in plain text (row level locks)

2003-07-24 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 We implemented a Shared Memory MemoryContext using OSSP libmm
 (used in Apache) for TelegraphCQ.

Tom How portable is that?  What kind of real-world performance do
Tom you get?

As portable as libmm I guess. We've only tried it on Linux and MacOS
X. I understand that the Apache guys use it - so it should be fairly
portable. 

http://www.ossp.org/pkg/lib/mm/

Actually I've also tried it successfully on SPARC Solaris. 

I believe that a user of our first TelegraphCQ release (that we did
not announce) tried it on FreeBSD. Sean Chittenden has tried it on
FreeBSD but I'm not sure if he built it and got it to work properly.

As for our SHMemoryContext code itself, it's pretty much a straight
copy of the code in aset.c and mctx.c, with appropriate changes where
we call the mm_alloc functions and also use mm_lock and mm_unlock for
protection. 

As for performance, I'm in the midst of a performance study, but it's
not a micro-benchmark of the shmem allocator. After I get some numbers
I'll start profiling our code.

One problem is that on some platforms the locking implementation uses
the file-system. This is not in our control and we just trust mm to do
the best possible. 

Given that we rely on shared query execution (we run multiple queries
in a single adaptive query plan) we don't have much choice - we must
rely on shared memory.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] this is in plain text (row level locks)

2003-07-24 Thread Sailesh Krishnamurthy
 Bruce == Bruce Momjian [EMAIL PROTECTED] writes:

 -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh

Bruce  ^^^ Watch out, that code from
Bruce Berkeley usually is a mess.  :-)

LOL ! 

That's why we release the code - in the hope of getting real hackers
playing with it :-)

Bruce Seriously, though, it would be good to see what you guys
Bruce are up to.

Our code is based on 7.3.2 pgsql and available at:
 
  http://telegraph.cs.berkeley.edu/telegraphcq 

If the specific shm stuff is interesting, I can easily take it out and
submit it. If you prefer to just look at the code in the tarball
that's fine too. 

Or you can browse our cvs repository at:

   http://triplerock.cs.berkeley.edu:8080/viewcvs/viewcvs.cgi/

Look for shmctx.c and shmset.c in src/backend/utils/mmgr 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] this is in plain text (row level locks)

2003-07-23 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Jenny - [EMAIL PROTECTED] writes:
 Iam trying to acquire rowlevel locks in postgresql. I try doing
 this: 'select * from students where name='Larry' for update;
 But by looking at the holding array of proclock , I've noticed
 that by doing this only AccessShareLock gets acquired which is
 a table level lock.

Tom Row-level locks are not recorded in proclock --- they are
Tom implemented by marking the individual tuple on-disk.  If we
Tom tried to record them in shared memory, it'd be very easy to
Tom run out of shared memory, because you could be holding row
Tom locks on a large number of tuples.

Of course, other database systems do this without too much hassle
.. including relying on lock escalation (move up to page/table level
locks) when the number of locks grow too large. 

Does pgsql only record X locks on the individual tuples on-disk or
does it do so for S locks as well ? 

Not that I dislike the idea - Toby Lehman suggested this in his
Ph.D. thesis in the mid-eighties for main-memory databases (where you
don't take the write penalty).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] table-level and row-level locks.

2003-07-22 Thread Sailesh Krishnamurthy

Once more unto the breach -

Could you please abstain from sending HTML email to the list ? 
Many thanks !

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] ANNOUNCEMENT: Availability of TelegraphCQ v0.2 (BETA release)

2003-07-17 Thread Sailesh Krishnamurthy

ANNOUNCEMENT: Availability of TelegraphCQ v0.2 (BETA release)
-

The Telegraph Team at UC Berkeley is pleased to announce the immediate
availability of TelegraphCQ v0.2 (a BETA release). TelegraphCQ is a
system for processing long-running continuous queries over data
streams. TelegraphCQ is implemented as a significant modification to
the PostgreSQL open-source database system. Specifically, TelegraphCQ
v0.2 is based on the PostregreSQL 7.3.2 code-base.

For more information on the Telegraph project please visit:

   http://telegraph.cs.berkeley.edu

To download TelegraphCQ and for more information, please visit:

   http://telegraph.cs.berkeley.edu/telegraphcq

Here is the README.TelegraphCQ from the source distribution:

--
This file contains information on TelegraphCQ 0.2 (a Beta releae). 

What's new (since TelegraphCQ 0.1)
-

- Better stability: Lots of bugs have been squashed

- CSV Wrapper: An easy to use wrapper that accepts simple comma
separated text as input and supports all PostgreSQL and user-defined
types. There is now no need to write C code to develop a new wrapper
if your data source produces textual data.

- Pull Wrappers: If you want to pull (lookup) data from external
sources (such as web pages) based on query predicates, you can use the
new infrastructure for the Pull Wrapper.

- Introspective Query Processing: The dynamic catalog is a new feature
in TelegraphCQ. It lets users query three special purpose internal
data streams to get a continuous view of how the TelegraphCQ system is
behaving.

- Applications: Included with this distribution are a few applications
that we demonstrated at the ACM FCRC/SIGMOD 2003 Conference in San
Diego. We have integrated data from CalTrans (California Department of
Transportation), CHP (California Highway Patrol) and a local network
interface (using the tcpdump utility). We have also included graphical
visualizers that run some interesting queries over data streams from
these sources.

Request
---

If you are using TelegraphCQ for any project (research or otherwise),
we'd like to know ! 

  - If you are making any performance measurements of TelegraphCQ and
  are publishing it or disseminating it in some way, please do let us
  know.

  - We welcome bug fixes and code contributions of any kind. Please
  send all patches to [EMAIL PROTECTED]

Caveat Emptor
-

The current version of TelegraphCQ (0.2) is still a Beta release and
is very much research code. We haven't optimized it for
performance. Also, you will need to create a fresh database to use it
as it will not work against an existing PostgreSQL database. You are
free to use pgdump to export and then import your data from an
existing PostgreSQL database. 

- Archived streams: Note that although TelegraphCQ 0.2 supports
archived streams, we currently do not support access of historical
data. This will be fixed in our next release. 

For more information


Getting Started: 
  
  A TelegraphCQ Getting Started guide can be found in
  docs/TelegraphCQ/gettingstarted.html

  The Getting Started guide also has links to other information such
  as DDL/DML syntax, examples etc. Note that the documentation
  installed in your TelegraphCQ installation directory contains
  documentation for the new syntax changes (CREATE/ALTER/DROP STREAM,
  the WINDOW clause in DML) integrated into the existing PostgreSQL
  documentation.

The Telegraph Project:

  http://telegraph.cs.berkeley.edu

Supported Platforms
---

TelegraphCQ is supported on Linux x86 (various distributions including
Red Hat Linux 8,9 and Debian) and MacOS X 10.2, the platforms on which
it was developed. In addition, it has been minimally tested on the
SPARC Solaris platform. 

In general, TelegraphCQ should run anywhere PostgreSQL runs. However,
there are significant differences in how we use shared memory and
synchronization primitives such as semaphores. This is likely to cause
some hiccups on platforms where we have not tested it yet. We welcome
patches that help in porting TelegraphCQ. 


Please send all comments (and patches) to [EMAIL PROTECTED]

--




-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] find_typedef on Linux

2003-07-15 Thread Sailesh Krishnamurthy

Friends

src/tools/find_typedef has in its prolog:

   # This script attempts to find all typedef's in the postgres binaries
   # by using 'nm' to report all typedef debugging symbols.
   # 
   # For this program to work, you must have compiled all binaries with 
   # debugging symbols.
   #
   # This is run on BSD/OS 4.0, so you may need to make changes.

I'm trying to run find_typedef on Linux and I don't get any output
(I'm on a 7.3.2 base). I tried playing with different options to
objdump (tried --syms instead of --stabs) to no avail. 

Is there something I need to pass to configure to ensure that all
binaries are built appropriately ? That is, something _other_ than
--enable-debug as I always have that anyways. 

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] agg/order-by question

2003-07-12 Thread Sailesh Krishnamurthy

Consider the explain for the following queries ..

sample=# explain select a, count(*) from foo group by a order by a;
   QUERY PLAN
-
 Aggregate  (cost=69.83..77.33 rows=100 width=4)
   -  Group  (cost=69.83..74.83 rows=1000 width=4)
 -  Sort  (cost=69.83..72.33 rows=1000 width=4)
   Sort Key: a
   -  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=4)
(5 rows)
 
sample=# explain select a, count(*) from foo group by a order by a desc;
  QUERY PLAN
---
 Sort  (cost=80.65..80.90 rows=100 width=4)
   Sort Key: a
   -  Aggregate  (cost=69.83..77.33 rows=100 width=4)
 -  Group  (cost=69.83..74.83 rows=1000 width=4)
   -  Sort  (cost=69.83..72.33 rows=1000 width=4)
 Sort Key: a
 -  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=4)
(7 rows)
 

In the first case pgsql doesn't have a Sort on top because the Sort
below the Group produces the right interesting order (using the
System-R term). In the second case however, since the order-by clause
demands desc there is a Sort tagged on on top. 

Now, instead of doing this, isn't it better to just have a similar
plan as in the first case, but just change the lower Sort to be
descending ? It doesn't affect the Group and the Agg in any way ..

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] agg/order-by question

2003-07-12 Thread Sailesh Krishnamurthy
 Bruno == Bruno Wolff, Bruno writes:

Bruno You might try this in 7.4. I am pretty sure a change was
Bruno made a couple of weeks ago to let group by work with either
Bruno sort order. Also hash aggragates have been available for
Bruno quite a while in 7.4. This is a better plan when there are
Bruno only a small number of distinct values.

Gotcha ! Thanks. 

TelegraphCQ is still on the 7.3.2 code base .. after doing one hellish
merge in March, I'm not too eager to do another, although merging more
often is likely going to be less painful.

I knew about the hash-aggregates - we had set spilling of
hash-aggregates to disk for large number of distinct values (with a
crude form of recursive partitioning) as a course project for our
undergraduate database class at Berkeley. When I get some time, I want
to clean up my solution code and contribute it as a patch. I don't
think that will be before the end of summer though.

BTW, some systems prefer sorted grouped-aggregates to hashed
grouped-aggregates - even for small distinct values. How it works is
to just update the running aggregates in place in the sort tournament
tree. The only requirement is to be able to compute aggregates of
aggregates, so that partial aggs for the same distinct values across
different runs can be merged. The advantage is that you get sorted
grouped aggregation for the same cost of unsorted hash-grouped
agg. The disadvantage is that you lose the modularity of the sort.


-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2PC: discussion in comp.arch

2003-07-01 Thread Sailesh Krishnamurthy

From Bill Todd's post:

This is the simple 'two-phase commit, presumed-abort' mechanism.  It has no
problems guaranteeing distributed consistency, but does suffer from the
problem that if the coordinator *never* recovers some of the other nodes may
be left 'in doubt' about the fate of the transaction.  In practice, with
replicated data 'never' recovering is virtually impossible, but just waiting
for some human to replace failed hardware can stall things enough that a
more complex 'three-phase commit' model exists where nodes need not wait for
the coordinator to recover.  Another manner in which that issue is addressed
is by having a 'back-up' coordinator to which coordination responsibility
can be transferred; a third is by having a robust coordinator - e.g., whose
storage can be 'failed over' to another separate system which can continue
operation.

This is exactly what some of us have been saying here. As long as the
co-ordinator is capable of recovering, there is no danger of
in-doubt transactions stalling a subordinate indefinitely. 

While I'm not sure if any database system implements 3PC, certainly
2PC-PA is implemented by most database vendors (and is part of the XA
standard). Presumed Abort (PA) is preferred to Presumed Commit (PC)
because PA involves less log sync operations for the common case of
read-only transactions. 2PC certainly has various real-world
applications, as with the integration of message-queueing systems. 

Alternatively, if there is a plan to support serious federated
capabilities within pgsql (by expanding the current dblink prototype
for instance) the issue of 2PC will come into play if pgsql supports
remote write operations.

Anyways, this is all moot if integration with enterprise transaction
systems is not an important goal of pgsql. If there is no explicit
need for such features amongts users of pgsql, I see no need in
polluting the codebase with unnecessary complexity. 

Let's just not confuse no need for XYZ functionality with XYZ
functionality is lame and can never work in practice.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Two weeks to feature freeze

2003-06-22 Thread Sailesh Krishnamurthy

Let me add my two cents ..

I think something like PostgreSQL needs two test suites - an
acceptance test (to ensure that checkins don't break functionality)
and a regression test suite.

What we call the regression test suite is really an acceptance
test. Tom Lane is absolutely right in asserting that a test suite that
takes a week to run will mean that people won't test at
all. Personally, I can (and have for many years) tolerate acceptance
test suites that take upto an hour. 

The existence of such an acceptance test should not however obviate
the presence of a wider regression test suite. It should be fine to
have an entire suite of regression test buckets take a week to run,
because you only start running them once you have a release candidate
or equivalent. 

Of course, setting up a regression test suite takes effort. There is
no need, however, to spend umpteen amounts of time writing the
buckets. What can be done is to incrementally build it up. So whenever
we have a significant new feature, somebody (preferably not the key
developers) could take the time to set up a set of test cases that try
to test it thoroughly. It's okay if such a test bucket takes 10-15
minutes to run. Then this can get rolled up into the regression suite
while a very small representative test case makes it to the
acceptance test suite. 

Of course, in the open source world, these things take resources and
are not easy to do. 

I certainly think that the base regression test suite is great. We
have clearing the pgsql regression test a checkin requirement for
TelegraphCQ developers as our goal is to not break pgsql
functionality.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Two weeks to feature freeze

2003-06-22 Thread Sailesh Krishnamurthy
 Bruce == Bruce Momjian [EMAIL PROTECTED] writes:

Bruce Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:  The question
 was whether 2PC is useful.  The question wasn't if an 
 unreliable 2PC was useful.
 
 My question is whether there is such a thing as reliable 2PC.
 I sure don't see how you build that.

Bruce Other databases use 2PC --- are you saying none of them are
Bruce reliable?

And they use them for both federated read/write (what you refer to as
distributed access through dblink) and for clustered configurations. 

I'm not sure if I understand Tom's beef - I think he is concerned
about what happens if a subordinate does not respond to a prepare
message. I would assume that the co-ordinator would not let the commit
go through until it has received confirmations from every
subordinate. The application's commit will remain blocked against the
co-ordinator when this takes place.

That said, I agree that 2PC (and variants) is rather heavy weight when
in widely distributed configurations. 

(Although I guess in practice, many people use Presumed Abort and not
vanilla 2PC as PA results in fewer log flushes for read-only
transactions.)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Two weeks to feature freeze

2003-06-22 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 I'm not sure if I understand Tom's beef - I think he is
 concerned about what happens if a subordinate does not respond
 to a prepare message. I would assume that the co-ordinator
 would not let the commit go through until it has received
 confirmations from every subordinate.

Tom No.  I want to know what the subordinate does when it's
Tom promised to commit and the co-ordinator never responds.
Tom AFAICS the subordinate is screwed --- it can't commit, and it
Tom can't abort, and it can't expect to make progress
Tom indefinitely on other work while it's holding locks for the
Tom not-quite-committed transaction.

Okay I understand what you mean now.

AFAIK the general way things happen is that each site has a recovery
procedure that kicks in after a crash. If the co-ordinator crashes
(which could be before or after it sends out COMMIT messages to some
of the subordinates), its recovery manager will bring the system up,
read the log and ready information about all uncommitted transactions
in virtual storage.

If a Xact is in the PREPARE stage it will periodically send a message
to the co-ordinator asking about what happened to the transaction in
question. Once the co-ordinator has come back online it can respond to
the query.

Of course in the case of a co-ordinator going out of action totally
and remaining unconnected this is not a viable solution. 

If you're making the case that 2PC is not viable on very wide area
networks with intermitted connectivity, I agree whole-heartedly. 

That said, 2PC (and its children, PA and PC) have their place, and are
indeed used in many systems. 

For instance, say you are rigging up a message queueing infrastructure
(like MQ-series) to your database (say with NOTIFY), you'd at least
like to have the db act as a co-ordinator with the MQ.

Or the parallel cluster example I gave earlier. Clustered linux boxes
are definitely here although no open-source DBMS offers a parallel
solution.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] [HACKERS] psql

2003-06-20 Thread Sailesh Krishnamurthy
 Nailah == Nailah Ogeer [EMAIL PROTECTED] writes:

Nailah Well here's the thing. Before i was trying to use
Nailah ShmemInitStruct in buf_init.c. The problem with this is
Nailah that you can't shrink or grow shared memory. That is why i
Nailah switched over and just used malloc. So i seem to be in a

We've implemented a Shared Memory MemoryContext in TelegraphCQ. We
used the opensource libmm from the Apache project. Maybe you can try
using it - it's fairly easy to use. The current version in the web is
based off of 7.2 code, but I hope to refresh with a beta based on 7.3
code in the next few weeks.

http://telegraph.cs.berkeley.edu/telegraphcq

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Help needed in testing my code.

2003-05-28 Thread Sailesh Krishnamurthy

Why don't you use elog(LOG, instead of printf ?

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] Help needed in testing my code.

2003-05-28 Thread Sailesh Krishnamurthy
 Srikanth == Srikanth M Srikanth writes:

Srikanth Dear Sir I am new to postgres and dont know the excat
Srikanth procedure of testing my code, okay i will use elog, but
Srikanth please tell me the procedure of testing that.

Srikanth 

As you're a big fan of stealing code, why don't you grep the source
for occurences of elog(LOG or elog(NOTICE. 

elog(NOTICE should send NOTICE messages to the psql client (I think !)
while elog(LOG while only send it to the logfile you might have
specified while starting the postmaster 

Srikanth for example,pls tell what should i do after i add code,
Srikanth what make files should i run and where should i place
Srikanth the new executables.

Well you sould start out by RTFM .. read how to build postgres. I run
make install in my postgres directory (right above src/).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] some more docbook help

2003-03-12 Thread Sailesh Krishnamurthy

Hi Gang

I can sense that I'm _really_ close to getting docbook working, but
not quite there yet. I'm hoping somebody can tell me where I'm
screwing up !

I'm on a Red Hat Linux 7.3 system. Although I do have openjade
installed I was finding it difficult to figure out where to point
stylesheets to postgres, so instead I decided to build openjade (and
so OpenSP) from source and configure it _exactly_ as described in
doc-toolsets.html and doc-build.html in the pgsql-7.3 docs.

Although a motley mix of sgml stuff was initially installed in
/usr/share/sgml, I created a /usr/local/share/sgml directory and
pointed it to openjade while building openjade as follows:

 $ ./configure --enable-default-catalog=/usr/local/share/sgml/catalog

Subsequently I followed all instructions in doc-toolsets.html for the
other packages.

I set DOCBOOKSTYLE to /usr/local/share/sgml before running configure
for postgres .. 

I saw an old usenet message (exchange with Bruce Momjian) which
suggests that the SGML_CATALOG_FILES env. variable must be set to:

 export SGML_CATALOG_FILES=/somewhere/docbook31/docbook.cat

I set DOCBOOKINDEX to be /usr/bin/collateindex.pl 

Now finally I go to the doc/src/sgml directory and try:

gmake admin.html  new.out 2 new.err

This however results in a make error. 

The output file is:

/usr/bin/perl /usr/bin/collateindex.pl -f -g -t 'Index' -i 'bookindex' -o 
bookindex.sgml HTML.index
/usr/bin/perl /usr/bin/collateindex.pl -f -g -t 'Index' -i 'setindex' -x -o 
setindex.sgml HTML.index
openjade  -D . -D ./ref -c /usr/local/share/sgml/catalog -d stylesheet.dsl -i 
output-html -t sgml book-decl.sgml admin.sgml

The error file however starts off with:

Processing HTML.index...
0 entries loaded...
0 entries ignored...
Done.
Processing HTML.index...
0 entries loaded...
0 entries ignored...
Done.
openjade:/usr/local/share/sgml/docbook31/docbook.cat:63:79:E: end of entity in comment
openjade:/usr/local/share/sgml/docbook31/docbook.cat:62:20:E: cannot find data; 
tried /usr/local/share/sgml/docbook31/data, ./data, ./ref/data
openjade:/usr/local/share/sgml/docbook31/docbook.cat:63:79:E: end of entity in comment
openjade:/usr/local/share/sgml/docbook31/docbook.cat:62:20:E: cannot find data; 
tried /usr/local/share/sgml/docbook31/data, ./data, ./ref/data
openjade:book-decl.sgml:1:55:W: cannot generate system identifier for public text 
-//OASIS//DTD DocBook V3.1//EN
openjade:book-decl.sgml:11:0:E: reference to entity BOOK for which no system 
identifier could be generated
openjade:book-decl.sgml:1:0: entity was defined here
openjade:book-decl.sgml:11:0:E: DTD did not contain element declaration for document 
type name
openjade:admin.sgml:5:9:E: there is no attribute ID
openjade:admin.sgml:5:16:E: element BOOK undefined
openjade:admin.sgml:9:7:E: element TITLE undefined

snip whole bunch of similar errors

openjade:/usr/local/share/sgml/docbook31/docbook.cat:62:20:E: cannot find data; 
tried /usr/local/share/sgml/docbook31/data, ./data, ./ref/data
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'STON86'
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'ROWE87'
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'STON87a'
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'STON87b'
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'STON90a'
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'STON89'
openjade:/usr/local/share/sgml/docbook-dsssl-1.77/html/dblink.dsl:203:1:E: XRef 
LinkEnd to missing ID 'STON90b'
gmake: *** [admin.html] Error 1

Does anybody have any suggestions ? Am I doing something horribly
wrong ? 

Many thanks ! 

-- 
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] some more docbook help

2003-03-12 Thread Sailesh Krishnamurthy

Phew .. after some more struggling with docbook I think I finally
managed to get it working. Posting here to help other folks googling
through usenet archives.

My problem was that I had DOCBOOKSTYLE set to /usr/local/share/sgml -
the directory which contained the catalog file. However, inspite of
this environment variable, configure insisted on setting DOCBOOKSTYLE
to /usr/local/share/sgml/docbook31 

I finally fixed it by creating a file
/usr/local/share/sgml/docbook31/catalog as:

CATALOG ../dsssl/catalog
CATALOG docbook.cat
CATALOG ../docbook-dsssl-1.77/catalog

This works fine thankfully. 

Now I have another question - and this is probably incredibly stupid.

I want to generate the PostgreSQL documentation with some extra
add-ons (DDL and DML we have added for streaming data and continuous
long-running queries). I would like the add-ons to show in some
changed fashion - say perhaps in bright red to show that this is new,
not-yet-very-well-tested (:-) functionality specific to our system
(TelegraphCQ) and not part of the base PostgreSQL features.

How would I go about doing something like this in SGML ?

I understand the whole process of SGML documentation as separating
content from presentation. So I imagine I should surround our new
stuff with some kind of tag, that when used with an HTML (or other)
stylesheet generates appropriately changed documentation. 

I hope I am on the right track ! Could somebody help me please ? 

Many thanks ! 

-- 
Sailesh
http://www.cs.berkeley.edu/~sailesh


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Postgres has a bad habit of becoming very confused if the
Tom page header of a page on disk has become corrupted.  In
Tom particular, bogus values in the pd_lower field tend to make

I haven't read this piece of pgsql code very carefully so I apologize
if what I suggest is already present.

One standard solution to handle disk page corruption is the use of
consistency bits.

The idea is that the bit that starts every 256th byte of a page is a
consistency bit. In a 8K page, you'd have 32 consistency bits.  If the
page is in a consistent state, then all 32 bits will be either 0 or
1. When a page is written to disk, the actual bit in each c-bit
position is copied out and placed in the header (end/beginning) of the
page. With a 8K page, there will be one word that contains the
actual bit. Then the c-bits are all either set or reset depending on
the state when the page was last read: if on read time the c-bits were
set, then on write time they are reset. So when you read a page, if
some of the consistency bits are set and some others are reset then
you know that there was a corruption.

This is of course based on the assumption that most disk arms manage
to atomically write 256 bytes at a time. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] set_ps_display on solaris x86

2003-02-13 Thread Sailesh Krishnamurthy

Our students are (unfortunately) on solaris x86 (unfortunate because I
have to do another round of testing before I deploy pgsql code for
them to hack).

Sadly, set_ps_display does not seem to have any effect in solaris
x86. At least ps only reports multiple postmaster processes and
arguments. 

Is there some magic incantation to be chanted to get this to work ? 

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] PG_TEMP_FILES_DIR

2003-02-13 Thread Sailesh Krishnamurthy

Quick question for the group 

I'm assuming that the PG_TEMP_FILES_DIR for BufFile temps is
automatically under the PGDATA directory. Is that correct ?

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] docbook and postgresql

2003-01-24 Thread Sailesh Krishnamurthy

Hi folks

I'm trying to build the documentation for pgsql (so that I can change
it for the stuff we are building) and I'm having trouble finding the
necessary docbook stuff.

I looked at:

   http://developer.postgresql.org/docs/postgres/doc-build.html

checking for onsgmls... onsgmls
checking for openjade... openjade
checking for DocBook V3.1... yes
checking for DocBook stylesheets... /usr/share/sgml/docbook/
checking for sgmlspl... sgmlspl

I set my DOCBOOKSTYLE to /usr/share/sgml/docbook 

However, I don't seem to have anything that looks like nwalsh-modular
in my system (the example in the above web page). As a result I get
this:

sailesh:datafix~/work/postgres/doc/src/sgml: cd
sailesh:datafix~: cd work/postgres/
sailesh:datafix~/work/postgres: cd doc/src/sgml
sailesh:datafix~/work/postgres/doc/src/sgml: gmake admin.html
/usr/bin/perl /usr/share/sgml/docbook//bin/collateindex.pl -f -g -t 'Index' -o 
bookindex.sgml -N
Can't open perl script /usr/share/sgml/docbook//bin/collateindex.pl: No such file or 
directory
gmake: *** [bookindex.sgml] Error 2
sailesh:datafix~/work/postgres/doc/src/sgml: 

So, sorry for the newbie question, but if somebody could point me to
what exactly I need to install, given that configure thinks that I do
have docbook-3.1, I'd be obliged. Thanks !

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Implementing a new Join Algorithm

2003-01-05 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Anagh Lal [EMAIL PROTECTED] writes:
 I am trying to test a new join algorithm by implementing it on
 Postgresql.  It would be great if you could give me some start
 off pointers so as to where all in the source code I will have
 to make changes.

Tom Lots of places ;-).

Tom You will find that a full-text indexing tool is an essential
Tom aid for working with the Postgres source code.  I am partial

I've had great success with cscope. Especially with the XEmacs
interfaces. 

I think in fact the easy part is the executor stuff because it's
nicely localized. Getting the planner to choose a new option is a
little messier. 

As part of the TelegraphCQ project we've implemented what smells like
a symmetric hash join. Actually it's more like an N-way symmetric hash
join. This is done using the new SteM operator which can work in a
classical iterator model plan. With the Eddy operator we can avoid
static dataflows.

Since we were a bit chary of jumping headlong into the optimizer, we
cheated. What we do is let the postgres optimizer do its thing and
produce a plan with Hash Joins. Then we walk the plan and replace Hash
Join nodes with our new SteM nodes. We call our conversion code in
pg_exec_query_string() right after pg_plan_query()

It's more complicated than that because we're actually implementing
the Eddy operator. But if your goal is just to try out your nice new
join algorithm, this would probably work and be a quick fix to get you
started.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] Question about bit.h and bit.c

2003-01-05 Thread Sailesh Krishnamurthy

I have a small nit

Why is it that bit.h is in src/include/utils and bit.c is in
src/backend/lib ? 

I can never for the life of me remember which is in which :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Question about bit.h and bit.c

2003-01-05 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom Sailesh Krishnamurthy [EMAIL PROTECTED] writes:
 Why is it that bit.h is in src/include/utils and bit.c is in
 src/backend/lib ?

Tom Possibly a more interesting question is why haven't we
Tom ditched them both ... AFAICT none of the bit.c routines are
Tom used anymore.

True. I just searched and found the only uses of the bitmask functions
(now greatly expanded) in our code :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Threads

2003-01-05 Thread Sailesh Krishnamurthy
 Shridhar == Shridhar Daithankar [EMAIL PROTECTED] writes:

Shridhar On Saturday 04 January 2003 03:20 am, you wrote:
 I am sure, many of you would like to delete this message
 before reading,  hold on. :-)
 
 I'm afraid most posters did not read the message.  Those who
 replied
 
 Why bother? did not address your challenge:

Shridhar Our challenges may be..;-)

Not having threading does reduce some of the freedom we've been having
in our work. But then we have ripped the process model a fair bit and
we have the freedom of an entirely new process to deal with data
streams entering the system and we're experimenting with threading for
asynchronous I/O there.

However, in general I agree with the spirit of the previous messages
in this thread that threading isn't the main issue for PG.

One thing that I missed so far in the threading thread. Context
switches are (IMHO) far cheaper between threads, because you save TLB
flushes. Whether this makes a real difference in a data intensive
application, I don't know. I wonder how easy it is to measure the x86
counters to see TLB flushes/misses.

In a database system, even if one process dies, I'd be very chary of
trusting it. So I am not too swayed by the fact that a
process-per-connection gets you better isolation. 

BTW, many commercial database systems also use per-process models on
Unix. However they are very aggressive with connection sharing and
reuse - even to the point of reusing the same process for multiple
active connections .. maybe at transaction boundaries. Good when a
connection is maintained for a long duaration with short-lived
transactions separated by fair amouns of time. 

Moreover, in db2 for instance, the same code base is used for both
per-thread and per-process models - in other words, the entire code is
MT-safe, and the scheduling mechanism is treated as a policy (Win32 is
MT, and some Unices MP). AFAICT though, postgres code, such as perhaps
the memory contexts is not MT-safe (of course the bufferpool/shmem
accesses are safe). 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] BITMAP Index support (and other DSS info.)

2003-01-02 Thread Sailesh Krishnamurthy
 Shahbaz == Shahbaz Chaudhary [EMAIL PROTECTED] writes:

Shahbaz There are bound to be people in the academia (grad
Shahbaz students, professors of CS, etc.) on this mailing list,
Shahbaz yet I see few RDBMS courses using postgresql as an
Shahbaz example.  If people still have connections to
Shahbaz universities, it would seem that inviting researchers to
Shahbaz use PGSQL for their experiments will quickly make it
Shahbaz comparable to Oracle/etc.  This would be specifically

At Berkeley, PostgreSQL is used for projects in the upper division
undergraduate database systems class. At least it was used this past
Fall, and we plan to use it in the Spring as well (I will be
TA'ng). The projects involved using pg as a back-end and a small
buffer replacement policy assignment.

http://www-inst.eecs.berkeley.edu/~cs186/

In addition, we (the database systems research group) are using the
PostgreSQL code base (7.2.1) to build TelegraphCQ, our new system to
process continuous queries over data streams. Preliminary paper here:

http://www.cs.berkeley.edu/~franklin/Papers/TCQcidr03.pdf

No, it's not really close to a release yet ... :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])