Re: [HACKERS] Improving count(*)

2005-11-17 Thread Varun Kacholia
  Seems like a better solution. I can finish the patch pretty soon. I need
  to contact the original author, who has disappeared, but I'll send it over
  to you.

 Sounds good. I wondered where he'd gone to.

Still here :-)
Just got swamped with too much work that the tablesample
patch got paged out. Gavin has a working version of the patch.
I can give a hand if need be.


Thanks

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Adding a new node to the executor

2005-09-03 Thread Varun Kacholia
Hey guys,
 I am taking my chances of posting a devel doubt on this list (couldn't find
any other relevant list). I think I have spent too much time on
this and need a fresh pair of eyes to look in it..

So this is my situation:
1. I have added a new field to RangeTblEntry (bool tablesample).

2. I have created a new executor node (this is to do system sampling
 [Ref: Query sampling thread]). In short, this new node goes over a table
 block by block, skipping some blocks as required.

3. Have made the appropriate changes to the parser+planner so that the
information about a tablesample is passed from the parser to the executor. 

But the RangeTblEntry received by my executor node does
not have the tablesample set, inspite of the fact that I do set it in
parse_relation.c:addRangeTableEntry().

I have made the appropriate changes to copyfuncs.c, but  still in vain..
Any help would be highly appreciated. I can send a diff if someone is
interested.

Thanks
Varun

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Query Sampling

2005-08-29 Thread Varun Kacholia
  I assume you realise that Bernoulli sampling is currently possibly using
 the random() function and setseed() ?
Yes, select * from table where random()  x, does the job.
 
 I can't see why TABLESAMPLE effects a sequential scan *only*, in all
 cases. I agree that there seems little point in sampling rows from a
 table when it is already sufficiently restricted that the query could
 use an index.
 AFAICS this clause would potentially effect Index and Bitmap scans also,
 and would be required for full correctness to the standard.

As I see it, there are 3 ways of implementing the sample operator:
1. modify node[Seq|Tid|Index|..]scan.c to consider sampling
2. create new nodes for each of the possible scans..sequential, index, tid et al
3. support sequential scan only for sampling.

(1) does not seem to be attractive, while (2) is a lot of work to
begin with. I was
planning to start with (3) and approach to (2) in the long run.
I would appreciate your opinion on this.

 You would need to be careful to sample the whole table though, rather
 than to follow the temptation to just scan the first X% of it. The start
 of a table has a tendency to be dead tuples, which was an error that the
 sampling logic in 7.4 made, so it would be wise to avoid repeating that.
Correct. I have that in mind.
 
 I'd be willing to lend a hand over the coming year - since 8.1 just went
 beta we can expect a good few months before the next code deadline.
Great! I would appreciate an extra hand.

Thanks
Varun 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Query Sampling

2005-08-27 Thread Varun Kacholia
Hi everybody,
  I would like to add query sampling support to postgresql (atleast as a part of
 my project, if someone feels strongly against checking it in the main branch).
I have been going over the code and I do see a lot of sampling stuff
in backend/commands/analyze.c. However, I plan to add sampling support
to the
executor, allowing the following types of queries:

SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLE
 BERNOULLI(10) REPEATABLE(5) GROUP BY STORE

(This is supported by DB2).

For starters I think this should be doable in the executor by cannibalizing
nodeSeqscan.c and adding sampling support to it.
However I am concerned about the planner optimizations as it might decide
to run an index scan (instead of a sequential scan) for a particular
base relation.
My question is: Is there any easy way of forcing the optimizer to
choose sequential
scan for a particular relation? (I apologize if this is documented in
the planner code
as I am still going over it).
I would appreciate any other comments.

Thanks much,
Varun

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org