Re: [HACKERS] Improving count(*)
Seems like a better solution. I can finish the patch pretty soon. I need to contact the original author, who has disappeared, but I'll send it over to you. Sounds good. I wondered where he'd gone to. Still here :-) Just got swamped with too much work that the tablesample patch got paged out. Gavin has a working version of the patch. I can give a hand if need be. Thanks ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Adding a new node to the executor
Hey guys, I am taking my chances of posting a devel doubt on this list (couldn't find any other relevant list). I think I have spent too much time on this and need a fresh pair of eyes to look in it.. So this is my situation: 1. I have added a new field to RangeTblEntry (bool tablesample). 2. I have created a new executor node (this is to do system sampling [Ref: Query sampling thread]). In short, this new node goes over a table block by block, skipping some blocks as required. 3. Have made the appropriate changes to the parser+planner so that the information about a tablesample is passed from the parser to the executor. But the RangeTblEntry received by my executor node does not have the tablesample set, inspite of the fact that I do set it in parse_relation.c:addRangeTableEntry(). I have made the appropriate changes to copyfuncs.c, but still in vain.. Any help would be highly appreciated. I can send a diff if someone is interested. Thanks Varun ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Query Sampling
I assume you realise that Bernoulli sampling is currently possibly using the random() function and setseed() ? Yes, select * from table where random() x, does the job. I can't see why TABLESAMPLE effects a sequential scan *only*, in all cases. I agree that there seems little point in sampling rows from a table when it is already sufficiently restricted that the query could use an index. AFAICS this clause would potentially effect Index and Bitmap scans also, and would be required for full correctness to the standard. As I see it, there are 3 ways of implementing the sample operator: 1. modify node[Seq|Tid|Index|..]scan.c to consider sampling 2. create new nodes for each of the possible scans..sequential, index, tid et al 3. support sequential scan only for sampling. (1) does not seem to be attractive, while (2) is a lot of work to begin with. I was planning to start with (3) and approach to (2) in the long run. I would appreciate your opinion on this. You would need to be careful to sample the whole table though, rather than to follow the temptation to just scan the first X% of it. The start of a table has a tendency to be dead tuples, which was an error that the sampling logic in 7.4 made, so it would be wise to avoid repeating that. Correct. I have that in mind. I'd be willing to lend a hand over the coming year - since 8.1 just went beta we can expect a good few months before the next code deadline. Great! I would appreciate an extra hand. Thanks Varun ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Query Sampling
Hi everybody, I would like to add query sampling support to postgresql (atleast as a part of my project, if someone feels strongly against checking it in the main branch). I have been going over the code and I do see a lot of sampling stuff in backend/commands/analyze.c. However, I plan to add sampling support to the executor, allowing the following types of queries: SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLE BERNOULLI(10) REPEATABLE(5) GROUP BY STORE (This is supported by DB2). For starters I think this should be doable in the executor by cannibalizing nodeSeqscan.c and adding sampling support to it. However I am concerned about the planner optimizations as it might decide to run an index scan (instead of a sequential scan) for a particular base relation. My question is: Is there any easy way of forcing the optimizer to choose sequential scan for a particular relation? (I apologize if this is documented in the planner code as I am still going over it). I would appreciate any other comments. Thanks much, Varun ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org