Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote On 5 November 2013 14:28, Leonardo Francalanci lt; m_lists@ gt; wrote: Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax index is not usable in a random inputs scenario. Please show

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote So in the use case you describe, the min max index would require a scan of only 25% of the table, not the 80% described earlier for random inserts. In my experience, people wish to keep data for much longer periods and so the percentage of scan required would drop lower than

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Jeremy Harris wrote Surely there's good correlation between IMSI IMEI, so have a separate table to translate one to (a group of) the others, and halve the indexes on your main table? Yes; unfortunately not always both are available; but it's something we are thinking about (it requires logic

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote From our discussions here, IMHO there is a strong case for avoiding btrees completely for larger historical data tables. That isn't something I had even considered as desirable before this conversation but ISTM now that taking that approach will be more fruitful than

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote Can you CLUSTER against a minmax index? Not in this release, at least in my understanding. It's not yet possible to do an ordered fetch, so the cluster scan probably won't work. As per the patch I helped writing, CLUSTER should use the sequential heap scan+sort when it

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Andres Freund-3 wrote On 2013-11-04 11:27:33 -0500, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt; klaussfreire@ gt; wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote Everybody on this thread is advised to look closely at Min Max indexes before starting any further work. MinMax will give us access to many new kinds of plan, plus they are about as close to perfectly efficient, by which I mean almost zero overhead, with regard to inserts

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are trying to achieve and test them against minmax indexes. That way we can

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote On 5 November 2013 09:57, Leonardo Francalanci lt; m_lists@ gt; wrote: While I do believe in testing (since In theory there is no difference between theory and practice. In practice there is), I would like to know the properties of the minmax index before trying it. What

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote Min-max indexes always require a sequential scan of the min-max index itself when querying. I'm worried about the number of heap pages that will be scanned. My understanding is that given the random input, the index will not be selective enough, and will end up requiring a

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote real data isn't truly random Well, let's try normal_rand??? create table t1 as select trunc(normal_rand(100, 50, 3)) as n, generate_series(1, 100) as i; with cte as (select min(n) as minn, max(n) as maxn, i/100 from t1 group by i/100),

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote Well, of course, they're not magic pixie dust. Of course they aren't. I think they can make a difference in a sequential input scenario. But I'm not the one who said that they are fit to solve the problems me and other people are talking about in this thread. Claudio

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Jeff Janes wrote Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index. Maybe even less. Cool!!! Do you have any idea/hint on how I could try and

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Jeff Janes wrote True, but that is also true of indexes created in bulk. It all has to reach disk eventually-- [...] If the checkpoint interval is as long as the partitioning period, then hopefully the active index buffers get re-dirtied while protected in shared_buffers, and only get

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Gavin Flower-2 wrote How about being able to mark indexes: 'MEMORY ONLY' to make them not go to disk and 'PERSISTENT | TRANSIENT' to mark if they should be recreated on machine bootup? I would love that. But: 1) I'd like to make some tests with a memory drive, and confirm that in

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large tables. Prior work by Heikki on Grouped Item Tuples was a way of reducing the size of indexes, yet still allowing uniqueness

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Presumably the data you are inserting isn't actually random. Please describe the use case you are considering in more detail and some view on how frequent that is, with some examples. Once we understand the use case and agree it is important, we might solve problems. Collecting calls data

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
What is the reason for needing such fast access to individual groups of records? Sure sounds like the NSA or similar ;-) Users need to search all calls originated from/to a user or from/to a specific mobile phone to answer/analyze customers' probl... ok, I give up: I work for the NSA ;) In

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
LSM-trees seem patent free I'm no expert, and I gave it just a look some time ago: it looked to me very complicated to get right... and as far as I remember you don't get that much gain, unless you go multi-level which would complicate things further Please somebody advise patent status of

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote The index insertions should be fast until the size of the active part of the indexes being inserted into exceeds shared_buffers by some amount (what that amount is would depend on how much dirty data the kernel is willing to allow in the page cache before it starts suffering

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote Are partitions read-only once time has moved on, or can stragglers show up that need to be inserted into older partitions? You could periodically merge older partitions into larger tables, index those aggregated tables, then transactionally disinherit the old partitions

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Point being: hardware is marching along pretty fast (after 20+ years of stagnation) and it's dangerous (IMO) to make big software investments based on the situation on the ground *today*. Yes, that's a good point. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote You could periodically merge older partitions into larger tables, index those aggregated tables, then transactionally disinherit the old partitions and inherit the new aggregated one. This would keep the value of K down, at the expense of re-writing data multiple times (but

[HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Hi, I don't see much interest in insert-efficient indexes. These are the ones I've found: - LSM-tree (used by Cassandra and SQLite4?) - Y-Tree (http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf ) - Fractal indexes

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Nobody's bothered to add WAL support, let alone do any other real work on them.  The non-btree index types that have been getting love are the ones

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Another point to add: I don't really see btree as a barrier to performance for most of the problems I face.  The real barriers to database performance are storage, contention, and query planning. Ehm that's true for regular OLTP stuff, which I understand is what most (95%?) of people

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
They should, in theory, be faster than btrees -- O(1) not O(log N) page fetches per lookup.  In practice they don't seem to be faster, and nobody's bothered to find out exactly why.  Again, this isn't a terribly encouraging precedent for implementing some other index type that's supposed to

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
I bet you've mis-diagnosed the problem.  Btrees don't have a problem keeping up with 50m records; you're problem is that after a certain point your page cache can't keep up with the pseudo-random i/o patterns and you start seeing faults to storage. [...]  This has nothing to do the btree

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large tables. Prior work by Heikki on Grouped Item Tuples was a way of reducing the size of indexes, yet still allowing uniqueness

Re: [HACKERS] Page Checksums

2011-12-22 Thread Leonardo Francalanci
Agreed. I do agree with Heikki that it really ought to be the OS problem, but then we thought that about dtrace and we're still waiting for that or similar to be usable on all platforms (+/- 4 years). My point is that it looks like this is going to take 1-2 years in postgresql, so it looks

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
I can't help in this discussion, but I have a question: how different would this feature be from filesystem-level CRC, such as the one available in ZFS and btrfs? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
On 21/12/2011 16.19, Stephen Frost wrote: * Leonardo Francalanci (m_li...@yahoo.it) wrote: I can't help in this discussion, but I have a question: how different would this feature be from filesystem-level CRC, such as the one available in ZFS and btrfs? Depends on how much you trust

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
I think what I meant was: isn't this going to be useless in a couple of years (if, say, btrfs will be available)? Or it actually gives something that FS will never be able to give? Yes, it will help you find/address bugs in the filesystem. These things are not unheard of... It sounds to me

Re: [HACKERS] WIP: Collecting statistics on CSV file data

2011-10-18 Thread Leonardo Francalanci
New API AnalyzeForeignTable I didn't look at the patch, but I'm using CSV foreign tables with named pipes to get near-realtime KPI calculated by postgresql. Of course, pipes can be read just once, so I wouldn't want an automatic analyze of foreign tables... -- Sent via pgsql-hackers mailing

Re: [HACKERS] Is there any plan to add unsigned integer types?

2011-09-26 Thread Leonardo Francalanci
  compression is an interesting topic: the guys over at tokudb are making some wild claims...i'm curious if they are real, and what the real tradeoffs are. I don't know how much of the performance they claim comes from compression and how much from the different indexing technique they use

Re: [HACKERS] Your Postgresql 9.2 patch

2011-06-28 Thread Leonardo Francalanci
Leonardo, Your patch: use less space in xl_xact_commit ... has been waiting on an updated version from you for 10 days now. Do you think you're likely to complete it for this CommitFest? I sent an email on the subject:

Re: [HACKERS] Your Postgresql 9.2 patch

2011-06-28 Thread Leonardo Francalanci
I've nearly finished editing prior to commit, so no worries. Thank you, let me know if I can help. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-16 Thread Leonardo Francalanci
With regards to the naming, I think it would be better if we kept XLOG_XACT_COMMIT record exactly as it is now, and make the second record an entirely new record called XLOG_XACT_COMMIT_FASTPATH. That way we retain backwards compatibility. If you'd like to rework like that please,

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-16 Thread Leonardo Francalanci
The important thing is that we retain backwards compatibility with current XLOG_XACT_COMMIT. I'm not worried what we call the other one. Ok, let me see if I got it right: #define XLOG_XACT_COMMIT0x00 should become: #define XLOG_XACT_COMMIT_WITH_INFO 0x00 and I'll add a

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-15 Thread Leonardo Francalanci
We don't need to be in a hurry here. As the reviewer I'm happy to give Leonardo some time, obviously no more than the end of the commit fest. If he doesn't respond at all, I'll do it, but I'd like to give him the chance and the experience if possible. Sorry I couldn't update the patch

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-15 Thread Leonardo Francalanci
On Tue, Jun 14, 2011 at 2:31 PM, Simon Riggs si...@2ndquadrant.com wrote: Well, we certainly have the option to review and commit the patch any time up until feature freeze. However, I don't want the CommitFest application to be full of entries for patches that are not actually being

Re: [HACKERS] use less space in xl_xact_commit patch

2011-06-15 Thread Leonardo Francalanci
On Wed, May 25, 2011 at 3:05 PM, Simon Riggs si...@2ndquadrant.com wrote: Leonardo, can you submit an updated version of this patch today that incorporates Simon's suggestion? Mmmh, maybe it was simpler than I thought; I must be missing something... patch attached How can I test

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-31 Thread Leonardo Francalanci
I think we need a detailed design document for how this is all going to work. We need to not only handle the master properly but also handle the slave properly. Consider, for example, the case where the slave begins to replay the transaction, reaches a restartpoint after replaying some

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-31 Thread Leonardo Francalanci
Well, I sort of assumed the design was OK, too, but the more we talk about this WAL-logging stuff, the less convinced I am that I really understand the problem. :-( I see. In fact, I think nobody thought about restart points... To sum up: 1) everything seems ok when in the wal_level =

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-30 Thread Leonardo Francalanci
Why is it necessary to replay the operation only on the slave? Can we just use XLOG_HEAP_NEWPAGE? Uh, I don't know why but I thought I shouldn't log a page on the master, since all the pages are already there and fsync-ed. But if it makes no harm, I can easily use XLOG_HEAP_NEWPAGE (of

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-27 Thread Leonardo Francalanci
From: Noah Misch n...@leadboat.com - the patch is missing the send all table pages to the standby part; is there some code I can use as base? Nothing comes to mind as especially similar. I guess I have to generate some special log type that is only played by standby servers.

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-25 Thread Leonardo Francalanci
Da: Simon Riggs si...@2ndquadrant.com I can't find a clear discussion of what you are trying to do, and how, just a URL back to a complex discussion on another topic. While trying to write a patch to allow changing an unlogged table into a logged one, I had to add another int field to

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-25 Thread Leonardo Francalanci
commitlog_lessbytes02.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-25 Thread Leonardo Francalanci
Sorry, email sent without body. Fixed some English mistakes. commitlog_lessbytes02.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-20 Thread Leonardo Francalanci
I'll try to sum up what I understood: 1) the standby keeps the lock, so no problem with stray files coming from the unlogged-logged log reply, as the table can't be read during the operation 2) calling ResetUnloggedRelations before ProcArrayApplyRecoveryInfo would remove the problem of the

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-20 Thread Leonardo Francalanci
- the patch is missing the send all table pages to the standby part; is there some code I can use as base? I guess I have to generate some special log type that is only played by standby servers. Maybe I could use log_newpage, but instead of XLOG_HEAP_NEWPAGE I could use something like

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-19 Thread Leonardo Francalanci
On Wed, May 18, 2011 at 04:02:59PM +0100, Leonardo Francalanci wrote: By the time the startup process releases the AccessExclusiveLock acquired by the proposed UNLOGGED - normal conversion process, that relfilenode needs to be either fully copied or unlinked all over again

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-18 Thread Leonardo Francalanci
int counts[1]; /* variable-length array of counts, xinfo flags define length of array and meaning of counts */ Damn, that's much cleaner than what I did. I don't know why I stuck with the idea that it had to be: int array int array ... instead of: int int ... array array ... which

Re: [HACKERS] use less space in xl_xact_commit patch

2011-05-18 Thread Leonardo Francalanci
this is a second version: now using intcounts[1]; /* variable-length array of counts */ in xl_xact_commit to keep track of number of different arrays at the end of the struct. Waiting for feedbacks... Leonardo commitlog_lessbytes00.patch Description: Binary data --

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-18 Thread Leonardo Francalanci
By the time the startup process releases the AccessExclusiveLock acquired by the proposed UNLOGGED - normal conversion process, that relfilenode needs to be either fully copied or unlinked all over again. (Alternately, find some other way to make sure queries don't read the half-copied

[HACKERS] use less space in xl_xact_commit patch

2011-05-16 Thread Leonardo Francalanci
Hi, following the conversation at http://postgresql.1045698.n5.nabble.com/switch-UNLOGGED-to-LOGGED-tp4290461p4382333.html I tried to remove some bytes from xl_xact_commit. The way I did it needs palloc+memcpy. I guess it could be done reusing the memory for smgrGetPendingDeletes. But I

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Leonardo Francalanci
Yes, that seems like a very appealing approach. There is plenty of bit-space available in xinfo, and we could reserve a bit each for nrels, nsubxacts, and nmsgs, with set meaning that an integer count of that item is present and clear meaning that the count is omitted from the structure

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Leonardo Francalanci
I don't think making xinfo shorter will save anything, because whatever follows it is going to be a 4-byte quantity and therefore 4-byte aligned. ups, didn't notice it. I'll splitxinfo into: uint16 xinfo; uint16 presentFlags; I guess it helps with the reading? I mean, instead

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-08 Thread Leonardo Francalanci
On Fri, Apr 22, 2011 at 4:13 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Maybe you should change xl_act_commit to have a separate list of rels to drop the init fork for (instead of mixing those with the list of files to drop as a whole). I tried to follow your suggestion

Re: [HACKERS] Unlogged tables, persistent kind

2011-04-26 Thread Leonardo Francalanci
If that 1% is random (not time/transaction related), usually you'd rather have an empty table. Why do you think it would be random? Heap blocks would be zeroed if they were found to be damaged, following a crash. If you erase full blocks, you have no idea what data you erased; it could

Re: [HACKERS] Unlogged tables, persistent kind

2011-04-25 Thread Leonardo Francalanci
The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It would always preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think it would give a boost in performance for all those cases

Re: [HACKERS] Unlogged tables, persistent kind

2011-04-25 Thread Leonardo Francalanci
The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? --

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-22 Thread Leonardo Francalanci
Maybe you should change xl_act_commit to have a separate list of rels to drop the init fork for (instead of mixing those with the list of files to drop as a whole). I tried to follow your suggestion, thank you very much. Here's a first attempt at the patch. I tested it with: create

Re: [HACKERS] What Index Access Method Functions are really needed?

2011-04-22 Thread Leonardo Francalanci
another question regarding indexes. Sadly I can't find enough info in the documentation. Which of the functions are needed in order for a index to work? All of them. Maybe I completely misunderstood the question, but some functions are optionals, such as amgetbitmap, right?

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-18 Thread Leonardo Francalanci
I think I coded a very basic version of the UNLOGGED to LOGGED patch (only wal_level=minimal case for the moment). To remove the INIT fork, I changed somehow PendingRelDelete to have a flag bool onlyInitFork so that the delete would remove only the INIT fork at commit. Everything works (note the

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-16 Thread Leonardo Francalanci
If the master crashes while a transaction that used CREATE TABLE is unfinished, both the master and the standby will indefinitely retain identical, stray (not referenced by pg_class) files. The catalogs do reference the relfilenode of each unlogged relation; currently, that

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-11 Thread Leonardo Francalanci
But re-reading it, I don't understand: what's the difference in creating a new regular table and crashing before emitting the abort record, and converting an unlogged table to logged and crashing before emitting the abort record? How do the standby servers handle a CREATE TABLE

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-04-09 Thread Leonardo Francalanci
I'm pretty sure we wouldn't accept a patch for a feature that would only work with wal_level=minimal, but it might be a useful starting point for someone else to keep hacking on. I understand. Reading your post at http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php I thought

[HACKERS] switch UNLOGGED to LOGGED

2011-04-08 Thread Leonardo Francalanci
Hi, I read the discussion at http://archives.postgresql.org/pgsql-hackers/2011-01/msg00315.php From what I can understand, going from/to unlogged to/from logged in the wal_level == minimal case is not too complicated. Suppose I try to write a patch that allows ALTER TABLE tablename SET

Re: [HACKERS] Fwd: What do these terms mean in the SOURCE CODE?

2010-11-21 Thread Leonardo Francalanci
Here's my single biggest tip for newcomers to the Postgres source: if you don't use ctags, glimpse, or some other tool that can quickly show you all references to a given identifier, go out and get one. It's one of the easiest ways to learn about things. I guess Eclipse is the best tool

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
On the other hand, if I use a similar test case to my original one (i.e. the tables are much wider) then the query planning takes 1.42 seconds in 9.1 with this patch instead of about 4.7 seconds as we observed it using PostgreSQL 9.0.0. The beginning of the gprof output now looks like

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
but I don't get any gain from the patch... explain time is still around 250 ms. Tried with 9000 partitions, time is still 2 secs. Small correction: I tried with 3000 partitions (FOR i IN 0..3000 ...) and got 250ms with both versions, with 9000 partitions 2 secs (again no gain from the

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
This is going to be dominated by constraint exclusion checking. There's basically no fix for that except a more explicit representation of the partitioning rules. Damn, I knew that was going to be more complicated :) So in which case does this patch help? I guess in a multi-index

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
Hmm, maybe I spoke too soon. With 9000 child tables I get a profile like this: Well, the 9000-table-test-case was meant to check the difference in performance with/without the patch... I don't see the reason for trying to optimize such an unrealistic case. BTW can someone explain to me

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-29 Thread Leonardo Francalanci
Cases with lots of irrelevant indexes. Zoltan's example had 4 indexes per child table, only one of which was relevant to the query. In your test case there are no irrelevant indexes, which is why the runtime didn't change. Mmh... I must be doing something wrong. It looks to me it's not

Re: [HACKERS] Custom aggragation function that creates an array

2010-10-23 Thread Leonardo Francalanci
Before you start coding, have you looked over what's in contrib/intarray ? There's nothing that fulfills my needs there, but I guess it would be the perfect place to watch for code examples! Thank you (I think I made my custom aggregation function work, but I'll look into intarray code to

[HACKERS] Custom aggragation function that creates an array

2010-10-22 Thread Leonardo Francalanci
Hi, I want to write a custom agg function that, given an int4 index, increments the element at index of an array and, at the end, returns the array. The array will always be int4[]. I need it in C, since plpgsql is way slower (and I need to use it in 5M+ rows). I did it, but I also need to

[HACKERS] On the usefulness of hint bits

2010-10-11 Thread Leonardo Francalanci
Hi, I was wondering what is the advantage of having hint bits for OLAP -style workloads, that is when the number of transactions is not that high. If I got it right, in 10 pg_clog pages we can store the status for more than 32 transactions. That's a lot, in a very small space (80KB?). So I

Re: [HACKERS] On the usefulness of hint bits

2010-10-11 Thread Leonardo Francalanci
Reduction of contention for pg_clog access, for one thing. If you read the archives, you'll find that pg_clog access contention has been shown to be one cause of context swap storms. Having to go to clog for every single tuple access would make that orders of magnitude worse. Ok; is it

Re: [HACKERS] On the usefulness of hint bits

2010-10-11 Thread Leonardo Francalanci
I wonder if we could improve this with some sort of process-local cache - not to get rid of hint bits, just to reduce pg_clog contention. We might easily end up testing the same XID many times during the same table scan. I guess that's my scenario... not that many transactions, so even

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-10-08 Thread Leonardo Francalanci
Applied with some significant editorialization. The biggest problem I found was that the code for expression indexes didn't really work, and would leak memory like there's no tomorrow even when it did work. Sorry I couldn't write the way it was supposed to... I'll look at the differences

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-10-04 Thread Leonardo Francalanci
It sounds like the costing model might need a bit more work before we commit this. I tried again the simple sql tests I posted a while ago, and I still get the same ratios. I've tested the applied patch on a dual opteron + disk array Solaris machine. I really don't get how a laptop hard

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-10-01 Thread Leonardo Francalanci
I ran a few more performance tests on this patch. Here's what I got for the tests Leonardo posted originally: * 2M rows: 22 seconds for seq. scan, 24 seconds for index scan * 5M rows: 139 seconds for seq. scan, 97 seconds for index scan * 10M rows: 256 seconds seq. scan, 611

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-09-29 Thread Leonardo Francalanci
10% is nothing. I was expecting this patch would give an order of magnitude of improvement or somethine like that in the worst cases of the current code (highly unsorted input) Yes. It should be x10 faster than ordinary method in the worst cases. Here's my post with a (very simple)

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-09-29 Thread Leonardo Francalanci
Here's my post with a (very simple) performance test: http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php I think the 10M rows test is more in line with what we want (83s vs. 646). Can someone else test the patch to see if what I found is still valid? I don't think it makes

Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-07-21 Thread Leonardo Francalanci
I think writetup_rawheap() and readtup_rawheap() are a little complex, but should work as long as there are no padding between t_len and t_self in HeapTupleData struct. - It might be cleaner if you write the total item length and tuple data separately. - (char *) tuple +