Andrew, > Arguably this might exaggerate the effect quite significantly. Users > will want to know the real time effect on a complete COPY. Depending on > how much the pasing is in the total time your 20% improvement in parsing > might only be a small fraction of 20% improvement in COPY.
Arguably has already been argued. We knew this because we profiled the entire COPY process and came up with this approx. breakdown for a specific case: Parsing: 25% Attribute Conversion: 40% Data Insertion: 35% Net copy rate: 8 MB/s on a filesystem that does 240 MB/s So - if we speed up parsing by 500% or 450%, the end result is about a 20-30% speed increase in the overall process. Note that we're still a *long* way from getting anywhere near the limit of the I/O subsystem at 12 MB/s. Oracle can probably get 5-8 times this data rate, if not better. The attribute conversion logic is also very slow and needs similar improvements. The reason we focused first on Parsing is that our MPP version of Bizgres will reach data loading rates approaching the parsing speed, so we needed to improve that part to get it out of the way. We will continue to improve COPY speed in Bizgres so that we can provide comparable COPY performance to Oracle and MySQL. > Like you, I'm happy we have seen a 5 times improvement in parsing. Is it > possible you can factor out something smallish from your patch that > might make up the balance? That parsing was 25% of the workload was traceable to a 3 main things: 1) Per character acquisition from source instead of buffering 2) Frequent interruption of the parsing pipeline to handle attribute conversion 3) Lack of micro-parallelism in the character finding logic Tom's patch took our contribution from (1) and (2) and his improvements, and he rejected (3). The net result is that we lost performance from the lack of (3) but gained performance from his improvements of (1) and (2). I believe that re-introducing (3) may bring us from 100 MB/s to 150 MB/s parsing speed. - Luke ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq