from:"David Jarvis"

Re: [PERFORM] Analysis Function

2010-06-16 Thread David Jarvis

> Fair enough. How about something like make_timestamp? It's at least > shorter and easier than construct :-) > Agreed. Dave

Re: [PERFORM] Analysis Function

2010-06-14 Thread David Jarvis

> > Does it make sense to allow minutes when hours isn't specified? Or > For time, 00 seems a reasonable default for all values; clearly document the defaults. Also, having a default makes the code simpler than plus . (Not to mention explaining it.) ;-) SELECT to_timestamp( minutes := 19 ) -- er

Re: [PERFORM] Analysis Function

2010-06-13 Thread David Jarvis

Hi, It's not immediately obvious what the default value of "timezone" > will be? > The system's locale, like now(); documentation can clarify. By named parameter, I meant default value. You could construct a timestamp variable using: construct_timestamp( year := 1900, hour := 1 ) When I read

Re: [PERFORM] Analysis Function

2010-06-13 Thread David Jarvis

Hi, We had a little chat about this with Magnus. It's pretty surprising that > there's no built-in function to do this, we should consider adding one. > I agree; you should be able to create a timestamp or a date from integer values. Others, apparently, have written code. The implementation I did

Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis

Hi, I added an explicit cast in the SQL: dateserial(extract(YEAR FROM m.taken)::int,'||p_month1||','||p_day1||') d1, dateserial(extract(YEAR FROM m.taken)::int,'||p_month2||','||p_day2||') d2 The function now takes three integer parameters; there was no performance loss. Thank y

Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis

Hi, Tom. extract(YEAR FROM m.taken) I thought that returned a double precision? Dave

Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis

Hi, Here is code to convert dates from integers without string concatenation: Edit dateserial.c: #include "postgres.h" #include "utils/date.h" #include "utils/nabstime.h" #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif Datum dateserial(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1 (dateserial); Datu

Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis

Hi, The C function returns a DateADT, which is a typedef for int32, but the > CREATE FUNCTION statement claims that it returns 'text'. > That'll do it. Thank you! but whether that is faster or slower I don't know. But I > don't see why this query needs to be fast in the first > place. It seems t

Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis

Hi, Tim. Have you tested DATE_TRUNC()? > Not really; it returns a full timestamp and I would still have to concatenate strings. My goal is to speed up the following code (where *p_*parameters are user inputs): *date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1,

Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis

Hi, To avoid string concatenation using dates, I figured I could write a C function: #include "postgres.h" #include "fmgr.h" #include "utils/date.h" #include "utils/nabstime.h" #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif Datum dateserial (PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1 (dateserial);

Re: [PERFORM] Analysis Function

2010-06-10 Thread David Jarvis

Hi, Andy. I assume you are doing this in a loop? Many Many Many times? cuz: > Yes. Here are the variations I have benchmarked (times are best of three): Variation #0 -no date field- Explain: http://explain.depesz.com/s/Y9R Time: 2.2s Variation #1 date('1960-1-1') Explain: http://explain.depes

Re: [PERFORM] Analysis Function

2010-06-10 Thread David Jarvis

Hi, I found a slow part of the query: SELECT * date(extract(YEAR FROM m.taken)||'-1-1') d1,* * date(extract(YEAR FROM m.taken)||'-1-31') d2* FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5148 AND ... Date extraction is 3.2 se

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-02 Thread David Jarvis

Hi, Hmm, that's nice, though I cannot but wonder whether the exclusive lock > required by CLUSTER is going to be a problem in the long run. > Not an issue; the inserts are one-time (or very rare; at most: once a year). > Hm, keep in mind that if the station clause alone is not selective > enou

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-02 Thread David Jarvis

Sorry, Alvaro. I was contemplating using a GIN or GiST index as a way of optimizing the query. Instead, I found that re-inserting the data in order of station ID (the primary look-up column) and then CLUSTER'ing on the station ID, taken date, and category index increased the speed by an order of

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread David Jarvis

Hi, Bryan. Thanks for the notes. I thought about using a prepared statement, but I cannot find any examples of using a PREPARE statement from within a function, and don't really feel like tinkering around to figure it out. Performance is at the point where the Java/PHP bridge and JasperReports ar

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread David Jarvis

Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : > m.taken BETWEEN sc.taken_start AND sc.taken_end > by values. It might help the planner... > That is a fairly important restriction. I will try making it * (year1||'-01-01')::date*, but I have no constant value for it -

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread David Jarvis

Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: 1. Created a new set of tables that matched the old

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis

I was told to try OVERLAPS instead of checking years. The query is now: SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) as amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5148 AND earth_di

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis

Hi, sc.taken_end <= '1996-12-31'::date AND > m.taken BETWEEN sc.taken_start AND sc.taken_end AND > > category of data at a certain time. But I'm afraid this makes the planning > much more difficult, as the select from measurements depend on the data > returned by other parts of the query (rows fro

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis

Hi, Kevin. below something in the range of 1.5 to 2 is probably not going to be > a good choice for the mix as a whole. > Good to know; thanks. > This should probably be set to something on the order of 3GB. This > will help the optimizer make more intelligent choices about when use > of the i

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis

Hi, Alexey. Is it necessary to get the data as far as 1900 all the time ? Maybe there is > a possibility to aggregate results from the past years if they are constant. > This I have done. I created another table (station_category) that associates stations with when they started to take measuremen

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis

Hi, Rob. I tried bumping the effective_cache_size. It made no difference. My latest attempt at forcing PostgreSQL to use the indexes involved two loops: one to loop over the stations, the other to extract the station data from the measurement table. The outer loop executes in 1.5 seconds. The inn

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis

Hi, Tom. Yes, that is what happened, making the tests rather meaningless, and giving me the false impression that the indexes were being used. They were but only because of cached results. When multiple users making different queries, the performance will return to ~80s per query. I also tried Ke

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis

Hi, Kevin. Thanks for the response. It sounds as though the active portion of your database is pretty > much cached in RAM. True? > I would not have thought so; there are seven tables, each with 39 to 43 million rows as: CREATE TABLE climate.measurement ( id bigserial NOT NULL, taken date

Re: [PERFORM] Random Page Cost and Planner

2010-05-24 Thread David Jarvis

Hi, I changed the date comparison to be based on year alone: extract(YEAR FROM sc.taken_start) >= 1900 AND extract(YEAR FROM sc.taken_end) <= 2009 AND The indexes are now always used; if someone wants to explain why using the numbers works (a constant) but using a date (another constant?

[PERFORM] Random Page Cost and Planner

2010-05-24 Thread David Jarvis

Hi, I wrote a query (see below) that extracts climate data from weather stations within a given radius of a city using the dates for which those weather stations actually have data. The query uses the measurement table's only index: CREATE UNIQUE INDEX measurement_001_stc_idx ON climate.measure

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-23 Thread David Jarvis

Hi, The problem is now solved (in theory). Well, it's not the functions per se that's the problem, it's the lack of > a useful index on the expression. > The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were ca

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-22 Thread David Jarvis

Hi, certainly understand that you wouldn't want to partition by year. It > Definitely not. > does strike me that perhaps you could partition by day ranges, but you'd > I don't think that will work; users can choose any day range, with the most common as Jan 1 - Dec 31, followed by seasonal ra

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-22 Thread David Jarvis

Hi, CREATE INDEX measurement_01_001_y_idx >> ON climate.measurement_01_001 >> USING btree >> (date_part('year'::text, taken)); >> >> Is that equivalent to what you suggest? >> > > No. It is not the same function, so Postgres has no way to know it produces > the same results (if it does). > Thi

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread David Jarvis

Hi, Yeb. This is starting to go back to the design I used with MySQL: - YEAR_REF - Has year and station - MONTH_REF - Has month, category, and yea referencer - MEASUREMENT - Has month reference, amount, and day Normalizing by date parts was fast. Partitioning the tables by year won't do

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

Hi, (An EXPLAIN ANALYSE would be better here). Look at the expected number of > stations "Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual time=0.053..41.173 rows=78 loops=1)" " Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: "Result (cost=0.00..21781.18 rows=8090 width=28)" " -> Append (cost=0.00..21781.18 rows=8090 width=28)" "-> Seq Scan on measur

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

Hi, Something in here really smells fishy to me. Those extract's above are > working on values which are from the table.. Why aren't you using these > functions to figure out how to construct the actual dates based on the > values provided by the *user*..? > Because I've only been using Postgre

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

Hi, check (taken >= '1913-12-01' and taken <= '1913-12-31') > I don't think I want to constrain by year, for a few reasons: 1. There are a lot of years -- over 110. 2. There will be more years added (both in the future for 2010 and in the past as I get data from other sources). Currently

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

Hi, ~300 million measurements ~12000 stations (not 7 as I mentioned before) ~5500 cities some serious data tho, at least. Basically, PG is sequentially scanning > through all of the tables in your partitioning setup. What is > constraint_exclusion set to? What version of PG is this? Do th

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

The greatest() expression reduces to either the current year (year + 0) or the next year (year + 1) by taking the sign of the difference in start/end days. This allows me to derive an end date, such as: Dec 22, 1900 to Mar 22, 1901 Then I check if the measured date falls between those two dates.

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

Hi, I was still referring to the measurement table. You have an index on > stationid, but still seem to be getting a sequential scan. Maybe the planner > does not realise that you are selecting a small number of stations. Posting > an EXPLAIN ANALYSE would really help here. > Here is the result f

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date That uses

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

What if I were to have the application pass in two sets of date ranges? For the condition of Dec 22 to Mar 22: Dec 22 would become: - Dec 22 - Dec 31 Mar 22 would become: - Jan 1 - Mar 22 The first range would always be for the current year; the second range would always be for the year

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

When using MySQL, the performance was okay (~5 seconds per query) using: date( concat_ws( '-', y.year, m.month, d.day ) ) between -- Start date. date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND -- End date. Calculated by checking to see if the end date wraps -- into the

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis

Hi, I have posted an image of the user inputs here: http://i.imgur.com/MUkuZ.png The problem is that I am given a range of days (Dec 22 - Mar 22) over a range of years (1900 - 2009) and the range of days can span from one year to the next. This is not the same as saying Dec 22, 1900 to Mar 22, 2

[PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-19 Thread David Jarvis

Hi, I recently switched to PostgreSQL from MySQL so that I can use PL/R for data analysis. The query in MySQL form (against a more complex table structure) takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish, as it takes over a minute. I think I have the correct table structu

42 matches

Mail list logo