Re: [PERFORM] application of KNN code to US zipcode searches?

2011-10-29 Thread Mark Stosberg
Table "public.zipcodes" Column| Type | Modifiers --+---+--- zipcode | character varying(5) | not null lonlat_point | geometry(Point,4326) | Indexes: "zipcodes_pkey" PRIMARY KEY, btree (zipco

Re: [PERFORM] application of KNN code to US zipcode searches?

2011-02-17 Thread Mark Stosberg
On 02/17/2011 03:17 PM, Oleg Bartunov wrote: > Mark, > > we investigating pgsphere http://pgsphere.projects.postgresql.org/, if > we could add KNN support. Great, thanks Oleg. I'll be happy to test it when something is ready. Mark -- Sent via pgsql-performance mailing list (pgsql-perform

Re: [PERFORM] application of KNN code to US zipcode searches?

2011-02-17 Thread Mark Stosberg
I tried again to use KNN for a real-world query, and I was able to get it to add an approximately 6x speed-up vs the cube search or earthdistance methods ( from 300 ms to 50ms ). I had to make some notable changes for the KNN index to be considered. - Of course, I had to switch to using basic po

Re: [PERFORM] application of KNN code to US zipcode searches?

2011-02-17 Thread Mark Stosberg
> PostGIS, geometry columns, and UTM.. I'm not sure where they are wrt > adding KNN support, but it's something they've been anxious to have for > a while, so I expect support will come quickly. I've looked into this a little more. One approach seems to be to project the lat/long pairs on to a f

Re: [PERFORM] application of KNN code to US zipcode searches?

2011-02-17 Thread Mark Stosberg
> I thought the benefit of KNN was that you could retrieve the rows in > distance order, so that a query for the closest 20 locations (for > example) would be very fast. I wouldn't have expected it to be > helpful when you're selecting all the rows regardless of distance. Kevin, Thanks for the

[PERFORM] application of KNN code to US zipcode searches?

2011-02-17 Thread Mark Stosberg
We perform over 1,000,000 searches each day for "adoptable shelter pets near your zipcode". We already have adequate performance for these searches using the "cube" contrib, but the new KNN work in 9.1 seemed like it might be a promising way to speed this up even further. I installed PostgreSQL 9

[PERFORM] Re: getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-04 Thread Mark Stosberg
On 02/03/2011 10:57 AM, gnuo...@rcn.com wrote: > For PG to prosper in the future, it has to embrace the > multi-core/processor/SSD machine at the query level As the person who brought up the original concern, I'll add that "multi-core at the query level" really isn't important for us. Most of our

Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Mark Stosberg
On 02/03/2011 10:54 AM, Oleg Bartunov wrote: > Mark, > > you could try gevel module to get structure of GIST index and look if > items distributed more or less homogenous (see different levels). You > can visualize index like http://www.sai.msu.su/~megera/wiki/Rtree_Index > Also, if your searches

[PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements

2011-02-03 Thread Mark Stosberg
Each night we run over a 100,000 "saved searches" against PostgreSQL 9.0.x. These are all complex SELECTs using "cube" functions to perform a geo-spatial search to help people find adoptable pets at shelters. All of our machines in development in production have at least 2 cores in them, and I'm

Re: [PERFORM] debugging handle exhaustion and 15 min/ 5mil row delete

2010-05-07 Thread Mark Stosberg
> You can use TRUNCATE instead DELETE. TRUNCATE is more efficient and > faster that DELETE. Thanks for the suggestion. However, TRUNCATE is not compatible with Slony, and we also have some rows which remain in table. > Now, we need more information about your system to give you a certain > so

[PERFORM] debugging handle exhaustion and 15 min/ 5mil row delete

2010-05-07 Thread Mark Stosberg
your help! Mark . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer m...@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . . -- Sent via pgsql-performance mailing list (pgsql-performanc

Re: [PERFORM] Help me put 2 Gigs of RAM to use

2009-12-10 Thread Mark Stosberg
Thanks for the response, Matthew. > On Thu, 10 Dec 2009, Mark Stosberg wrote: > > What I'm noticing is that the while the FreeBSD server has 4 Gigs of > > memory, there are rarely every more than 2 in use-- the memory use > > graphs as being rather constant. My goal is

[PERFORM] Fw: Help me put 2 Gigs of RAM to use

2009-12-10 Thread Mark Stosberg
is message is a second attempt.) -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer m...@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . . -- Sent via

Re: [PERFORM] recommendations for web/db connection pooling or DBD::Gofer reviews

2008-04-10 Thread Mark Stosberg
Under heavy load, Apache has the usual failure mode of spawning so many threads/processes and database connections that it just exhausts all the memory on the webserver and also kills the database. As usual, I would use lighttpd as a frontend (also serving static files) to handle the l

Re: [PERFORM] bulk insert performance problem

2008-04-08 Thread Mark Stosberg
Christian Bourque wrote: Any idea? Is there any other improvements I could do? Are you using the "COPY" syntax in the import script or individual insert statements? Using COPY will always be *much* faster. I believe COPY always appends to tables rather than replacing the contents, you can

[PERFORM] Re: what worked: performance improvements for geo-spatial searching on FreeBSD

2008-04-08 Thread Mark Stosberg
Oleg Bartunov wrote: Mark, do you know about our sky segmentation code Q3C, see details http://www.sai.msu.su/~megera/wiki/SkyPixelization We use it for billions objects in database and quite happy. Oleg, Thanks for the response. That sounds interesting, but it's not clear to me how I would

[PERFORM] per-review of PgBouncer / Slony design

2008-04-07 Thread Mark Stosberg
I would stick to proven postgresql technologies such as pgbouncer. Thanks for the fast recommendation, Joshua. I'll consider it. Our application is Slony-replicated web/db project with two slaves. Does this design seem sensible? - Run one pgbouncer server on the master, with settings to

[PERFORM] recommendations for web/db connection pooling or DBD::Gofer reviews

2008-04-07 Thread Mark Stosberg
When traffic to our PostgreSQL-backed website spikes, the first resource we see being exhausted is the DB slots on the master server (currently set to about 400). I expect that as new Apache/mod_perl children are being put to us, they are creating new database connections. I'm interested in rec

[PERFORM] what worked: performance improvements for geo-spatial searching on FreeBSD

2008-04-07 Thread Mark Stosberg
The http://www.1-800-save-a-pet.com site is hosted with FreeBSD and PostgreSQL and as a geo-spatial search as a central feature. One thing that made a substantial performance improvement was switching from the "geo_distance()" search in the earthdistance contrib, to use the "cube" based geo-spati

Re: [PERFORM] Writting a "search engine" for a pgsql DB

2007-02-26 Thread Mark Stosberg
> > Now see, this is exactly the kind of sagely advice I was hoping for! :) > > I'll look into tsearch2, and failing that for some reason, I love the > keyword table idea. For example keyword search code, you can try this package: http://downloads.sourceforge.net/cascade/cascade-devel-pieces-1.1

Re: [PERFORM] Writting a "search engine" for a pgsql DB

2007-02-26 Thread Mark Stosberg
Madison Kelly wrote: > > I think the more direct question I was trying to get at is "How do you > build a 'relavence' search engine? One where results are returned/sorted > by relevance of some sort?". At this point, the best I can think of, > would be to perform multiple queries; first matching

Re: [PERFORM] Writting a "search engine" for a pgsql DB

2007-02-26 Thread Mark Stosberg
Joshua D. Drake wrote: > Madison Kelly wrote: >> Hi all, >> >> I am asking in this list because, at the end of the day, this is a >> performance question. >> >> I am looking at writing a search engine of sorts for my database. I >> have only ever written very simple search engines before which

Re: [PERFORM] low memory usage reported by 'top' indicates poor tuning?

2007-02-26 Thread Mark Stosberg
Joshua D. Drake wrote: > Mark Stosberg wrote: >> Hello, >> >> I'm trying to make sense of the memory usage reported by 'top', compared >> to what "pg_database_size" shows. Here's one result:' > > > You are missing the

[PERFORM] low memory usage reported by 'top' indicates poor tuning?

2007-02-26 Thread Mark Stosberg
Hello, I'm trying to make sense of the memory usage reported by 'top', compared to what "pg_database_size" shows. Here's one result: select pg_size_pretty(pg_database_size('production')); pg_size_pretty 6573 MB Now, looking at memory use with "top", there is a lot memory tha

[PERFORM] Using the 8.2 autovacuum values with 8.1

2007-02-22 Thread Mark Stosberg
Thanks to everyone for the feedback about vacuuming. It's been very useful. The pointers to the pgstattuple and Pgfouine tools were also helpful. I'm now considering the following plan for trying Autovacuuming again with 8.1. I'd like any peer review you have to offer of the following: 1. First,

Re: [PERFORM] Auto-Vacuum in 8.1 was ineffective for me. 8.2 may work better?

2007-02-21 Thread Mark Stosberg
Alvaro Herrera wrote: > Mark Stosberg wrote: >> When I upgraded a busy database system to PostgreSQL 8.1, I was excited >> about AutoVacuum, and promptly enabled it, and turned off the daily >> vacuum process. >> >> ( >> I set the following, as well

[PERFORM] Auto-Vacuum in 8.1 was ineffective for me. 8.2 may work better?

2007-02-21 Thread Mark Stosberg
When I upgraded a busy database system to PostgreSQL 8.1, I was excited about AutoVacuum, and promptly enabled it, and turned off the daily vacuum process. ( I set the following, as well as the option to enable auto vacuuming stats_start_collector = true stats_row_level = true ) I could see in t

[PERFORM] How to avoid vacuuming a huge logging table

2007-02-21 Thread Mark Stosberg
Our application has a table that is only logged to, and infrequently used for reporting. There generally no deletes and updates. Recently, the shear size (an estimated 36 million rows) caused a serious problem because it prevented a "vacuum analyze" on the whole database from finishing in a timely

Re: [PERFORM] How to debug performance problems

2007-02-21 Thread Mark Stosberg
Ray Stell wrote: > I'd like to have a toolbox prepared for when performance goes south. > I'm clueless. Would someone mind providing some detail about how to > measure these four items Craig listed: > > 1. The first thing is to find out which query is taking a lot of time. > > 2. A long-running

Re: [PERFORM] reindex vs 'analyze'

2007-02-16 Thread Mark Stosberg
Tom Lane wrote: > Mark Stosberg <[EMAIL PROTECTED]> writes: >> Your suggestion about the pet_state index was right on. I tried >> "Analyze" on it, but still got the same bad estimate. However, I then >> used "reindex" on that index, and that fixed

Re: [PERFORM] reindex vs 'analyze'

2007-02-14 Thread Mark Stosberg
On Wed, Feb 14, 2007 at 01:07:23PM -0500, Tom Lane wrote: > Mark Stosberg <[EMAIL PROTECTED]> writes: > > Your suggestion about the pet_state index was right on. I tried > > "Analyze" on it, but still got the same bad estimate. However, I then > > used "

[PERFORM] reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)

2007-02-14 Thread Mark Stosberg
Merlin Moncure wrote: On 2/14/07, Tom Lane <[EMAIL PROTECTED]> wrote: There are two things wrong here: first, that the estimated row count is only 20% of actual; it should certainly not be that far off for such a simple condition. I wonder if your vacuum/analyze procedures are actually working.

Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-13 Thread Mark Stosberg
On Tue, Feb 13, 2007 at 09:31:18AM -0500, Merlin Moncure wrote: > > >my mistake, i misunderstood what you were trying to do...can you try > >removing the 'order by radius' and see if it helps? if not, we can try > >working on this query some more. There is a better, faster way to do > >this, I'm s

Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-12 Thread Mark Stosberg
Merlin-- Thanks so much for your help. Some follow-ups are below. Merlin Moncure wrote: > >> Here the basic query I'm using: >> SELECT >> -- 1609.344 is a constant for "meters per mile" >> cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode = >> '90210') , earth_coords)/1609.344 >>

Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-12 Thread Mark Stosberg
Merlin Moncure wrote: > On 2/10/07, Mark Stosberg <[EMAIL PROTECTED]> wrote: >> >> With the help of some of this list, I was able to successfully set up >> and benchmark a cube-based replacement for geo_distance() calculations. >> >> On a development box

Re: [PERFORM] Can anyone make this code tighter? Too slow, Please help!

2007-02-09 Thread Mark Stosberg
[EMAIL PROTECTED] wrote: > I have this function in my C#.NET app that goes out to find the > business units for each event and returns a string (for my report). > I'm finding that for larger reports it takes too long and times out. > > Does anyone know how I can speed this process up? Is this cod

[PERFORM] cube operations slower than geo_distance() on production server

2007-02-09 Thread Mark Stosberg
With the help of some of this list, I was able to successfully set up and benchmark a cube-based replacement for geo_distance() calculations. On a development box, the cube-based variations benchmarked consistently running in about 1/3 of the time of the gel_distance() equivalents. After settin

Re: [PERFORM] Speed up this query

2007-02-08 Thread Mark Stosberg
[EMAIL PROTECTED] wrote: > Hi all, > > I'm fairly new to SQL, so this is probably a dumb way to form this > query, but I don't know another. > > I want to see the usernames of all the users who have logged on > today. "users" is my table of users with id's and username's. > "session_stats" is m

[PERFORM] tip: faster sorting for proximity queries by using cube_distance()

2007-02-07 Thread Mark Stosberg
Hello, I wanted to share something else I learned in my proximity search work. One my requirements is to order by the distance that matches are found from the center point. When did this using earth_distance(), the benefit of the earth_box() technique over the old geo_distance became minimal as

Re: [PERFORM] explain analyze output: vacuuming made a big difference.

2007-02-06 Thread Mark Stosberg
Mark Stosberg wrote: > > I'm reading the explain analyze output correctly myself, nearly all of > the time spent is related to the 'pets' table, but I can't see what to > about it. Something about typing that message jarred by brain to think to try: VACUUM FULL

Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Mark Stosberg
Bruno Wolff III wrote: > > Some people here may be able to tell you more if you show us explain > analyze output. Here is my explain analyze output. Some brief context of what's going on. The goal is to find "Pets Near You". We join the pets table on the shelters table to get a zipcode, and then

Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-05 Thread Mark Stosberg
Merlin Moncure wrote: > On 2/5/07, Mark Stosberg <[EMAIL PROTECTED]> wrote: >> Bruno Wolff III wrote: >> > On Sat, Feb 03, 2007 at 14:00:26 -0500, >> > Mark Stosberg <[EMAIL PROTECTED]> wrote: >> >> I'm using geo_distance() from contrib/ear

Re: [PERFORM] optimizing a geo_distance() proximity query

2007-02-05 Thread Mark Stosberg
Bruno Wolff III wrote: > On Sat, Feb 03, 2007 at 14:00:26 -0500, > Mark Stosberg <[EMAIL PROTECTED]> wrote: >> I'm using geo_distance() from contrib/earthdistance would like to find a >> way to spend up the geo distance calculation if possible. This is for a

[PERFORM] optimizing a geo_distance() proximity query

2007-02-03 Thread Mark Stosberg
Hello, I'm using geo_distance() from contrib/earthdistance would like to find a way to spend up the geo distance calculation if possible. This is for a proximity search: "Show me adoptable pets within 250 miles of this zipcode". I'm researched a number of approaches to this, but none seem as wor

[PERFORM] Using statement_timeout as a performance tool?

2007-02-01 Thread Mark Stosberg
Hello, I'm working on setting up replication with Slony, and will soon have a slave that a lot of SELECT traffic will be sent to (over 500k/day). The primary query we need to run is somewhat complex, but seems to complete on average in well under a second. However, every so often (less in 1 in

Re: [PERFORM] Most effective tuning choices for busy website?

2005-06-14 Thread Mark Stosberg
Neil Conway wrote: > Mark Stosberg wrote: >> I've used PQA to analyze my queries and happy overall with how they are >> running. About 55% of the query time is going to variations of the pet >> searching query, which seems like where it should be going. The query is &

[PERFORM] Most effective tuning choices for busy website?

2005-06-05 Thread Mark Stosberg
Hello, I'm the fellow who was interviewed in the fall about using PostgreSQL on 1-800-Save-A-Pet.com: http://techdocs.postgresql.org/techdocs/interview-stosberg.php The site traffic continues to grow, and we are now seeing parts of the day where the CPU load (according to MRTG graphs) on the data

[PERFORM] seeking consultant for high performance, complex searching with Postgres web app

2004-04-20 Thread Mark Stosberg
ECTED] for more information. Thanks! Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer [EMAIL PROTECTED] Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summe