Hi folks,
I have a system that racks up about 40M log lines per day. I'm able to COPY
the log files into a PostgreSQL table that looks like this:
CREATE TABLE activity_unlogged
(
user_id character(24) NOT NULL,
client_ip inet,
hr_timestamp timestamp without time zone,
locale character
in
that assumption?
On Thu, Mar 1, 2012 at 10:40 AM, Peter van Hardenberg p...@pvh.ca wrote:
On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi
alessan...@path.com wrote:
Now, I want to reduce that data to get the last activity that was
performed
by each user in any given hour. It should fit
, Mar 1, 2012 at 10:51 AM, Kevin Grittner kevin.gritt...@wicourts.gov
wrote:
Alessandro Gagliardi alessan...@path.com wrote:
hr_timestamp timestamp without time zone,
In addition to the responses which more directly answer your
question, I feel I should point out
Hah! Yeah, that might would work. Except that I suck at grep. :(
Perhaps that's a weakness I should remedy.
On Thu, Mar 1, 2012 at 10:35 AM, Craig James cja...@emolecules.com wrote:
On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi
alessan...@path.com wrote:
Hi folks,
I have a system
Interesting solution. If I'm not mistaken, this does solve the problem of
having two entries for the same user at the exact same time (which violates
my pk constraint) but it does so by leaving both of them out (since there
is no au1.hr_timestamp au2.hr_timestamp in that case). Is that right?
On
Ah, yes, that makes sense. Thank you!
On Thu, Mar 1, 2012 at 11:39 AM, Claudio Freire klaussfre...@gmail.comwrote:
On Thu, Mar 1, 2012 at 4:35 PM, Alessandro Gagliardi
alessan...@path.com wrote:
Interesting solution. If I'm not mistaken, this does solve the problem of
having two entries
I have a database where I virtually never delete and almost never do
updates. (The updates might change in the future but for now it's okay to
assume they never happen.) As such, it seems like it might be worth it to
set autovacuum=off or at least make it so vacuuming hardly ever occurs.
Actually,
I should have been more clear. I virtually never delete or do updates, but
I insert *a lot*. So the table does change quite a bit, but only in one
direction.
I was unable to disable autovacuum universally (due to the
cant_change_runtime_param error) but I was able to disable it on individual
a
database that is being used in this way. Any cache sizes I should be
messing with? Etc.
Thank you,
-Alessandro
On Thu, Feb 23, 2012 at 9:45 AM, Thom Brown t...@linux.com wrote:
On 23 February 2012 17:35, Alessandro Gagliardi alessan...@path.com
wrote:
I should have been more clear. I virtually
On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford
scrawf...@pinpointresearch.com wrote:
**
The documentation has information like This parameter can only be set in
the postgresql.conf file or on the server command line. that will tell
you in advance which settings will fail when you attempt
, Peter van Hardenberg p...@pvh.ca wrote:
On Thu, Feb 23, 2012 at 10:38 AM, Alessandro Gagliardi
alessan...@path.com wrote:
around the same time as disabling auto-vacuum, so that could account for
the
coincidental speed up). I'm not sure what else I could be doing wrong.
It's
definitely
On Thu, Feb 23, 2012 at 11:07 AM, Andy Colson a...@squeakycode.net wrote:
That depends on if you have triggers that are doing selects. But in
general you are correct, analyze wont help inserts.
I do have some, actually. I have a couple trigger functions like:
CREATE OR REPLACE FUNCTION
On Thu, Feb 23, 2012 at 11:26 AM, Steve Crawford
scrawf...@pinpointresearch.com wrote:
**
You need to rethink things a bit. Databases can fail in all sorts of ways
and can slow down during bursts of activity, data dumps, etc. You may need
to investigate some form of intermediate buffering.
On Thu, Feb 23, 2012 at 1:37 PM, Steve Crawford
scrawf...@pinpointresearch.com wrote:
It's possible that you might get a nice boost by wrapping the inserts into
a transaction:
begin;
insert into...;
insert into...;
insert into...;
...
commit;
This only requires all that disk-intensive
On Thu, Feb 23, 2012 at 1:11 PM, Peter van Hardenberg p...@heroku.comwrote:
My hunch is still that your issue is lock contention.
How would I check that? I tried looking at pg_locks but I don't know what
to look for.
We have many customers who do much more than this throughput, though
I'm
:
On 2/20/12 2:06 PM, Alessandro Gagliardi wrote:
. But first I just want to know if people
think that this might be a viable solution or if I'm barking up the wrong
tree.
Batching is usually helpful for inserts, especially if there's a unique
key on a very large table involved.
I
True. I implemented the SAVEPOINTs solution across the board. We'll see
what kind of difference it makes. If it's fast enough, I may be able to do
without that.
On Tue, Feb 21, 2012 at 3:53 PM, Samuel Gendler
sgend...@ideasculptor.comwrote:
On Tue, Feb 21, 2012 at 9:59 AM, Alessandro Gagliardi
Ah, that did make a big difference! It went from taking 10x as long to
taking only 1.5x as long (about what I would have expected, if not
better.) Thank you!
On Fri, Feb 17, 2012 at 9:29 PM, Ants Aasma ants.aa...@eesti.ee wrote:
On Feb 17, 2012 8:35 PM, Alessandro Gagliardi alessan...@path.com
be a viable solution or if I'm barking up the wrong
tree.
Thanks!
-Alessandro
On Fri, Feb 17, 2012 at 10:34 AM, Alessandro Gagliardi
alessan...@path.comwrote:
CREATE TABLE seen_its (
user_id character(24) NOT NULL,
moment_id character(24) NOT NULL,
created timestamp without time zone
Comparing
SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) WHERE
seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND
now()::date::timestamp
to
SELECT DISTINCT(user_id) FROM seen_its WHERE created BETWEEN (now()::date -
interval '8 days')::timestamp AND
, Alessandro Gagliardi wrote:
Comparing
SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) WHERE
seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND
now()::date::timestamp
to
SELECT DISTINCT(user_id) FROM seen_its WHERE created BETWEEN (now()::date
at 10:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Alessandro Gagliardi alessan...@path.com writes:
Still slow as mud: http://explain.depesz.com/s/Zfn
Now I've got indices on created, timezone, created at time zone timezone,
and (created at time zone timezone)::date. Clearly the problem isn't
Here's my query:
SELECT foursquare.name, foursquare.city, COUNT(moment_id) AS popularity
FROM foursq_categories
JOIN foursquare USING (foursq_id)
JOIN places USING (foursq_id)
JOIN blocks USING (block_id)
WHERE primary
AND (created at time zone timezone)::date = 'yesterday'
AND (country =
INDEX blocks_private_idx
ON blocks
USING btree
(private );
CREATE INDEX blocks_shared_idx
ON blocks
USING btree
(shared );
CREATE INDEX blocks_timezone_idx
ON blocks
USING btree
(timezone );
On Thu, Feb 9, 2012 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Alessandro Gagliardi
(or blocks_created_at_timezone_idx). How do I make that happen?
On Thu, Feb 9, 2012 at 12:15 PM, Kevin Grittner kevin.gritt...@wicourts.gov
wrote:
Alessandro Gagliardi alessan...@path.com wrote:
(Actually, I originally did try one on (created at time zone
timezone)::date but couldn't figure out how to phrase
On Thu, Feb 2, 2012 at 6:52 AM, Merlin Moncure mmonc...@gmail.com wrote:
also, is effective_cache_size set to a reasonable value?
Yeah, it's 153kB
a memory limit and paging out.
Is that right?
On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi
alessan...@path.comwrote:
I just got a pointer on presenting EXPLAIN ANALYZE in a more human
friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S
From this it looks like the bottleneck
Scan.
I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5?
Actually 60K should be plenty for my purposes anyway.
On Wed, Feb 1, 2012 at 10:35 AM, Scott Marlowe scott.marl...@gmail.comwrote:
On Wed, Feb 1, 2012 at 11:19 AM, Alessandro Gagliardi
alessan...@path.com wrote
On Wed, Feb 1, 2012 at 11:04 AM, Bob Lunney bob_lun...@yahoo.com wrote:
Possibly. What does
psql show work_mem;
say?
100MB
cool. But that's definitely secondary compared to getting the
above query to run faster.
Thank you very much for any help!
-Alessandro Gagliardi
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref
.
I should have realized that as I exploited that limitation in three of my
tables. Gradually adding those indices now; will report on what kind of
difference it makes
On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi
alessan...@path.comwrote:
My slow query today is somewhat more complex
loops=1)
Total runtime: 73511.072 ms
Please let me know if there is any way to make this more efficient.
Thank you,
-Alessandro
On Tue, Jan 31, 2012 at 2:10 PM, Alessandro Gagliardi
alessan...@path.com wrote:
Looks like I missed a key sentence
in http://www.postgresql.org/docs/9.0/static/ddl
;
That was a bit faster, but still very slow. Here's the EXPLAIN:
http://explain.depesz.com/s/ZdF
On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi
alessan...@path.comwrote:
I changed the query a bit so the results would not change over the
course of the day to:
SELECT relname, emotion
using pgAdmin3, though I doubt that's relevant.
As for *GUC Settings*: Again, I don't know what this is. Whatever Heroku
defaults to is what I'm using.
Thank you in advance!
-Alessandro Gagliardi
30, 2012 at 4:13 PM, Alessandro Gagliardi
alessan...@path.com wrote:
So, here's the query:
SELECT private, COUNT(block_id) FROM blocks WHERE created 'yesterday'
AND
shared IS FALSE GROUP BY private
What confuses me is that though this is a largish table (millions of
rows
klaussfre...@gmail.comwrote:
On Mon, Jan 30, 2012 at 5:35 PM, Alessandro Gagliardi
alessan...@path.com wrote:
To answer your (non-)question about Heroku, it's a cloud service, so I
don't
host PostgreSQL myself. I'm not sure how much I can mess with things like
GUC since I don't even have access
On Mon, Jan 30, 2012 at 1:25 PM, Josh Berkus j...@agliodbs.com wrote:
You can do SHOW random_page_cost yourself right now, too.
4
I also tried SHOW seq_page_cost and that's 1.
Looking at
http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-RANDOM-PAGE-COSTI
wonder if I
that enable_seqscan is determining whether or not the data is
being cached
On Mon, Jan 30, 2012 at 1:13 PM, Fernando Hevia fhe...@gmail.com wrote:
On Mon, Jan 30, 2012 at 17:35, Alessandro Gagliardi
alessan...@path.comwrote:
Well that was a *lot* faster:
HashAggregate (cost=156301.82..156301.83 rows=2
, Alessandro Gagliardi
alessan...@path.com wrote:
Looking
at
http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-RANDOM-PAGE-COST
I wonder if I should try reducing random_page_cost?
Yes try lowering it. Generally speaking, random page cost should
always be = seq
Got it (with a little bit of klutzing around). :) Thanks!
On Mon, Jan 30, 2012 at 2:24 PM, Scott Marlowe scott.marl...@gmail.comwrote:
On Mon, Jan 30, 2012 at 3:19 PM, Scott Marlowe scott.marl...@gmail.com
wrote:
On Mon, Jan 30, 2012 at 2:55 PM, Alessandro Gagliardi
alessan...@path.com
40 matches
Mail list logo