Matthew Schumacher wrote:
> John A Meinel wrote:
>
>
>>Surely this isn't what you have. You have *no* loop here, and you have
>>stuff like:
>> AND
>>(bayes_token_tmp) NOT IN (SELECT token FROM bayes_token);
>>
>>I'm guessing this isn't your last version of the function.
>>
>>As far as putting
John A Meinel wrote:
> Surely this isn't what you have. You have *no* loop here, and you have
> stuff like:
> AND
> (bayes_token_tmp) NOT IN (SELECT token FROM bayes_token);
>
> I'm guessing this isn't your last version of the function.
>
> As far as putting the CREATE TEMP TABLE inside th
Matthew Schumacher wrote:
> Matthew Schumacher wrote:
>
>>Tom Lane wrote:
>>
>>
>>
>>>I don't really see why you think that this path is going to lead to
>>>better performance than where you were before. Manipulation of the
>>>temp table is never going to be free, and IN (sub-select) is always
>>>
Matthew Schumacher wrote:
> Tom Lane wrote:
>
>
>>I don't really see why you think that this path is going to lead to
>>better performance than where you were before. Manipulation of the
>>temp table is never going to be free, and IN (sub-select) is always
>>inherently not fast, and NOT IN (sub-
Tom Lane wrote:
> I don't really see why you think that this path is going to lead to
> better performance than where you were before. Manipulation of the
> temp table is never going to be free, and IN (sub-select) is always
> inherently not fast, and NOT IN (sub-select) is always inherently
> aw
John A Meinel <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I don't really see why you think that this path is going to lead to
>> better performance than where you were before.
> So for an IN (sub-select), does it actually pull all of the rows from
> the other table, or is the planner smart en
John A Meinel wrote:
> Matthew Schumacher wrote:
>
> I recommend that you drop and re-create the temp table. There is no
> reason to have it around, considering you delete and re-add everything.
> That means you never have to vacuum it, since it always only contains
> the latest rows.
Whenever I
Tom Lane wrote:
> Matthew Schumacher <[EMAIL PROTECTED]> writes:
>
>> for i in array_lower(intokenary, 1) .. array_upper(intokenary, 1)
>> LOOP
>>_token := intokenary[i];
>>INSERT INTO bayes_token_tmp VALUES (_token);
>> END LOOP;
>
>
>> UPDATE
>>bayes_token
>> SET
>>spam_count
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> for i in array_lower(intokenary, 1) .. array_upper(intokenary, 1)
> LOOP
> _token := intokenary[i];
> INSERT INTO bayes_token_tmp VALUES (_token);
> END LOOP;
> UPDATE
> bayes_token
> SET
> spam_count = greatest_int(spam_co
Matthew Schumacher wrote:
> Okay,
>
> Here is the status of the SA updates and a question:
>
> Michael got SA changed to pass an array of tokens to the proc so right
> there we gained a ton of performance due to connections and transactions
> being grouped into one per email instead of one per toke
What I really want to do is have the token array available as a record
so that I can query against it, but not have it take up the resources of
a real table. If I could copy from an array into a record then I can
even get rid of the loop. Anyone have any thoughts on how to do this?
You coul
Okay,
Here is the status of the SA updates and a question:
Michael got SA changed to pass an array of tokens to the proc so right
there we gained a ton of performance due to connections and transactions
being grouped into one per email instead of one per token.
Now I am working on making the pro
Jim C. Nasby wrote:
>I'm not sure who's responsible for DBI::Pg (Josh?), but would it make
>sense to add better support for bytea to DBI::Pg? ISTM there should be a
>better way of doing this than adding gobs of \'s.
>
>
It has support for binding a bytea parameter, but in this case we're
trying
On Mon, Aug 01, 2005 at 01:28:24PM -0800, Matthew Schumacher wrote:
> PFC wrote:
> >
> >
> >> select put_tokens2(1, '{"\\246\\323\\061\\332\\277"}', 1, 1, 1);
> >
> >
> > Try adding more backslashes until it works (seems that you need
> > or something).
> > Don't DBI convert t
PFC wrote:
>
>
>> select put_tokens2(1, '{"\\246\\323\\061\\332\\277"}', 1, 1, 1);
>
>
> Try adding more backslashes until it works (seems that you need
> or something).
> Don't DBI convert the language types to postgres quoted forms on its
> own ?
>
Your right I am find
select put_tokens2(1, '{"\\246\\323\\061\\332\\277"}', 1, 1, 1);
Try adding more backslashes until it works (seems that you need or
something).
Don't DBI convert the language types to postgres quoted forms on its
own ?
---(end of broadcast)-
Tom Lane wrote:
>
> Revised insertion procedure:
>
>
> CREATE or replace FUNCTION put_tokens (_id INTEGER,
> _tokens BYTEA[],
> _spam_count INTEGER,
> _ham_count INTEGER,
> _at
Tom Lane wrote:
> Michael Parker <[EMAIL PROTECTED]> writes:
>
>>sub bytea_esc {
>> my ($str) = @_;
>> my $buf = "";
>> foreach my $char (split(//,$str)) {
>>if (ord($char) == 0) { $buf .= "000"; }
>>elsif (ord($char) == 39) { $buf .= "047"; }
>>elsif (ord($char) == 92) { $b
Michael Parker <[EMAIL PROTECTED]> writes:
> sub bytea_esc {
> my ($str) = @_;
> my $buf = "";
> foreach my $char (split(//,$str)) {
> if (ord($char) == 0) { $buf .= "000"; }
> elsif (ord($char) == 39) { $buf .= "047"; }
> elsif (ord($char) == 92) { $buf .= "134"; }
>
Michael Parker <[EMAIL PROTECTED]> writes:
> The next hurdle, and I've just posted to the DBD::Pg list, is
> escaping/quoting the token strings.
If you're trying to write a bytea[] literal, I think the most reliable
way to write the individual bytes is
nnn
where nnn is *octal*. The id
Hi All,
As a SpamAssassin developer, who by my own admission has real problem
getting PostgreSQL to work well, I must thank everyone for their
feedback on this issue. Believe me when I say what is in the tree now
is a far cry from what used to be there, orders of magnitude faster
for sure. I thi
Jim C. Nasby wrote:
On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
Ok, here is the current plan.
Change the spamassassin API to pass a hash of tokens into the storage
module, pass the tokens to the proc as an array, start a transaction,
load the tokens into a temp table us
On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
> Ok, here is the current plan.
>
> Change the spamassassin API to pass a hash of tokens into the storage
> module, pass the tokens to the proc as an array, start a transaction,
> load the tokens into a temp table using copy, sele
Ok, here is the current plan.
Change the spamassassin API to pass a hash of tokens into the storage
module, pass the tokens to the proc as an array, start a transaction,
load the tokens into a temp table using copy, select the tokens distinct
into the token table for new tokens, update the token t
John Arbash Meinel wrote:
>Matthew Schumacher wrote:
>
>
>
>>All it's doing is trying the update before the insert to get around the
>>problem of not knowing which is needed. With only 2-3 of the queries
>>implemented I'm already back to running about the same speed as the
>>original SA proc th
Matthew Schumacher wrote:
>Tom Lane wrote:
>
>
>
>>I looked into this a bit. It seems that the problem when you wrap the
>>entire insertion series into one transaction is associated with the fact
>>that the test does so many successive updates of the single row in
>>bayes_vars. (VACUUM VERBOSE
Matthew Schumacher wrote:
>All it's doing is trying the update before the insert to get around the
>problem of not knowing which is needed. With only 2-3 of the queries
>implemented I'm already back to running about the same speed as the
>original SA proc that is going to ship with SA 3.1.0.
>
>A
Tom Lane wrote:
> I looked into this a bit. It seems that the problem when you wrap the
> entire insertion series into one transaction is associated with the fact
> that the test does so many successive updates of the single row in
> bayes_vars. (VACUUM VERBOSE at the end of the test shows it cl
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> One thing that is still very puzzling to me is why this runs so much
> slower when I put the data.sql in a transaction. Obviously transactions
> are acting different when you call a proc a zillion times vs an insert
> query.
I looked into this a bi
Ok, here is where I'm at, I reduced the proc down to this:
CREATE FUNCTION update_token (_id INTEGER,
_token BYTEA,
_spam_count INTEGER,
_ham_count INTEGER,
_atime INTEGER)
RETU
Andrew McMillan <[EMAIL PROTECTED]> writes:
> On Fri, 2005-07-29 at 09:37 -0800, Matthew Schumacher wrote:
>> How often should this table be vacuumed, every 5 minutes?
> I would be tempted to vacuum after each e-mail, in this case.
Perhaps the bulk of the transient states should be done in a temp
Andrew McMillan wrote:
>
> For the data in question (i.e. bayes scoring) it would seem that not
> much would be lost if you did have to restore your data from a day old
> backup, so perhaps fsync=false is OK for this particular application.
>
> Regards,
> And
On Fri, 2005-07-29 at 09:37 -0800, Matthew Schumacher wrote:
> >
> > On my laptop this takes:
> >
> > real1m33.758s
> > user0m4.285s
> > sys 0m1.181s
> >
> > One interesting effect is the data in bayes_vars has a huge number of
> > updates and needs vacuum _frequently_. After the ru
Also, this test goes a bit faster with sync turned off, if mysql isn't
using sync that would be why it's so much faster. Anyone know what the
default for mysql is?
For InnoDB I think it's like Postgres (only slower) ; for MyISAM it's no
fsync, no transactions, no crash tolerance of any ki
Josh Berkus wrote:
>Dennis,
>
>
>
>> EXCEPTION WHEN unique_violation THEN
>>
>>
>
>I seem to remember that catching an exception in a PL/pgSQL procedure was a
>large performance cost. It'd be better to do UPDATE ... IF NOT FOUND.
>
>
>
Actually, he was doing an implicit UPDATE IF N
Andrew McMillan wrote:
> On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote:
>
>>Ok, I finally got some test data together so that others can test
>>without installing SA.
>>
>>The schema and test dataset is over at
>>http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz
>>
>>I hav
Dennis,
> EXCEPTION WHEN unique_violation THEN
I seem to remember that catching an exception in a PL/pgSQL procedure was a
large performance cost. It'd be better to do UPDATE ... IF NOT FOUND.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(en
On Wed, 27 Jul 2005, Matthew Schumacher wrote:
> Then they do this to insert the token:
>
> INSERT INTO bayes_token (
> id,
> token,
> spam_count,
> ham_count,
> atime
> ) VALUES (
> ?,
> ?,
> ?,
> ?,
> ?
> ) ON DUPLICATE KEY
> UPDATE
> spam_count = GREATEST(spam_count +
zOn Thu, 28 Jul 2005, Matthew Schumacher wrote:
> Gavin Sherry wrote:
>
> >
> > I had a look at your data -- thanks.
> >
> > I have a question though: put_token() is invoked 120596 times in your
> > benchmark... for 616 messages. That's nearly 200 queries (not even
> > counting the 1-8 (??) inside
On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote:
>
> Ok, I finally got some test data together so that others can test
> without installing SA.
>
> The schema and test dataset is over at
> http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz
>
> I have a pretty fast machine w
Gavin Sherry wrote:
>
> I had a look at your data -- thanks.
>
> I have a question though: put_token() is invoked 120596 times in your
> benchmark... for 616 messages. That's nearly 200 queries (not even
> counting the 1-8 (??) inside the function itself) per message. Something
> doesn't seem ri
On Thu, 28 Jul 2005, Matthew Schumacher wrote:
> Karim Nassar wrote:
> > On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:
> >
> >
> >>I put the rest of the schema up at
> >>http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
> >>needs to see it too.
> >
> >
> > Do you h
Karim Nassar wrote:
> On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:
>
>
>>I put the rest of the schema up at
>>http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
>>needs to see it too.
>
>
> Do you have sample data too?
>
Ok, I finally got some test data togeth
> I'm not sure how much this has been discussed on the list, but wasn't
> able to find anything relevant in the archives.
>
> The new Spamassassin is due out pretty soon. They are currently
testing
> 3.1.0pre4. One of the things I hope to get out of this release is
bayes
> word stats moved to a
On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:
> I put the rest of the schema up at
> http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
> needs to see it too.
Do you have sample data too?
--
Karim Nassar
Collaborative Computing Lab of NAU
Office: (928) 523 5868 -
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> After playing with various indexes and what not I simply am unable to
> make this procedure perform any better. Perhaps someone on the list can
> spot the bottleneck and reveal why this procedure isn't performing that
> well or ways to make it bette
Josh Berkus wrote:
> Matt,
>
> Well, it might be because we don't have a built-in GREATEST or LEAST prior to
> 8.1. However, it's pretty darned easy to construct one.
I was more talking about min() and max() but yea, I think you knew where
I was going with it...
>
> Well, there's the genera
Matt,
> UPDATE bayes_vars SET
> $token_count_update
> newest_token_age = GREATEST(newest_token_age, ?),
> oldest_token_age = LEAST(oldest_token_age, ?)
> WHERE id = ?
>
>
> I think the reason why the procedure was written for postgres was
> because of the greatest and least statements perf
Josh Berkus wrote:
> Matt,
>
>
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better. Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
>
>
> Wel
Matt,
> After playing with various indexes and what not I simply am unable to
> make this procedure perform any better. Perhaps someone on the list can
> spot the bottleneck and reveal why this procedure isn't performing that
> well or ways to make it better.
Well, my first thought is that this
50 matches
Mail list logo