On a side note the literal_column change worked fine performance wise
allowing me to remove the literal_bind/compile step.
I guess I am 50/50 on which is cleaner. Is either one more advantageous
from a SQL injection perspective?
SQL Injection is not a primary concern at the moment for this proc
Tried doing various conversions on the pk values as they enter the
statement:
1. to bytes
2. to ascii
3. to latin1 (technically the same encoding as the extract source before
entering the db)
None of which yielded a performance improvement for the non-compiled
version.
I have read that this
So inspecting the elements of the tuple, they are both str, so hence
unicode.
Are you saying that if I convert those values to bytes it could improve
performance?
> I'd not bother with the literal_binds and just use a literal value:
>
> pkf = [(col == literal_column("'%s'" % v)) for (col, v
On Thu, Aug 31, 2017 at 2:19 PM, Ken MacKenzie wrote:
> So here is my solution as of now. As far as I can tell hits the marks of:
>
> 1. DB Neutral
> 2. Performance (9k records deleted in 5 seconds)
> 3. Generic
>
> (excuse the print statements still there for testing)
>
> Would welcome comments
So here is my solution as of now. As far as I can tell hits the marks of:
1. DB Neutral
2. Performance (9k records deleted in 5 seconds)
3. Generic
(excuse the print statements still there for testing)
Would welcome comments on this proposed solution. I have learned a lot
from the advice in t
Ending output from batch size of 10:
501 - 0:24:35.558103
491 - 0:24:37.197976
481 - 0:24:38.909318
471 - 0:24:40.645801
461 - 0:24:41.954807
451 - 0:24:43.628955
441 - 0:24:45.246469
431 - 0:24:46.662228
421 - 0:24:48.192343
411 - 0:24:49.839395
401 - 0:24:51.513340
391 - 0:24:53.153668
381 - 0:2
I tried a batch size as small as ten and each step took about 2 seconds. I set
it up to show me the time between each batch for that run. Lowering the batch
size actually seems to worsen performance. Which is not what I expected.
Sent from my iPad
> On Aug 31, 2017, at 5:42 AM, Simon King wro
Yes the double quote was a typo, sorry about that.
Your point about non varchar pk, that is why I will double check other dbs. SQL
server lets it fly but I think you are right about Postgres.
Sent from my iPad
> On Aug 31, 2017, at 5:36 AM, Simon King wrote:
>
> The generic version would lo
I'm pretty sure the bad performance is due to pyodbc (or maybe SQL
Server) not handling *huge* numbers of bind parameters gracefully. You
are generating a query with (batch_size * n_primary_key_columns) bind
parameters, which even in your original version meant 2000 bind
parameters.
Try with a bat
The generic version would look something like this:
table = cls.__table__
pkconditions = []
for pk in cls.SQL_PK:
pkconditions.append(table.c[pk] == sa.bindparam(pk)
condition = sa.and_(*pkconditions)
statement = sa.delete(table, whereclause=condition)
batchparams = [dict(zip(cls.SQL_PK, v)) f
When I get in tomorrow I will take a look at some of these options. I also
intend to spin up a quick MySQL or postgresql db to test if my textual solution
can work against an alternate target.
Sent from my iPad
> On Aug 30, 2017, at 6:20 PM, Jonathan Vanasco wrote:
>
> Looking at your code an
Looking at your code and considering the size of your database, I've run
into performance issues under PostgreSQL a few times that may be related...
they're all usually because very slight text changes to a query can
drastically alter the server's query planner.
The cause/fixes were:
1. Index
When you call create_engine:
e = create_engine(url, echo=True)
include the "echo=True" part.
Then look at the SQL being emitted.
That's the important part here.
On Wed, Aug 30, 2017 at 5:09 PM, Ken MacKenzie wrote:
> So the big difference between the methods when seen with print(q)
>
> Te
So the big difference between the methods when seen with print(q)
Textual SQL is building a statement with the values inline
such as where college=888 and check=1234567
The methods using the column object are building with parameters
where college = (?) and check = (?)
That is all I can see dif
With a batch size of 1 based on the original selection, no improvement.
I am going to see what sql this is creating.
2328928 function calls (2085124 primitive calls) in 1084.925
seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(func
I am noticing the 16 execute calls. There should only be 10 with a batch
size of 1000 and a record size of 9,000 and some change.
Are you just wanting to see the sql via a print(q)?
On Wednesday, August 30, 2017 at 2:01:31 PM UTC-4, Mike Bayer wrote:
>
> it looks like you have 16 SQL statements
I tried an alternate version:
@classmethod
def dict_delete2(cls, dict_in):
"""
Delete records from the database
based on a dictionary keyed by
PK tuple
"""
s = dbc.get_session()
keys_in = ["-".join(k) for k in dict_in.keys()]
it looks like you have 16 SQL statements that take a very different
amount of time depending on format. Can you please paste the output
with echo=True on create_engine() so that the difference in SQL
statements can be seen?
On Wed, Aug 30, 2017 at 12:30 PM, Ken MacKenzie wrote:
> Not sure how I
Not sure how I would iterate through a non predetermined number of primary
keys.
I guess part of me is wondering that although textual sql is not inherently
db neutral how different between the db targets is the where field =
'value' syntax?
On Wednesday, August 30, 2017 at 12:07:52 PM UTC-4,
You could also try using executemany:
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
I think it would look something like this:
table = cls.__table__
condition = sa.and_(
table.c.pk1 == sa.bindparam('pk1'),
table.c.pk2 == sa.bindparam('pk2'),
)
stat
Sorted Textual SQL profile
ncalls tottime percall cumtime percall filename:lineno(function)
161.8400.1151.8400.115 {method 'execute' of
'pyodbc.Cursor' objects}
95720.1960.0000.7880.000 elements.py:1860(_construct)
577000.1890.000
Sorted:
ncalls tottime percall cumtime percall filename:lineno(function)
16 1053.794 65.862 1053.794 65.862 {method 'execute' of
'pyodbc.Cursor' objects}
348916/3483110.5390.0000.5560.000 {built-in method
isinstance}
160920/1609180.3130.0000.3130
After the current sorted profile finishes I will revert to the textual
version and run a profile on that. I expect another 10-15 minutes for this
to finish right now.
At present the batch size is set to 1000, total record count is just over
9000 in these tests.
The reason for 1000 was at firs
After I finish the current profile run I will revert and profile that one
for comparison. I am expecting another 10-15 minutes for this to finish.
On Wednesday, August 30, 2017 at 11:18:13 AM UTC-4, Simon King wrote:
>
> It would be interesting to see the profile of the textual SQL version.
> I
It would be interesting to see the profile of the textual SQL version.
It looks like most of the time is being spent inside pyodbc, rather
than SQLAlchemy, so I guess it must be something to do with the
processing of bind parameters. How many parameters are being sent in
per query? ie. what is len(
rerunning with -s "time" for the %prun call. Also with a full run as that
was the partial.
In total full run should delete 9000+ records.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Com
So I implemented this version with one change. I moved this line:
cols = [getattr(cls, colname) for colname in cls.SQL_PK]
To the beginning before the loop, actually before the outer loop that
parses the batch.
However, the new version is MUCH slower than the original, so I profiled it
and t
On Tue, Aug 29, 2017 at 9:49 PM, Ken MacKenzie wrote:
> I have a query I have constructed and I had to deal with a composite primary
> key to select items;
>
> q = s.query(cls)
> or_cond = []
> for x in id_batch:
>
> pkf = [text(f + "=='" + v + "
I have a query I have constructed and I had to deal with a composite
primary key to select items;
q = s.query(cls)
or_cond = []
for x in id_batch:
pkf = [text(f + "=='" + v + "'") for f,v in zip(cls.SQL_PK,
x)]
and_cond = (and_
29 matches
Mail list logo