Re: MySQL index hints
On Thu, Aug 26, 2010 at 12:09 AM, Michael Manfrewrote: > > On Aug 24, 9:30 pm, Russell Keith-Magee > wrote: >> I don't follow your logic here. > > I guess I'm more of the mindset of query hints having as much to do > with the stored data as the schema and most hints would probably be a > premature optimization or specific to a failing with a specific > database version. I do acknowledge that there are situations where > being able to ship hint corrections for known issues would be useful, > but I can't escape from the thought that those situations would attach > an expectation about which database backend should, or should probably > not be used with the app. A reasonable objection. The problem is that a query is contained in code; if I (as an end-user) discover that a particular query needs to be hinted, I don't have an entry point. Putting a with_hint() keyword into the query provides that entry point, but then means that we need to be able to account for the case where the same app is deployed under MySQL and Postgres. A namespace is the best way I can think to acheive this. Of course, I'm open to any other suggestion for how we can separate the location where the hint is required from the actual hint definition. >> And that's the proposal on the table, except that you would need to >> specify that this is a MSSQL hint, since the hint you're making isn't >> of any use to a user of any database other than MSSQL. > > I'm assuming the current proposal will allow for arbitrary namespaces > that are defined simply by a backend willing to look for the specific > name. If not, there would need to be discussion about which namespaces > are defined by the core and how unsupported backends would register > new namespaces. Correct. The current proposal makes the namespace the last part of the backend's import path (e.g., mysql for the django.db.backends.mysql backend). This does have the potential for collisions if you write a custom mysql backend in a different location. However, there isn't a huge population of backends, so the practical likelihood of collision is relatively small. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Aug 24, 9:30 pm, Russell Keith-Mageewrote: > I don't follow your logic here. I guess I'm more of the mindset of query hints having as much to do with the stored data as the schema and most hints would probably be a premature optimization or specific to a failing with a specific database version. I do acknowledge that there are situations where being able to ship hint corrections for known issues would be useful, but I can't escape from the thought that those situations would attach an expectation about which database backend should, or should probably not be used with the app. > And that's the proposal on the table, except that you would need to > specify that this is a MSSQL hint, since the hint you're making isn't > of any use to a user of any database other than MSSQL. I'm assuming the current proposal will allow for arbitrary namespaces that are defined simply by a backend willing to look for the specific name. If not, there would need to be discussion about which namespaces are defined by the core and how unsupported backends would register new namespaces. Regards, Michael Manfre -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Wed, Aug 25, 2010 at 5:11 AM, Michael Manfrewrote: > I don't think namespacing is a valid objection against query hints. By > design, query hints are specific to a database version and if a > generic django app were to include them, then the author is doing > something wrong or setting the expectation of only supporting a very > specific environment. I don't follow your logic here. If I'm writing a reusable app, I define some models, and I define the views that use those models. Those views issue queries. Therefore, I know both the tables that are involved, and the queries that will be issued on those tables. If I'm really tuning things, I might even be using custom SQL, shipped with my project, to create custom indices for those tables. Since I'm in control of the table and how the table is used, I should also be in a position to provide any query plan hints that will assist the known usage patterns. If I'm distributing my application to many people, different people in my audience will be using different databases. That puts me in a position of (potentially) having to provide hints for multiple databases as part of my shipped application. Therefore, there is a namespace problem. I need MySQL hints to be applied for users that have MySQL. I need Oracle hints to be applied for users that have Oracle, ignoring the MySQL hints. I need Postgres to ignore any hint that is applied. On top of that, there's a possibility that a hint name (and value format) for one database might clash with another. I don't know of any specific examples of overlap, but it isn't hard to manufacture one using generic hinting keywords like "index = foo". > Any core implementation for supporting this > should be minimalist and only provide a mechanism for passing some > kind of data from the QuerySet to the SQLCompiler that each backend > can optionally choose to do something with it. > > I would be thrilled to be able to do use > > with_hints('HASH GROUP', 'FAST 10') > > and have it set > > Query.query_hints = ['HASH GROUP', 'FAST 10'] > > I'm currently patching in support so I can do something like this for > django-mssql. And that's the proposal on the table, except that you would need to specify that this is a MSSQL hint, since the hint you're making isn't of any use to a user of any database other than MSSQL. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
I don't think namespacing is a valid objection against query hints. By design, query hints are specific to a database version and if a generic django app were to include them, then the author is doing something wrong or setting the expectation of only supporting a very specific environment. Any core implementation for supporting this should be minimalist and only provide a mechanism for passing some kind of data from the QuerySet to the SQLCompiler that each backend can optionally choose to do something with it. I would be thrilled to be able to do use with_hints('HASH GROUP', 'FAST 10') and have it set Query.query_hints = ['HASH GROUP', 'FAST 10'] I'm currently patching in support so I can do something like this for django-mssql. Regards, Michael Manfre On Jul 22, 12:03 am, Russell Keith-Mageewrote: > On Fri, Jul 16, 2010 at 10:43 AM, Simon Litchfield > wrote: > > Russ -- > > > Firstly, re namespacing. No worries, let's just keep it RDBMS- > > specific, ie -- > > > MySQL - with_hints(use={}, force={}, ignore={}) > > Oracle - with_hints(index={}, ...) > > > That gives plenty of name space to breathe in. > > True, I don't think it's a really nice API. Why is "index" a command > for Oracle? What happens if some other backend accepts something with > a "index"? Personally, I'd rather see this be an explicit namespace: > > with_hints(mysql={...}, oracle={...}) > > That guarantees that there isn't any cross-database clashes with hint > names, only with database backend names. A backend can choose look at > the full list of hints and determine which hint set it should use; > potentially, it could even look at another backend's hints and act on > them (although I don't know why this would be helpful, except perhaps > in the case of a custom MySQL backend that extends the base > capabilities). > > > > > Next -- > > >> Using the filter/exclude name sounds interesting, but skips the first > >> step of putting an index hint on the origin table. For example, > >> Book.objects.filter(author__name='foo') -- now put an index hint on > >> Book; now put one on Author. If Author and Book are an a m2m relation > >> -- put a hint on the m2m table. > > > How about -- > > > {'book': 'book_idx', 'author': 'author_idx', 'author__name': > > 'm2m_idx'} > > > Or, if you want to cover absolutely every possibility -- > > > {'book': 'book_idx', 'author__name__author': 'author_idx', > > 'author__name': 'm2m_idx'} > > > That is, require indexes on the right side of joins to be named > > explicitly, just in case Author is used in another join. > > > Sounds complicated, but obviously edge case. Common usage will be > > simple, eg {'book':'book_idx'}. > > > Let me know what you think. > > What about when the query constructs multiple joins on a related > table? How does the hint get named and applied then? Is it a > consistent "a join between A and B will always get hint X"? Or do > different joins need different hints? > > Yours, > Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Thu, Jul 22, 2010 at 12:59 PM, Santiago Perezwrote: >> I'd be happy to use raw(); but then you lose len(), slicing, >> pagination, filter chaining, etc. My problem came about because admin >> change lists were unusably slow on large tables. With_hints allowed a >> simple monkey patch to admin, dropping 2s per page down to ~0.01s. > > How about a raw_hints() or something that expects you to insert > additional SQL code that won't affect the result so it can still > return a QuerySet that supports len(), slicing, etc...? > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To post to this group, send email to django-develop...@googlegroups.com. > To unsubscribe from this group, send email to > django-developers+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. > > You mean extra() ;) Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Voltaire "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
> I'd be happy to use raw(); but then you lose len(), slicing, > pagination, filter chaining, etc. My problem came about because admin > change lists were unusably slow on large tables. With_hints allowed a > simple monkey patch to admin, dropping 2s per page down to ~0.01s. How about a raw_hints() or something that expects you to insert additional SQL code that won't affect the result so it can still return a QuerySet that supports len(), slicing, etc...? -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Fri, Jul 16, 2010 at 10:43 AM, Simon Litchfieldwrote: > Russ -- > > Firstly, re namespacing. No worries, let's just keep it RDBMS- > specific, ie -- > > MySQL - with_hints(use={}, force={}, ignore={}) > Oracle - with_hints(index={}, ...) > > That gives plenty of name space to breathe in. True, I don't think it's a really nice API. Why is "index" a command for Oracle? What happens if some other backend accepts something with a "index"? Personally, I'd rather see this be an explicit namespace: with_hints(mysql={...}, oracle={...}) That guarantees that there isn't any cross-database clashes with hint names, only with database backend names. A backend can choose look at the full list of hints and determine which hint set it should use; potentially, it could even look at another backend's hints and act on them (although I don't know why this would be helpful, except perhaps in the case of a custom MySQL backend that extends the base capabilities). > Next -- > >> Using the filter/exclude name sounds interesting, but skips the first >> step of putting an index hint on the origin table. For example, >> Book.objects.filter(author__name='foo') -- now put an index hint on >> Book; now put one on Author. If Author and Book are an a m2m relation >> -- put a hint on the m2m table. > > How about -- > > {'book': 'book_idx', 'author': 'author_idx', 'author__name': > 'm2m_idx'} > > Or, if you want to cover absolutely every possibility -- > > {'book': 'book_idx', 'author__name__author': 'author_idx', > 'author__name': 'm2m_idx'} > > That is, require indexes on the right side of joins to be named > explicitly, just in case Author is used in another join. > > Sounds complicated, but obviously edge case. Common usage will be > simple, eg {'book':'book_idx'}. > > Let me know what you think. What about when the query constructs multiple joins on a related table? How does the hint get named and applied then? Is it a consistent "a join between A and B will always get hint X"? Or do different joins need different hints? Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
Simon -- Slicing result sets is clearly the developer's responsibility IMHO. Even with hard limits, MySQL still fails miserably in many situations; hence the proposal to support hinting from the ORM. Russ -- Firstly, re namespacing. No worries, let's just keep it RDBMS- specific, ie -- MySQL - with_hints(use={}, force={}, ignore={}) Oracle - with_hints(index={}, ...) That gives plenty of name space to breathe in. Next -- > Using the filter/exclude name sounds interesting, but skips the first > step of putting an index hint on the origin table. For example, > Book.objects.filter(author__name='foo') -- now put an index hint on > Book; now put one on Author. If Author and Book are an a m2m relation > -- put a hint on the m2m table. How about -- {'book': 'book_idx', 'author': 'author_idx', 'author__name': 'm2m_idx'} Or, if you want to cover absolutely every possibility -- {'book': 'book_idx', 'author__name__author': 'author_idx', 'author__name': 'm2m_idx'} That is, require indexes on the right side of joins to be named explicitly, just in case Author is used in another join. Sounds complicated, but obviously edge case. Common usage will be simple, eg {'book':'book_idx'}. Let me know what you think. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Thu, Jul 8, 2010 at 3:51 PM, Simon Riggswrote: > On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote: >> > If you can come up with answers to these points, I might get >> > interested. 1 and 2 are fairly trivial; I can think of some obvious >> > answers for 3, but 4 is the big problem, and will require some >> serious >> > research and consideration. >> >> Well, I'm glad you like the with_hints() approach. Items 1-3 are easy. >> Re 4 though, every db does it differently. In practice, most don't >> need hints like MySQL does , because their query optimisers do a >> much better job. > > The big problem I see is that hints are simply the wrong approach to > handling this issue, which I do see as an important one. > > The SQL optimizer can't work out how you're going to handle the queryset > if all you mention is the filter(). SQL being a set-based language the > optimizer won't know whether you wish to retrieve 0, 1 or 1 million > rows. In many cases it actively avoids using the index for what it > thinks will be larger retrievals. > That's categorically untrue. One of the major functions of an optimizer is too try to figure out the approximate result size so it can better establish index vs. data cost. > The number of rows required from the queryset is an important part of > defining the access path. Hints are only required because we didn't > specify the queryset in sufficient detail for the optimizer to > understand what we wanted. > Uhh, querysets don't really generate SQL that's in any way different from what a normal person would write. > I would say the correct approach here is to use slicing, since this will > add a LIMIT clause and provide full usage information to the SQL > optimizer about the queryset. > > Can I suggest an auto-slice parameter so that we can add LIMIT N onto > the bottom of every query by default, if a slice is not already defined. > That would work for all DBMS. > And auto_slice parameter to what exactly? > -- > Simon Riggs www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Training and Services > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To post to this group, send email to django-develop...@googlegroups.com. > To unsubscribe from this group, send email to > django-developers+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. > > Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Voltaire "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote: > > If you can come up with answers to these points, I might get > > interested. 1 and 2 are fairly trivial; I can think of some obvious > > answers for 3, but 4 is the big problem, and will require some > serious > > research and consideration. > > Well, I'm glad you like the with_hints() approach. Items 1-3 are easy. > Re 4 though, every db does it differently. In practice, most don't > need hints like MySQL does , because their query optimisers do a > much better job. The big problem I see is that hints are simply the wrong approach to handling this issue, which I do see as an important one. The SQL optimizer can't work out how you're going to handle the queryset if all you mention is the filter(). SQL being a set-based language the optimizer won't know whether you wish to retrieve 0, 1 or 1 million rows. In many cases it actively avoids using the index for what it thinks will be larger retrievals. The number of rows required from the queryset is an important part of defining the access path. Hints are only required because we didn't specify the queryset in sufficient detail for the optimizer to understand what we wanted. I would say the correct approach here is to use slicing, since this will add a LIMIT clause and provide full usage information to the SQL optimizer about the queryset. Can I suggest an auto-slice parameter so that we can add LIMIT N onto the bottom of every query by default, if a slice is not already defined. That would work for all DBMS. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Mon, Jul 5, 2010 at 11:19 PM, Simon Litchfieldwrote: >> I would like to know how you're validating your assertion that MySQL >> is the most used backend. It doesn't match my experience or >> observation. > > Nobody knows for sure. I'd put my money on it though. > >> The fact that this is a MySQL-specific issue is perhaps the biggest >> impediment to my enthusiasm about this specific feature (as proposed). >> I've spent enough time in the past covering for the inadequacies of >> MySQL. I don't particularly relish the thought of adding more time to >> that particular ledger. :-) > > I know, I've seen you pulling your hair out :-) > >> Having an implementation ignore kwargs is the obvious approach to >> providing this feature cross platform. The issue for me is what >> exactly the kwargs should be, and how they would be interpreted. It >> isn't clear to be how 'index="my_index"' maps to the application of >> USE INDEX, IGNORE INDEX or FORCE INDEX in a query, or how the >> dictionary syntax would map to the situation where you have two >> appearances of a given table in a query. > > Great, so lets define the syntax then and away we go. > > 1) I'd say index= would map to MySQL's USE INDEX and Oracle's index(). > The others you mention there would be MySQL-specific kwargs. Or, more > simply, ignore and force could be '-index' and '+index'. Bit like we > use + and - for order_by. The problem with sharing a common keyword is that an index hint specified for Oracle won't be useful (or even legal, necessarily) for MySQL. If I'm going to ship my reusable app, and I know that one particular query will perform badly under MySQL, it would be good to be able to ship with whatever hinting strategy will help. However, that same hint may not be appropriate for other backends, even if those backends do support hinting. Effectively, there's a namespace issue here; we need to be able to specify "MySQL" hints independently of "Oracle" hints. As a further complication -- as of Django 1.2, database backends are a fully qualified path, so "mysql" isn't necessarily sufficient to identify a backend (although I'm slightly more comfortable putting this sort of collision in the category of "acceptable risk" if there's no obvious way to avoid it). As for the "+/-" syntax; I'm not wild about it. I'm not an Oracle expert, but from a quick inspection it appears that there are dozens of optimizations that could potentially be applied, and none of them fall into the "USE/FORCE/IGNORE" triptych (many of them don't even apply to indexes directly). Even on MySQL -- optimizations like "USE INDEX x FOR ORDER BY" won't be covered by a simple syntax. I'm not expecting that you will provide a patch that will implement all possible hints, but I don't want to adopt a syntax that will cut off future options. I want to have some confidence that it would be *possible* to implement Oracle hinting or MySQL FOR ORDER BY hinting if they were so inclined. > 2) The multiple table issue is edge case but worth supporting if it > doesn't complicate the syntax too much. Rather than use models as keys > we could use the filter/exclude name; eg {'relatedmodel__field': > 'index'} would apply 'index' specifically to the 'relatedmodel__field' > join. Using the filter/exclude name sounds interesting, but skips the first step of putting an index hint on the origin table. For example, Book.objects.filter(author__name='foo') -- now put an index hint on Book; now put one on Author. If Author and Book are an a m2m relation -- put a hint on the m2m table. I'd need to see a more complete set of worked examples to see how this would pan out. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Wed, Jul 7, 2010 at 8:07 AM, Kevin Howertonwrote: >> For a very high traffic project backed by 20+ DB servers > Slightly OT, but MySQL performance related... > > If you are write heavy, there's another issue that I imagine would > bring significant gains on a MyISAM setup of that proportion... and > possibly other databases to a lesser degree (the locking mechanisms > with MyISAM are different) Yes - this is a well known limitation of MyISAM tables. This is a situation where you need to pick the right tool for the job; if you're on a write heavy site, MyISAM probably isn't going to be a good choice. > Basically, right now the ORM will UPDATE or INSERT based off of a > SELECT query it does when you hit save(). So even if the ORM should > already know it's an UPDATE it will run this logic and these queries. > > There's "force_update=True" which I imagine would skip the SELECT, but > I was looking into it because the logic here has some other > side-effects as well. > > One of the side effects is that if you change a PK, and try to save it > you end up with two rows. I guess the PK probably shouldn't be > mutable, or we can fix that logic and possible see some gains in > speed. If you've got any ideas on how to optimize the object saving process, we're happy to entertain them -- but keep in mind Django's backwards compatibility policy. Creating new objects by manually setting the primary key is a well established idiom. We can't eliminate this behavior without breaking backwards compatibility. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
> For a very high traffic project backed by 20+ DB servers Slightly OT, but MySQL performance related... If you are write heavy, there's another issue that I imagine would bring significant gains on a MyISAM setup of that proportion... and possibly other databases to a lesser degree (the locking mechanisms with MyISAM are different). Basically, right now the ORM will UPDATE or INSERT based off of a SELECT query it does when you hit save(). So even if the ORM should already know it's an UPDATE it will run this logic and these queries. There's "force_update=True" which I imagine would skip the SELECT, but I was looking into it because the logic here has some other side-effects as well. One of the side effects is that if you change a PK, and try to save it you end up with two rows. I guess the PK probably shouldn't be mutable, or we can fix that logic and possible see some gains in speed. I have some tests somewhere to try and illustrate how MyISAM locks, and there might be a couple open tickets relating to side effects... if there's any interest... kind of waiting until things settle with the refactor, as I have some ideas I want to try out. -k -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
> I would like to know how you're validating your assertion that MySQL > is the most used backend. It doesn't match my experience or > observation. Nobody knows for sure. I'd put my money on it though. > The fact that this is a MySQL-specific issue is perhaps the biggest > impediment to my enthusiasm about this specific feature (as proposed). > I've spent enough time in the past covering for the inadequacies of > MySQL. I don't particularly relish the thought of adding more time to > that particular ledger. :-) I know, I've seen you pulling your hair out :-) > Having an implementation ignore kwargs is the obvious approach to > providing this feature cross platform. The issue for me is what > exactly the kwargs should be, and how they would be interpreted. It > isn't clear to be how 'index="my_index"' maps to the application of > USE INDEX, IGNORE INDEX or FORCE INDEX in a query, or how the > dictionary syntax would map to the situation where you have two > appearances of a given table in a query. Great, so lets define the syntax then and away we go. 1) I'd say index= would map to MySQL's USE INDEX and Oracle's index(). The others you mention there would be MySQL-specific kwargs. Or, more simply, ignore and force could be '-index' and '+index'. Bit like we use + and - for order_by. 2) The multiple table issue is edge case but worth supporting if it doesn't complicate the syntax too much. Rather than use models as keys we could use the filter/exclude name; eg {'relatedmodel__field': 'index'} would apply 'index' specifically to the 'relatedmodel__field' join. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Mon, Jul 5, 2010 at 3:59 PM, Simon Litchfieldwrote: >> If you can come up with answers to these points, I might get >> interested. 1 and 2 are fairly trivial; I can think of some obvious >> answers for 3, but 4 is the big problem, and will require some serious >> research and consideration. > > Well, I'm glad you like the with_hints() approach. Items 1-3 are easy. > Re 4 though, every db does it differently. In practice, most don't > need hints like MySQL does , because their query optimisers do a > much better job. > > I'd be happy to use raw(); but then you lose len(), slicing, > pagination, filter chaining, etc. My problem came about because admin > change lists were unusably slow on large tables. With_hints allowed a > simple monkey patch to admin, dropping 2s per page down to ~0.01s. > > MySQL is still the most used backend AFAIK, and it's the one that > really needs the hints (poor old thing it is). I would like to know how you're validating your assertion that MySQL is the most used backend. It doesn't match my experience or observation. The fact that this is a MySQL-specific issue is perhaps the biggest impediment to my enthusiasm about this specific feature (as proposed). I've spent enough time in the past covering for the inadequacies of MySQL. I don't particularly relish the thought of adding more time to that particular ledger. :-) > Adding with_hints() > will only serve to encourage support from other backends too, where > possible. > > Maybe we can make the with_hints() syntax more generic with both > common and backend-specific kwargs -- > > .with_hints(index='my_index') (string implies index on queryset base > model) > .with_hints(index={Model:'my_index', RelatedModel:'index2'}) > (dictionary allows defining index on a per-model basis) > > So the Oracle backend, for example, could implement -- > > .with_hints(hash=True, index={Model:'my_index', > RelatedModel:'index2'}) ( > > On Oracle there are dozens of possible hints, so I'd say unsupported > kwargs could simply be ignored. This would ensure seamless database > portability. Having an implementation ignore kwargs is the obvious approach to providing this feature cross platform. The issue for me is what exactly the kwargs should be, and how they would be interpreted. It isn't clear to be how 'index="my_index"' maps to the application of USE INDEX, IGNORE INDEX or FORCE INDEX in a query, or how the dictionary syntax would map to the situation where you have two appearances of a given table in a query. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Mon, Jul 5, 2010 at 3:08 PM, hinnackwrote: > Thats interesting. > Can you explain, how the search keyword made it into the source? > Entry.objects.filter(headline__search="+Django -jazz Python") > SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz > Python IN BOOLEAN MODE); > Seems to be very MySQL specific... Yes; As implemented, __search is MySQL specific, and the API is covered by our backwards compatibility policy, so we will continue to support it. However, that doesn't mean it should be used as an example or excuse for other database-specific features in the ORM. The only explanation I can give for this particular feature is age. The __search keyword was added over 4 years ago (Revision r3073, ticket #573). Django was still in a period of rapid growth at that time. Some of the features that were added at that time might not survive if they were proposed today (either because of hindsight, or because of changes in the perceived role of the ORM over time). If a fulltext search operator were proposed today, it might still be accepted -- after all, a query of text data using a rich full-text search syntax is something that transcends SQL and could be considered a primitive operation on object data. However, before I would accept such a proposal, I would need to be convinced that the proposed syntax would be supported on other backends. Postgres 8.3, for instance, has fulltext search indexing capabilities. I would also need to be convinced that the proposed syntax was rich enough to potentially cover the range of features provided by MySQL. As it stands, we only support queries "IN BOOLEAN MODE", not "IN NATURAL LANGUAGE MODE" or "WITH QUERY EXPANSION". Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Mon, 2010-07-05 at 00:59 -0700, Simon Litchfield wrote: > I'd be happy to use raw(); but then you lose len(), slicing, > pagination, filter chaining, etc. My problem came about because admin > change lists were unusably slow on large tables. With_hints allowed a > simple monkey patch to admin, dropping 2s per page down to ~0.01s. So the underlying problem is: how to rewrite a query (whether generated by my code, someone else's code, or a mixture) so that it performs better on my database. There will always be third party code which generates queries that are really bad for performance, depending on your project or your database. This also applies to tickets like http://code.djangoproject.com/ticket/11604 This makes me think that what we need is a mechanism to do an equivalent to your monkey patching that is easier and more general. Some kind of "replace the auto-generated SQL with this SQL" method could be really useful. Could this be solved by writing your own cursor wrapper which checks the SQL against a list and rewrites as necessary? (You could set it up quickly like this: http://chris-lamb.co.uk/2010/06/01/appending-request-url-sql-statements-django/ That still involves monkey patching, but it is possible to do the same thing by writing a database backend). The major drawback is fragility - doing a replacement at this level could easily break, for example if you added a field to a model. Adequate testing could catch this if your cursor wrapper had a debug mode that could report what was replaced. If we tried to do the replacement at a higher level, then you run into the problem of how to recognise a query and replace it, which could be a much more expensive operation if we are trying to compare objects of type django.db.models.sql.query.Query, for example, and you also might have the problem of not having full access to SQL statements. However, if this method does work, if would be good to document how to do it, and provide some better builtin hooks if necessary. Luke -- "We may not return the affection of those who like us, but we always respect their good judgement." -- Libbie Fudim Luke Plant || http://lukeplant.me.uk/ -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Jul 5, 5:08 pm, hinnackwrote: > Thats interesting. > Can you explain, how the search keyword made it into the source? > Entry.objects.filter(headline__search="+Django -jazz Python") > SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz > Python IN BOOLEAN MODE); > Seems to be very MySQL specific... Yes Henrik, it's true- this is a classic example of a *very useful* MySQL-specific feature already in the ORM. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
> If you can come up with answers to these points, I might get > interested. 1 and 2 are fairly trivial; I can think of some obvious > answers for 3, but 4 is the big problem, and will require some serious > research and consideration. Well, I'm glad you like the with_hints() approach. Items 1-3 are easy. Re 4 though, every db does it differently. In practice, most don't need hints like MySQL does , because their query optimisers do a much better job. I'd be happy to use raw(); but then you lose len(), slicing, pagination, filter chaining, etc. My problem came about because admin change lists were unusably slow on large tables. With_hints allowed a simple monkey patch to admin, dropping 2s per page down to ~0.01s. MySQL is still the most used backend AFAIK, and it's the one that really needs the hints (poor old thing it is). Adding with_hints() will only serve to encourage support from other backends too, where possible. Maybe we can make the with_hints() syntax more generic with both common and backend-specific kwargs -- .with_hints(index='my_index') (string implies index on queryset base model) .with_hints(index={Model:'my_index', RelatedModel:'index2'}) (dictionary allows defining index on a per-model basis) So the Oracle backend, for example, could implement -- .with_hints(hash=True, index={Model:'my_index', RelatedModel:'index2'})( On Oracle there are dozens of possible hints, so I'd say unsupported kwargs could simply be ignored. This would ensure seamless database portability. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
Thats interesting. Can you explain, how the search keyword made it into the source? Entry.objects.filter(headline__search="+Django -jazz Python") SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz Python IN BOOLEAN MODE); Seems to be very MySQL specific... regards Henrik On 4 Jul., 17:01, Russell Keith-Mageewrote: > On Sun, Jul 4, 2010 at 1:10 PM, Simon Litchfield wrote: > > For those of us using MySQL, having large tables, whilst also wanting > > queryset goodness -- > >http://code.djangoproject.com/ticket/11003 > > > It goes a little something like this -- > > Model.objects.filter(field=value).with_hints('my_index') > > > All those in favour say I. > > Nay. :-) > > Or, at least, not until you demonstrate how the same syntax is > interpreted in Postgres, SQLite, Oracle, any other SQL database, and > potentially on MongoDB, Cassandra, BigTable, or other noSQL datastore. > > This was the biggest problem when I closed #11003 originally; it > remains the biggest problem. I'm not in favor of adding an ORM > extension that is only useful for MySQL's query optimizer. That > defeats the purpose of having a cross-database ORM syntax. If you are > at the point of optimizing your system and you discover that you have > a dataset and query that actually require database-specific fine > tuning, I don't see it as an unreasonable requirement that you crank > out some database-specific raw SQL. > > Django's ORM isn't intended to be perfect for all situations. It's > intended to make the 80% case trivial so you can bootstrap a site > without having to worry about the little details. Once you've hit that > point and you're tuning performance, you're going to need to pay close > attention to the little details like exactly how your site uses SQL. > Attempting to find a way to express every possible SQL statement, > including backend-specific optimizations, into an ORM syntax is folly. > After all, we already have a syntax that is able to express every > possible SQL statement... it's called SQL :-) > > That said -- I'm not fundamentally opposed to the idea of introducing > hinting tools to the ORM, as long as: > * they are demonstrably rich enough to get the job done, > * have an interpretation that is compatible with the cross-platform, > reusable-app message that underpins Django's design philosophy. > > To that end, the 'with_hint()' proposal appeals to me a lot more than > the previous suggestion of Yet Another Extra Argument. However, the > patch as provided: > > 1) Doesn't contain tests or docs > 2) Will break queries if the user includes the with_hint() clause on > any database other than MySQL > 3) Even then, only supports the USE INDEX keyword, not FORCE INDEX or > IGNORE INDEX > 4) Doesn't suggest how hinting strategies would be implemented on a > database other than MySQL > > If you can come up with answers to these points, I might get > interested. 1 and 2 are fairly trivial; I can think of some obvious > answers for 3, but 4 is the big problem, and will require some serious > research and consideration. > > I would also add that if this is a query extension that you really do > want, it's something that should be possible to add by writing a > custom database backend that extends Django's MySQL backend and > provides a custom Query base class -- and if this can't be done > relatively easily, then a refactoring of the sql.Query or SQLCompiler > interface to allow easier query rewriting by a backend *is* something > I could get behind very easily. > > Yours, > Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: MySQL index hints
On Sun, Jul 4, 2010 at 1:10 PM, Simon Litchfieldwrote: > For those of us using MySQL, having large tables, whilst also wanting > queryset goodness -- > http://code.djangoproject.com/ticket/11003 > > It goes a little something like this -- > Model.objects.filter(field=value).with_hints('my_index') > > All those in favour say I. Nay. :-) Or, at least, not until you demonstrate how the same syntax is interpreted in Postgres, SQLite, Oracle, any other SQL database, and potentially on MongoDB, Cassandra, BigTable, or other noSQL datastore. This was the biggest problem when I closed #11003 originally; it remains the biggest problem. I'm not in favor of adding an ORM extension that is only useful for MySQL's query optimizer. That defeats the purpose of having a cross-database ORM syntax. If you are at the point of optimizing your system and you discover that you have a dataset and query that actually require database-specific fine tuning, I don't see it as an unreasonable requirement that you crank out some database-specific raw SQL. Django's ORM isn't intended to be perfect for all situations. It's intended to make the 80% case trivial so you can bootstrap a site without having to worry about the little details. Once you've hit that point and you're tuning performance, you're going to need to pay close attention to the little details like exactly how your site uses SQL. Attempting to find a way to express every possible SQL statement, including backend-specific optimizations, into an ORM syntax is folly. After all, we already have a syntax that is able to express every possible SQL statement... it's called SQL :-) That said -- I'm not fundamentally opposed to the idea of introducing hinting tools to the ORM, as long as: * they are demonstrably rich enough to get the job done, * have an interpretation that is compatible with the cross-platform, reusable-app message that underpins Django's design philosophy. To that end, the 'with_hint()' proposal appeals to me a lot more than the previous suggestion of Yet Another Extra Argument. However, the patch as provided: 1) Doesn't contain tests or docs 2) Will break queries if the user includes the with_hint() clause on any database other than MySQL 3) Even then, only supports the USE INDEX keyword, not FORCE INDEX or IGNORE INDEX 4) Doesn't suggest how hinting strategies would be implemented on a database other than MySQL If you can come up with answers to these points, I might get interested. 1 and 2 are fairly trivial; I can think of some obvious answers for 3, but 4 is the big problem, and will require some serious research and consideration. I would also add that if this is a query extension that you really do want, it's something that should be possible to add by writing a custom database backend that extends Django's MySQL backend and provides a custom Query base class -- and if this can't be done relatively easily, then a refactoring of the sql.Query or SQLCompiler interface to allow easier query rewriting by a backend *is* something I could get behind very easily. Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
MySQL index hints
For those of us using MySQL, having large tables, whilst also wanting queryset goodness -- http://code.djangoproject.com/ticket/11003 It goes a little something like this -- Model.objects.filter(field=value).with_hints('my_index') All those in favour say I. Simon -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.