RE: [PHP] Paging and permissions
On Tue, 2011-02-08 at 14:36 +0200, Arno Kuhl wrote: I'm hoping some clever php gurus have been here before and are willing to share some ideas. I have a site where articles are assigned to categories in containers. An article can be assigned to only one category per container, but one or more containers. Access permissions can be set per article, per category and/or per container, for one or more users and/or user groups. If an article is assigned to 10 categories and only one of those has a permission denying access, then the article can't be accessed even if browsing through one of the other 9 categories. Currently everything works fine, with article titles showing when browsing through category or search result lists, and a message is displayed when the article is clicked if it cannot be viewed because of a permission. Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. I'm stuck with how to determine the number of results for paging at the start of the list or search. The site is quite large (20,000+ articles and growing) so reading the entire result set and sifting through it with permission rules for each request is not an option. But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. I thought of saving the set to a temporary db table or file (not sure about overhead of serializing/unserializing large arrays). A sizing exercise based on the recordset returned for searches and lists shows a max of about 150MB for 20,000 articles and 380MB for 50,000 articles that needs to be saved temporarily per search or list request - in the vast majority of cases the set will be *much* smaller but it needs to cope with the worst case, and still do so a year down the line. All this extra work because I can't simply get an accurate number of results for paging, because of permissions! So my questions are: 1. Which is better (performance) for this situation: file or db? 2. How do I prepare a potentially very large data set for file or fast writing to a new table (ie I obviously don't want to write it record by record) 3. Are there any other alternatives worth looking at? TIA Cheers Arno How are you determining (logically, not in code) when an article is allowed to be read? Assume an article on user permissions in mysql is in a container called 'databases' and in a second one called 'security' and both containers are in a category called 'computers' Now get a user called John who is in a group called 'db admins' and that group gives him permissions to view all articles in the 'databases' container and any articles in any container in the 'computers' category. Now assume John also has explicit user permissions revoking that right to view the article in any container. What I'm getting at is what's the order of privilege for rights? Do group rights for categories win out over those for containers, or do individual user rights trump all of them overall? I think once that's figured out, a lot can be done inside the query itself to minimise the impact on the script getting the results. Thanks, Ash http://www.ashleysheridan.co.uk --- The simple structure is articles in categories, categories in containers, only one article per container/category, in one or more containers. If an article permission explicitly allows or denies access then the permission applies, otherwise the container/s and category/s permissions are checked. The permission checks user access first then group. A user can belong to multiple groups. There's no query to handle this that can return a neat recordset for paging. Currently the complete checks are only done for an article request. The category list only checks access to the category and the container it belongs to, so the list is either displayed in its entirety (including titles of articles that can't be viewed) or not at all, and obviously the paging works perfectly because the total number of titles is known up front and remains constant for subsequent requests. If I use read-ahead to make allowance for permissions and remove paging (just keep prev/next) the problem goes away. Or I could use best-guess paging, which could range from 100% accurate to 99% wrong. At first glance that's not really acceptable, but I noticed recently Google does the same thing with their search results. First prize is to work out a proper solution that is fast and accurate and works on fairly large results, and I'm still hoping for some
RE: [PHP] Paging and permissions
On Wed, 2011-02-09 at 13:03 +0200, Arno Kuhl wrote: On Tue, 2011-02-08 at 14:36 +0200, Arno Kuhl wrote: I'm hoping some clever php gurus have been here before and are willing to share some ideas. I have a site where articles are assigned to categories in containers. An article can be assigned to only one category per container, but one or more containers. Access permissions can be set per article, per category and/or per container, for one or more users and/or user groups. If an article is assigned to 10 categories and only one of those has a permission denying access, then the article can't be accessed even if browsing through one of the other 9 categories. Currently everything works fine, with article titles showing when browsing through category or search result lists, and a message is displayed when the article is clicked if it cannot be viewed because of a permission. Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. I'm stuck with how to determine the number of results for paging at the start of the list or search. The site is quite large (20,000+ articles and growing) so reading the entire result set and sifting through it with permission rules for each request is not an option. But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. I thought of saving the set to a temporary db table or file (not sure about overhead of serializing/unserializing large arrays). A sizing exercise based on the recordset returned for searches and lists shows a max of about 150MB for 20,000 articles and 380MB for 50,000 articles that needs to be saved temporarily per search or list request - in the vast majority of cases the set will be *much* smaller but it needs to cope with the worst case, and still do so a year down the line. All this extra work because I can't simply get an accurate number of results for paging, because of permissions! So my questions are: 1. Which is better (performance) for this situation: file or db? 2. How do I prepare a potentially very large data set for file or fast writing to a new table (ie I obviously don't want to write it record by record) 3. Are there any other alternatives worth looking at? TIA Cheers Arno How are you determining (logically, not in code) when an article is allowed to be read? Assume an article on user permissions in mysql is in a container called 'databases' and in a second one called 'security' and both containers are in a category called 'computers' Now get a user called John who is in a group called 'db admins' and that group gives him permissions to view all articles in the 'databases' container and any articles in any container in the 'computers' category. Now assume John also has explicit user permissions revoking that right to view the article in any container. What I'm getting at is what's the order of privilege for rights? Do group rights for categories win out over those for containers, or do individual user rights trump all of them overall? I think once that's figured out, a lot can be done inside the query itself to minimise the impact on the script getting the results. Thanks, Ash http://www.ashleysheridan.co.uk --- The simple structure is articles in categories, categories in containers, only one article per container/category, in one or more containers. If an article permission explicitly allows or denies access then the permission applies, otherwise the container/s and category/s permissions are checked. The permission checks user access first then group. A user can belong to multiple groups. There's no query to handle this that can return a neat recordset for paging. Currently the complete checks are only done for an article request. The category list only checks access to the category and the container it belongs to, so the list is either displayed in its entirety (including titles of articles that can't be viewed) or not at all, and obviously the paging works perfectly because the total number of titles is known up front and remains constant for subsequent requests. If I use read-ahead to make allowance for permissions and remove paging (just keep prev/next) the problem goes away. Or I could use best-guess paging, which could range from 100% accurate to 99% wrong. At first glance that's not really acceptable, but I noticed recently Google does the same thing with their search results. First prize is to work out a proper solution that is fast
RE: [PHP] Paging and permissions
Instead of serializing the articles, you only need their IDs. Using $sql .= ' where id in (' . implode(',', $ids) . ')'; you can load the data for a page of results in a single query. Storing the IDs is much cheaper than the articles. If the permissions are fairly static (i.e. access for user X to article Y doesn't change every two minutes) you could create a calculated permission table as a many-to-many between user and article. Here's the logic flow for a query: 1. Run the query to find matching article IDs 2. Load permissions from table for all IDs 3. For each article without a calculated permission, calculate it and insert a row (do a batch insert to save time) If you flag the query in the middle tier as having been processed as above, you can join to the calculated permissions each time you need another page. The downside is that the code that runs the queries has to operate in two modes: raw and joined to the permissions. If most users end up querying for all articles, the table could grow. Plus you need to purge rows any time the permissions for an article/user changes which could get fairly complicated. David --- Storing only the IDs is way cheaper than storing the entire resultset, and I'd been thinking along the same lines. Getting a complete list of valid IDs in the first place is turning out to be a different matter. The permissions for article/user aren't that straight-forward, and in fact the most common permissions are group/category and group/container, where an article can be assigned to one or more category/containers. Using a temporary permission table could be the solution. Thanks. Cheers Arno -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Paging and permissions
On Wed, 2011-02-09 at 13:27 +0200, Arno Kuhl wrote: Instead of serializing the articles, you only need their IDs. Using $sql .= ' where id in (' . implode(',', $ids) . ')'; you can load the data for a page of results in a single query. Storing the IDs is much cheaper than the articles. If the permissions are fairly static (i.e. access for user X to article Y doesn't change every two minutes) you could create a calculated permission table as a many-to-many between user and article. Here's the logic flow for a query: 1. Run the query to find matching article IDs 2. Load permissions from table for all IDs 3. For each article without a calculated permission, calculate it and insert a row (do a batch insert to save time) If you flag the query in the middle tier as having been processed as above, you can join to the calculated permissions each time you need another page. The downside is that the code that runs the queries has to operate in two modes: raw and joined to the permissions. If most users end up querying for all articles, the table could grow. Plus you need to purge rows any time the permissions for an article/user changes which could get fairly complicated. David --- Storing only the IDs is way cheaper than storing the entire resultset, and I'd been thinking along the same lines. Getting a complete list of valid IDs in the first place is turning out to be a different matter. The permissions for article/user aren't that straight-forward, and in fact the most common permissions are group/category and group/container, where an article can be assigned to one or more category/containers. Using a temporary permission table could be the solution. Thanks. Cheers Arno You can get the same set of results with a join betwixt the tables, and it should be slightly faster than creating a temporary table if you've got your indexes right. Thanks, Ash http://www.ashleysheridan.co.uk
[PHP] Paging and permissions
I'm hoping some clever php gurus have been here before and are willing to share some ideas. I have a site where articles are assigned to categories in containers. An article can be assigned to only one category per container, but one or more containers. Access permissions can be set per article, per category and/or per container, for one or more users and/or user groups. If an article is assigned to 10 categories and only one of those has a permission denying access, then the article can't be accessed even if browsing through one of the other 9 categories. Currently everything works fine, with article titles showing when browsing through category or search result lists, and a message is displayed when the article is clicked if it cannot be viewed because of a permission. Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. I'm stuck with how to determine the number of results for paging at the start of the list or search. The site is quite large (20,000+ articles and growing) so reading the entire result set and sifting through it with permission rules for each request is not an option. But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. I thought of saving the set to a temporary db table or file (not sure about overhead of serializing/unserializing large arrays). A sizing exercise based on the recordset returned for searches and lists shows a max of about 150MB for 20,000 articles and 380MB for 50,000 articles that needs to be saved temporarily per search or list request - in the vast majority of cases the set will be *much* smaller but it needs to cope with the worst case, and still do so a year down the line. All this extra work because I can't simply get an accurate number of results for paging, because of permissions! So my questions are: 1. Which is better (performance) for this situation: file or db? 2. How do I prepare a potentially very large data set for file or fast writing to a new table (ie I obviously don't want to write it record by record) 3. Are there any other alternatives worth looking at? TIA Cheers Arno
Re: [PHP] Paging and permissions
On Tue, Feb 8, 2011 at 7:36 AM, Arno Kuhl ak...@telkomsa.net wrote: I'm hoping some clever php gurus have been here before and are willing to share some ideas. I have a site where articles are assigned to categories in containers. An article can be assigned to only one category per container, but one or more containers. Access permissions can be set per article, per category and/or per container, for one or more users and/or user groups. If an article is assigned to 10 categories and only one of those has a permission denying access, then the article can't be accessed even if browsing through one of the other 9 categories. Currently everything works fine, with article titles showing when browsing through category or search result lists, and a message is displayed when the article is clicked if it cannot be viewed because of a permission. Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. I'm stuck with how to determine the number of results for paging at the start of the list or search. The site is quite large (20,000+ articles and growing) so reading the entire result set and sifting through it with permission rules for each request is not an option. But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. I thought of saving the set to a temporary db table or file (not sure about overhead of serializing/unserializing large arrays). A sizing exercise based on the recordset returned for searches and lists shows a max of about 150MB for 20,000 articles and 380MB for 50,000 articles that needs to be saved temporarily per search or list request - in the vast majority of cases the set will be *much* smaller but it needs to cope with the worst case, and still do so a year down the line. All this extra work because I can't simply get an accurate number of results for paging, because of permissions! So my questions are: 1. Which is better (performance) for this situation: file or db? have you timed it yourself? 2. How do I prepare a potentially very large data set for file or fast writing to a new table (ie I obviously don't want to write it record by record) Even the db's cant insert as fast as the function is presented to it, and it can respond, so again...timeit 3. Are there any other alternatives worth looking at? This is a question for the experienced php developers. But the above is applicable. TIA Cheers Arno -- According to theoretical physics, the division of spatial intervals as the universe evolves gives rise to the fact that in another timeline, your interdimensional counterpart received helpful advice from me...so be eternally pleased for them. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Paging and permissions
On Tue, 2011-02-08 at 14:36 +0200, Arno Kuhl wrote: I'm hoping some clever php gurus have been here before and are willing to share some ideas. I have a site where articles are assigned to categories in containers. An article can be assigned to only one category per container, but one or more containers. Access permissions can be set per article, per category and/or per container, for one or more users and/or user groups. If an article is assigned to 10 categories and only one of those has a permission denying access, then the article can't be accessed even if browsing through one of the other 9 categories. Currently everything works fine, with article titles showing when browsing through category or search result lists, and a message is displayed when the article is clicked if it cannot be viewed because of a permission. Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. I'm stuck with how to determine the number of results for paging at the start of the list or search. The site is quite large (20,000+ articles and growing) so reading the entire result set and sifting through it with permission rules for each request is not an option. But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. I thought of saving the set to a temporary db table or file (not sure about overhead of serializing/unserializing large arrays). A sizing exercise based on the recordset returned for searches and lists shows a max of about 150MB for 20,000 articles and 380MB for 50,000 articles that needs to be saved temporarily per search or list request - in the vast majority of cases the set will be *much* smaller but it needs to cope with the worst case, and still do so a year down the line. All this extra work because I can't simply get an accurate number of results for paging, because of permissions! So my questions are: 1. Which is better (performance) for this situation: file or db? 2. How do I prepare a potentially very large data set for file or fast writing to a new table (ie I obviously don't want to write it record by record) 3. Are there any other alternatives worth looking at? TIA Cheers Arno How are you determining (logically, not in code) when an article is allowed to be read? Assume an article on user permissions in mysql is in a container called 'databases' and in a second one called 'security' and both containers are in a category called 'computers' Now get a user called John who is in a group called 'db admins' and that group gives him permissions to view all articles in the 'databases' container and any articles in any container in the 'computers' category. Now assume John also has explicit user permissions revoking that right to view the article in any container. What I'm getting at is what's the order of privilege for rights? Do group rights for categories win out over those for containers, or do individual user rights trump all of them overall? I think once that's figured out, a lot can be done inside the query itself to minimise the impact on the script getting the results. Thanks, Ash http://www.ashleysheridan.co.uk
Re: [PHP] Paging and permissions
On Tue, Feb 8, 2011 at 1:36 PM, Arno Kuhl ak...@telkomsa.net wrote: I'm hoping some clever php gurus have been here before and are willing to share some ideas. I have a site where articles are assigned to categories in containers. An article can be assigned to only one category per container, but one or more containers. Access permissions can be set per article, per category and/or per container, for one or more users and/or user groups. If an article is assigned to 10 categories and only one of those has a permission denying access, then the article can't be accessed even if browsing through one of the other 9 categories. Currently everything works fine, with article titles showing when browsing through category or search result lists, and a message is displayed when the article is clicked if it cannot be viewed because of a permission. Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. I'm stuck with how to determine the number of results for paging at the start of the list or search. The site is quite large (20,000+ articles and growing) so reading the entire result set and sifting through it with permission rules for each request is not an option. But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. I thought of saving the set to a temporary db table or file (not sure about overhead of serializing/unserializing large arrays). A sizing exercise based on the recordset returned for searches and lists shows a max of about 150MB for 20,000 articles and 380MB for 50,000 articles that needs to be saved temporarily per search or list request - in the vast majority of cases the set will be *much* smaller but it needs to cope with the worst case, and still do so a year down the line. All this extra work because I can't simply get an accurate number of results for paging, because of permissions! So my questions are: 1. Which is better (performance) for this situation: file or db? 2. How do I prepare a potentially very large data set for file or fast writing to a new table (ie I obviously don't want to write it record by record) 3. Are there any other alternatives worth looking at? TIA Cheers Arno Seems to me you make your setup needlessly complicated and restrictive. And it's bad form to display articles in search results that aren't allowed to be viewed.. Tell us more about why you want it to be so restrictive, i just don't understand it.
Re: [PHP] Paging and permissions
On Tue, Feb 8, 2011 at 4:36 AM, Arno Kuhl ak...@telkomsa.net wrote: But it might be an option if done once at the start of each search or list request, and then use that temporary modified result set for subsequent requests on the same set. Instead of serializing the articles, you only need their IDs. Using $sql .= ' where id in (' . implode(',', $ids) . ')'; you can load the data for a page of results in a single query. Storing the IDs is much cheaper than the articles. If the permissions are fairly static (i.e. access for user X to article Y doesn't change every two minutes) you could create a calculated permission table as a many-to-many between user and article. Here's the logic flow for a query: 1. Run the query to find matching article IDs 2. Load permissions from table for all IDs 3. For each article without a calculated permission, calculate it and insert a row (do a batch insert to save time) If you flag the query in the middle tier as having been processed as above, you can join to the calculated permissions each time you need another page. The downside is that the code that runs the queries has to operate in two modes: raw and joined to the permissions. If most users end up querying for all articles, the table could grow. Plus you need to purge rows any time the permissions for an article/user changes which could get fairly complicated. On Tue, Feb 8, 2011 at 5:17 AM, Tolas Anon tolas...@gmail.com wrote: And it's bad form to display articles in search results that aren't allowed to be viewed. On Tue, Feb 8, 2011 at 4:36 AM, Arno Kuhl ak...@telkomsa.net wrote: Now there's a requirement to not display the article title in category lists and search results if it cannot be viewed. :) David