Hi Brian,

Il 19/09/2017 21:20, Brian Mountford ha scritto:
Mario,

I went back and read your draft about result sorting and paging, because I am looking to implement paging in our RDAP server. But I have a comment: It seems like some of these operations will not scale well. For instance, the count parameter will cause the database to do something like an SQL SELECT COUNT(*) operation. That generally results in a scan through the table (or at least an index), I think.

The draft does not aim at exploring all the possible implementations of the count parameter. It only states that the implementation of new parameters is technically feasible because counting, paging and sorting operators are currently supported by major RDBMSs and No-SQL DBMSs as well. If you are not satisfied of the standard SQL operators peformance you can adopt different solutions. If you are considering a RDBMS, there are a number of tips to follow in order to obtain the number of results and there are available a lot of comparisons between the different performances. Obviously, the RDAP search paths must be at least mapped on db indexed fields because, not only count, but the row selection can take a lot of time. Under such conditions, SQL SELECT count(*) performs well even if the query generates a huge result set.

There are some other facts to consider when you decide to implement RDAP searches in general, for example, which types of searches are allowed according to the different access levels and, consequently, if it's worth to set up an appropriate technical strategy to obtain the best performance in any type of searches. Basicly, the decisions you take in your RDAP profile affect your RDAP implementation.


Likewise, paging by means of an offset, as you suggest, might result in scanning through the result set from the start each time a new page is requested. Might it be more efficient to have the server return, using your clever link technique, some sort of cursor value indicating the start of the next page? Of course, if an implementer wanted the cursor to be the index of the next row in the result set, it could pass the offset as the cursor token, but by defining the cursor to be an opaque string without defined meaning, you would not lock in all implementers to that scheme.

The only cursor coming in my mind should be based on the values of the sorted field (you need a default one if a value for sortby is not provided). This is the most common way to simulate paging. For example, if you want to search for a set of domains sorted for ldhName in ascending order and the page contains 100 domains, you should use the 100-th domain ldhName to build a where condition (e.g. "ldhName > 100-th ldhName of the current page") of the query producing the next page.
In this case, the value of the ldhName property is the cursor.
However, I see some drawbacks:

- the property or the properties of the cursor have to be a unique key because otherwise you will meet some inconsistencies; ldhName is a unique key but if you want to sort for registrationDate, you have generate the cursor as the concatenation of registrationDate and ldhName and this makes the building of the where condition much more complicated - the building of the where condition depends on both the sorting order and the scrolling direction; the server might decide to provide also a link to the previous page - the use of a cursor does not allow you to jump across the results as you want; the server provides a link to the next page but you might want to jump N objects forward - if you use a cursor, you don't have a perception of the page position within the result set, you can derive it by the limit value and the number of pages scrolled but it 's not intuitive; the proposed counting/paging parameters give you this perception

So the use of a cursor raises some issues too. It could be more efficient under some specific conditions but it is less flexible, less intuitive and more complicated than offset. Anyway, here again, I think that all depends on the search capabilities the RDAP server allows and, consequently, on the size of the result sets. IMHO RDAP searches should not deal with huge result sets (hundreds of thousands or milions of objects). The count parameter should be used to evaluate the query precision and personally I believe that additional parameters should be introduced in RDAP to filter and furtherly restrict a result set. Another solution to deal with big collections of objects in RDAP could be the use of partial responses (please see my yesterday's mail to the WG). Definitively, if result sets are not huge (less than a hundred thousand of domains), offset is not a problem. I made some tests on some searches dealing with 50000/60000 domains: I specified the parameter fieldSet="id" (only objectClassName and ldhName in the domainSearchResultsarray) and
set the number of domains per page, initially, to 1000 and then to 5000.
I didn't find relevant perfomance issues.

Thanks for your interest in the draft.
Regards,
Mario



Thanks.

Regards,
Brian


_______________________________________________
regext mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/regext


--
Dr. Mario Loffredo
Servizi Internet e Sviluppo Tecnologico
CNR - Istituto di Informatica e Telematica
via G. Moruzzi 1, I-56124 PISA, Italy
E-Mail: [email protected]
Phone: +39 050 3153497
Web: http://www.iit.cnr.it/mario.loffredo

_______________________________________________
regext mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/regext

Reply via email to