Re: [regext] Registration Data Access Protocol (RDAP) Query Parameters for Result Sorting and Paging

Mario Loffredo Wed, 20 Sep 2017 09:46:13 -0700

Hi Brian,

Il 19/09/2017 21:20, Brian Mountford ha scritto:

Mario,
I went back and read your draft about result sorting and paging,because I am looking to implement paging in our RDAP server. But Ihave a comment: It seems like some of these operations will not scalewell. For instance, the count parameter will cause the database to dosomething like an SQL SELECT COUNT(*) operation. That generallyresults in a scan through the table (or at least an index), I think.

The draft does not aim at exploring all the possible implementations ofthe count parameter. It only states that the implementation of newparameters is technically feasible because counting, paging and sortingoperators are currently supported by major RDBMSs and No-SQL DBMSs aswell. If you are not satisfied of the standard SQL operators peformanceyou can adopt different solutions. If you are considering a RDBMS, thereare a number of tips to follow in order to obtain the number of resultsand there are available a lot of comparisons between the differentperformances. Obviously, the RDAP search paths must be at least mappedon db indexed fields because, not only count, but the row selection cantake a lot of time.Under such conditions, SQL SELECT count(*) performs well even if thequery generates a huge result set.

There are some other facts to consider when you decide to implement RDAPsearches in general, for example, which types of searches are allowedaccording to the different access levels and, consequently, if it'sworth to set up an appropriate technical strategy to obtain the bestperformance in any type of searches.Basicly, the decisions you take in your RDAP profile affect your RDAPimplementation.

Likewise, paging by means of an offset, as you suggest, might resultin scanning through the result set from the start each time a new pageis requested. Might it be more efficient to have the server return,using your clever link technique, some sort of cursor value indicatingthe start of the next page? Of course, if an implementer wanted thecursor to be the index of the next row in the result set, it couldpass the offset as the cursor token, but by defining the cursor to bean opaque string without defined meaning, you would not lock in allimplementers to that scheme.

The only cursor coming in my mind should be based on the values of thesorted field (you need a default one if a value for sortby is notprovided). This is the most common way to simulate paging. For example,if you want to search for a set of domains sorted for ldhName inascending order and the page contains 100 domains, you should use the100-th domain ldhName to build a where condition (e.g. "ldhName > 100-thldhName of the current page") of the query producing the next page.

In this case, the value of the ldhName property is the cursor.
However, I see some drawbacks:

- the property or the properties of the cursor have to be a unique keybecause otherwise you will meet some inconsistencies; ldhName is aunique key but if you want to sort for registrationDate, you havegenerate the cursor as the concatenation of registrationDate and ldhNameand this makes the building of the where condition much more complicated- the building of the where condition depends on both the sorting orderand the scrolling direction; the server might decide to provide also alink to the previous page- the use of a cursor does not allow you to jump across the results asyou want; the server provides a link to the next page but you mightwant to jump N objects forward- if you use a cursor, you don't have a perception of the page positionwithin the result set, you can derive it by the limit value and thenumber of pages scrolled but it 's not intuitive; the proposedcounting/paging parameters give you this perception

So the use of a cursor raises some issues too. It could be moreefficient under some specific conditions but it is less flexible, lessintuitive and more complicated than offset.Anyway, here again, I think that all depends on the search capabilitiesthe RDAP server allows and, consequently, on the size of the result sets.IMHO RDAP searches should not deal with huge result sets (hundreds ofthousands or milions of objects).The count parameter should be used to evaluate the query precision andpersonally I believe that additional parameters should be introduced inRDAP to filter and furtherly restrict a result set.Another solution to deal with big collections of objects in RDAP couldbe the use of partial responses (please see my yesterday's mail to theWG).Definitively, if result sets are not huge (less than a hundred thousandof domains), offset is not a problem.I made some tests on some searches dealing with 50000/60000 domains: Ispecified the parameter fieldSet="id" (only objectClassName and ldhNamein the domainSearchResultsarray) and

set the number of domains per page, initially, to 1000 and then to 5000.
I didn't find relevant perfomance issues.

Thanks for your interest in the draft.
Regards,
Mario


Thanks.

Regards,
Brian


_______________________________________________
regext mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/regext



--
Dr. Mario Loffredo
Servizi Internet e Sviluppo Tecnologico
CNR - Istituto di Informatica e Telematica
via G. Moruzzi 1, I-56124 PISA, Italy
E-Mail: [email protected]
Phone: +39 050 3153497
Web: http://www.iit.cnr.it/mario.loffredo

_______________________________________________
regext mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/regext

Re: [regext] Registration Data Access Protocol (RDAP) Query Parameters for Result Sorting and Paging

Reply via email to