Hi Brian,
Il 19/09/2017 21:20, Brian Mountford ha scritto:
Mario,
I went back and read your draft about result sorting and paging,
because I am looking to implement paging in our RDAP server. But I
have a comment: It seems like some of these operations will not scale
well. For instance, the count parameter will cause the database to do
something like an SQL SELECT COUNT(*) operation. That generally
results in a scan through the table (or at least an index), I think.
The draft does not aim at exploring all the possible implementations of
the count parameter. It only states that the implementation of new
parameters is technically feasible because counting, paging and sorting
operators are currently supported by major RDBMSs and No-SQL DBMSs as
well. If you are not satisfied of the standard SQL operators peformance
you can adopt different solutions. If you are considering a RDBMS, there
are a number of tips to follow in order to obtain the number of results
and there are available a lot of comparisons between the different
performances. Obviously, the RDAP search paths must be at least mapped
on db indexed fields because, not only count, but the row selection can
take a lot of time.
Under such conditions, SQL SELECT count(*) performs well even if the
query generates a huge result set.
There are some other facts to consider when you decide to implement RDAP
searches in general, for example, which types of searches are allowed
according to the different access levels and, consequently, if it's
worth to set up an appropriate technical strategy to obtain the best
performance in any type of searches.
Basicly, the decisions you take in your RDAP profile affect your RDAP
implementation.
Likewise, paging by means of an offset, as you suggest, might result
in scanning through the result set from the start each time a new page
is requested. Might it be more efficient to have the server return,
using your clever link technique, some sort of cursor value indicating
the start of the next page? Of course, if an implementer wanted the
cursor to be the index of the next row in the result set, it could
pass the offset as the cursor token, but by defining the cursor to be
an opaque string without defined meaning, you would not lock in all
implementers to that scheme.
The only cursor coming in my mind should be based on the values of the
sorted field (you need a default one if a value for sortby is not
provided). This is the most common way to simulate paging. For example,
if you want to search for a set of domains sorted for ldhName in
ascending order and the page contains 100 domains, you should use the
100-th domain ldhName to build a where condition (e.g. "ldhName > 100-th
ldhName of the current page") of the query producing the next page.
In this case, the value of the ldhName property is the cursor.
However, I see some drawbacks:
- the property or the properties of the cursor have to be a unique key
because otherwise you will meet some inconsistencies; ldhName is a
unique key but if you want to sort for registrationDate, you have
generate the cursor as the concatenation of registrationDate and ldhName
and this makes the building of the where condition much more complicated
- the building of the where condition depends on both the sorting order
and the scrolling direction; the server might decide to provide also a
link to the previous page
- the use of a cursor does not allow you to jump across the results as
you want; the server provides a link to the next page but you might
want to jump N objects forward
- if you use a cursor, you don't have a perception of the page position
within the result set, you can derive it by the limit value and the
number of pages scrolled but it 's not intuitive; the proposed
counting/paging parameters give you this perception
So the use of a cursor raises some issues too. It could be more
efficient under some specific conditions but it is less flexible, less
intuitive and more complicated than offset.
Anyway, here again, I think that all depends on the search capabilities
the RDAP server allows and, consequently, on the size of the result sets.
IMHO RDAP searches should not deal with huge result sets (hundreds of
thousands or milions of objects).
The count parameter should be used to evaluate the query precision and
personally I believe that additional parameters should be introduced in
RDAP to filter and furtherly restrict a result set.
Another solution to deal with big collections of objects in RDAP could
be the use of partial responses (please see my yesterday's mail to the
WG).
Definitively, if result sets are not huge (less than a hundred thousand
of domains), offset is not a problem.
I made some tests on some searches dealing with 50000/60000 domains: I
specified the parameter fieldSet="id" (only objectClassName and ldhName
in the domainSearchResultsarray) and
set the number of domains per page, initially, to 1000 and then to 5000.
I didn't find relevant perfomance issues.
Thanks for your interest in the draft.
Regards,
Mario
Thanks.
Regards,
Brian
_______________________________________________
regext mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/regext
--
Dr. Mario Loffredo
Servizi Internet e Sviluppo Tecnologico
CNR - Istituto di Informatica e Telematica
via G. Moruzzi 1, I-56124 PISA, Italy
E-Mail: [email protected]
Phone: +39 050 3153497
Web: http://www.iit.cnr.it/mario.loffredo
_______________________________________________
regext mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/regext