2007/1/12, Jean-Francois Dockes <[EMAIL PROTECTED]>:


Just a few opinions/comments/votes on recent issues:

- Need for a query-closing call and backend resource management issues: It
  is up to the backend to manage its resources, and decide how processing
  should be split between Query() and GetHitProperties().



  To make things easier, I am in favour of a CloseQuery() call which
  well-behaved applications will use, and also of specifying that
  query_handles can become stale, and that applications should then
restart
  the query (which opens the question of error reports which is still a
  blank area).



Check.  It seems people agree with you on this. I'll update the wiki.


- CountHits() / GetHitproperties() racy-ness: It is up to the backend to
  maintain consistency inside a single opened query, the current interface
  allows it (unlike the previous one using the query string as a bad
  query_handle).

  Ideally the Query() call would open some kind of database snapshot which
  would be preserved as long as the query_handle is valid. This may be
  feasible or not with the current backends, which are expected to just
"do
  their best", which the current draft does not prevent. Aren't things
such
  as CountHits() usually considered to only return estimates anyway ?


Well. It could be noted in the wiki that CountHits is not guaranteed to
return the correct number (especially on large result sets).


- GetHitProperties result list as map or sequence: as Fabrice wrote, the
  object identifiers are not useful. The results are requested as slices
  from of an ordered list (offset/limit), and should be returned as a
  simple sequence or array of (propertyName=>propertyValue) maps.

  Magnus' initially proposed the response to be:
   "A map mapping each hit (sequence number) to a map of property-list of
     values pairs."

  I think that the sequence number can be kept implicit:

    Query (in s query_string, out i query_handle)
    GetHitProperties ( in s query_handle, in i offset, in i limit,
                       in as properties, out (sequence of maps) response )



The return value could be stripped of all maps and use the same ordering of
properties as in the properties input value. Fx the call:

 GetHitProperties (query_handle,0, 2, ["uri", "dc:title", "mime"])

could return:

[
["file:///home/mikkel/delta_comp.pdf", "Delta Complexes",
"application/pdf"]
["file:///home/mikkel/summa.svg", "Summa Logo", "image/svg+xml"]
]

From an optimization point of view this is probably the best we can get.
This is also how track er currently does, and it is relatively easy to work
with.

The reason why I'm hesitating to go for this solution is the live api. It
would be really nice to be able to use the same data structures here. The
live api however has a need to be able to tell the consumer that *this
particular hit* has become invalid.

A way around this could be to always have the first element in the response
list be a unique hit identifier. Or the last element for that matter - this
way the returned properties would have the same indices as the requested
properties.

We could ease up on the global-identifier thing, and just let the identifier
be relative to the given query handle.

- Using URI as key: as previously stated I think that this is a bad idea.


+1

- Accessing Snippets individually: no need for GetSnippets(), use:
  GetHitProperties(query_handle, offset, 1, ["Snippet"])


As far as I can tell, this is the general consensus...

Cheers,
Mikkel

PS: Be sure to check out the query language proposal at
http://wiki.freedesktop.org/wiki/WasabiQueryLanguage
_______________________________________________
xdg mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/xdg

Reply via email to