Sherman Monroe wrote:
Daniel,
I see some interesting concepts worth exploring here, e.g. using
windows (with paging inside the window). But as I refine my query,
there isn't any apparent context that orients me in the data. E.g. how
does one box/set relate to the others.
The dependency between the boxes is recorded, but it is not a simple
matter to actually expose it in simple way to the user. Each box (set)
is really dependent on a chain of previous operations, so in general it
may be a very long list of function compositions.
I think the biggest contribution is not so much the interface aspects
that you refer to, but the way you can form the various sets (boxes)
through various operations - the SPO, which allows you to do arbitrary
matches for <s,p,o> triples, plus union/intersection/difference, plus
de-referencing, plus faceted interface on either an arbitrary set of
chosen properties (and applied to any set) or automatically generated
facets.
Here is a simple interesting scenario:
Find a drug for hypoglycemia that can be prescribed to a known alcohol
abuser.
Click on menu->repositories, add drugbank sparql endpoint
(http://www4.wiwiss.fu-berlin.de/drugbank/sparql) limit 50 (sometimes
we've been getting timeouts; just try again and eventually it works. We
have a locally loaded version of these repositories, but we haven't
finished building the index for the full text search yet, still figuring
how to build this index it in Virtuoso).
search for hypoglycemic (call it Set A)
search for avoid alcohol (call it Set B)
click on A, clic on the intersection symbol, click on set B, click on
"=". (call it set C).
Click on A, click on S, click on "-".
You've computed the set of drugs associated with hypoglycemic,
intersected with the set of drugs which should not be taken with
alcohol, and computed the difference between this set and the set of
drugs associated with hypoglycemic, resulting in such drugs that may be
taken with alcohol.
If you sophisticate the scenario a bit, you can repeat the same
reasoning for "antidepressant", to get the set of drugs which are
antidepressants and may be taken with alcohol.
Sophisticating further (but here I don't have the medical knowledge to
formulate it properly), I could try to determine which diabetes and
antidepressant drugs could be prescribed together (I'd need to determine
dangerous interactions between candidates obtained in the previous steps).
and so on...
I notice you're using Sesame, do you think it can scale? I tried
selecting several repositories at once, but the system seems to hang
awhile (couple of minutes) before returning results.
We use both Sesame (through its Java interface) and Virtuoso (regular
http SPARQL interface), depending on the size of the datased (e.g.,
dbpedia is on Virtuoso). You may have also realized you can add any
arbitrary external endpoint as well.
The problems you report are not really due to Explorator, but rather
from the engines themselves, and the particular repositories. If you try
to issue the same queries (notice there are many queries necessary to
present the information in the form it appears on the screen), you will
see they also take a while to respond. In fact, we'd be very interested
in seeing how to optimize such queries. Samur, my former student, will
elaborate this in a separate message, for those interested.
(we might take this offline if it becomes too specific, although I feel
the problems we face are the same anyone who wishes to build "user
friendly" interfaces to RDF data would face...)
Cheers
D