(My answer does not help very much, but anyway :)

I am not sure the spreadsheet view is ok for very nested data structures
(that makes too many columns in the end).

I would use a CONSTRCUT to retrieve a graph of
patient-findings-findingType, then do some post-processing to build a
facetable data structure.
cf this example:
http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html
coming from that JSON:
http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/data.json#



On Thu, Jul 25, 2013 at 11:07 AM, <[email protected]> wrote:

> Hi,
>
> I am running into some performance issues and was wondering if I was
> approaching the problem from the correct angle or if there is something
> more efficient.
>
> I have a property "Has_Finding" which is used to assert things about
> patients. There are different kinds of findings, e.g. Dyspnea, Dysphagia,
> Death, Hypertension ... So my data looks this:
>
> patient1 ec:Has_Finding Dyspnea1 . Dyspnea1 a nci:Dyspnea_Score_2 .
> patient1 ec:Has_Finding Dysphagia1 . Dysphagia1 a nci:Dysphagia_Score_1.
> patient1 ec:Has_Finding Dyspnea2 . Dyspnea2 a nci:Dyspnea_Score_3.
> patient2 ec:Has_Finding Dyspnea3. Dyspnea3 a nci:Dyspnea_Score_2.
> patient2 ec:Has_Finding Hypertension1 . Hypertension1 a nci:Hypertension .
> etc.
>
> My users want to know about findings. I am offering a GUI-based query tool
> and am generating the Sparql queries automatically. The users are unaware
> of Sparql or anything like that.
>
> So I can easily get all the findings with the simple sparql query:
>
> SELECT ?pat ?findingType
> WHERE { ?pat ec:Has_Finding ?finding . ?finding a ?findingType. }
>
> The problem is that the user has to figure out what finding a particular
> result row is talking about by looking at the value of ?findingType and by
> string comparison (or something unreliable like that) find out that this is
> actually representing a Dyspnea Score. Or a Dysphagia Score etc. before
> he/she can work with the value itself. But the next row could already be
> representing something else. This way of analyzing the result requires that
> the user has some semantic knowledge about the data.
>
> So I would like to make the query return the different types of findings
> as "columns" instead of "rows".
>
> SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType
> WHERE {
>   ?pat a ec:Patient .
>   OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType .
> ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . }
>   OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a ?dysphagiaType
> . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . }
>   OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType .
> ?hypertType rdfs:subClassOf* nci:Hypertension . }
> }
>
> Of course there are more than just 3 different finding types. When
> executing these queries, I noticed that the more of these rows I am adding,
> the longer the execution time gets. Which is expected, but from a certain
> point it seems to increase by a factor of 2 or more.
>
> While asking for 1 or 2 types of findings takes 5 seconds. Adding a 3rd
> takes 10 seconds. After 5 we are up to 40 seconds. At 10 we are at 2-3
> minutes and around 15 it takes so long that it is not feasible anymore. I
> am currently only testing on a fraction of my data (about 1000 patients).
> This query is a little simplified since there are other properties that I
> use, e.g. Has_Id, Has_Date_Observed. But these are all datatype properties
> pointing to literals so I left them out for sake of simplicity.
>
> My questions are:
> 1. Is that a common/expected query?
> 2. Is there a different way of achieving this without making the query
> analyze these rdfs:subClassOf* triples.
> 3. Should I just use the first, general query and then do some
> post-processing in my code to split it up into columns before passing it to
> the user?
>
> Thanks for any help!
> Wolfgang
>
>

Reply via email to