(My answer does not help very much, but anyway :) I am not sure the spreadsheet view is ok for very nested data structures (that makes too many columns in the end).
I would use a CONSTRCUT to retrieve a graph of patient-findings-findingType, then do some post-processing to build a facetable data structure. cf this example: http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html coming from that JSON: http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/data.json# On Thu, Jul 25, 2013 at 11:07 AM, <[email protected]> wrote: > Hi, > > I am running into some performance issues and was wondering if I was > approaching the problem from the correct angle or if there is something > more efficient. > > I have a property "Has_Finding" which is used to assert things about > patients. There are different kinds of findings, e.g. Dyspnea, Dysphagia, > Death, Hypertension ... So my data looks this: > > patient1 ec:Has_Finding Dyspnea1 . Dyspnea1 a nci:Dyspnea_Score_2 . > patient1 ec:Has_Finding Dysphagia1 . Dysphagia1 a nci:Dysphagia_Score_1. > patient1 ec:Has_Finding Dyspnea2 . Dyspnea2 a nci:Dyspnea_Score_3. > patient2 ec:Has_Finding Dyspnea3. Dyspnea3 a nci:Dyspnea_Score_2. > patient2 ec:Has_Finding Hypertension1 . Hypertension1 a nci:Hypertension . > etc. > > My users want to know about findings. I am offering a GUI-based query tool > and am generating the Sparql queries automatically. The users are unaware > of Sparql or anything like that. > > So I can easily get all the findings with the simple sparql query: > > SELECT ?pat ?findingType > WHERE { ?pat ec:Has_Finding ?finding . ?finding a ?findingType. } > > The problem is that the user has to figure out what finding a particular > result row is talking about by looking at the value of ?findingType and by > string comparison (or something unreliable like that) find out that this is > actually representing a Dyspnea Score. Or a Dysphagia Score etc. before > he/she can work with the value itself. But the next row could already be > representing something else. This way of analyzing the result requires that > the user has some semantic knowledge about the data. > > So I would like to make the query return the different types of findings > as "columns" instead of "rows". > > SELECT ?pat ?dypsneaType ?dysphagiaType ?hypertType > WHERE { > ?pat a ec:Patient . > OPTIONAL { ?pat ec:Has_Finding ?dyspnea . ?dyspnea a ?dyspneaType . > ?dyspneaType rdfs:subClassOf* nci:Dyspnea_Score . } > OPTIONAL { ?pat ec:Has_Finding ?dysphagia . ?dysphagia a ?dysphagiaType > . ?dysphagiaType rdfs:subClassOf* nci:Dysphagia_Score . } > OPTIONAL { ?pat ec:Has_Finding ?hypert . ?hypert a ?hyertType . > ?hypertType rdfs:subClassOf* nci:Hypertension . } > } > > Of course there are more than just 3 different finding types. When > executing these queries, I noticed that the more of these rows I am adding, > the longer the execution time gets. Which is expected, but from a certain > point it seems to increase by a factor of 2 or more. > > While asking for 1 or 2 types of findings takes 5 seconds. Adding a 3rd > takes 10 seconds. After 5 we are up to 40 seconds. At 10 we are at 2-3 > minutes and around 15 it takes so long that it is not feasible anymore. I > am currently only testing on a fraction of my data (about 1000 patients). > This query is a little simplified since there are other properties that I > use, e.g. Has_Id, Has_Date_Observed. But these are all datatype properties > pointing to literals so I left them out for sake of simplicity. > > My questions are: > 1. Is that a common/expected query? > 2. Is there a different way of achieving this without making the query > analyze these rdfs:subClassOf* triples. > 3. Should I just use the first, general query and then do some > post-processing in my code to split it up into columns before passing it to > the user? > > Thanks for any help! > Wolfgang > >
