On 30/01/2021 00:44, Jeffrey Kenneth Tyzzer wrote:
I have a query for academic articles that looks like this:
--Query 1--
CONSTRUCT{
?publication a ?citeType ;
cite:title ?publicationTitle ;
cite:issn ?issn ;
cite:eissn ?eissn ;
cite:container-title
?journalTitle ;
cite:author [ cite:rank
?authorRank ; cite:family ?authorLastName ; cite:given ?authorFirstName ]
}
WHERE {
GRAPH <http://test/pubs/> {
BIND(<http://test/1483451> AS ?publication)
VALUES (?citeType ?citeText ?cslType) {
(bibo:AcademicArticle "journal-article"
"article-journal")
(bibo:Book "book" "book")
(vivo:ConferencePaper "conference"
"paper-conference")
(bibo:Chapter "chapter"
"chapter")
}
?publication a ?citeType ;
rdfs:label ?publicationTitle ;
vivo:relatedBy [ a vivo:Authorship ; vivo:rank ?authorRank; vivo:relates [ a
vcard:Individual ; vcard:hasName [ a vcard:Name ; vcard:last_name
?authorLastName ; vcard:first_name ?authorFirstName ] ] ]
OPTIONAL {
?publication
vivo:hasPublicationVenue ?journal .
?journal bibo:eissn ?eissn ;
bibo:issn ?issn ;
rdfs:label ?journalTitle ;
}
}
}
--End query 1--
What’s important to note is that there are four authors of the article and two ISSNs and
titles for its journal (it switched names a few decades ago but the eISSN didn’t change). As
you can see, the authors are retrieved and CONSTRUCTed using blank nodes (the data model
incidentally is Article <--> Authorship <--> Individual --> Name).
The problem I’m having is that, because of there being two ISSNs and titles,
the CONSTRUCT is returning 16 author triples (4 authors x 2 ISSNs x 2 titles
), i.e.:
--Query 1 output--
<http://test/1483451>
a bibo:AcademicArticle ;
cite:author [ cite:family "Rafter" ;
cite:given "J" ;
cite:rank 1
] ;
cite:author [ cite:family "Benham" ;
cite:given "K" ;
cite:rank 3
] ;
cite:author [ cite:family "Mastro" ;
cite:given "RHR" ;
cite:rank 4
] ;
cite:author [ cite:family "Andersen" ;
cite:given "D" ;
cite:rank 2
] ;
cite:author [ cite:family "Mastro" ;
cite:given "RHR" ;
cite:rank 4
] ;
cite:author [ cite:family "Andersen" ;
cite:given "D" ;
cite:rank 2
] ;
cite:author [ cite:family "Andersen" ;
cite:given "D" ;
cite:rank 2
] ;
cite:author [ cite:family "Mastro" ;
cite:given "RHR" ;
cite:rank 4
] ;
cite:author [ cite:family " Benham " ;
cite:given "K" ;
cite:rank 3
] ;
cite:author [ cite:family "Rafter" ;
cite:given "J" ;
cite:rank 1
] ;
cite:author [ cite:family " Benham " ;
cite:given "K" ;
cite:rank 3
] ;
cite:author [ cite:family "Andersen" ;
cite:given "D" ;
cite:rank 2
] ;
cite:author [ cite:family "Mastro" ;
cite:given "RHR" ;
cite:rank 4
] ;
cite:author [ cite:family "Rafter" ;
cite:given "J" ;
cite:rank 1
] ;
cite:author [ cite:family " Benham " ;
cite:given "K" ;
cite:rank 3
] ;
cite:author [ cite:family "Rafter" ;
cite:given "J" ;
cite:rank 1
] ;
cite:container-title "Journal of the American Ceramic Society" , "Advanced
Ceramic Materials" ;
cite:eissn "1551-2916" ;
cite:issn "0883-5551" , "0002-7820" ;
cite:title "Synthesis and sintering behavior of spinel
nanoparticles" .
--End query 1 output--
If I comment out the bibo:issn ?issn and rdfs:label ?journalTitle patterns in
the WHERE clause, or if I don’t use the [ cite:rank ?authorRank ; cite:family
?authorLastName ; cite:given ?authorFirstName ] structure in the CONSTRUCT, I
get what I expect:
--Query 1a output--
<http://test/1483451>
a bibo:AcademicArticle ;
cite:author [ cite:family "Andersen" ;
cite:given "D" ;
cite:rank 2
] ;
cite:author [ cite:family " Benham " ;
cite:given "K" ;
cite:rank 3
] ;
cite:author [ cite:family "Mastro" ;
cite:given "RHR" ;
cite:rank 4
] ;
cite:author [ cite:family "Rafter" ;
cite:given "J" ;
cite:rank 1
] ;
cite:eissn "1551-2916" ;
cite:title "Synthesis and sintering behavior of spinel
nanoparticles" .
--End query 1a output—
If I switch the CONSTRUCT to a SELECT I see (and would expect) 16 rows, but was
not anticipating the CONSTRUCT to behave like this (i.e., express such a
product of the triples). Can one of you kindly explain what’s going on under
the covers and if there’s a remedy for this behavior?
CONSTRUCT is
+ execute WHERE as a SELECT *
+ result model = empty graph
+ feed the SELECT rows one at a time into the template to produce RDF
Your query has ?citeText in these rows.
+ add each template instantiation to the result model
+ return result model
and you have a blank node in the template.
Each time the template is used, you get a fresh blank node.
16 rows, 16 blank nodes, 16 unique "cite:author [ ... ]" property-values.
If the WHERE is
WHERE {
SELECT DISTINCT <only variables used in the template>
WHERE {
...
}
}
and specifically not ?citeText, you will get less duplication.
Andy
PS Could you make the query more readable and also include the prefixes
so the reader can read it in the email or take it and parse it locally.
Thanks.
Thanks much.
--Jeff