Hi everyone, I'm one of the developers behind the Freesound website ( https://freesound.org, a sound sharing website), we use Solr as our search engine and I'm currently experimenting with a new feature that I'd like to implement using Solr. In summary, we have a Solr index with one document per sound in our database and we do standard search operations there. However, I'd like to add child documents to each of the main documents which include specific information about the sounds at different points in time. For example, I have a main document with basic properties like sound title and tags, but then have N child documents that have a timestamp field and some extra information associated with that time stamp. Here is simplified example of a document that could be indexed (normally my child documents would include also dense vector fields):
[ { "ID": "1", "title": "Recording of a street ambience", "tags": ['urban', 'ambience', 'dogs', 'birds'], "duration": 1:21, "events": [{ "ID": "1/events#0", "timestamp": 0:23, "event_description": "Dog barking" },{ "ID": "1/events#1", "timestamp": 0:47, "event_description": "Bird calls" },{ "ID": "1/events#2", "timestamp": 1:05, "event_description": "Dog barking" }, ... ] }, ... ] What I want to achieve is to do a query that matches child documents and sorts them according to some score, but I want to do faceting based on parent document fields. For example, I want to get all documents in which a "Dog barking" event happens (and if a document has 2 such events like in the example, I want the document returned 2 times), I want them sorted by the score of the child document, but I want to include faceting data for, e.g. the "duration" field (which refers to the parent document). One solution would be to duplicate all the parent document fields in every child document at index time. This would work, but then I would get a lot of redundant information in the index. What I think would work best would be a way to extend the child document fields and include the fields of the parent at "query time". So I'd like to specify the field list with something like "fl=timestamp,event_description,__parent__.duration". Is that possible? I tried other approaches that might work like the parent query parser which will return parent documents whose child documents match some criteria, but this has the problems of not telling me which of the child documents matched the query, and also it will not sort them as expected because the score is not propagated to the parent document. That is all, thanks a lot for the support! Cheers, frederic -- Frederic Font - ffont.github.io Music Technology Group, UPF - mtg.upf.edu <https://www.upf.edu/web/mtg/> Freesound - freesound.org