Hi everyone,

I'm one of the developers behind the Freesound website (
https://freesound.org, a sound sharing website), we use Solr as our search
engine and I'm currently experimenting with a new feature that I'd like to
implement using Solr. In summary, we have a Solr index with one document
per sound in our database and we do standard search operations there.
However, I'd like to add child documents to each of the main documents
which include specific information about the sounds at different points in
time. For example, I have a main document with basic properties like sound
title and tags, but then have N child documents that have a timestamp field
and some extra information  associated with that time stamp. Here is
simplified example of a document that could be indexed (normally my child
documents would include also dense vector fields):

[
  {
    "ID": "1",
    "title": "Recording of a street ambience",
    "tags": ['urban', 'ambience', 'dogs', 'birds'],
    "duration": 1:21,
    "events": [{
        "ID": "1/events#0",
        "timestamp": 0:23,
        "event_description": "Dog barking"
      },{
        "ID": "1/events#1",
        "timestamp": 0:47,
        "event_description": "Bird calls"
      },{
        "ID": "1/events#2",
        "timestamp": 1:05,
        "event_description": "Dog barking"
      },
      ...
    ]
  },
  ...
]

What I want to achieve is to do a query that matches child documents and
sorts them according to some score, but I want to do faceting based on
parent document fields. For example, I want to get all documents in which a
"Dog barking" event happens (and if a document has 2 such events like in
the example, I want the document returned 2 times), I want them sorted by
the score of the child document, but I want to include faceting data for,
e.g. the "duration" field (which refers to the parent document).

One solution would be to duplicate all the parent document fields in every
child document at index time. This would work, but then I would get a lot
of redundant information in the index.

What I think would work best would be a way to extend the child document
fields and include the fields of the parent at "query time". So I'd like to
specify the field list with something like
"fl=timestamp,event_description,__parent__.duration". Is that possible?

I tried other approaches that might work like the parent query parser which
will return parent documents whose child documents match some criteria, but
this has the problems of not telling me which of the child documents
matched the query, and also it will not sort them as expected because the
score is not propagated to the parent document.

That is all, thanks a lot for the support!

Cheers,

frederic





--
Frederic Font - ffont.github.io
Music Technology Group, UPF - mtg.upf.edu <https://www.upf.edu/web/mtg/>
Freesound - freesound.org

Reply via email to