Hi,

I'm using Solr 4 RC, and my documents look like this:

<doc>
  <id>123</id>
  <name>Folder name</name>
</doc>
<doc>
  <id>abc</id>
  <name>Document 1</name>
  <parentid>123</parentid>
</doc>
<doc>
  <id>def</id>
  <name>Document 2</name>
  <parentid>123</parentid>
</doc>

Meaning there are two documents which are in the same folder in this
example. When querying documents I need the name of the folder the
documents are in, so when "Document 2" above is part of the result set from
Solr, I need the name of the parent folder, which is "Folder name" in this
case. And due to speed reasons I'd like to get both the search results and
the folder names in one single query. Documents in a result set will
normally be in different folders, but it can of course happen that the
documents in a result set will be in the same folder. I'm using SolrCloud
with one shard, and the solution is scaled for 50+ million documents. Will
normally ask for 10-20 query hits per query.

Possible solutions:
1. Grouping. But this seems to be for fields within the documents in the
result set only, not for parent documents
2. Join. But this gives me either just the parent folders or just the
documents
3. Denormalize the data, and store the folder for each document. The
drawback with this though is that all documents in a folder must be
changed/reindexed when the folder name changes.
4. Multiple queries. This is of course possible, but I'd like to avoid it
due to speed reasons.

What would be the best solution here? Is this possible at all, or is
denormalization the way to go?

Any help on this would be greatly appreciated :-)


Best,
Stein Gran

Reply via email to