Hi,
From the top of my head - probably does not solve problem completely, but may trigger brainstorming: Index chapters and include page break tokens. Use highlighting to return matches and make sure fragment size is large enough to get page break token. In such scenario you should use slop for phrase searches...

More I write it, less I like it, but will not delete...

Regards,
Emir

On 01.03.2016 12:56, Zaccheo Bagnati wrote:
Hi all,
I'm searching for ideas on how to define schema and how to perform queries
in this use case: we have to index books, each book is split into chapters
and chapters are split into pages (pages represent original page cutting in
printed version). We should show the result grouped by books and chapters
(for the same book) and pages (for the same chapter). As far as I know, we
have 2 options:

1. index pages as SOLR documents. In this way we could theoretically
retrieve chapters (and books?)  using grouping but
     a. we will miss matches across two contiguous pages (page cutting is
only due to typographical needs so concepts could be split... as in printed
books)
     b. I don't know if it is possible in SOLR to group results on two
different levels (books and chapters)

2. index chapters as SOLR documents. In this case we will have the right
matches but how to obtain the matching pages? (we need pages because the
client can only display pages)

we have been struggling on this problem for a lot of time and we're  not
able to find a suitable solution so I'm looking if someone has ideas or has
already solved a similar issue.
Thanks


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Reply via email to