The problem: If we index a monograph in Solr, there's no way to convert search results into page-level hits. The solution: have a "paged-text" fieldtype which keeps track of page divisions as it indexes, and reports page-level hits in the search results. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: SOLR-380 URL: https://issues.apache.org/jira/browse/SOLR-380 Project: Solr Issue Type: New Feature Components: search Reporter: Tricia Williams Priority: Minor "Paged-Text" FieldType for Solr > > A chance to dig into the guts of Solr. The problem: If we index a > monograph in Solr, there's no way to convert search results into > page-level hits. The solution: have a "paged-text" fieldtype which keeps > track of page divisions as it indexes, and reports page-level hits in the > search results. > > The input would contain page milestones: <page id="234"/>. As Solr > processed the tokens (using its standard tokenizers and filters), it would > concurrently build a structural map of the item, indicating which term > position marked the beginning of which page: <page id="234" > firstterm="14324"/>. This map would be stored in an unindexed field in > some efficient format. > > At search time, Solr would retrieve term positions for all hits that are > returned in the current request, and use the stored map to determine page > ids for each term position. The results would imitate the results for > highlighting, something like: > > <lst name="pages"> > <lst name="doc1"> > <int name="pageid">234</int> > <int name="pageid">236</int> > </lst> > <lst name="doc2"> > <int name="pageid">19</int> > </lst> > </lst> > <lst name="hitpos"> > <lst name="doc1"> > <lst name="234"> > <int name="pos">14325</int> > </lst> > </lst> > ... > </lst> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.