[jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".

Erik Hatcher (JIRA) Wed, 17 Oct 2007 02:35:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535489
 ]


Erik Hatcher commented on SOLR-380:
-----------------------------------

> The idea was to use dynamic fields (e.g. page_1, page_2, page_3... page_N) to 
> store the text of each page in a single document. The problem is that 
> currently Solr does not support "glob" style field expansion in query 
> parameters (e.g.
> qf=page_* ) so you would end up having to specify the entire list of page 
> fields in your query, which is impractical. There is already an open issue 
> related to this particular problem (SOLR-247) but nobody has had time to look 
> into it.

In this case, a copyField from page_* into an unstored "contents" would do the 
trick, which would also facilitate querying across pages.  A position increment 
gap could also prohibit phrase queries across "pages", optionally.

> There's no way to convert search results into page-level hits of a 
> "structured document".
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-380
>                 URL: https://issues.apache.org/jira/browse/SOLR-380
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Tricia Williams
>            Priority: Minor
>
> "Paged-Text" FieldType for Solr
> A chance to dig into the guts of Solr. The problem: If we index a monograph 
> in Solr, there's no way to convert search results into page-level hits. The 
> solution: have a "paged-text" fieldtype which keeps track of page divisions 
> as it indexes, and reports page-level hits in the search results.
> The input would contain page milestones: <page id="234"/>. As Solr processed 
> the tokens (using its standard tokenizers and filters), it would concurrently 
> build a structural map of the item, indicating which term position marked the 
> beginning of which page: <page id="234" firstterm="14324"/>. This map would 
> be stored in an unindexed field in some efficient format.
> At search time, Solr would retrieve term positions for all hits that are 
> returned in the current request, and use the stored map to determine page ids 
> for each term position. The results would imitate the results for 
> highlighting, something like:
> <lst name="pages">
> &nbsp;&nbsp;<lst name="doc1">
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">234</int>
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">236</int>
> &nbsp;&nbsp;        </lst>
> &nbsp;&nbsp;        <lst name="doc2">
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">19</int>
> &nbsp;&nbsp;        </lst>
> </lst>
> <lst name="hitpos">
> &nbsp;&nbsp;        <lst name="doc1">
> &nbsp;&nbsp;&nbsp;&nbsp;                <lst name="234">
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                        <int 
> name="pos">14325</int>
> &nbsp;&nbsp;&nbsp;&nbsp;                </lst>
> &nbsp;&nbsp;        </lst>
> &nbsp;&nbsp;        ...
> </lst>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".

Reply via email to