[jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".

Pieter Berkel (JIRA) Tue, 16 Oct 2007 23:48:21 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535426
 ]


Pieter Berkel commented on SOLR-380:
------------------------------------

There was a recent discussion surrounding a similar problem on solr-user:
http://www.nabble.com/Structured-Lucene-documents-tf4234661.html#a12048390

The idea was to use dynamic fields (e.g. page_1, page_2, page_3... page_N) to 
store the text of each page in a single document.  The problem is that 
currently Solr does not support "glob" style field expansion in query 
parameters (e.g. qf=page_* ) so you would end up having to specify the entire 
list of page fields in your query, which is impractical.  There is already an 
open issue related to this particular problem (SOLR-247) but nobody has had 
time to look into it.

In terms of returning term position information, this seems somehow (albeit 
loosely) related to highlighting, is there any way you could use the existing 
functionality to achieve your goal? (definitely would be a hack though)


> There's no way to convert search results into page-level hits of a 
> "structured document".
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-380
>                 URL: https://issues.apache.org/jira/browse/SOLR-380
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Tricia Williams
>            Priority: Minor
>
> "Paged-Text" FieldType for Solr
> A chance to dig into the guts of Solr. The problem: If we index a monograph 
> in Solr, there's no way to convert search results into page-level hits. The 
> solution: have a "paged-text" fieldtype which keeps track of page divisions 
> as it indexes, and reports page-level hits in the search results.
> The input would contain page milestones: <page id="234"/>. As Solr processed 
> the tokens (using its standard tokenizers and filters), it would concurrently 
> build a structural map of the item, indicating which term position marked the 
> beginning of which page: <page id="234" firstterm="14324"/>. This map would 
> be stored in an unindexed field in some efficient format.
> At search time, Solr would retrieve term positions for all hits that are 
> returned in the current request, and use the stored map to determine page ids 
> for each term position. The results would imitate the results for 
> highlighting, something like:
> <lst name="pages">
> &nbsp;&nbsp;<lst name="doc1">
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">234</int>
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">236</int>
> &nbsp;&nbsp;        </lst>
> &nbsp;&nbsp;        <lst name="doc2">
> &nbsp;&nbsp;&nbsp;&nbsp;                <int name="pageid">19</int>
> &nbsp;&nbsp;        </lst>
> </lst>
> <lst name="hitpos">
> &nbsp;&nbsp;        <lst name="doc1">
> &nbsp;&nbsp;&nbsp;&nbsp;                <lst name="234">
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                        <int 
> name="pos">14325</int>
> &nbsp;&nbsp;&nbsp;&nbsp;                </lst>
> &nbsp;&nbsp;        </lst>
> &nbsp;&nbsp;        ...
> </lst>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a "structured document".

Reply via email to