[ 
https://issues.apache.org/jira/browse/SOLR-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847936#action_12847936
 ] 

Trey Grainger commented on SOLR-1837:
-------------------------------------

Re: bugs in Luke that result in missing terms - I recently fixed one such bug, 
and indeed it was located in the DocReconstructor - if you are aware of others 
then please report them using the Luke issue tracker.

I just pulled down the most recent Luke code, and it does looks like that 
recent fix was made to cover the bug I saw.  Unfortunately, the fix results in 
a null ref for me on my index.  I'll open an issue, as it looks like all that's 
needed is an extra null check.

Re: Document reconstruction is a very IO-intensive operation, so I would advise 
against using it on a production system, and also it produces inexact results 
(because analysis is usually a lossy operation).

I hear you about it being IO-intensive.  There's also other admin tools in Solr 
which do similarly intensive operations (the schema browser, for example, which 
generates a list of all fields and a distribution of terms within those 
fields).  The intent of the tool is for one-off debugging, not for any kind of 
automated querying, but I'll try do some tests to see to what degree this tool 
is affecting our current production systems (I have not see any noticeable 
effect thus far).

Also, regarding the process being lossy.  In this case, that is kind of the 
point of the tool (in my use) - to see what has actually been put into the 
index vs what was in the document sent to the engine.  For example, if I index 
a field with the text "Wi-fi hotspots are a life-saver" with payloads on parts 
of speech, as well as stemming I want to be able to see something like:
"wi [1] / fi [1] | wifi [1] / hotspot [1] / are [2] / a [3] / life [1] / saver 
[1] | lifesaver [1]"

With no payloads, this would simply be
"wi / fi | wifi / hotspots | hotspot / are / a / life / saver | lifesaver"

So I had initially named to tool the Solr Document Reconstructor, after the 
name you gave to the tool in Luke.  Based on your comments, I think it might be 
less confusing for me to call it something like "Document Inspector", since it 
is not truly reconstructing the original document.

I'll try to get what I have pushed up today so you can check it out if you 
want.  Thanks for your great work on that tool!

> Reconstruct a Document (stored fields, indexed fields, payloads)
> ----------------------------------------------------------------
>
>                 Key: SOLR-1837
>                 URL: https://issues.apache.org/jira/browse/SOLR-1837
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis, web gui
>    Affects Versions: 1.5
>         Environment: All
>            Reporter: Trey Grainger
>            Priority: Minor
>             Fix For: 1.5
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> One Solr feature I've been sorely in need of is the ability to inspect an 
> index for any particular document.  While the analysis page is good when you 
> have specific content and a specific field/type your want to test the 
> analysis process for, once a document is indexed it is not currently possible 
> to easily see what is actually sitting in the index.
> One can use the Lucene Index Browser (Luke), but this has several limitations 
> (gui only, doesn't understand solr schema, doesn't display many non-text 
> fields in human readable format, doesn't show payloads, some bugs lead to 
> missing terms, exposes features dangerous to use in a production Solr 
> environment, slow or difficult to check from a remote location, etc.).  The 
> document reconstruction feature of Luke provides the base for what can become 
> a much more powerful tool when coupled with Solr's understanding of a schema, 
> however.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to