Re: Solr with Unknown Lucene Index?
Having found some code that searches a Lucene index, the only analyzers referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer. How can I map this is Solr? The example schema doesn't seem to mention this, and specifying 'text' or 'string' for every field doesn't seem to help. Thanks Lee On 22/01/2011 21:50, Erick Erickson wrote: Sorry, I was out of town for a while. Luke just reads stuff, it doesn't try to interpret any schema. Solr makes certain assumptions about what *should* be in the index based on the schema. So getting Solr to just use a Lucene index really involves knowing that Lucene used, say, a StandardAnalyzer followed by a LowerCaseFilter followed by for some field And there's no way I know of to find that information out from a raw Lucene index. If you don't get things to match, your results will...er...vary. But perhaps you can guess well enough to make it work, although upgrading will be a problem. I really think your effort would be best spent finding the original indexing or querying code if at all possible and seeing the way that code defined the analysis chain (in the code) for each fields and using that as a basis for creating a close enough schema. Best Erick On Thu, Jan 20, 2011 at 3:59 AM, Lee Goddard lee...@gmail.com mailto:lee...@gmail.com wrote: Thanks, Erick. I think my question comes down to, 'how does Luke know how to read the indexes?' I will try the Luke mailing list. Cheers Lee On 19/01/2011 17:49, Erick Erickson wrote: I don't really think this is possible/reasonable. There's nothing fixed about a Lucene index, you could index a field in different documents with any number of analysis chains. The tricky part here will, as you've discovered, find a way to match the Solr schema closely enough to get your desired results. Are you sure there's no way to re-index the data? Or find the original code that indexed it? Best Erick On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard lee...@gmail.com mailto:lee...@gmail.com wrote: I have to use some Lucene indexes, and Solr looks like the perfect solution. However, all I know about the Lucene indexes are what Luke tells me, and simply setting the schema to represent all fields as text does not seem to be working -- though as this is my first Solr, I am not sure if that is due to some other issue. Is there some way to ascertain how the Solr schema should describe the Lucene fields? Many thanks in anticipation Lee
Re: Solr with Unknown Lucene Index?
: Having found some code that searches a Lucene index, the only analyzers : referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer. : : How can I map this is Solr? The example schema doesn't seem to mention this, : and specifying 'text' or 'string' for every field doesn't seem to help. 1) that analyzer seems to be a Lucene.Net analyzer, so the java equivilent would be org.apache.lucene.analsys.standard.StandardAnalyzer 2) the example schema.xml demonstrates how to use an existing Analyzer implementation... !-- One can also specify an existing Analyzer class that has a default constructor via the class attribute on the analyzer element fieldType name=text_greek class=solr.TextField analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/ /fieldType -- 3) i'm getting the sense from your comments that you aren't very familiar with lucene/solr in general. An important thing to understand is that just because the code that created the index only ever uses StandardAnalyzer doens't mean it will make sense to use that analyzer on every field when attempting to search that field from solr -- some fields may have been indexed w/o using any analysis, some may be numeric fields with special encoding, some may be compressed, etc... trying to reverse engineer what the schema should look like to open any arbitrary index requires a lot of understanding about how that index was built -- it's easy to just dump the terms found in an index w/o knowing anything about where those terms came fom (that's what Luke does) but that doens't help your recognize things like this list of X words were treated as stop words, and don't appera in the index, so my query analyzer needs to be configured with those same X words In short: you can eaisly make solr *read* the index (just like luke) but that won't neccessarily help you *use* the index in a meaninigful way. -Hoss
Solr with Unknown Lucene Index?
I have to use some Lucene indexes, and Solr looks like the perfect solution. However, all I know about the Lucene indexes are what Luke tells me, and simply setting the schema to represent all fields as text does not seem to be working -- though as this is my first Solr, I am not sure if that is due to some other issue. Is there some way to ascertain how the Solr schema should describe the Lucene fields? Many thanks in anticipation Lee
Re: Solr with Unknown Lucene Index?
I don't really think this is possible/reasonable. There's nothing fixed about a Lucene index, you could index a field in different documents with any number of analysis chains. The tricky part here will, as you've discovered, find a way to match the Solr schema closely enough to get your desired results. Are you sure there's no way to re-index the data? Or find the original code that indexed it? Best Erick On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard lee...@gmail.com wrote: I have to use some Lucene indexes, and Solr looks like the perfect solution. However, all I know about the Lucene indexes are what Luke tells me, and simply setting the schema to represent all fields as text does not seem to be working -- though as this is my first Solr, I am not sure if that is due to some other issue. Is there some way to ascertain how the Solr schema should describe the Lucene fields? Many thanks in anticipation Lee