Re: Solr with Unknown Lucene Index?

2011-01-24 Thread Lee Goddard
Having found some code that searches a Lucene index, the only analyzers 
referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.


How can I map this is Solr? The example schema doesn't seem to mention 
this, and specifying 'text' or 'string' for every field doesn't seem to 
help.


Thanks
Lee

On 22/01/2011 21:50, Erick Erickson wrote:
Sorry, I was out of town for a while. Luke just reads stuff, it 
doesn't try to interpret any schema.
Solr makes certain assumptions about what *should* be in the index 
based on the schema.
So getting Solr to just use a Lucene index really involves knowing 
that Lucene used, say,
a StandardAnalyzer followed by a LowerCaseFilter followed by for some 
field And there's

no way I know of to find that information out from a raw Lucene index.

If you don't get things to match, your results will...er...vary. But 
perhaps you can guess

well enough to make it work, although upgrading will be a problem.

I really think your effort would be best spent finding the original 
indexing or querying
code if at all possible and seeing the way that code defined the 
analysis chain (in the
code) for each fields and using that as a basis for creating a close 
enough schema.



Best
Erick

On Thu, Jan 20, 2011 at 3:59 AM, Lee Goddard lee...@gmail.com 
mailto:lee...@gmail.com wrote:


Thanks, Erick. I think my question comes down to, 'how does Luke
know how to read the indexes?' I will try the Luke mailing list.

Cheers
Lee


On 19/01/2011 17:49, Erick Erickson wrote:

I don't really think this is possible/reasonable. There's nothing
fixed about
a Lucene index, you could index a field in different documents
with any
number of analysis chains. The tricky part here will, as you've
discovered,
find a way to match the Solr schema closely enough to get your
desired
results.

Are you sure there's no way to re-index the data? Or find the
original code
that indexed it?

Best
Erick

On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard lee...@gmail.com
mailto:lee...@gmail.com wrote:

I have to use some Lucene indexes, and Solr looks like the
perfect solution.

However, all I know about the Lucene indexes are what Luke
tells me, and simply setting the schema to represent all
fields as text does not seem to be working -- though as this
is my first Solr, I am not sure if that is due to some other
issue.

Is there some way to ascertain how the Solr schema should
describe the Lucene fields?

Many thanks in anticipation
Lee






Re: Solr with Unknown Lucene Index?

2011-01-24 Thread Chris Hostetter

: Having found some code that searches a Lucene index, the only analyzers
: referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.
: 
: How can I map this is Solr? The example schema doesn't seem to mention this,
: and specifying 'text' or 'string' for every field doesn't seem to help.

1) that analyzer seems to be a Lucene.Net analyzer, so the java equivilent 
would be org.apache.lucene.analsys.standard.StandardAnalyzer

2) the example schema.xml demonstrates how to use an existing Analyzer 
implementation...

!-- One can also specify an existing Analyzer class that has a
 default constructor via the class attribute on the analyzer element
fieldType name=text_greek class=solr.TextField
  analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/
/fieldType
--

3) i'm getting the sense from your comments that you aren't very familiar 
with lucene/solr in general.  An important thing to understand is that 
just because the code that created the index only ever uses 
StandardAnalyzer doens't mean it will make sense to use that analyzer on 
every field when attempting to search that field from solr -- some fields 
may have been indexed w/o using any analysis, some may be numeric fields 
with special encoding, some may be compressed, etc...

trying to reverse engineer what the schema should look like to open any 
arbitrary index requires a lot of understanding about how that index was 
built -- it's easy to just dump the terms found in an index w/o knowing 
anything about where those terms came fom (that's what Luke does) but that 
doens't help your recognize things like this list of X words were treated 
as stop words, and don't appera in the index, so my query analyzer needs 
to be configured with those same X words

In short: you can eaisly make solr *read* the index (just like luke) but 
that won't neccessarily help you *use* the index in a meaninigful way.

-Hoss


Solr with Unknown Lucene Index?

2011-01-19 Thread Lee Goddard
I have to use some Lucene indexes, and Solr looks like the perfect 
solution.


However, all I know about the Lucene indexes are what Luke tells me, and 
simply setting the schema to represent all fields as text does not seem 
to be working -- though as this is my first Solr, I am not sure if that 
is due to some other issue.


Is there some way to ascertain how the Solr schema should describe the 
Lucene fields?


Many thanks in anticipation
Lee


Re: Solr with Unknown Lucene Index?

2011-01-19 Thread Erick Erickson
I don't really think this is possible/reasonable. There's nothing fixed
about
a Lucene index, you could index a field in different documents with any
number of analysis chains. The tricky part here will, as you've discovered,
find a way to match the Solr schema closely enough to get your desired
results.

Are you sure there's no way to re-index the data? Or find the original code
that indexed it?

Best
Erick

On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard lee...@gmail.com wrote:

 I have to use some Lucene indexes, and Solr looks like the perfect
 solution.

 However, all I know about the Lucene indexes are what Luke tells me, and
 simply setting the schema to represent all fields as text does not seem to
 be working -- though as this is my first Solr, I am not sure if that is due
 to some other issue.

 Is there some way to ascertain how the Solr schema should describe the
 Lucene fields?

 Many thanks in anticipation
 Lee