Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

2013-10-28 Thread Benson Margulies
I'm working on tool that wants to construct analyzers 'at arms length' -- a bit like from a solr schema -- so that multiple dueling analyzers could be in their own class loaders at one time. I want to just define a simple configuration for char filters, tokenizer, and token filter. So it would be,

Re: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

2013-10-28 Thread Benson Margulies
OK, so, here I go again making a public idiot of myself. Could it be that the tokenizer factory is 'relatively recent' as in since 4.1? On Mon, Oct 28, 2013 at 7:39 AM, Benson Margulies wrote: > I'm working on tool that wants to construct analyzers 'at arms length' -- > a bit like from a solr

RE: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

2013-10-28 Thread Uwe Schindler
Hi Benson, the base factory class and the abstract Tokenizer, TpokenFilter and CharFilter factory classes are all in Lucene's analyzers-commons module (since 4.0). They are no longer part of Solr. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@t

Custom Segment Element

2013-10-28 Thread Geoff Cooney
Hi, We build a custom parallel index for geo fields containing a z-order based tree structure. This was originally developed against 3.x but we are looking at upgrading to lucene 4.x now. In the current implementation, we inject ourselves into the IndexingChain on indexing. The challenge is on me

Re: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

2013-10-28 Thread Benson Margulies
Just how 'experimental' is the SPI system at this point, if that's a reasonable question? On Mon, Oct 28, 2013 at 8:41 AM, Uwe Schindler wrote: > Hi Benson, > > the base factory class and the abstract Tokenizer, TpokenFilter and > CharFilter factory classes are all in Lucene's analyzers-commons

Re: Why is there a token filter factory abstraction but not a tokenizer factory abstraction in Lucene?

2013-10-28 Thread Benson Margulies
We have been in the habit of naming of classes on the theory that Java packages are doing work in the namespace. So, we'd name a class: com.basistech..BaseLinguisticsTokenFilterFactory So that means that our name in the SPI system is just 'BaseLinguistics'. That seems a bit problematic. I don't s

Anyone interested in a worked-out example of the SPIs for analyzer components?

2013-10-28 Thread Benson Margulies
I just built myself a sort of Solr-schema-in-a-test-tube. It's a class that builds a classloader on some JAR files and then uses the SPI mechanism to manufacture Analyzer objects made out of tokenizers and filters. I can make this visible in github, or even attach it to a JIRA, if anyone is intere

Lucene Corrupt Index Exception

2013-10-28 Thread arminder01
Hi, We have integrated Lucene with our program and one of the user is facing the Lucene Corrupt Index Exception. When I ran the CheckIndex command, I got the following result... followed by... Any idea what could have caused this index corruption? I will fix the index using the CheckIndex c

Re: Lucene Corrupt Index Exception

2013-10-28 Thread Michael McCandless
Hi, I only see whitespace under "following result..." and "followed by...". Were there any interesting exceptions during indexing? Mike McCandless http://blog.mikemccandless.com On Mon, Oct 28, 2013 at 5:21 PM, arminder01 wrote: > Hi, > > We have integrated Lucene with our program and one of

Re: Lucene Corrupt Index Exception

2013-10-28 Thread arminder01
Hi Mike, Thanks for your reply. I have removed the raw formatting from the text. Please let me know if you can see the complete text now. Thanks! Armin -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Corrupt-Index-Exception-tp4098138p4098152.html Sent from the Lucen

Re: Lucene Corrupt Index Exception

2013-10-28 Thread Michael McCandless
Hmm I still don't see the details in your email, but clicking through to Nabble I could see them: 2 of 8: name=_1bs4 docCount=19 compound=true hasProx=true numFiles=2 size (MB)=0.017 diagnostics = {optimize=false, mergeFactor=10, os.version=6.1, os=Windows Se rver 2008 R2, luce