handling token created/deleted events in an Index

2008-06-16 Thread Mathieu Lecarme
With the LUCENE-1297, the SpellChecker will be able to choose how to estimate distance between two words. Here are some other enhancement: * The capacity to synchronize the main Index and the SpellChecker Index. Handling tokens creation is easy, a simple TokenFilter can do the work. But fo

Re: WebLuke - include Jetty in Lucene binary distribution?

2008-04-25 Thread Mathieu Lecarme
markharw00d a écrit : Any word on getting this committed as a contrib? Not really changed the code since the message below. I can commit pretty much the contents of the zip file below any time you want. Do folks still feel comfortable with the "bloat" this adds to the Lucene source distro? T

Re: Storing phrases in index

2008-04-10 Thread Mathieu Lecarme
palexv a écrit : Thanks! Can you help me to get ShingleFilter class. It is absent in version 2.3.1. How can I get it? It's in the SVN version. You can backport it, are building your own, with a Stack. M. - To unsubscribe,

Re: Storing phrases in index

2008-04-09 Thread Mathieu Lecarme
palexv a écrit : Hello all. I have a question to advanced in lucene. I have a set of phrases which I need to store in index. Is there is a way of storing phrases as terms in index? How is the best way of writing such index? Should this field be tokenized? not tokenized What is the best wa

Re: Optimise Indexing time using lucene..

2008-04-09 Thread Mathieu Lecarme
lucene4varma a écrit : Hi all, I am new to lucene and am using it for text search in my web application, and for that i need to index records in database. We are using jdbc directory to store the indexes. Now the problem is when is start the process of indexing the records for the first time it

Re: shingles and punctuations

2008-04-08 Thread Mathieu Lecarme
27;s up to your app to know, unfortunately :-( I think the WikipediaTokenizer is the only one using flags currently in the Lucene. On Apr 6, 2008, at 10:43 PM, Mathieu Lecarme wrote: I'll use Token flags to specifiy first token in a sentence, but how it's works? how flag collisi

Re: shingles and punctuations

2008-04-06 Thread Mathieu Lecarme
u need sentence detection to take place further upstream. Then you could use the Token type or Token flags to indicate punctuation, sentences, whatever and we could patch the shingle filter to ignore these things, or break and move onto the next one. -Grant On Apr 6, 2008, at 7:23 PM, Mathieu Leca

shingles and punctuations

2008-04-06 Thread Mathieu Lecarme
The newly ShingleFilter is very helpful to fetch group of words, but it doesn't handle ponctuation or any separation. If you feed it with multiple sentences, you will get shingle that start in one sentences and end in the next. In order to avoid that, you can handle token positions, if there is

Re: WordNet synonyms overhead

2008-03-18 Thread Mathieu Lecarme
Harald Näger a écrit : Hi, I am especially interessted in the WordNet synonym expansion that was discussed in the "Lucene in Action" book. Is there anyone here on the list who has experience with this approach? I'm curious about how much the synonym expansion will increase the size of an in

Re: [jira] Created: (LUCENE-1229) NGramTokenFilter optimization in query phase

2008-03-14 Thread Mathieu Lecarme
Hiroaki Kawai (JIRA) a écrit : NGramTokenFilter optimization in query phase Key: LUCENE-1229 URL: https://issues.apache.org/jira/browse/LUCENE-1229 Project: Lucene - Java Issue Type: Improvement

Re: an API for synonym in Lucene-core

2008-03-13 Thread Mathieu Lecarme
is fundation. M. Otis Gospodnetic a écrit : Grant, I think Mathieu is hinting at his JIRA contribution (I looked at it briefly the other day, but haven't had the chance to really understand it). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message ---

an API for synonym in Lucene-core

2008-03-12 Thread Mathieu Lecarme
Why Lucen doesn't have a clean synonym API? WordNet contrib is not an answer, it provides an Interface for its own needs, and most of the world don't speak english. Compass provides a tool, just like Solr. Lucene is the framework for applications like Solr, Nutch or Compass, why don't backport l

[jira] Commented: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-03-07 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576415#action_12576415 ] Mathieu Lecarme commented on LUCENE-1190: - A simpler preview of Lexicon feat

Re: [jira] Commented: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-03-02 Thread Mathieu Lecarme
hum, quote and question disappear. Le 2 mars 08 à 13:32, Mathieu Lecarme (JIRA) a écrit : [ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574214 #action_12574214 ] Mathieu Lecarme commente

[jira] Commented: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-03-02 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574214#action_12574214 ] Mathieu Lecarme commented on LUCENE-1190: - With a FuzzyQuery, for example,

[jira] Commented: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-29 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573907#action_12573907 ] Mathieu Lecarme commented on LUCENE-1190: - News features: helper to extends q

[jira] Updated: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-29 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-1190: Attachment: aphone+lexicon.patch > a lexicon object for merging spellchecker and synon

[jira] Updated: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-25 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-1190: Attachment: aphone+lexicon.patch > a lexicon object for merging spellchecker and synon

[jira] Created: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-25 Thread Mathieu Lecarme (JIRA)
Type: New Feature Components: contrib/*, Search Affects Versions: 2.3 Reporter: Mathieu Lecarme Attachments: aphone+lexicon.patch Some Lucene features need a list of referring word. Spellchecking is the basic example, but synonyms is an other use. Other tools can

[jira] Updated: (LUCENE-956) phonem conversion from aspell dictionnary

2008-02-21 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-956: --- Attachment: aphone.patch New version, with more language (bg, br, da, de, el, en, fo, fr, is

Re: Need help for ordering results by specific order

2007-07-19 Thread Mathieu Lecarme
? > Can I sort (maybe score it atindexing time)? > > > > Mathieu Lecarme wrote: > >> Have a look of the book "Lucene in action", ch 6.1 : "using custom >> sort method" >> >> SortComparatorSource might be your friend. Lucene selecting

Re: Need help for ordering results by specific order

2007-07-18 Thread Mathieu Lecarme
Have a look of the book "Lucene in action", ch 6.1 : "using custom sort method" SortComparatorSource might be your friend. Lucene selecting stuff, and you sort, just like you wont. M. Le 18 juil. 07 à 10:29, savageboy a écrit : Hi, I am newer for lucene. I have a project for search engin

Re: for a better spellchecker

2007-07-13 Thread Mathieu Lecarme
The SpellChecker code mix indexing function, ngram treatment, and querying functions. Extending it will not produce clean code. Is it relevant to first refactor SpellChecker code for extracting dictionary reading function and indexing/searching functions? SpellChecker will get a method to add

[jira] Updated: (LUCENE-956) phonem conversion from aspell dictionnary

2007-07-11 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-956: --- Attachment: aphone.patch > phonem conversion from aspell dictionn

[jira] Created: (LUCENE-956) phonem conversion from aspell dictionnary

2007-07-11 Thread Mathieu Lecarme (JIRA)
Affects Versions: 2.2 Reporter: Mathieu Lecarme First step to improve Spellchecker's suggestions : phonem conversion for differents languages. The conversion code is build from aspell file description. The patch contains class for managing english, french, wallon and swedish. If

build.xml for a contrib wich depend on an other contrib

2007-07-10 Thread Mathieu Lecarme
The first version of aspell format phonem converter in java is almost finished. The source is buildable with ant, but, in the lucene trunk, it failed. The build depends on SpellChecker wich is build after. How can can I fix it? A statical spellChecker.jar in lib in my contrib? a "depends" i

for a better spellchecker

2007-07-06 Thread Mathieu Lecarme
Now, SpellChecker use the trigram algorithm to find similar words. It works well for keyboard fumbles, but not well enough for short words and for languages like french where a same sound can be wrote differently. Spellchecking is a classical computer task, and aspell provides some nice and

Re: [jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-28 Thread Mathieu Lecarme
Any news about the integration of this patch? M. Mathieu Lecarme (JIRA) a écrit : > [ > https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Mathieu Lecarme upd

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-13 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: elision-0.2.patch All suggested corrections are done. > Elision filter

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-13 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: (was: elision-0.2.patch) > Elision filter for simple french analyz

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-13 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: elision-0.2.patch All suggested corrections are done. > Elision filter

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-05 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: elision.patch > Elision filter for simple french analyz

[jira] Created: (LUCENE-906) Elision filter for simple french analyzing

2007-06-05 Thread Mathieu Lecarme (JIRA)
Reporter: Mathieu Lecarme If you don't wont to use stemming, StandardAnalyzer miss some french strangeness like elision. "l'avion" wich means "the plane" must be tokenized as "avion" (plane). This filter could be used with other latin language

using a french specific analyser without stemming

2007-06-04 Thread Mathieu Lecarme
For a project with a lot ofLucene search (via Compass), I had some troubles with stemming. Stemming is nice for enlarge search range, but make completion strange. So FrenchAnalyzer was not usable. A simpler StandardAnalyzer makes the job right, except for some french speciality, like elision.