I submitted a patch to handle Aspell phonetic rules. You can find it in JIRA.
On Thu, 4 Sep 2008 17:07:09 +0300, "Cam Bazz" <[EMAIL PROTECTED]> wrote:
> let me rephrase the problem. I already have a set of bad words. I want to
> avoid people inputting typos of the bad words.
> for example 'shit'
Yes. You can store data in lucene index and don't search on it : your
simdocid.
M.
On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
<[EMAIL PROTECTED]> wrote:
> Hi all,
>
> How can I implemented this scenario in lucene?
>
> suppose every document has three fields: docid, doctext and simdocid
Lucene is just an index. Where do you wont to store your data? in a db,
flatfiles, document with an url, in lucene?
M.
On Fri, 19 Sep 2008 16:25:27 +0800 (CST), xh sun
<[EMAIL PROTECTED]> wrote:
> Thank you. Mathieu.
>
> But the hits don't include the document doc02 i
Have a look at Compass : http://www.compass-project.org/
It's one of the easyest way to mix db and lucene.
M.
On Wed, 1 Oct 2008 00:43:57 -0700 (PDT), agatone <[EMAIL PROTECTED]>
wrote:
>
> Hi,
> I asked this question already on "lucene-general" list but also got
> advised
> to ask here too.
>
Crawling a DB is not a good idea. Indexing while writing/deleting is
clever.
Doing it inside the DB is a solution.
Java users like ORM. Compass plug Lucene indexation in the ORM's
transaction. If it's wrote or deleted, Lucene is aware.
Compass is opensource.
M.
On Wed, 1 Oct 2008 09:12:41 -0300,
Compass handles that nicely.
You can first query, lucene and building a IN (...) in your SQL db.
Or you can ask your SQL first, and handling it with a bitset in Lucene.
M.
On Thu, 23 Oct 2008 14:27:53 +0200, Niels Ott <[EMAIL PROTECTED]>
wrote:
> Hi everybody,
>
> I need to query for documents
you stem the search query and while indexing, so only "flash" is indexed
when "flashing" is read.
If you don't wont to hurt your index with half word, you can use a second
index, just like for spelling :
http://blog.garambrogne.net/index.php?post/2008/03/07/A-lexicon-approach-for-Lucene-index
M.
Thomas Arni a écrit :
> Hello Luceners
>
> I have started a new project and need to index pdf documents.
> There are several projects around, which allow to extract the content,
> like pdfbox, xpdf and pjclassic.
>
> As far as I studied the FAQ's and examples, all these
> tools allow simple text ex
sandeep chawla a écrit :
> Hi ,
>
> I am working on a search application . This application requires me to
> implement a stop filter
> using a stop word list. I have implemented a stop filter using lucene's API.
>
> I want to take my application one step further.
>
> I want to remove all the words
Laxmilal Menaria a écrit :
> Hello Everyone,
>
> I want to search 'abc-d' as exact keyword not 'abc d'. KeywordAnalyzer can
> be used for this purpose. StandradAnalyzer create different tokens for
> 'abc-d' as 'abc' and 'd'.
> But I can not use this, becuase I am indexing the content of a text fil
fuzzy are simply not indexed.
If you wont to search quickly with fuzzy search, you should index word
and their ngrams, it's the "do you mean" pattern.
you first select used word wich share ngram with the query word, the
distance is computed with levenstein, and you use this word as a
synon
Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix".
So, this
is some kind of "wildcard fuzzy" but not real fuzzy anymore.
I understand the optimitation but right now I hardly can image a
reasonable
use-case. Who care whether the levenstein distance is a the
beginnen, middle
christophe blin a écrit :
Hi,
thanks for the pointer to the ellision filter, but I am currently stuck with
lucene-core-2.2.0 found in maven2 central repository (do not contain this
class). I'll watch for an upgrade to 2.3 in the future.
you can backport it easily with copy-paste.
M.
--
Markus Fischer a écrit :
Hi,
[Resent: guess I sent the first before I completed my subscription,
just in case it comes up twice ...]
the subject may be a bit weird but I couldn't find a better way to
describe a problem I'm trying to solve.
If I'm not mistaken, one factor of scoring is the
Yes, I've found a tester!
A patch was submited for this kind of job :
https://issues.apache.org/jira/browse/LUCENE-1190
And here is the svn work in progress :
https://admin.garambrogne.net/subversion/revuedepresse/trunk/src/java/lexicon
And the web version :
https://admin.garambrogne.net/projets
[EMAIL PROTECTED] a écrit :
If you want something from an index it has to be IN the
index. So, store a
summary field in each document and make sure that field is part of the
query.
And how could one create automatically such a summary?
Have a look to http://alias-i.com/lingpipe/index.h
Dharmalingam a écrit :
I am working on some sort of search mechanism to link a requirement (i.e. a
query) to source code files (i.e., documents). For that purpose, I indexed
the source code files using Lucene. Contrary to traditional natural language
search scenario, we search for code files that
Petite Abeille a écrit :
A proposal for a Lua entry for the "Google Summer of Code" '08:
A Lua implementation of Lucene.
For me, Lua is just a glue between C coded object, a super config file.
Like used in lighttpd or WoW.
Lulu will work on top of Lucy?
Did I miss something?
M.
--
Grant Ingersoll a écrit :
On Feb 29, 2008, at 5:39 AM, Mathieu Lecarme wrote:
Petite Abeille a écrit :
A proposal for a Lua entry for the "Google Summer of Code" '08:
A Lua implementation of Lucene.
For me, Lua is just a glue between C coded object, a super config
fil
Hi
Mathieu Lecarme wrote:
On a related topic, I'm also searching for a way to suggest
alternate spelling of words to the user, when we found a word
which is very less frequent used in the index or not in the index
at all. I'm Austrian based, when I e.g. search for
"r
The easiest way is to split index by Document. In Lucene, index
contains Document and inverse index of Term. If you wont to put Term
in different place, Document will be duplicated on each index, with
only a part of their Term.
How will you manage node failure in your network?
They were so
Le 2 mars 08 à 03:05, 仇寅 a écrit :
Hi,
I agree with your point that it is easier to partition index by
document.
But the partition-by-keyword approach has much greater scalability
over the
partition-by-document approach. Each query involves communicating with
constant number of nodes; whi
he documents to be indexed are not
necessarily web pages. They are mostly files stored on each node's
file
system.
Node failures are also handled by replicas. The index for each term
will be
replicated on multiple nodes, whose nodeIDs are near to each other.
This
mechanism is handled
There's no syntax to restore stemmed word. Stemming is done while
reading the news, so the index never knows the complete word.
I submit a patch for that :
https://issues.apache.org/jira/browse/LUCENE-1190
Be careful, rssbandit use .net lucene, not the java version.
M.
secou a écrit :
Hi,
k and diff log should be the right approach.
M.
仇寅 a écrit :
Hi Mathieu,
You were right. In the early stage, I only intend to implement the basic
TermQuery and BooleanQuery function. Fuzzy match and partial match requires
more complicated algorithms. Cache consistency will certainly be my concern.
Not sure, you might want to ask on Nutch. From a strict language
standpoint, the notion of a stopword in my mind is a bit dubious. If
the word really has no meaning, then why does the language have it to
begin with? In a search context, it has been treated as of minimal
use in the early da
Borgman, Lennart a écrit :
Is there any possibility to use a thesaurus or an onthology when
indexing/searching with Lucene?
Yes. the WordNet contrib do that. And with a token filter, it's easy to
use your own.
What do you wont to do?
M.
---
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java/lexicon/src/java/org/apache/lucene/lexicon/QueryUtils.java
M.
Itamar Syn-Hershko a écrit :
Hi all,
I'm looking for the best way to inflate a query, so a query like: "synchronous AND colour" -- will become something lik
Here is a POC about using Lucene, via Compass, from PHP or Python (other
languages will come later), with only XML configuration, object
notation, and native use of scripting language.
http://blog.garambrogne.net/index.php?post/2008/03/11/Using-Compass-without-dirtying-its-hands-with-java
It's
Raghu Ram a écrit :
Hi all,
I guess this question is a bit off the track. Are there any language
identification modules inside Lucene ??? If not can somebody please suggest
me a good one.
Thank You.
nutch provide a tool for that, with ngram pattern, just like OO.o do it.
M.
---
Dragon Fly a écrit :
Hi,
I'd like to find out if I can do the following with Lucene (on Windows).
On server A:
- An index writer creates/updates the index. The index is physically stored on
server A.
- An index searcher searches against the index.
On server B:
- Maps to the index directory.
Itamar Syn-Hershko a écrit :
For what it worths, I did something similar in my BidiAnalyzer so I can
index both Hebrew/Semitic texts and English/Latin words without switching
analyzers, giving each the proper treatment. I did it simply by testing the
first char and looking at its numeric value -
Raghu Ram a écrit :
to complicate it further ... the text for which language identification has
to be done is small, in most cases a short sentence like " I like Pepsi ".
Can something be done for this ?
Drinking water?
More seriously, if ngram pattern language guessing is too ambigous,
sear
luceneuser a écrit :
Hi All,
I need help on retrieving results based on relevance + freshness. As of
now, i get based on either of the fields, either on relevance or freshness.
how can i achieve this. Lucene retrieves results on relevance but also
fetches old results too. i need more relevan
milu07 a écrit :
Hello,
My machine is Ubuntu 7.10. I am working with Apache Lucene. I have done with
indexer and tried with command line Searcher (the default command line
included in Lucene package: http://lucene.apache.org/java/2_3_1/demo2.html).
When I use this at command line:
java Searcher
Ivan Vasilev a écrit :
Hi Guys,
Has anybody integrated the Spell Checker contributed to Lucene.
http://blog.garambrogne.net/index.php?post/2008/03/07/A-lexicon-approach-for-Lucene-index
https://issues.apache.org/jira/browse/LUCENE-1190
I need advise from where to get free dictionary file (one
Ivan Vasilev a écrit :
Thanks Mathieu for your help!
The contribution that you have made to Lucene by this patch seems to
be great, but the hunspell dictionary is under LGPL which the lawyer
of our company does not like.
It's the spell tool used by Openoffice and firefox. Data must be
Ivan Vasilev a écrit :
Thanks Mathieu,
I tryed to checkout but without success. Anyway I can do it manually,
but as the contribution is still not approved from Lucene our chiefs
will not whant it to be included to our project by now.
It's a right decision. I hope the third patch will be
Wojtek H a écrit :
Hi all,
Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so thi
Marjan Celikik a écrit :
Hi everyone,
I know that there are packages that support the "Did you mean ... ?"
search features with lucene which tries to find the most suited
correct-word query.. however, so far I haven't encountered the opposite
search feature: given a correct query, find all docum
Marjan Celikik a écrit :
Mathieu Lecarme wrote:
You have to iterate over your query, if it's a BooleanQuery, keep it,
if it's a TermQuery, replace it with a BooleanQuery with all variants
of the Term with Occur.SHOULD
M.
Thanks.. however I don't fully understand what
Marjan Celikik a écrit :
Mathieu Lecarme wrote:
wever I don't fully understand what do you mean by "iterate over
your query". I would like a conceptual answer how is this done with
Lucene, not a technical one..
Your query is a tree, with BooleanQuery as branch and other que
[EMAIL PROTECTED] a écrit :
The need is:
I have millions of entries in database, each entry is in such format (more or
less)
ID NameDescription start (number) stop(number)
Currently my application uses the database to do search, queries are in the
following format:
Select * fr
Use shingleFilter.
I'm working on a wider SpellChecker, I'll post a third patch soon.
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java
M.
dreampeppers99 a écrit :
Hi,
I have two question about this GREAT tool.. (framework, library...
"whatever")
Well I decide put spe
Le 8 avr. 08 à 18:34, Karl Wettin a écrit :
dreampeppers99 skrev:
1º Why need I pass a Directory objecto (obligatory) on constructor of
SpellChecker?
Mainly because it is a nasty peice of code. But it does a good job.
Because spellChecker use a directory to store data. It can be
FSDirectory
I'm cool :) I just think you are overcomplicating things.
Yes... I can use two words and OR
Suposse I query on this
The Lord of Rings: Return of King
The Lord of Rings: Fellowship
The Lord of Rings: The Two towers
The Lord of Weapons
The Lord of War
Suposse an user search: "The Lord of Rings
Allen Atamer a écrit :
My dictionary filter currently implements next() and everything works well
when dictionary entries are replaced one-to-one. For example: Can =>
Canada.
A problem arises when I try to replace it with more than one word. Going
through next() I encounter "shutdown". But
have a look at Compass.
M.
Prashant Saraf a écrit :
Hi,
We are planning to provide search functionality in the a
web base application. Can we use Lucene for it to search data from
database like oracle and MS-Sql?
Thanks and Regards
प्रशांत सराफ
(Prashant Saraf)
S
Have a look at Compass 2.0M3
http://www.kimchy.org/searchable-cascading-mapping/
Your multiple index will be nice for massive write. In a classical
read/write ratio, Compass will be much easier.
M.
Rajesh parab a écrit :
Hi,
We are using Lucene 2.0 to index data stored inside
relational dat
Antony Bowesman a écrit :
We're planning to archive email over many years and have been looking
at using DB to store mail meta data and Lucene for the indexed mail
data, or just Lucene on its own with email data and structure stored
as XML and the raw message stored in the file system.
For so
Le 11 avr. 08 à 19:29, Rajesh parab a écrit :
Thanks for these pointers Mathieu.
We have earlier looked at Compass, but the main issue
with database index is DB vendor support for BLOB
locator. I understand that Oracle provides has this
support to get the partial data from BLOB, but I guess
Regarding data and its relationships - the use case I
am trying to solve is to partition my data into 2
indexes, a primary index that will contains majority
of the data and it is fairly static. The secondary
index will have related information for the same data
set in primary index and this relate
Rafael Turk a écrit :
Hi Folks,
I´m trying to load Google Web 1T 5 Gram to Lucene. (This corpus contains
English word n-grams and their observed frequency counts. The length of the
n-grams ranges from unigrams(single words) to five-grams)
I´m loading each ngram (each row is a ngram) as an
Rafael Turk a écrit :
Hi Mathieu,
*What do you wont to do?*
An spell checker and related keyword suggestion
Here is a spell checker wich I try to finalize :
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java
If you wont an ngram => popularity map, just us
Alex Chew a écrit :
Hi,
Does somebody have practice building a distributed application with lucene
and Hadoop/HFS?
Lucene 2.3.1 looks not explose HFSDirectory.
Any advice will be appreciated.
Regards,
Alex
have a look to Nutch.
M.
--
am doing wrong ?
Thank u.
______
Mathieu Decaffmeyer
Internet communications are not secure and therefore Fortis Banque Luxembourg
S.A. does not accept legal responsibility for the contents of this message. The
inform
help.
__
Mathieu Decaffmeyer
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 23, 2007 2:01 PM
To: java-user@lucene.apache.org
Subject: Re: Low hits
* This message comes from the Internet Network *
What ve
score for titles of the web pages for the people I
develop.
I will try Luke but for some reason I can't install it in my company,
Can someone give me some suggestions on what I should do ?
Thank u.
__
Mathieu Decaffmeyer
Web Developer
Fortis B
Hi,
I have one index with one document with title "Logistics"
I have a second index with the same document with title "Logistics" and
other documents (some contains the word "Logistics" as well)
If I execute a search title:Logistics in the first index, I have 0.31
for the document with title "Lo
Hi, I have a table of objects Hit,
I want to merge the different Hits objects of the table to have one Hits
object.
Is this possible ?
Thank u for any help !
__
Internet communications are not secure and therefore
.
-Original Message-
From: Nicolas Lalevée [mailto:[EMAIL PROTECTED]
Sent: Monday, January 29, 2007 12:15 PM
To: java-user@lucene.apache.org
Subject: Re: Merge Hits
* This message comes from the Internet Network *
Le Lundi 29 Janvier 2007 12:08, DECAFFMEYER MATHIEU a écrit :
> Hi, I h
Network *
Le Lundi 29 Janvier 2007 13:33, DECAFFMEYER MATHIEU a écrit :
> Thank u for your response,
> Actually I want to merge the Hits to get a better score,
> For example when user enter Hello
> I want to merge :
> title:Hello
> headlines:Hello
> summary:Hello
> content:H
IndexSearcher.explain(). That'll
tell you why.
Erik
On Jan 29, 2007, at 4:43 AM, DECAFFMEYER MATHIEU wrote:
> Hi,
>
> I have one index with one document with title "Logistics"
>
> I have a second index with the same document with title "Logistics"
>
Mon, 29 Jan 2007 21:52:58 +0100
: From: Soeren Pekrul <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Re: Score
:
: DECAFFMEYER MATHIEU wrote:
: >
: > Both are the same document but in different indexes,
: > the only difference i
equivalent ?!
Thank u.
______
Mathieu Decaffmeyer
Web Developer
Fortis Banque Luxembourg
IS Retail Banking - Web Content Management
Mobile : 0032 479 / 69 . 42 . 96
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 30, 2007
earch on this word I keep having a score of a bit more than
0.
Why is my boost not working ?
Thank u.
______
Mathieu Decaffmeyer
Internet communications are not secure and therefore Fortis Banque Luxembourg
S.A. d
Sorry I have it working ...
__
Mathieu Decaffmeyer
From: DECAFFMEYER MATHIEU [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 31, 2007 11:04 AM
To: java-user@lucene.apache.org
Subject: Boost
* This message
Hi, I have exactly the same question.
Correct me if I'm wrong :
it seems that I can do any I/O operations on the index while querying
because of the open IndexReader.
So if I had the same situation as gui (the poster of the thread), I can
just delete the old index while people query on it ?
Then b
Thank u Chris for your support.
__
Matt
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 01, 2007 12:54 AM
To: java-user@lucene.apache.org
Subject: RE: Score
* This message comes from the Internet Net
Hi,
I have a list of filenames like
Corporate.htm
Logistics.htm
Merchant.htm
that need to be deleted.
For now on I give this list to my Search application that reads the
idnex and give the results, and if the path contains one of the
filenames, I don't display this hit ... Not really proper
closed and reopened.
Erick
On 2/1/07, DECAFFMEYER MATHIEU <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have a list of filenames like
> Corporate.htm
> Logistics.htm
> Merchant.htm
>
> that need to be deleted.
>
> For now on I give this list to my Search
:
index.deleteDocuments(filed name, field value);
_
From: DECAFFMEYER MATHIEU [mailto:[EMAIL PROTECTED]
Sent: 01 February 2007 09:53
To: java-user@lucene.apache.org
Subject: Deleting document by file name
Hi,
I have a list of filenames
Hi all,
I have simple questions for which I can't find an answer by googling :
1)
I want to add headlines for a document :
Field headlinesField = new Field("headlines", headlines,
Field.Store.YES, Field.Index.TOKENIZED);
But how do I separate the headlines between them ?
Let's say I want to ad
ose are just the fields that demo uses, your application can
use any field it needs, like "headlines" above.
Otis
- Original Message
From: DECAFFMEYER MATHIEU <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, February 2, 2007 9:03:50 AM
Subject: Adding head
Hi,
The score depends of
1. the query
2. the matched document
3. the index.
I don't really understand why the index must influence the score (why it
ahs been implemented that way).
Let's say I have this page Logistics.htm
I have just one time the word "experience" in it.
It will get a high sc
Hi,
I need to merge indexes,
if I want the user to see the changes (the merged indexes), I heard I
need to close the index reader and re-open it again.
But I will need to do this avery x minutes for some reasons,
So I wondered what could happen if user does a query just when a re-open
of the read
My question is what happen when a re-opening of the reader occurs and in
the same time a user does a query on the index ? And are there solutions
for this.
__
Matt
-Original Message-
From: Michael McCandless [mailto:[EMAIL PROTECTED]
Sent: Thursda
the user is executing a query"...
Erick
On 2/22/07, DECAFFMEYER MATHIEU <[EMAIL PROTECTED]> wrote:
>
> My question is what happen when a re-opening of the reader occurs and
in
> the same time a user does a query on the index ? And are there
solutions
> for this.
>
>
Hi,
I store the Lucene Index of my web applications in a file system.
Oftenly, I need to add to this index another index also stored as file
system.
I have three questions :
* What is the best way to do this ?
Open an IndexReader on this newcoming index-file system
and use addIndexes(IndexR
Hi,
While updating my index I have the following error :
[3/1/07 9:44:19:214 CET] 76414c82 SystemErr R java.io.IOException:
Lock obtain timed out:
[EMAIL PROTECTED]:\TEMP\lucene-b56f455aea0a705baecaa4411d590aa2-write.lock
[3/1/07 9:44:19:214 CET] 76414c82 SystemErr R at
org.apache.l
I deleted the lock file, now it seems to work ...
When can such an error happen ?
__
Matt
From: DECAFFMEYER MATHIEU [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 01, 2007 9:56 AM
To: java-user@lucene.apache.org
nalyzer = new StandardAnalyzer();
}
What I want to achive is be able to use an englsih stemmer,
But I can't find any methods to associate my stemmer to my Analayzer.
I appreciate any help, thank u.
______
Mathieu Decaffmeyer
Web Developer
Fortis Banqu
I needed this myself not long time ago..
Here is a piece of code to get an Analyzer that will use a tokeniez and
an English stemmer, (for "bears" it will also return "bear" and vice
versa)
private static Analyzer createEnglishAnalyzer() {
return new Analyzer() {
public TokenStream tokenSt
Hi,
I need to merge several indexes (I call them incremental index) with my
main index.
Each incremental index can contain the same url's of the main index,
that's why I have a list of url's to update, that I will delete from the
main index before merging with an incremental index.
I have also
Hi,
I have put this question as "urgent" because I can notice I don't have
often answers,
If I'm asking the wrong way, please tell me...
Before I delete a document I search it in the index to be sure there is
a hit (via a Term object),
When I find a hit I delete the document (with the same Term
you would get some valuable
information from it.
http://www.linuxforums.org/forum/linux-newbie/6322-asking-good-questions
-2-a.html
Erick
On 3/13/07, DECAFFMEYER MATHIEU <[EMAIL PROTECTED]> wrote:
>
>
> Hi,
>
> I have put this question as "urgent" because I
deletion would return 0.
> On 3/13/07, DECAFFMEYER MATHIEU <[EMAIL PROTECTED]> wrote:
>>
>> Before I delete a document I search it in the index to be sure there
is a
>> hit (via a Term object),
>> When I find a hit I delete the document (with the same Term object),
>
Hi,
I am parsing this file called Logistics.htm
I have a field named "headlines" that contains word "clients" among others.
When I don't put a boost on this field, I have as score 0.06 when searching for
clients.
Then when I put a boost of "10", I have a score of 0.21
Yet I was expecting a score
have a look of opensearch.org specification, your self-completion
will work with IE7 and Firefox 2.
JSON serialization is quicker than XML stuff.
Be careful to limit the number of responses.
A search in "test*" works very well in my project with ten thousands
of documents.
Begin completion onl
If you do that, you enumerate every terms!!!
If you use a alphabeticaly sorted collection, you can stop, when
match stop, but, you have to test every terms before matching.
Lucene gives you tools to match begining of a term, just use it!!
M.
Le 8 juin 07 à 14:57, Patrick Turcotte a écrit :
H
Why don't use Document?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/
org/apache/lucene/document/Document.html
HTMLDocument manage HTML stuff like encoding, header, and other
specificity.
Nutch use specific word tools (http://lucene.apache.org/nutch/apidocs/
org/ap
You can work like with lucene spelling.
A specific Index with word as Document, boost with something
proportionnal of number of occurences (with log and math magic)
The magical stuff is n Fields with starting ngram, not stored, no
tokenized.
For example, if you wont to index the word "carott",
if you don't use the same tokenizer for indexing and searching, you will
have troubles like this.
Mixing exact match (with ") and wildcard (*) is a strange idea.
Typographical rules says that you have a space after a comma, no?
Your field is tokenized?
M.
Renaud Waldura a écrit :
> My very simple
Your request seems to be a two steps query.
First step, you select image, and then collection
Second step, you sort collection.
BitVector can help you?
M.
Antoine Baudoux a écrit :
> Hi,
>
> I'm developping an image database. Each lucene document
> representing an image contains (among ot
First step is to feed a Set with "collection"
Second step is to sort it.
With a sortedSet, you can do that, isnt'it?
M.
Antoine Baudoux a écrit :
> Could-you be more precise? I dont understand what you mean.
>
>
>
> On 15 Jun 2007, at 17:20, Mathieu Lecarme wrote:
ith at most 300 elements
you can sort it with strange rules.
M.
Antoine Baudoux a écrit :
> The problem is that i want lucene to do the sorting, because the query
> qould return thousands of results, and I'm displaying documents one
> page at a time.
>
>
> On 15 Jun 2007, at 17
e rules.
>>
>> M.
>>
>> Antoine Baudoux a écrit :
>>> The problem is that i want lucene to do the sorting, because the query
>>> qould return thousands of results, and I'm displaying documents one
>>> page at a time.
>>>
>>>
Walt explain differently what I said.
Lucene can be efficiently use for selecting objects, without sorting
or scoring anything, then, with id stored in Lucene, you can sort
yourself with a simple Sortable implementation.
The only limit is that lucene gives you not too much results, with
your
Compass use a trick to manage father-son indexation.
If you index "collection", with a fields Date, wich are the newest
picture inside, and putting all picture's keyword to it collection?
Then, with a keyword search, you will find the collection with the
most tag occurence number and date s
Lee Li Bin a écrit :
> Hi,
>
> I still met problem for searching of Chinese words.
> XMl file which is the datasource and analyzer has already been encoded.
> Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it
> still can't get any results.
>
> 1.do we need any encoding
1 - 100 of 118 matches
Mail list logo