from:"\"Nicolas Maisonneuve\""

Re: Lucene Vs Ixiasoft

2004-12-08 Thread Nicolas Maisonneuve

hi,
think first of the relevance of the model in this 2 search engine  for
XML document retrieval.

Lucene is classic fulltext search engine  using the vector space
model. this model is efficient for indexing  no structred document
(like plain text file ) and not made for structured document like XML.
there is a XML demo of lucene sandbox but it's not really very
efficient because it doesn't take advantage of  the document strucutre
in the indexing and the ranking model, so it lose semantic information
and relevance.

i don't know Ixiasoft, check the information to see how it index and
rank XML document.

nicolas 

On Wed, 8 Dec 2004 14:20:45 -0500, Praveen Peddi
<[EMAIL PROTECTED]> wrote:
> Does anyone know about Ixiasoft server. Its a xml repository/search engine. 
> If anyone knows about it, does he/she also know how it is compared to Lucene? 
> Which is fast?
> 
> Praveen
> **
> Praveen Peddi
> Sr Software Engg, Context Media, Inc.
> email:[EMAIL PROTECTED]
> Tel:  401.854.3475
> Fax:  401.861.3596
> web: http://www.contextmedia.com
> **
> Context Media- "The Leader in Enterprise Content Integration"
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: dotLucene (port of Jakarta Lucene to C#)

2004-12-01 Thread Nicolas Maisonneuve

hy george
is the C# lucene faster than java lucene  ?  (because it seems to me
that  C# is faster than java, isn't it  ?)

nicolas maisonneuve



On Sun, 28 Nov 2004 21:08:30 -0500, George Aroush <[EMAIL PROTECTED]> wrote:
> Hi folks,
> 
> I am please to announce the availability of dotLucene 1.4.0 RC1.  dotLucene
> is a complete port of Jakarta Lucene to C#.  The port is almost a
> line-by-line port and it includes the demos as well as all the JUnit tests.
> An index created by dotLucene is cross compatible with Jakarta Lucene and
> via verse.
> 
> Please visit http://sourceforge.net/projects/dotlucene/ to learn more about
> dotLucene and to download the source code.
> 
> Best regards,
> 
> -- George Aroush
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Filter for a search refinement

2004-11-21 Thread Nicolas Maisonneuve

hmm just a question ..

- in the normal indexSearcher method  
there is a  if (score >0.0F || filter.get(doc)) { doc  in the hit}

- but in the queryFilter , there  isn't a minimum score condition 

normal or not ?

nicolas



On Sun, 21 Nov 2004 14:34:00 +0100, Nicolas Maisonneuve
<[EMAIL PROTECTED]> wrote:
> yes ...it's the same kind of feature... (i didn't see this Filter !,
> shame on me)
> but my method is maybe faster because with the queryFilter an internal
> search is launched and not with my method
> 
> nicolas
> 
> 
> 
> 
> On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
> <[EMAIL PROTECTED]> wrote:
> > Nicolas - how does your filter differ from the capabilities available
> > from the built-in QueryFilter?  It seems at first glance to be nearly
> > the same thing.
> >
> > Erik
> >
> >
> >
> >
> > On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
> >
> > > I developped a filter to seach in filtering the search with anterior
> > > hits (search refinement)
> > >
> > > see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
> > >
> > > Nicolas Maisonneuve
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Filter for a search refinement

2004-11-21 Thread Nicolas Maisonneuve

yes ...it's the same kind of feature... (i didn't see this Filter !,
shame on me)
but my method is maybe faster because with the queryFilter an internal
search is launched and not with my method

nicolas



On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
> Nicolas - how does your filter differ from the capabilities available
> from the built-in QueryFilter?  It seems at first glance to be nearly
> the same thing.
> 
> Erik
> 
> 
> 
> 
> On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
> 
> > I developped a filter to seach in filtering the search with anterior
> > hits (search refinement)
> >
> > see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
> >
> > Nicolas Maisonneuve
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

hasFieldFilter contribution

2004-11-03 Thread Nicolas Maisonneuve




I developed a 
Filter that restricts search results to documents that has terms in specific 
fields
(because currently  we can't search with 
lucene documents with this kind of feature (a document with present/absent of 
values in specific fields)
 
nicolas 
 
package org.apache.lucene.search;


import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import java.util.BitSet;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.TermDocs;
import java.util.*;


/**
 * A Filter that restricts search results to documents that has terms in specific fields
 * (OR operator: the documents that has terms in field1 or in field2)
 * @author Nicolas Maisonneuve
 */
public class HasFieldFilter
extends Filter {

private Set fieldnames;


/**
 * a array of the  field's names
 * @param fieldname String[] a array of field's names
 */
public HasFieldFilter (String[] fieldnames) {
this.fieldnames=new HashSet();
for (int i=0; i-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

new version of spell checker

2004-10-21 Thread Nicolas Maisonneuve

UPDATE
- sort fixed (the sort was inversed!) 
- set gram dynamicaly (depending of the length of the word) 
- use the FuzzyQuery score: ((edit distance)/(length of word))
- new Dictionary interface + LuceneDictionary  and PlaintextDictionary implementation
- replace addWords method by indexDictionary(Dictionnary dic)
- add  a new public method: boolean exist(word) 
- add a build.xml

see the wiki page http://wiki.apache.org/jakarta-lucene/SpellChecker

1 - Could we put the spellchecker to the sandbox.. it'll be easier to maintain than 
use Bugzilla/wiki process ?

2 - Jonathan Hager: Could you test this version with our dictionary and said me the 
results ?

3 - I search a french dictonary , someone has a URL where i could download it ?

thanks to Jonathan Hager, and Aad Nales for your suggestions / observations ;-)

Nicolas Maisonneuve

Spell checker

2004-10-11 Thread Nicolas Maisonneuve

hy lucene users
i developed a Spell checker for lucene inspired by the David Spencer code

see the wiki doc: http://wiki.apache.org/jakarta-lucene/SpellChecker

Nicolas Maisonneuve

Re: a search like Google

2004-02-15 Thread Nicolas Maisonneuve




hy, 
 
>This will give you (+title:i +title:love +title:lucene)^2 (+author:i 
+author:love +author:lucene) \>(+content:i +content:love 
+content:lucene)
this is not the same thing than 
(title:i^2 author:i content:i) +(title:love^2 author:love content:love) 
+(title:lucene^2 author:lucene content:lucene)
because in the first we must have all the terms in a field  and in the 
second just one term is necessary
 
the david Spencer is good but we can use the lucene 
syntax query  like phrase query, prefix, boolean, etc..
so to use all the lucene syntax , we have to hack 
the parser 
 
see my fulltextparser code 
..
 
i made a parser
package org.apache.lucene.queryParser;



/**
 * Title: 
 * Description: 
 * Copyright: Copyright (c) 2003
 * Company: 
 * @author Maisonneuve Nicolas
 * @version 1.0
 */
import java.io.IOException;
import java.io.StringReader;
import java.util.Vector;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.CharStream;
import org.apache.lucene.queryParser.FastCharStream;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParserConstants;
import org.apache.lucene.queryParser.QueryParserTokenManager;
import org.apache.lucene.queryParser.Token;
import org.apache.lucene.queryParser.TokenMgrError;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.WildcardQuery;


public class fulltextParser
implements QueryParserConstants {

private static final int CONJ_NONE=0;

private static final int CONJ_AND=1;

private static final int CONJ_OR=2;

private static final int MOD_NONE=0;

private static final int MOD_NOT=10;

private static final int MOD_REQ=11;

public static final int DEFAULT_OPERATOR_OR=0;

public static final int DEFAULT_OPERATOR_AND=1;



/** The actual operator that parser uses to combine query terms */
private int operator=DEFAULT_OPERATOR_AND;



/**
 * Whether terms of wildcard and prefix queries are to be automatically
 * lower-cased or not.  Default is true.
 */
boolean lowercaseWildcardTerms=true;

Analyzer analyzer;

String field;

String[] fields;

Float[] boosts;

int phraseSlop=0;



/** Parses a query string, returning a [EMAIL PROTECTED] org.apache.lucene.search.Query}.
 *  @param query	the query string to be parsed.
 *  @param fields	the default field for query terms.
 *  @param analyzer   used to find terms in the query text.
 *  @throws ParseException if the parsing fails
 */
static public Query parse (String query, String fields[], Analyzer analyzer) throws ParseException {
try {
fulltextParser parser=new fulltextParser(fields, analyzer);
return parser.parse(query);
}
catch(TokenMgrError tme) {
throw new ParseException(tme.getMessage());
}
}

   /** Parses a query string, returning a [EMAIL PROTECTED] org.apache.lucene.search.Query}.
 *  @param query	the query string to be parsed.
 *  @param fields	the default field for query terms.
	 *  @param boost	the boost of each field in the fields parameter
 *  @param analyzer   used to find terms in the query text.
 *  @throws ParseException if the parsing fails
 */
static public Query parse (String query, String fields[], Float boost[], Analyzer analyzer) throws ParseException {
try {
fulltextParser parser=new fulltextParser(fields, boost, analyzer);
return parser.parse(query);
}
catch(TokenMgrError tme) {
throw new ParseException(tme.getMessage());
}
}



/** Constructs a query parser.
 *  @param field	the default field for query terms.
 *  @param analyzer   used to find terms in the query text.
 */
public fulltextParser (String[] fields, Analyzer a) {
this(fields, null, a);
}


public fulltextParser (String[] fields, Float boosts[], Analyzer a) {
this(new FastCharStream(new StringReader("")));
analyzer=a;
this.fields=fields;
this.boosts=boosts;
field=fields[0];
}



/** Parses a query string, returning a
 * Query.
 *  @param query	the query string to be parsed.
 *  @throws ParseException if the parsing fails
 *  @throws TokenMgrError if ther parsing fails
 */
public Query parse (String query) throws ParseException, TokenMgrError {
ReInit(new FastCharStream(new StringReader(query)));
return Query(field);
}



/**
 * Sets the default slop for phrases.  If zero, then exact phras

a search like Google

2004-02-12 Thread Nicolas Maisonneuve

hy, 
i have a index with the fields :
title 
author
content 

i would make the same search type than Google  ( a form with a textfiel). When the 
user search "i love lucene" (it's not a phrase query  but just the text in the 
textfield ), i would like search  in all the index fields but with a specific weight 
boost for each field. In this example title weight=2, author=1 content=1

the results would be (i suppose  the default operator is "and") :  (title:i^2 author:i 
content:i) +(title:love^2 author:love content:love) +(title:lucene^2 author:lucene 
content:lucene)

but must i modify the QueryParser  or is there a different way for do this ?
( because i modified the QueryParser and it's work but if there is a cleaner way to do 
this , i take it ! )

nicolas maisonneuve

spans directory in the CVS version

2004-02-11 Thread Nicolas Maisonneuve

hy,
recently, there is a new subdirectory "spans" in the search directory. what is it  and 
how use it ?

thanks in advance
nicolas maisonneuve

featues page in the Lucene web site

2004-02-09 Thread Nicolas Maisonneuve

hy, 
it would be great if a page with all features of lucene would be created in the apache 
lucene site !

in the sourceforge website (http://lucene.sourceforge.net/features.html) ,there is 
this page..but is it updated ?

thanks in advance
nicolas maisonneuve

Re: difference in javadoc and faq similarity expression

2004-01-19 Thread Nicolas Maisonneuve

but in the javadoc expression, there no the TFIDF weight for query , juste
for the document and the Cosine   use the both.. hmm  strange

i have a report to write about lucene and i don't know
what formula write in the paper and how explain it



- Original Message - 
From: "Karl Koch" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Sunday, January 18, 2004 11:54 PM
Subject: Re: difference in javadoc and faq similarity expression


> I would rely on the JavaDoc since this one is up to date. The latest
version
> 1.3 final is just a few weeks old. Some entries in the FAQ however are
still
> from 2001...
>
> Cheers,
> Karl
>
> > hy,
> > i have troubles in find the correspondance betwwen the javadoc and faq
> > similarity expression
> >
> > in the Similarity Javadoc
> >
> > score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) *
> > lengthNorm(t.field in d)  * coord(q,d) * queryNorm(q) ]
> >
> > in the FAQ
> >
> > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
boost_t)
> > *
> > coord_q_d
> >
> > In FAQ | In Javadoc
> > 1 / norm_q = queryNorm(q)
> > 1 / norm_d_t=lengthNorm(t.field in d)
> > coord_q_d=coord(q,d)
> > boost_t=getBoost(t.field in d)
> > idf_t=idf(t)
> > tf_d=tf(t in d)
> >
> > but
> > where is the javadoc expression for "tf_q" faq expression
> >
> > nicolas
> >
> > - Original Message - 
> > From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > Sent: Sunday, January 18, 2004 9:33 PM
> > Subject: Re: theorical informations
> >
> >
> > > thanks Karl !
> > >
> > > - Original Message - 
> > > From: "Karl Koch" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > > Sent: Sunday, January 18, 2004 9:22 PM
> > > Subject: Re: theorical informations
> > >
> > >
> > > > Actually, finding an answer to this question is not really
important.
> > More
> > > > important is if you can do what you want with it. If you result
comes
> > from
> > > a
> > > > prob. model or a vector space model, who cares if you just want to
> > give
> > a
> > > > query and back a hit list of results?
> > > >
> > > > Possibliy some people here will strongly disagree... ;-) (?)
> > > >
> > > > Karl
> > > >
> > > > > Hello Nicolas,
> > > > >
> > > > > I am sure you mean IR (Information Retrieval) Model. Lucene
> > implements
> > a
> > > > > Vector Space Model with integrated Boolean Model. This means the
> > Boolean
> > > > > model
> > > > > is integrated with a Boolean query language but mapped into the
> > Vector
> > > > > Space.
> > > > > Therefore you have ranking even though the traditional Boolean
model
> > > does
> > > > > not
> > > > > support this. Cosine similarity is used to measure similarity
> > between
> > > > > documents and the query. You can find this in a very long
dicussion
> > here
> > > > > when you
> > > > > search the archive...
> > > > >
> > > > > Karl
> > > > >
> > > > > > hy ,
> > > > > > i have 2  theorycal questions :
> > > > > >
> > > > > > i searched in the mailing list the R.I. model implemented in
> > Lucene
> > ,
> > > > > > but no precise answer.
> > > > > >
> > > > > > 1) What is the R.I model implemented in Lucene ? (ex: Boolean
> > Model,
> > > > > > Vector Model,Probabilist Model, etc... )
> > > > > >
> > > > > > 2) What is the theory Similarity function  implemented in Lucene
> > > > > > (Euclidian, Cosine, Jaccard, Dice)
> > > > > >
> > > > > > (why this important informations is not in the Lucene Web site
or
> > in
> > > the
> > > > >
> > > > > > faq ? )
> > > > > >
> > > > >
> > > > > -- 
> > > > > +++ GMX - die erste Adresse für Mail, Message, More +++
> > > > > Bis 31.1.: TopMail + Digicam für nur 29 EUR
> > http://www.gmx.net/topmail
> > >

difference in javadoc and faq similarity expression

2004-01-18 Thread Nicolas Maisonneuve

hy,
i have troubles in find the correspondance betwwen the javadoc and faq
similarity expression

in the Similarity Javadoc

score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) *
lengthNorm(t.field in d)  * coord(q,d) * queryNorm(q) ]

in the FAQ

score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) *
coord_q_d

In FAQ | In Javadoc
1 / norm_q = queryNorm(q)
1 / norm_d_t=lengthNorm(t.field in d)
coord_q_d=coord(q,d)
boost_t=getBoost(t.field in d)
idf_t=idf(t)
tf_d=tf(t in d)

but
where is the javadoc expression for "tf_q" faq expression

nicolas

- Original Message - 
From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Sunday, January 18, 2004 9:33 PM
Subject: Re: theorical informations


> thanks Karl !
>
> - Original Message - 
> From: "Karl Koch" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Sunday, January 18, 2004 9:22 PM
> Subject: Re: theorical informations
>
>
> > Actually, finding an answer to this question is not really important.
More
> > important is if you can do what you want with it. If you result comes
from
> a
> > prob. model or a vector space model, who cares if you just want to give
a
> > query and back a hit list of results?
> >
> > Possibliy some people here will strongly disagree... ;-) (?)
> >
> > Karl
> >
> > > Hello Nicolas,
> > >
> > > I am sure you mean IR (Information Retrieval) Model. Lucene implements
a
> > > Vector Space Model with integrated Boolean Model. This means the
Boolean
> > > model
> > > is integrated with a Boolean query language but mapped into the Vector
> > > Space.
> > > Therefore you have ranking even though the traditional Boolean model
> does
> > > not
> > > support this. Cosine similarity is used to measure similarity between
> > > documents and the query. You can find this in a very long dicussion
here
> > > when you
> > > search the archive...
> > >
> > > Karl
> > >
> > > > hy ,
> > > > i have 2  theorycal questions :
> > > >
> > > > i searched in the mailing list the R.I. model implemented in Lucene
,
> > > > but no precise answer.
> > > >
> > > > 1) What is the R.I model implemented in Lucene ? (ex: Boolean Model,
> > > > Vector Model,Probabilist Model, etc... )
> > > >
> > > > 2) What is the theory Similarity function  implemented in Lucene
> > > > (Euclidian, Cosine, Jaccard, Dice)
> > > >
> > > > (why this important informations is not in the Lucene Web site or in
> the
> > >
> > > > faq ? )
> > > >
> > >
> > > -- 
> > > +++ GMX - die erste Adresse für Mail, Message, More +++
> > > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> > -- 
> > +++ GMX - die erste Adresse für Mail, Message, More +++
> > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: theorical informations

2004-01-18 Thread Nicolas Maisonneuve

thanks Karl !

- Original Message - 
From: "Karl Koch" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Sunday, January 18, 2004 9:22 PM
Subject: Re: theorical informations


> Actually, finding an answer to this question is not really important. More
> important is if you can do what you want with it. If you result comes from
a
> prob. model or a vector space model, who cares if you just want to give a
> query and back a hit list of results?
>
> Possibliy some people here will strongly disagree... ;-) (?)
>
> Karl
>
> > Hello Nicolas,
> >
> > I am sure you mean IR (Information Retrieval) Model. Lucene implements a
> > Vector Space Model with integrated Boolean Model. This means the Boolean
> > model
> > is integrated with a Boolean query language but mapped into the Vector
> > Space.
> > Therefore you have ranking even though the traditional Boolean model
does
> > not
> > support this. Cosine similarity is used to measure similarity between
> > documents and the query. You can find this in a very long dicussion here
> > when you
> > search the archive...
> >
> > Karl
> >
> > > hy ,
> > > i have 2  theorycal questions :
> > >
> > > i searched in the mailing list the R.I. model implemented in Lucene ,
> > > but no precise answer.
> > >
> > > 1) What is the R.I model implemented in Lucene ? (ex: Boolean Model,
> > > Vector Model,Probabilist Model, etc... )
> > >
> > > 2) What is the theory Similarity function  implemented in Lucene
> > > (Euclidian, Cosine, Jaccard, Dice)
> > >
> > > (why this important informations is not in the Lucene Web site or in
the
> >
> > > faq ? )
> > >
> >
> > -- 
> > +++ GMX - die erste Adresse für Mail, Message, More +++
> > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
> -- 
> +++ GMX - die erste Adresse für Mail, Message, More +++
> Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

theorical informations

2004-01-18 Thread Nicolas Maisonneuve

hy , 
i have 2  theorycal questions :

i searched in the mailing list the R.I. model implemented in Lucene , but no precise 
answer.

1) What is the R.I model implemented in Lucene ? (ex: Boolean Model, Vector 
Model,Probabilist Model, etc... ) 

2) What is the theory Similarity function  implemented in Lucene (Euclidian, Cosine, 
Jaccard, Dice)

(why this important informations is not in the Lucene Web site or in the faq ? )

IndexReader.document(int i)

2004-01-17 Thread Nicolas Maisonneuve

hy,
i would like to know  
in the IndexReader.document(int i)
what is this number  i ? 
if the the first document is the oldest document indexed 
and the last the youngest ? (so we can sort by date  easyly) ?

thank in advance

nico

RE: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve


- Original Message - 
From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, January 15, 2004 3:58 PM
Subject: Re: Betreff: Copy Directory to Directory function ( backup)


> thanks ! the copy function works
> but i have troubles..
> I used a scheduled task to backup the index.
> for the test , a backup is made all the 15 secondes.
> and sometime , in the backup process,
> when i clean a directory with :
> Directory target=FSDirectory.getDirectory(selected_backup_dir, true);
> i have a Exception :
> java.io.IOException: couldn't delete segments
>  at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
>  at org.apache.lucene.store.FSDirectory.(FSDirectory.java:151)
>  at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132)
>  at
>
lab.crip5.ECR.cocoon.components.IndexBackupJob.backup(IndexBackupJob.java:13
> 5)
>
> the exception happend sometimes
>
> my backup function is simple :
>
>   private void backup (String index_to_backup) throws Exception {
> getLogger().info("begin backup index "+index_to_backup+" at "+new
> Date()+"...");
>
> // get the directory of the index
> Directory
> source=index_manager.getIndex(index_to_backup).getDirectory();
>
> // select target backup directory
> File target_backup_dir=select_backup(index_to_backup);
>
> // clean the old index
> Directory target=FSDirectory.getDirectory(new_backup_dir, true);
>
> // backup
> copy(source, target);
>
> target.close();
>
> getLogger().info("end backup index "+index_to_backup+" at "+new
> Date()+"...ok");
> }
>
> - Original Message - 
> From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Thursday, January 15, 2004 3:21 PM
> Subject: Fw: Betreff: Copy Directory to Directory function ( backup)
>
>
> >
> > - Original Message - 
> > From: "Nick Smith" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Thursday, January 15, 2004 2:58 PM
> > Subject: Betreff: Copy Directory to Directory function ( backup)
> >
> >
> > > Hi Nico,
> > >This is the method that I use for backing up my indices...
> > >
> > > Good Luck!
> > >
> > > Nick
> > >
> > >   /**
> > >* Copy contents of dir, erasing current contents.
> > >*
> > >* This can be used to write a memory-based index to disk.
> > >*
> > >* @param dir a Directory value
> > >* @exception IOException if an error occurs
> > >*/
> > >   public void copyDir(Directory dir) throws IOException {
> > > // remove current contents of directory
> > > create();
> > >
> > > final String[] ar = dir.list();
> > > for (int i = 0; i < ar.length; i++)
> > > {
> > >   // make place on disk
> > >   OutputStream os = createFile(ar[i]);
> > >   // read current file
> > >   InputStream is = dir.openFile(ar[i]);
> > >
> > >   final int MAX_CHUNK_SIZE = 131072;
> > >   byte[] buf = new byte[MAX_CHUNK_SIZE];
> > >   int remainder = (int)is.length();
> > >   while (remainder > 0) {
> > > int chunklen = (remainder > MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE :
> > remainde!
> > > is.readBytes(buf, 0, chunklen);
> > > os.writeBytes(buf, chunklen);
> > > remainder -= chunklen;
> > >   }
> > >
> > >   // graceful cleanup
> > >   is.close();
> > >   os.close();
> > > }
> > >   }
> > >
> > >
> > >
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Betreff: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve

thanks ! the copy function works
but i have troubles..
I used a scheduled task to backup the index.
for the test , a backup is made all the 15 secondes.
and sometime , in the backup process,
when i clean a directory with :
Directory target=FSDirectory.getDirectory(selected_backup_dir, true);
i have a Exception :
java.io.IOException: couldn't delete segments
 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
 at org.apache.lucene.store.FSDirectory.(FSDirectory.java:151)
 at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132)
 at
lab.crip5.ECR.cocoon.components.IndexBackupJob.backup(IndexBackupJob.java:13
5)

the exception happend sometimes

my backup function is simple :

  private void backup (String index_to_backup) throws Exception {
getLogger().info("begin backup index "+index_to_backup+" at "+new
Date()+"...");

// get the directory of the index
Directory
source=index_manager.getIndex(index_to_backup).getDirectory();

// select target backup directory
File target_backup_dir=select_backup(index_to_backup);

// clean the old index
Directory target=FSDirectory.getDirectory(new_backup_dir, true);

// backup
copy(source, target);

target.close();

getLogger().info("end backup index "+index_to_backup+" at "+new
Date()+"...ok");
}

- Original Message - 
From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 15, 2004 3:21 PM
Subject: Fw: Betreff: Copy Directory to Directory function ( backup)


>
> - Original Message - 
> From: "Nick Smith" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Thursday, January 15, 2004 2:58 PM
> Subject: Betreff: Copy Directory to Directory function ( backup)
>
>
> > Hi Nico,
> >This is the method that I use for backing up my indices...
> >
> > Good Luck!
> >
> > Nick
> >
> >   /**
> >* Copy contents of dir, erasing current contents.
> >*
> >* This can be used to write a memory-based index to disk.
> >*
> >* @param dir a Directory value
> >* @exception IOException if an error occurs
> >*/
> >   public void copyDir(Directory dir) throws IOException {
> > // remove current contents of directory
> > create();
> >
> > final String[] ar = dir.list();
> > for (int i = 0; i < ar.length; i++)
> > {
> >   // make place on disk
> >   OutputStream os = createFile(ar[i]);
> >   // read current file
> >   InputStream is = dir.openFile(ar[i]);
> >
> >   final int MAX_CHUNK_SIZE = 131072;
> >   byte[] buf = new byte[MAX_CHUNK_SIZE];
> >   int remainder = (int)is.length();
> >   while (remainder > 0) {
> > int chunklen = (remainder > MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE :
> remainde!
> > is.readBytes(buf, 0, chunklen);
> > os.writeBytes(buf, chunklen);
> > remainder -= chunklen;
> >   }
> >
> >   // graceful cleanup
> >   is.close();
> >   os.close();
> > }
> >   }
> >
> >
> >
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Fw: Betreff: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve


- Original Message - 
From: "Nick Smith" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 15, 2004 2:58 PM
Subject: Betreff: Copy Directory to Directory function ( backup)


> Hi Nico,
>This is the method that I use for backing up my indices...
>
> Good Luck!
>
> Nick
>
>   /**
>* Copy contents of dir, erasing current contents.
>*
>* This can be used to write a memory-based index to disk.
>*
>* @param dir a Directory value
>* @exception IOException if an error occurs
>*/
>   public void copyDir(Directory dir) throws IOException {
> // remove current contents of directory
> create();
>
> final String[] ar = dir.list();
> for (int i = 0; i < ar.length; i++)
> {
>   // make place on disk
>   OutputStream os = createFile(ar[i]);
>   // read current file
>   InputStream is = dir.openFile(ar[i]);
>
>   final int MAX_CHUNK_SIZE = 131072;
>   byte[] buf = new byte[MAX_CHUNK_SIZE];
>   int remainder = (int)is.length();
>   while (remainder > 0) {
> int chunklen = (remainder > MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE :
remainde!
> is.readBytes(buf, 0, chunklen);
> os.writeBytes(buf, chunklen);
> remainder -= chunklen;
>   }
>
>   // graceful cleanup
>   is.close();
>   os.close();
> }
>   }
>
>
>




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve

hmm, yes
but i don't want open a indexWriter for this
and there is the performance question when the index is big

- Original Message - 
From: "Karsten Konrad" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, January 15, 2004 2:20 PM
Subject: AW: Copy Directory to Directory function ( backup)

Hi,

an elegant method is to create an empty directory and merge
the index to be copied into it, using .addDirectories() of
IndexWriter. This way, you do not have to deal with files
at all.

Regards,

Karsten

-----Ursprüngliche Nachricht-
Von: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 15. Januar 2004 13:28
An: [EMAIL PROTECTED]
Betreff: Copy Directory to Directory function ( backup)

hy ,
i would like backup a index.

1) my first idea  is to make a system copy of all the files
but in the FSDirectory class,  there is no public method to know where is
located the directory. A simple methode like
public File getDirectoryFile() {
return directory; would be great;
}
2) so i decide to create a copy(Directory source, Directory target) method
i seen the openFile() and createFile method but after i
but i don't know how use it (see my function  , this function make a
Exception )

private void copy (Directory source, Directory target) throws
IOException {
String[] files=source.list();
for(int i=0; i

Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve

hy ,
i would like backup a index.

1) my first idea  is to make a system copy of all the files
but in the FSDirectory class,  there is no public method to know where is located the 
directory. A simple methode like 
public File getDirectoryFile() {
return directory; would be great;
}
2) so i decide to create a copy(Directory source, Directory target) method 
i seen the openFile() and createFile method but after i 
but i don't know how use it (see my function  , this function make a Exception )

private void copy (Directory source, Directory target) throws IOException {
String[] files=source.list();
for(int i=0; i

create a getQuery in the Hits Class

2003-09-19 Thread Nicolas Maisonneuve

hy , 
 in the Hits class , we have a query proporty but no public method to get it.. 

it would great if you add this 
public final Query getQuery() {
return this.query;
}

StandardTokenizer problem

2003-09-04 Thread Nicolas Maisonneuve

hy ,
when i use standardTokenizer
for parse for example "I.B.M"
the type of the Token  is HOST and not ACRONYM

WHY ???

in StandardTokenizer.jj

 // acronyms: U.S.A., I.B.M., etc.
  // use a post-filter to remove dots
|  "." ( ".")+ >

  // hostname
|  ("." )+ >

"I.B.M" can be a host or acronym, so threre is a problem , no  ?

- Original Message - 
From: "petite_abeille" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, September 04, 2003 3:19 PM
Subject: Re: Lucene app to index Java code


> Hi Erik,
> 
> On Thursday, Sep 4, 2003, at 15:03 Europe/Zurich, Erik Hatcher wrote:
> 
> > - XDoclet could be used to sweep through Java code and build a 
> > text/XML file as richly as you'd like from the information there 
> > (complete with JavaDoc tags, which Zapata will miss :)),
> 
> Correct. This happen to be on purpose :) Does XDoclet build an 
> "intertwingled" object graph of your code along the way? Performing a 
> plain search on a code base is pretty trivial... what seems to be more 
> interesting would be to put that in context.
> 
> Zapata does something along the line of what MagicHat does for 
> Objective-C:
> 
> http://homepage.mac.com/petite_abeille/MagicHat/
> 
> But from the sound of what Otis is saying this is not what you guys are 
> looking for... back to the pampa then...
> 
> Cheers,
> 
> PA.
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Avalon IndexWriter

2003-08-29 Thread Nicolas Maisonneuve

hy, 
i would know if someone has written a avalon indexwriter 

thank in advance..

Re: Lucene Vs Ixiasoft

Re: dotLucene (port of Jakarta Lucene to C#)

Re: Filter for a search refinement

Re: Filter for a search refinement

hasFieldFilter contribution

new version of spell checker

Spell checker

Re: a search like Google

a search like Google

spans directory in the CVS version

featues page in the Lucene web site

Re: difference in javadoc and faq similarity expression

difference in javadoc and faq similarity expression

Re: theorical informations

theorical informations

IndexReader.document(int i)

RE: Copy Directory to Directory function ( backup)

Re: Betreff: Copy Directory to Directory function ( backup)

Fw: Betreff: Copy Directory to Directory function ( backup)

Re: Copy Directory to Directory function ( backup)

Copy Directory to Directory function ( backup)

create a getQuery in the Hits Class

StandardTokenizer problem

Avalon IndexWriter

24 matches

Site Navigation

Mail list logo

Footer information