Re: Lucene Vs Ixiasoft

2004-12-08 Thread Nicolas Maisonneuve
hi,
think first of the relevance of the model in this 2 search engine  for
XML document retrieval.

Lucene is classic fulltext search engine  using the vector space
model. this model is efficient for indexing  no structred document
(like plain text file ) and not made for structured document like XML.
there is a XML demo of lucene sandbox but it's not really very
efficient because it doesn't take advantage of  the document strucutre
in the indexing and the ranking model, so it lose semantic information
and relevance.

i don't know Ixiasoft, check the information to see how it index and
rank XML document.

nicolas 

On Wed, 8 Dec 2004 14:20:45 -0500, Praveen Peddi
[EMAIL PROTECTED] wrote:
 Does anyone know about Ixiasoft server. Its a xml repository/search engine. 
 If anyone knows about it, does he/she also know how it is compared to Lucene? 
 Which is fast?
 
 Praveen
 **
 Praveen Peddi
 Sr Software Engg, Context Media, Inc.
 email:[EMAIL PROTECTED]
 Tel:  401.854.3475
 Fax:  401.861.3596
 web: http://www.contextmedia.com
 **
 Context Media- The Leader in Enterprise Content Integration
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: dotLucene (port of Jakarta Lucene to C#)

2004-12-01 Thread Nicolas Maisonneuve
hy george
is the C# lucene faster than java lucene  ?  (because it seems to me
that  C# is faster than java, isn't it  ?)

nicolas maisonneuve



On Sun, 28 Nov 2004 21:08:30 -0500, George Aroush [EMAIL PROTECTED] wrote:
 Hi folks,
 
 I am please to announce the availability of dotLucene 1.4.0 RC1.  dotLucene
 is a complete port of Jakarta Lucene to C#.  The port is almost a
 line-by-line port and it includes the demos as well as all the JUnit tests.
 An index created by dotLucene is cross compatible with Jakarta Lucene and
 via verse.
 
 Please visit http://sourceforge.net/projects/dotlucene/ to learn more about
 dotLucene and to download the source code.
 
 Best regards,
 
 -- George Aroush
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Filter for a search refinement

2004-11-21 Thread Nicolas Maisonneuve
yes ...it's the same kind of feature... (i didn't see this Filter !,
shame on me)
but my method is maybe faster because with the queryFilter an internal
search is launched and not with my method

nicolas



On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
[EMAIL PROTECTED] wrote:
 Nicolas - how does your filter differ from the capabilities available
 from the built-in QueryFilter?  It seems at first glance to be nearly
 the same thing.
 
 Erik
 
 
 
 
 On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
 
  I developped a filter to seach in filtering the search with anterior
  hits (search refinement)
 
  see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
 
  Nicolas Maisonneuve
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Filter for a search refinement

2004-11-21 Thread Nicolas Maisonneuve
hmm just a question ..

- in the normal indexSearcher method  
there is a  if (score 0.0F || filter.get(doc)) { doc  in the hit}

- but in the queryFilter , there  isn't a minimum score condition 

normal or not ?

nicolas



On Sun, 21 Nov 2004 14:34:00 +0100, Nicolas Maisonneuve
[EMAIL PROTECTED] wrote:
 yes ...it's the same kind of feature... (i didn't see this Filter !,
 shame on me)
 but my method is maybe faster because with the queryFilter an internal
 search is launched and not with my method
 
 nicolas
 
 
 
 
 On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
 [EMAIL PROTECTED] wrote:
  Nicolas - how does your filter differ from the capabilities available
  from the built-in QueryFilter?  It seems at first glance to be nearly
  the same thing.
 
  Erik
 
 
 
 
  On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
 
   I developped a filter to seach in filtering the search with anterior
   hits (search refinement)
  
   see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
  
   Nicolas Maisonneuve
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



hasFieldFilter contribution

2004-11-03 Thread Nicolas Maisonneuve



I developeda 
Filter that restricts search results to documents that has terms in specific 
fields
(because currently we can't search with 
lucene documents with this kind of feature (a document with present/absent of 
values in specific fields)

nicolas 

package org.apache.lucene.search;


import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import java.util.BitSet;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.index.TermDocs;
import java.util.*;


/**
 * A Filter that restricts search results to documents that has terms in specific fields
 * (OR operator: the documents that has terms in field1 or in field2)
 * @author Nicolas Maisonneuve
 */
public class HasFieldFilter
extends Filter {

private Set fieldnames;


/**
 * a array of the  field's names
 * @param fieldname String[] a array of field's names
 */
public HasFieldFilter (String[] fieldnames) {
this.fieldnames=new HashSet();
for (int i=0; ifieldnames.length; i++) {
this.fieldnames.add(fieldnames[i].intern());
}
}


public BitSet bits (IndexReader reader) throws IOException {

final BitSet bits=new BitSet(reader.maxDoc());
TermEnum enumerator=reader.terms();
TermDocs termDocs=reader.termDocs();

Iterator iter=reader.getFieldNames().iterator();
try {
// for each field
while (iter.hasNext()) {
String field=(String) iter.next();

//if is not in the list of specific fields
if (!fieldnames.contains(field)) {
continue;
}
enumerator.skipTo(new Term(field, ));

if (enumerator.term()==null) {
continue;
}
// restrict doc to the field
while (enumerator.term().field()==field) {
termDocs.seek(enumerator.term());
while (termDocs.next()) {
bits.set(termDocs.doc());
}
if (!enumerator.next()) {
break;
}
}
}
}
finally {
enumerator.close();
termDocs.close();
}

return bits;
}
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

new version of spell checker

2004-10-21 Thread Nicolas Maisonneuve
UPDATE
- sort fixed (the sort was inversed!) 
- set gram dynamicaly (depending of the length of the word) 
- use the FuzzyQuery score: ((edit distance)/(length of word))
- new Dictionary interface + LuceneDictionary  and PlaintextDictionary implementation
- replace addWords method by indexDictionary(Dictionnary dic)
- add  a new public method: boolean exist(word) 
- add a build.xml

see the wiki page http://wiki.apache.org/jakarta-lucene/SpellChecker

1 - Could we put the spellchecker to the sandbox.. it'll be easier to maintain than 
use Bugzilla/wiki process ?

2 - Jonathan Hager: Could you test this version with our dictionary and said me the 
results ?

3 - I search a french dictonary , someone has a URL where i could download it ?

thanks to Jonathan Hager, and Aad Nales for your suggestions / observations ;-)

Nicolas Maisonneuve


Spell checker

2004-10-11 Thread Nicolas Maisonneuve
hy lucene users
i developed a Spell checker for lucene inspired by the David Spencer code

see the wiki doc: http://wiki.apache.org/jakarta-lucene/SpellChecker

Nicolas Maisonneuve

a search like Google

2004-02-12 Thread Nicolas Maisonneuve
hy, 
i have a index with the fields :
title 
author
content 

i would make the same search type than Google  ( a form with a textfiel). When the 
user search i love lucene (it's not a phrase query  but just the text in the 
textfield ), i would like search  in all the index fields but with a specific weight 
boost for each field. In this example title weight=2, author=1 content=1

the results would be (i suppose  the default operator is and) :  (title:i^2 author:i 
content:i) +(title:love^2 author:love content:love) +(title:lucene^2 author:lucene 
content:lucene)

but must i modify the QueryParser  or is there a different way for do this ?
( because i modified the QueryParser and it's work but if there is a cleaner way to do 
this , i take it ! )

nicolas maisonneuve





spans directory in the CVS version

2004-02-11 Thread Nicolas Maisonneuve
hy,
recently, there is a new subdirectory spans in the search directory. what is it  and 
how use it ?

thanks in advance
nicolas maisonneuve

Re: difference in javadoc and faq similarity expression

2004-01-19 Thread Nicolas Maisonneuve
but in the javadoc expression, there no the TFIDF weight for query , juste
for the document and the Cosine   use the both.. hmm  strange

i have a report to write about lucene and i don't know
what formula write in the paper and how explain it



- Original Message - 
From: Karl Koch [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Sunday, January 18, 2004 11:54 PM
Subject: Re: difference in javadoc and faq similarity expression


 I would rely on the JavaDoc since this one is up to date. The latest
version
 1.3 final is just a few weeks old. Some entries in the FAQ however are
still
 from 2001...

 Cheers,
 Karl

  hy,
  i have troubles in find the correspondance betwwen the javadoc and faq
  similarity expression
 
  in the Similarity Javadoc
 
  score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) *
  lengthNorm(t.field in d)  * coord(q,d) * queryNorm(q) ]
 
  in the FAQ
 
  score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
boost_t)
  *
  coord_q_d
 
  In FAQ | In Javadoc
  1 / norm_q = queryNorm(q)
  1 / norm_d_t=lengthNorm(t.field in d)
  coord_q_d=coord(q,d)
  boost_t=getBoost(t.field in d)
  idf_t=idf(t)
  tf_d=tf(t in d)
 
  but
  where is the javadoc expression for tf_q faq expression
 
  nicolas
 
  - Original Message - 
  From: Nicolas Maisonneuve [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Sunday, January 18, 2004 9:33 PM
  Subject: Re: theorical informations
 
 
   thanks Karl !
  
   - Original Message - 
   From: Karl Koch [EMAIL PROTECTED]
   To: Lucene Users List [EMAIL PROTECTED]
   Sent: Sunday, January 18, 2004 9:22 PM
   Subject: Re: theorical informations
  
  
Actually, finding an answer to this question is not really
important.
  More
important is if you can do what you want with it. If you result
comes
  from
   a
prob. model or a vector space model, who cares if you just want to
  give
  a
query and back a hit list of results?
   
Possibliy some people here will strongly disagree... ;-) (?)
   
Karl
   
 Hello Nicolas,

 I am sure you mean IR (Information Retrieval) Model. Lucene
  implements
  a
 Vector Space Model with integrated Boolean Model. This means the
  Boolean
 model
 is integrated with a Boolean query language but mapped into the
  Vector
 Space.
 Therefore you have ranking even though the traditional Boolean
model
   does
 not
 support this. Cosine similarity is used to measure similarity
  between
 documents and the query. You can find this in a very long
dicussion
  here
 when you
 search the archive...

 Karl

  hy ,
  i have 2  theorycal questions :
 
  i searched in the mailing list the R.I. model implemented in
  Lucene
  ,
  but no precise answer.
 
  1) What is the R.I model implemented in Lucene ? (ex: Boolean
  Model,
  Vector Model,Probabilist Model, etc... )
 
  2) What is the theory Similarity function  implemented in Lucene
  (Euclidian, Cosine, Jaccard, Dice)
 
  (why this important informations is not in the Lucene Web site
or
  in
   the

  faq ? )
 

 -- 
 +++ GMX - die erste Adresse für Mail, Message, More +++
 Bis 31.1.: TopMail + Digicam für nur 29 EUR
  http://www.gmx.net/topmail



  -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:
[EMAIL PROTECTED]

   
-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Bis 31.1.: TopMail + Digicam für nur 29 EUR
http://www.gmx.net/topmail
   
   
  
 -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
  
  
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 

 -- 
 +++ GMX - die erste Adresse für Mail, Message, More +++
 Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: theorical informations

2004-01-18 Thread Nicolas Maisonneuve
thanks Karl !

- Original Message - 
From: Karl Koch [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Sunday, January 18, 2004 9:22 PM
Subject: Re: theorical informations


 Actually, finding an answer to this question is not really important. More
 important is if you can do what you want with it. If you result comes from
a
 prob. model or a vector space model, who cares if you just want to give a
 query and back a hit list of results?

 Possibliy some people here will strongly disagree... ;-) (?)

 Karl

  Hello Nicolas,
 
  I am sure you mean IR (Information Retrieval) Model. Lucene implements a
  Vector Space Model with integrated Boolean Model. This means the Boolean
  model
  is integrated with a Boolean query language but mapped into the Vector
  Space.
  Therefore you have ranking even though the traditional Boolean model
does
  not
  support this. Cosine similarity is used to measure similarity between
  documents and the query. You can find this in a very long dicussion here
  when you
  search the archive...
 
  Karl
 
   hy ,
   i have 2  theorycal questions :
  
   i searched in the mailing list the R.I. model implemented in Lucene ,
   but no precise answer.
  
   1) What is the R.I model implemented in Lucene ? (ex: Boolean Model,
   Vector Model,Probabilist Model, etc... )
  
   2) What is the theory Similarity function  implemented in Lucene
   (Euclidian, Cosine, Jaccard, Dice)
  
   (why this important informations is not in the Lucene Web site or in
the
 
   faq ? )
  
 
  -- 
  +++ GMX - die erste Adresse für Mail, Message, More +++
  Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 

 -- 
 +++ GMX - die erste Adresse für Mail, Message, More +++
 Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



difference in javadoc and faq similarity expression

2004-01-18 Thread Nicolas Maisonneuve
hy,
i have troubles in find the correspondance betwwen the javadoc and faq
similarity expression

in the Similarity Javadoc

score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) *
lengthNorm(t.field in d)  * coord(q,d) * queryNorm(q) ]

in the FAQ

score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) *
coord_q_d

In FAQ | In Javadoc
1 / norm_q = queryNorm(q)
1 / norm_d_t=lengthNorm(t.field in d)
coord_q_d=coord(q,d)
boost_t=getBoost(t.field in d)
idf_t=idf(t)
tf_d=tf(t in d)

but
where is the javadoc expression for tf_q faq expression

nicolas

- Original Message - 
From: Nicolas Maisonneuve [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Sunday, January 18, 2004 9:33 PM
Subject: Re: theorical informations


 thanks Karl !

 - Original Message - 
 From: Karl Koch [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Sunday, January 18, 2004 9:22 PM
 Subject: Re: theorical informations


  Actually, finding an answer to this question is not really important.
More
  important is if you can do what you want with it. If you result comes
from
 a
  prob. model or a vector space model, who cares if you just want to give
a
  query and back a hit list of results?
 
  Possibliy some people here will strongly disagree... ;-) (?)
 
  Karl
 
   Hello Nicolas,
  
   I am sure you mean IR (Information Retrieval) Model. Lucene implements
a
   Vector Space Model with integrated Boolean Model. This means the
Boolean
   model
   is integrated with a Boolean query language but mapped into the Vector
   Space.
   Therefore you have ranking even though the traditional Boolean model
 does
   not
   support this. Cosine similarity is used to measure similarity between
   documents and the query. You can find this in a very long dicussion
here
   when you
   search the archive...
  
   Karl
  
hy ,
i have 2  theorycal questions :
   
i searched in the mailing list the R.I. model implemented in Lucene
,
but no precise answer.
   
1) What is the R.I model implemented in Lucene ? (ex: Boolean Model,
Vector Model,Probabilist Model, etc... )
   
2) What is the theory Similarity function  implemented in Lucene
(Euclidian, Cosine, Jaccard, Dice)
   
(why this important informations is not in the Lucene Web site or in
 the
  
faq ? )
   
  
   -- 
   +++ GMX - die erste Adresse für Mail, Message, More +++
   Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
 
  -- 
  +++ GMX - die erste Adresse für Mail, Message, More +++
  Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexReader.document(int i)

2004-01-17 Thread Nicolas Maisonneuve
hy,
i would like to know  
in the IndexReader.document(int i)
what is this number  i ? 
if the the first document is the oldest document indexed 
and the last the youngest ? (so we can sort by date  easyly) ?

thank in advance

nico 

Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve
hy ,
i would like backup a index.

1) my first idea  is to make a system copy of all the files
but in the FSDirectory class,  there is no public method to know where is located the 
directory. A simple methode like 
public File getDirectoryFile() {
return directory; would be great;
}
2) so i decide to create a copy(Directory source, Directory target) method 
i seen the openFile() and createFile method but after i 
but i don't know how use it (see my function  , this function make a Exception )

private void copy (Directory source, Directory target) throws IOException {
String[] files=source.list();
for(int i=0; ifiles.length; i++) {
InputStream in=source.openFile(files[i]);
OutputStream out=target.createFile(files[i]);
byte c;

while((c=in.readByte())!=-1) {
out.writeByte(c);
}
in.close();
out.close();
}

someone could help me please 
nico 


Re: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve
hmm, yes
but i don't want open a indexWriter for this
and there is the performance question when the index is big

- Original Message - 
From: Karsten Konrad [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, January 15, 2004 2:20 PM
Subject: AW: Copy Directory to Directory function ( backup)



Hi,

an elegant method is to create an empty directory and merge
the index to be copied into it, using .addDirectories() of
IndexWriter. This way, you do not have to deal with files
at all.

Regards,

Karsten

-Ursprüngliche Nachricht-
Von: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 15. Januar 2004 13:28
An: [EMAIL PROTECTED]
Betreff: Copy Directory to Directory function ( backup)


hy ,
i would like backup a index.

1) my first idea  is to make a system copy of all the files
but in the FSDirectory class,  there is no public method to know where is
located the directory. A simple methode like
public File getDirectoryFile() {
return directory; would be great;
}
2) so i decide to create a copy(Directory source, Directory target) method
i seen the openFile() and createFile method but after i
but i don't know how use it (see my function  , this function make a
Exception )

private void copy (Directory source, Directory target) throws
IOException {
String[] files=source.list();
for(int i=0; ifiles.length; i++) {
InputStream in=source.openFile(files[i]);
OutputStream out=target.createFile(files[i]);
byte c;

while((c=in.readByte())!=-1) {
out.writeByte(c);
}
in.close();
out.close();
}

someone could help me please
nico

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fw: Betreff: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve

- Original Message - 
From: Nick Smith [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, January 15, 2004 2:58 PM
Subject: Betreff: Copy Directory to Directory function ( backup)


 Hi Nico,
This is the method that I use for backing up my indices...

 Good Luck!

 Nick

   /**
* Copy contents of codedir/code, erasing current contents.
*
* This can be used to write a memory-based index to disk.
*
* @param dir a codeDirectory/code value
* @exception IOException if an error occurs
*/
   public void copyDir(Directory dir) throws IOException {
 // remove current contents of directory
 create();

 final String[] ar = dir.list();
 for (int i = 0; i  ar.length; i++)
 {
   // make place on disk
   OutputStream os = createFile(ar[i]);
   // read current file
   InputStream is = dir.openFile(ar[i]);

   final int MAX_CHUNK_SIZE = 131072;
   byte[] buf = new byte[MAX_CHUNK_SIZE];
   int remainder = (int)is.length();
   while (remainder  0) {
 int chunklen = (remainder  MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE :
remainde!
 is.readBytes(buf, 0, chunklen);
 os.writeBytes(buf, chunklen);
 remainder -= chunklen;
   }

   // graceful cleanup
   is.close();
   os.close();
 }
   }







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Betreff: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve
thanks ! the copy function works
but i have troubles..
I used a scheduled task to backup the index.
for the test , a backup is made all the 15 secondes.
and sometime , in the backup process,
when i clean a directory with :
Directory target=FSDirectory.getDirectory(selected_backup_dir, true);
i have a Exception :
java.io.IOException: couldn't delete segments
 at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
 at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:151)
 at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132)
 at
lab.crip5.ECR.cocoon.components.IndexBackupJob.backup(IndexBackupJob.java:13
5)

the exception happend sometimes

my backup function is simple :

  private void backup (String index_to_backup) throws Exception {
getLogger().info(begin backup index +index_to_backup+ at +new
Date()+...);

// get the directory of the index
Directory
source=index_manager.getIndex(index_to_backup).getDirectory();

// select target backup directory
File target_backup_dir=select_backup(index_to_backup);

// clean the old index
Directory target=FSDirectory.getDirectory(new_backup_dir, true);

// backup
copy(source, target);

target.close();

getLogger().info(end backup index +index_to_backup+ at +new
Date()+...ok);
}

- Original Message - 
From: Nicolas Maisonneuve [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, January 15, 2004 3:21 PM
Subject: Fw: Betreff: Copy Directory to Directory function ( backup)



 - Original Message - 
 From: Nick Smith [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, January 15, 2004 2:58 PM
 Subject: Betreff: Copy Directory to Directory function ( backup)


  Hi Nico,
 This is the method that I use for backing up my indices...
 
  Good Luck!
 
  Nick
 
/**
 * Copy contents of codedir/code, erasing current contents.
 *
 * This can be used to write a memory-based index to disk.
 *
 * @param dir a codeDirectory/code value
 * @exception IOException if an error occurs
 */
public void copyDir(Directory dir) throws IOException {
  // remove current contents of directory
  create();
 
  final String[] ar = dir.list();
  for (int i = 0; i  ar.length; i++)
  {
// make place on disk
OutputStream os = createFile(ar[i]);
// read current file
InputStream is = dir.openFile(ar[i]);
 
final int MAX_CHUNK_SIZE = 131072;
byte[] buf = new byte[MAX_CHUNK_SIZE];
int remainder = (int)is.length();
while (remainder  0) {
  int chunklen = (remainder  MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE :
 remainde!
  is.readBytes(buf, 0, chunklen);
  os.writeBytes(buf, chunklen);
  remainder -= chunklen;
}
 
// graceful cleanup
is.close();
os.close();
  }
}
 
 
 




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Copy Directory to Directory function ( backup)

2004-01-15 Thread Nicolas Maisonneuve

- Original Message - 
From: Nicolas Maisonneuve [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, January 15, 2004 3:58 PM
Subject: Re: Betreff: Copy Directory to Directory function ( backup)


 thanks ! the copy function works
 but i have troubles..
 I used a scheduled task to backup the index.
 for the test , a backup is made all the 15 secondes.
 and sometime , in the backup process,
 when i clean a directory with :
 Directory target=FSDirectory.getDirectory(selected_backup_dir, true);
 i have a Exception :
 java.io.IOException: couldn't delete segments
  at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
  at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:151)
  at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132)
  at

lab.crip5.ECR.cocoon.components.IndexBackupJob.backup(IndexBackupJob.java:13
 5)

 the exception happend sometimes

 my backup function is simple :

   private void backup (String index_to_backup) throws Exception {
 getLogger().info(begin backup index +index_to_backup+ at +new
 Date()+...);

 // get the directory of the index
 Directory
 source=index_manager.getIndex(index_to_backup).getDirectory();

 // select target backup directory
 File target_backup_dir=select_backup(index_to_backup);

 // clean the old index
 Directory target=FSDirectory.getDirectory(new_backup_dir, true);

 // backup
 copy(source, target);

 target.close();

 getLogger().info(end backup index +index_to_backup+ at +new
 Date()+...ok);
 }

 - Original Message - 
 From: Nicolas Maisonneuve [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, January 15, 2004 3:21 PM
 Subject: Fw: Betreff: Copy Directory to Directory function ( backup)


 
  - Original Message - 
  From: Nick Smith [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Thursday, January 15, 2004 2:58 PM
  Subject: Betreff: Copy Directory to Directory function ( backup)
 
 
   Hi Nico,
  This is the method that I use for backing up my indices...
  
   Good Luck!
  
   Nick
  
 /**
  * Copy contents of codedir/code, erasing current contents.
  *
  * This can be used to write a memory-based index to disk.
  *
  * @param dir a codeDirectory/code value
  * @exception IOException if an error occurs
  */
 public void copyDir(Directory dir) throws IOException {
   // remove current contents of directory
   create();
  
   final String[] ar = dir.list();
   for (int i = 0; i  ar.length; i++)
   {
 // make place on disk
 OutputStream os = createFile(ar[i]);
 // read current file
 InputStream is = dir.openFile(ar[i]);
  
 final int MAX_CHUNK_SIZE = 131072;
 byte[] buf = new byte[MAX_CHUNK_SIZE];
 int remainder = (int)is.length();
 while (remainder  0) {
   int chunklen = (remainder  MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE :
  remainde!
   is.readBytes(buf, 0, chunklen);
   os.writeBytes(buf, chunklen);
   remainder -= chunklen;
 }
  
 // graceful cleanup
 is.close();
 os.close();
   }
 }
  
  
  
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



create a getQuery in the Hits Class

2003-09-19 Thread Nicolas Maisonneuve
hy , 
 in the Hits class , we have a query proporty but no public method to get it.. 

it would great if you add this 
public final Query getQuery() {
return this.query;
}

StandardTokenizer problem

2003-09-04 Thread Nicolas Maisonneuve
hy ,
when i use standardTokenizer
for parse for example I.B.M
the type of the Token  is HOST and not ACRONYM

WHY ???

in StandardTokenizer.jj

 // acronyms: U.S.A., I.B.M., etc.
  // use a post-filter to remove dots
| ACRONYM: ALPHA . (ALPHA .)+ 

  // hostname
| HOST: ALPHANUM (. ALPHANUM)+ 

I.B.M can be a host or acronym, so threre is a problem , no  ?

- Original Message - 
From: petite_abeille [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, September 04, 2003 3:19 PM
Subject: Re: Lucene app to index Java code


 Hi Erik,
 
 On Thursday, Sep 4, 2003, at 15:03 Europe/Zurich, Erik Hatcher wrote:
 
  - XDoclet could be used to sweep through Java code and build a 
  text/XML file as richly as you'd like from the information there 
  (complete with JavaDoc tags, which Zapata will miss :)),
 
 Correct. This happen to be on purpose :) Does XDoclet build an 
 intertwingled object graph of your code along the way? Performing a 
 plain search on a code base is pretty trivial... what seems to be more 
 interesting would be to put that in context.
 
 Zapata does something along the line of what MagicHat does for 
 Objective-C:
 
 http://homepage.mac.com/petite_abeille/MagicHat/
 
 But from the sound of what Otis is saying this is not what you guys are 
 looking for... back to the pampa then...
 
 Cheers,
 
 PA.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Avalon IndexWriter

2003-08-29 Thread Nicolas Maisonneuve
hy, 
i would know if someone has written a avalon indexwriter 

thank in advance..