Re: Lucene Vs Ixiasoft
hi, think first of the relevance of the model in this 2 search engine for XML document retrieval. Lucene is classic fulltext search engine using the vector space model. this model is efficient for indexing no structred document (like plain text file ) and not made for structured document like XML. there is a XML demo of lucene sandbox but it's not really very efficient because it doesn't take advantage of the document strucutre in the indexing and the ranking model, so it lose semantic information and relevance. i don't know Ixiasoft, check the information to see how it index and rank XML document. nicolas On Wed, 8 Dec 2004 14:20:45 -0500, Praveen Peddi <[EMAIL PROTECTED]> wrote: > Does anyone know about Ixiasoft server. Its a xml repository/search engine. > If anyone knows about it, does he/she also know how it is compared to Lucene? > Which is fast? > > Praveen > ** > Praveen Peddi > Sr Software Engg, Context Media, Inc. > email:[EMAIL PROTECTED] > Tel: 401.854.3475 > Fax: 401.861.3596 > web: http://www.contextmedia.com > ** > Context Media- "The Leader in Enterprise Content Integration" > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: dotLucene (port of Jakarta Lucene to C#)
hy george is the C# lucene faster than java lucene ? (because it seems to me that C# is faster than java, isn't it ?) nicolas maisonneuve On Sun, 28 Nov 2004 21:08:30 -0500, George Aroush <[EMAIL PROTECTED]> wrote: > Hi folks, > > I am please to announce the availability of dotLucene 1.4.0 RC1. dotLucene > is a complete port of Jakarta Lucene to C#. The port is almost a > line-by-line port and it includes the demos as well as all the JUnit tests. > An index created by dotLucene is cross compatible with Jakarta Lucene and > via verse. > > Please visit http://sourceforge.net/projects/dotlucene/ to learn more about > dotLucene and to download the source code. > > Best regards, > > -- George Aroush > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Filter for a search refinement
hmm just a question .. - in the normal indexSearcher method there is a if (score >0.0F || filter.get(doc)) { doc in the hit} - but in the queryFilter , there isn't a minimum score condition normal or not ? nicolas On Sun, 21 Nov 2004 14:34:00 +0100, Nicolas Maisonneuve <[EMAIL PROTECTED]> wrote: > yes ...it's the same kind of feature... (i didn't see this Filter !, > shame on me) > but my method is maybe faster because with the queryFilter an internal > search is launched and not with my method > > nicolas > > > > > On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher > <[EMAIL PROTECTED]> wrote: > > Nicolas - how does your filter differ from the capabilities available > > from the built-in QueryFilter? It seems at first glance to be nearly > > the same thing. > > > > Erik > > > > > > > > > > On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote: > > > > > I developped a filter to seach in filtering the search with anterior > > > hits (search refinement) > > > > > > see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334 > > > > > > Nicolas Maisonneuve > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Filter for a search refinement
yes ...it's the same kind of feature... (i didn't see this Filter !, shame on me) but my method is maybe faster because with the queryFilter an internal search is launched and not with my method nicolas On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Nicolas - how does your filter differ from the capabilities available > from the built-in QueryFilter? It seems at first glance to be nearly > the same thing. > > Erik > > > > > On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote: > > > I developped a filter to seach in filtering the search with anterior > > hits (search refinement) > > > > see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334 > > > > Nicolas Maisonneuve > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
hasFieldFilter contribution
I developed a Filter that restricts search results to documents that has terms in specific fields (because currently we can't search with lucene documents with this kind of feature (a document with present/absent of values in specific fields) nicolas package org.apache.lucene.search; import java.io.IOException; import org.apache.lucene.index.IndexReader; import java.util.BitSet; import org.apache.lucene.index.Term; import org.apache.lucene.index.TermEnum; import org.apache.lucene.index.TermDocs; import java.util.*; /** * A Filter that restricts search results to documents that has terms in specific fields * (OR operator: the documents that has terms in field1 or in field2) * @author Nicolas Maisonneuve */ public class HasFieldFilter extends Filter { private Set fieldnames; /** * a array of the field's names * @param fieldname String[] a array of field's names */ public HasFieldFilter (String[] fieldnames) { this.fieldnames=new HashSet(); for (int i=0; i- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
new version of spell checker
UPDATE - sort fixed (the sort was inversed!) - set gram dynamicaly (depending of the length of the word) - use the FuzzyQuery score: ((edit distance)/(length of word)) - new Dictionary interface + LuceneDictionary and PlaintextDictionary implementation - replace addWords method by indexDictionary(Dictionnary dic) - add a new public method: boolean exist(word) - add a build.xml see the wiki page http://wiki.apache.org/jakarta-lucene/SpellChecker 1 - Could we put the spellchecker to the sandbox.. it'll be easier to maintain than use Bugzilla/wiki process ? 2 - Jonathan Hager: Could you test this version with our dictionary and said me the results ? 3 - I search a french dictonary , someone has a URL where i could download it ? thanks to Jonathan Hager, and Aad Nales for your suggestions / observations ;-) Nicolas Maisonneuve
Spell checker
hy lucene users i developed a Spell checker for lucene inspired by the David Spencer code see the wiki doc: http://wiki.apache.org/jakarta-lucene/SpellChecker Nicolas Maisonneuve
Re: a search like Google
hy, >This will give you (+title:i +title:love +title:lucene)^2 (+author:i +author:love +author:lucene) \>(+content:i +content:love +content:lucene) this is not the same thing than (title:i^2 author:i content:i) +(title:love^2 author:love content:love) +(title:lucene^2 author:lucene content:lucene) because in the first we must have all the terms in a field and in the second just one term is necessary the david Spencer is good but we can use the lucene syntax query like phrase query, prefix, boolean, etc.. so to use all the lucene syntax , we have to hack the parser see my fulltextparser code .. i made a parser package org.apache.lucene.queryParser; /** * Title: * Description: * Copyright: Copyright (c) 2003 * Company: * @author Maisonneuve Nicolas * @version 1.0 */ import java.io.IOException; import java.io.StringReader; import java.util.Vector; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.CharStream; import org.apache.lucene.queryParser.FastCharStream; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParserConstants; import org.apache.lucene.queryParser.QueryParserTokenManager; import org.apache.lucene.queryParser.Token; import org.apache.lucene.queryParser.TokenMgrError; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FuzzyQuery; import org.apache.lucene.search.PhraseQuery; import org.apache.lucene.search.PrefixQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.WildcardQuery; public class fulltextParser implements QueryParserConstants { private static final int CONJ_NONE=0; private static final int CONJ_AND=1; private static final int CONJ_OR=2; private static final int MOD_NONE=0; private static final int MOD_NOT=10; private static final int MOD_REQ=11; public static final int DEFAULT_OPERATOR_OR=0; public static final int DEFAULT_OPERATOR_AND=1; /** The actual operator that parser uses to combine query terms */ private int operator=DEFAULT_OPERATOR_AND; /** * Whether terms of wildcard and prefix queries are to be automatically * lower-cased or not. Default is true. */ boolean lowercaseWildcardTerms=true; Analyzer analyzer; String field; String[] fields; Float[] boosts; int phraseSlop=0; /** Parses a query string, returning a [EMAIL PROTECTED] org.apache.lucene.search.Query}. * @param query the query string to be parsed. * @param fields the default field for query terms. * @param analyzer used to find terms in the query text. * @throws ParseException if the parsing fails */ static public Query parse (String query, String fields[], Analyzer analyzer) throws ParseException { try { fulltextParser parser=new fulltextParser(fields, analyzer); return parser.parse(query); } catch(TokenMgrError tme) { throw new ParseException(tme.getMessage()); } } /** Parses a query string, returning a [EMAIL PROTECTED] org.apache.lucene.search.Query}. * @param query the query string to be parsed. * @param fields the default field for query terms. * @param boost the boost of each field in the fields parameter * @param analyzer used to find terms in the query text. * @throws ParseException if the parsing fails */ static public Query parse (String query, String fields[], Float boost[], Analyzer analyzer) throws ParseException { try { fulltextParser parser=new fulltextParser(fields, boost, analyzer); return parser.parse(query); } catch(TokenMgrError tme) { throw new ParseException(tme.getMessage()); } } /** Constructs a query parser. * @param field the default field for query terms. * @param analyzer used to find terms in the query text. */ public fulltextParser (String[] fields, Analyzer a) { this(fields, null, a); } public fulltextParser (String[] fields, Float boosts[], Analyzer a) { this(new FastCharStream(new StringReader(""))); analyzer=a; this.fields=fields; this.boosts=boosts; field=fields[0]; } /** Parses a query string, returning a * Query. * @param query the query string to be parsed. * @throws ParseException if the parsing fails * @throws TokenMgrError if ther parsing fails */ public Query parse (String query) throws ParseException, TokenMgrError { ReInit(new FastCharStream(new StringReader(query))); return Query(field); } /** * Sets the default slop for phrases. If zero, then exact phras
a search like Google
hy, i have a index with the fields : title author content i would make the same search type than Google ( a form with a textfiel). When the user search "i love lucene" (it's not a phrase query but just the text in the textfield ), i would like search in all the index fields but with a specific weight boost for each field. In this example title weight=2, author=1 content=1 the results would be (i suppose the default operator is "and") : (title:i^2 author:i content:i) +(title:love^2 author:love content:love) +(title:lucene^2 author:lucene content:lucene) but must i modify the QueryParser or is there a different way for do this ? ( because i modified the QueryParser and it's work but if there is a cleaner way to do this , i take it ! ) nicolas maisonneuve
spans directory in the CVS version
hy, recently, there is a new subdirectory "spans" in the search directory. what is it and how use it ? thanks in advance nicolas maisonneuve
featues page in the Lucene web site
hy, it would be great if a page with all features of lucene would be created in the apache lucene site ! in the sourceforge website (http://lucene.sourceforge.net/features.html) ,there is this page..but is it updated ? thanks in advance nicolas maisonneuve
Re: difference in javadoc and faq similarity expression
but in the javadoc expression, there no the TFIDF weight for query , juste for the document and the Cosine use the both.. hmm strange i have a report to write about lucene and i don't know what formula write in the paper and how explain it - Original Message - From: "Karl Koch" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Sunday, January 18, 2004 11:54 PM Subject: Re: difference in javadoc and faq similarity expression > I would rely on the JavaDoc since this one is up to date. The latest version > 1.3 final is just a few weeks old. Some entries in the FAQ however are still > from 2001... > > Cheers, > Karl > > > hy, > > i have troubles in find the correspondance betwwen the javadoc and faq > > similarity expression > > > > in the Similarity Javadoc > > > > score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) * > > lengthNorm(t.field in d) * coord(q,d) * queryNorm(q) ] > > > > in the FAQ > > > > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) > > * > > coord_q_d > > > > In FAQ | In Javadoc > > 1 / norm_q = queryNorm(q) > > 1 / norm_d_t=lengthNorm(t.field in d) > > coord_q_d=coord(q,d) > > boost_t=getBoost(t.field in d) > > idf_t=idf(t) > > tf_d=tf(t in d) > > > > but > > where is the javadoc expression for "tf_q" faq expression > > > > nicolas > > > > - Original Message - > > From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Sunday, January 18, 2004 9:33 PM > > Subject: Re: theorical informations > > > > > > > thanks Karl ! > > > > > > - Original Message - > > > From: "Karl Koch" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > > Sent: Sunday, January 18, 2004 9:22 PM > > > Subject: Re: theorical informations > > > > > > > > > > Actually, finding an answer to this question is not really important. > > More > > > > important is if you can do what you want with it. If you result comes > > from > > > a > > > > prob. model or a vector space model, who cares if you just want to > > give > > a > > > > query and back a hit list of results? > > > > > > > > Possibliy some people here will strongly disagree... ;-) (?) > > > > > > > > Karl > > > > > > > > > Hello Nicolas, > > > > > > > > > > I am sure you mean IR (Information Retrieval) Model. Lucene > > implements > > a > > > > > Vector Space Model with integrated Boolean Model. This means the > > Boolean > > > > > model > > > > > is integrated with a Boolean query language but mapped into the > > Vector > > > > > Space. > > > > > Therefore you have ranking even though the traditional Boolean model > > > does > > > > > not > > > > > support this. Cosine similarity is used to measure similarity > > between > > > > > documents and the query. You can find this in a very long dicussion > > here > > > > > when you > > > > > search the archive... > > > > > > > > > > Karl > > > > > > > > > > > hy , > > > > > > i have 2 theorycal questions : > > > > > > > > > > > > i searched in the mailing list the R.I. model implemented in > > Lucene > > , > > > > > > but no precise answer. > > > > > > > > > > > > 1) What is the R.I model implemented in Lucene ? (ex: Boolean > > Model, > > > > > > Vector Model,Probabilist Model, etc... ) > > > > > > > > > > > > 2) What is the theory Similarity function implemented in Lucene > > > > > > (Euclidian, Cosine, Jaccard, Dice) > > > > > > > > > > > > (why this important informations is not in the Lucene Web site or > > in > > > the > > > > > > > > > > > faq ? ) > > > > > > > > > > > > > > > > -- > > > > > +++ GMX - die erste Adresse für Mail, Message, More +++ > > > > > Bis 31.1.: TopMail + Digicam für nur 29 EUR > > http://www.gmx.net/topmail > > >
difference in javadoc and faq similarity expression
hy, i have troubles in find the correspondance betwwen the javadoc and faq similarity expression in the Similarity Javadoc score(q,d) =Sum [tf(t in d) * idf(t) * getBoost(t.field in d) * lengthNorm(t.field in d) * coord(q,d) * queryNorm(q) ] in the FAQ score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t) * coord_q_d In FAQ | In Javadoc 1 / norm_q = queryNorm(q) 1 / norm_d_t=lengthNorm(t.field in d) coord_q_d=coord(q,d) boost_t=getBoost(t.field in d) idf_t=idf(t) tf_d=tf(t in d) but where is the javadoc expression for "tf_q" faq expression nicolas - Original Message - From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Sunday, January 18, 2004 9:33 PM Subject: Re: theorical informations > thanks Karl ! > > - Original Message - > From: "Karl Koch" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Sunday, January 18, 2004 9:22 PM > Subject: Re: theorical informations > > > > Actually, finding an answer to this question is not really important. More > > important is if you can do what you want with it. If you result comes from > a > > prob. model or a vector space model, who cares if you just want to give a > > query and back a hit list of results? > > > > Possibliy some people here will strongly disagree... ;-) (?) > > > > Karl > > > > > Hello Nicolas, > > > > > > I am sure you mean IR (Information Retrieval) Model. Lucene implements a > > > Vector Space Model with integrated Boolean Model. This means the Boolean > > > model > > > is integrated with a Boolean query language but mapped into the Vector > > > Space. > > > Therefore you have ranking even though the traditional Boolean model > does > > > not > > > support this. Cosine similarity is used to measure similarity between > > > documents and the query. You can find this in a very long dicussion here > > > when you > > > search the archive... > > > > > > Karl > > > > > > > hy , > > > > i have 2 theorycal questions : > > > > > > > > i searched in the mailing list the R.I. model implemented in Lucene , > > > > but no precise answer. > > > > > > > > 1) What is the R.I model implemented in Lucene ? (ex: Boolean Model, > > > > Vector Model,Probabilist Model, etc... ) > > > > > > > > 2) What is the theory Similarity function implemented in Lucene > > > > (Euclidian, Cosine, Jaccard, Dice) > > > > > > > > (why this important informations is not in the Lucene Web site or in > the > > > > > > > faq ? ) > > > > > > > > > > -- > > > +++ GMX - die erste Adresse für Mail, Message, More +++ > > > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > > +++ GMX - die erste Adresse für Mail, Message, More +++ > > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: theorical informations
thanks Karl ! - Original Message - From: "Karl Koch" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Sunday, January 18, 2004 9:22 PM Subject: Re: theorical informations > Actually, finding an answer to this question is not really important. More > important is if you can do what you want with it. If you result comes from a > prob. model or a vector space model, who cares if you just want to give a > query and back a hit list of results? > > Possibliy some people here will strongly disagree... ;-) (?) > > Karl > > > Hello Nicolas, > > > > I am sure you mean IR (Information Retrieval) Model. Lucene implements a > > Vector Space Model with integrated Boolean Model. This means the Boolean > > model > > is integrated with a Boolean query language but mapped into the Vector > > Space. > > Therefore you have ranking even though the traditional Boolean model does > > not > > support this. Cosine similarity is used to measure similarity between > > documents and the query. You can find this in a very long dicussion here > > when you > > search the archive... > > > > Karl > > > > > hy , > > > i have 2 theorycal questions : > > > > > > i searched in the mailing list the R.I. model implemented in Lucene , > > > but no precise answer. > > > > > > 1) What is the R.I model implemented in Lucene ? (ex: Boolean Model, > > > Vector Model,Probabilist Model, etc... ) > > > > > > 2) What is the theory Similarity function implemented in Lucene > > > (Euclidian, Cosine, Jaccard, Dice) > > > > > > (why this important informations is not in the Lucene Web site or in the > > > > > faq ? ) > > > > > > > -- > > +++ GMX - die erste Adresse für Mail, Message, More +++ > > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > -- > +++ GMX - die erste Adresse für Mail, Message, More +++ > Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
theorical informations
hy , i have 2 theorycal questions : i searched in the mailing list the R.I. model implemented in Lucene , but no precise answer. 1) What is the R.I model implemented in Lucene ? (ex: Boolean Model, Vector Model,Probabilist Model, etc... ) 2) What is the theory Similarity function implemented in Lucene (Euclidian, Cosine, Jaccard, Dice) (why this important informations is not in the Lucene Web site or in the faq ? )
IndexReader.document(int i)
hy, i would like to know in the IndexReader.document(int i) what is this number i ? if the the first document is the oldest document indexed and the last the youngest ? (so we can sort by date easyly) ? thank in advance nico
RE: Copy Directory to Directory function ( backup)
- Original Message - From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, January 15, 2004 3:58 PM Subject: Re: Betreff: Copy Directory to Directory function ( backup) > thanks ! the copy function works > but i have troubles.. > I used a scheduled task to backup the index. > for the test , a backup is made all the 15 secondes. > and sometime , in the backup process, > when i clean a directory with : > Directory target=FSDirectory.getDirectory(selected_backup_dir, true); > i have a Exception : > java.io.IOException: couldn't delete segments > at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) > at org.apache.lucene.store.FSDirectory.(FSDirectory.java:151) > at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132) > at > lab.crip5.ECR.cocoon.components.IndexBackupJob.backup(IndexBackupJob.java:13 > 5) > > the exception happend sometimes > > my backup function is simple : > > private void backup (String index_to_backup) throws Exception { > getLogger().info("begin backup index "+index_to_backup+" at "+new > Date()+"..."); > > // get the directory of the index > Directory > source=index_manager.getIndex(index_to_backup).getDirectory(); > > // select target backup directory > File target_backup_dir=select_backup(index_to_backup); > > // clean the old index > Directory target=FSDirectory.getDirectory(new_backup_dir, true); > > // backup > copy(source, target); > > target.close(); > > getLogger().info("end backup index "+index_to_backup+" at "+new > Date()+"...ok"); > } > > - Original Message - > From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Thursday, January 15, 2004 3:21 PM > Subject: Fw: Betreff: Copy Directory to Directory function ( backup) > > > > > > - Original Message - > > From: "Nick Smith" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Thursday, January 15, 2004 2:58 PM > > Subject: Betreff: Copy Directory to Directory function ( backup) > > > > > > > Hi Nico, > > >This is the method that I use for backing up my indices... > > > > > > Good Luck! > > > > > > Nick > > > > > > /** > > >* Copy contents of dir, erasing current contents. > > >* > > >* This can be used to write a memory-based index to disk. > > >* > > >* @param dir a Directory value > > >* @exception IOException if an error occurs > > >*/ > > > public void copyDir(Directory dir) throws IOException { > > > // remove current contents of directory > > > create(); > > > > > > final String[] ar = dir.list(); > > > for (int i = 0; i < ar.length; i++) > > > { > > > // make place on disk > > > OutputStream os = createFile(ar[i]); > > > // read current file > > > InputStream is = dir.openFile(ar[i]); > > > > > > final int MAX_CHUNK_SIZE = 131072; > > > byte[] buf = new byte[MAX_CHUNK_SIZE]; > > > int remainder = (int)is.length(); > > > while (remainder > 0) { > > > int chunklen = (remainder > MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE : > > remainde! > > > is.readBytes(buf, 0, chunklen); > > > os.writeBytes(buf, chunklen); > > > remainder -= chunklen; > > > } > > > > > > // graceful cleanup > > > is.close(); > > > os.close(); > > > } > > > } > > > > > > > > > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Betreff: Copy Directory to Directory function ( backup)
thanks ! the copy function works but i have troubles.. I used a scheduled task to backup the index. for the test , a backup is made all the 15 secondes. and sometime , in the backup process, when i clean a directory with : Directory target=FSDirectory.getDirectory(selected_backup_dir, true); i have a Exception : java.io.IOException: couldn't delete segments at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166) at org.apache.lucene.store.FSDirectory.(FSDirectory.java:151) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132) at lab.crip5.ECR.cocoon.components.IndexBackupJob.backup(IndexBackupJob.java:13 5) the exception happend sometimes my backup function is simple : private void backup (String index_to_backup) throws Exception { getLogger().info("begin backup index "+index_to_backup+" at "+new Date()+"..."); // get the directory of the index Directory source=index_manager.getIndex(index_to_backup).getDirectory(); // select target backup directory File target_backup_dir=select_backup(index_to_backup); // clean the old index Directory target=FSDirectory.getDirectory(new_backup_dir, true); // backup copy(source, target); target.close(); getLogger().info("end backup index "+index_to_backup+" at "+new Date()+"...ok"); } - Original Message - From: "Nicolas Maisonneuve" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, January 15, 2004 3:21 PM Subject: Fw: Betreff: Copy Directory to Directory function ( backup) > > - Original Message - > From: "Nick Smith" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Thursday, January 15, 2004 2:58 PM > Subject: Betreff: Copy Directory to Directory function ( backup) > > > > Hi Nico, > >This is the method that I use for backing up my indices... > > > > Good Luck! > > > > Nick > > > > /** > >* Copy contents of dir, erasing current contents. > >* > >* This can be used to write a memory-based index to disk. > >* > >* @param dir a Directory value > >* @exception IOException if an error occurs > >*/ > > public void copyDir(Directory dir) throws IOException { > > // remove current contents of directory > > create(); > > > > final String[] ar = dir.list(); > > for (int i = 0; i < ar.length; i++) > > { > > // make place on disk > > OutputStream os = createFile(ar[i]); > > // read current file > > InputStream is = dir.openFile(ar[i]); > > > > final int MAX_CHUNK_SIZE = 131072; > > byte[] buf = new byte[MAX_CHUNK_SIZE]; > > int remainder = (int)is.length(); > > while (remainder > 0) { > > int chunklen = (remainder > MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE : > remainde! > > is.readBytes(buf, 0, chunklen); > > os.writeBytes(buf, chunklen); > > remainder -= chunklen; > > } > > > > // graceful cleanup > > is.close(); > > os.close(); > > } > > } > > > > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Fw: Betreff: Copy Directory to Directory function ( backup)
- Original Message - From: "Nick Smith" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, January 15, 2004 2:58 PM Subject: Betreff: Copy Directory to Directory function ( backup) > Hi Nico, >This is the method that I use for backing up my indices... > > Good Luck! > > Nick > > /** >* Copy contents of dir, erasing current contents. >* >* This can be used to write a memory-based index to disk. >* >* @param dir a Directory value >* @exception IOException if an error occurs >*/ > public void copyDir(Directory dir) throws IOException { > // remove current contents of directory > create(); > > final String[] ar = dir.list(); > for (int i = 0; i < ar.length; i++) > { > // make place on disk > OutputStream os = createFile(ar[i]); > // read current file > InputStream is = dir.openFile(ar[i]); > > final int MAX_CHUNK_SIZE = 131072; > byte[] buf = new byte[MAX_CHUNK_SIZE]; > int remainder = (int)is.length(); > while (remainder > 0) { > int chunklen = (remainder > MAX_CHUNK_SIZE ? MAX_CHUNK_SIZE : remainde! > is.readBytes(buf, 0, chunklen); > os.writeBytes(buf, chunklen); > remainder -= chunklen; > } > > // graceful cleanup > is.close(); > os.close(); > } > } > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Copy Directory to Directory function ( backup)
hmm, yes but i don't want open a indexWriter for this and there is the performance question when the index is big - Original Message - From: "Karsten Konrad" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, January 15, 2004 2:20 PM Subject: AW: Copy Directory to Directory function ( backup) Hi, an elegant method is to create an empty directory and merge the index to be copied into it, using .addDirectories() of IndexWriter. This way, you do not have to deal with files at all. Regards, Karsten -----Ursprüngliche Nachricht- Von: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 15. Januar 2004 13:28 An: [EMAIL PROTECTED] Betreff: Copy Directory to Directory function ( backup) hy , i would like backup a index. 1) my first idea is to make a system copy of all the files but in the FSDirectory class, there is no public method to know where is located the directory. A simple methode like public File getDirectoryFile() { return directory; would be great; } 2) so i decide to create a copy(Directory source, Directory target) method i seen the openFile() and createFile method but after i but i don't know how use it (see my function , this function make a Exception ) private void copy (Directory source, Directory target) throws IOException { String[] files=source.list(); for(int i=0; i
Copy Directory to Directory function ( backup)
hy , i would like backup a index. 1) my first idea is to make a system copy of all the files but in the FSDirectory class, there is no public method to know where is located the directory. A simple methode like public File getDirectoryFile() { return directory; would be great; } 2) so i decide to create a copy(Directory source, Directory target) method i seen the openFile() and createFile method but after i but i don't know how use it (see my function , this function make a Exception ) private void copy (Directory source, Directory target) throws IOException { String[] files=source.list(); for(int i=0; i
create a getQuery in the Hits Class
hy , in the Hits class , we have a query proporty but no public method to get it.. it would great if you add this public final Query getQuery() { return this.query; }
StandardTokenizer problem
hy , when i use standardTokenizer for parse for example "I.B.M" the type of the Token is HOST and not ACRONYM WHY ??? in StandardTokenizer.jj // acronyms: U.S.A., I.B.M., etc. // use a post-filter to remove dots | "." ( ".")+ > // hostname | ("." )+ > "I.B.M" can be a host or acronym, so threre is a problem , no ? - Original Message - From: "petite_abeille" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, September 04, 2003 3:19 PM Subject: Re: Lucene app to index Java code > Hi Erik, > > On Thursday, Sep 4, 2003, at 15:03 Europe/Zurich, Erik Hatcher wrote: > > > - XDoclet could be used to sweep through Java code and build a > > text/XML file as richly as you'd like from the information there > > (complete with JavaDoc tags, which Zapata will miss :)), > > Correct. This happen to be on purpose :) Does XDoclet build an > "intertwingled" object graph of your code along the way? Performing a > plain search on a code base is pretty trivial... what seems to be more > interesting would be to put that in context. > > Zapata does something along the line of what MagicHat does for > Objective-C: > > http://homepage.mac.com/petite_abeille/MagicHat/ > > But from the sound of what Otis is saying this is not what you guys are > looking for... back to the pampa then... > > Cheers, > > PA. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Avalon IndexWriter
hy, i would know if someone has written a avalon indexwriter thank in advance..