Lock timeout should show the index it failed on...
Just an RFE... if a lock times out we should probably throw the name of the FSDirectory (or if it's a RAMDirectory) ... I'm lazy so this is a reminder for either myself to do this or wait until one of you guys take care of it :) Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: code works with 1.3-rc1 but not with 1.3-final??
Or use IndexWriter.setUseCompundFile(true) to reduce the number of files created by Lucene. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#setUseCompoundFile(boolean) =Matt Kevin A. Burton wrote: Dan wrote: I have some code that creates a lucene index. It has been working fine with lucene-1.3-rc1.jar but I wanted to upgrade to lucene-1.3-final.jar. I did this and the indexer breaks. I get the following error when running the index with 1.3-final: Optimizing the index IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many open files) Indexed 884 files in 8 directories Index creation took 242 seconds % No... it's you... ;) Read the FAQ and then run ulimit -n 100 or so... You need to increase your file handles. Chance are you never noticed this before but the problem was still present. If you're on a Linux box you would be amazed to find out that you're only about 200 file handles away from running out of your per-user quota file quota. You might have to su as root to change this.. RedHat is more strict because it uses the glibc resource restrictions thingy. (who's name slips my mind at the moment). Debian is configured better here as per defaults. Also a google query would have solved this for you very quickly ;).. Kevin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: code works with 1.3-rc1 but not with 1.3-final??
Dan wrote: I have some code that creates a lucene index. It has been working fine with lucene-1.3-rc1.jar but I wanted to upgrade to lucene-1.3-final.jar. I did this and the indexer breaks. I get the following error when running the index with 1.3-final: Optimizing the index IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many open files) Indexed 884 files in 8 directories Index creation took 242 seconds % No... it's you... ;) Read the FAQ and then run ulimit -n 100 or so... You need to increase your file handles. Chance are you never noticed this before but the problem was still present. If you're on a Linux box you would be amazed to find out that you're only about 200 file handles away from running out of your per-user quota file quota. You might have to su as root to change this.. RedHat is more strict because it uses the glibc resource restrictions thingy. (who's name slips my mind at the moment). Debian is configured better here as per defaults. Also a google query would have solved this for you very quickly ;).. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
termPosition does not iterate properly in Lucene 1.3 rc1
Lucene does not iterate through the termPositions on one of my indexed data sources. It used to iterate properly through this data source, but not anymore. I tried on a different indexed data source and it iterates properly. The Lucene index directory does not have any lock files either. My code is as follows TermPositions termPos = reader.termPositions(aTerm); while (termPos.next()) { // get doc String docID = reader.document(termPos.doc()).get(keyName); ... } Is there anything wrong with that? Thanks for your help, Allen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
code works with 1.3-rc1 but not with 1.3-final??
I have some code that creates a lucene index. It has been working fine with lucene-1.3-rc1.jar but I wanted to upgrade to lucene-1.3-final.jar. I did this and the indexer breaks. I get the following error when running the index with 1.3-final: Optimizing the index IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many open files) Indexed 884 files in 8 directories Index creation took 242 seconds % So it appears the the code that uses 1.3-final breaks on the call to optimize(). Does anyone know what is wrong? Again, the ONLY change between the working version and the version that breaks on optimize is the jar file I use. lucene-1.3-rc1.jar works. lucene-1.3-final.jar doesnt. Wierd huh? I've tested this on both Unix (solaris) and on windows. In both cases, I'm using jdk 1.4.2_03. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Final Hits
Terry, I'm still quite curious how you plan to take advantage of a subclassable Hits. Are you going to create your own IndexSearcher with returns your subclass somehow? You could use a HitCollector (which is what is used under the covers of the Hits returning methods anyway) to emulate whatever it is you're trying to do, I suspect. As for 'final' Doug did a great thing by designing Lucene tight and controlled with private/package scoped access and final modifiers in lots of places. There is no technical issue with removing the final, but we would need to see a pretty compelling detailed reason to do so. Erik On Mar 22, 2004, at 7:56 AM, Terry Steichen wrote: Erik, There are a number of different possibilities which I'm still evaluating. But if there is some significant reason for *not* subclassing Hits (performance?), that will have a major bearing on whether the approach I'm evaluating makes sense. So, let me rephrase my question: Is the "final" nature of Hits due to some performance reason, or simply because no one has previously expressed any interest in subclassing it? Or, putting it in reverse, is there any technical problem likely to arise from removing the "final" attribute(s)? Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 22, 2004 7:06 AM Subject: Re: Final Hits How exactly would you take advantage of a subclassable Hits class? On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote: Does anyone know why the Hits class is final (thus preventing it from being subclassed)? Regards, Terry - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Final Hits
Erik, There are a number of different possibilities which I'm still evaluating. But if there is some significant reason for *not* subclassing Hits (performance?), that will have a major bearing on whether the approach I'm evaluating makes sense. So, let me rephrase my question: Is the "final" nature of Hits due to some performance reason, or simply because no one has previously expressed any interest in subclassing it? Or, putting it in reverse, is there any technical problem likely to arise from removing the "final" attribute(s)? Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 22, 2004 7:06 AM Subject: Re: Final Hits > How exactly would you take advantage of a subclassable Hits class? > > > On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote: > > > Does anyone know why the Hits class is final (thus preventing it from > > being subclassed)? > > > > Regards, > > > > Terry > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Specifation of the Key words to be searched
Re-directing to lucene-user list. One way of doing this is by writing a custom Analyzer that throws away words you don't want to index (see an example of custom Analyzer in jGuru FAQ). Another way would be to just re-use the existing Analyzers and add words you don't want indexed to the Analyzer's stop list. Otis --- jitender ahuja <[EMAIL PROTECTED]> wrote: > Sir, >I am implementing lucene for a database as part of my masters' > project. I desire to reduce the index directory size by specifying > the key words to be indexed for the "Text" field specified as Reader > type. This Key words' specification, if possible, will further reduce > the Index directory size, but am unable to figure out how to do the > same. > Kindly specify the means to achieve the same. > > Regards, > Jitender - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Demoting results
On Fri, 2004-03-19 at 11:58, Doug Cutting wrote: > Doug Cutting wrote: > >> On Thu, 2004-03-18 at 13:32, Doug Cutting wrote: > >> > >>> Have you tried assigning these very small boosts (0 < boost < 1) and > >>> assigning other query clauses relatively large boosts (boost > 1)? > > > > I don't think you understood my proposal. You should try boosting the > > documents when you add them. Instead of adding a "doctype" field with > > "good" and "bad" values, use Document.setBoost(0.01) at index time. > > Sorry. My mistake. You did understand my proposal, it was just a bad > proposal. Boosting documents is a better approach, but is less > flexible. I think the final proposal in my previous message might be > the best approach (defining a custom coordination function for these > query clauses). Thanks for the ideas - I love the flexibility of Lucene that there are so many ways to accomplish what at first seemed so difficult. Boris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing japanese PDF documents
Yes he did, but I was away the past couple days. As this is more of a PDFBox issue I responded in the PDFBox forums, please follow the thread there if you are interested. Ben On Mon, 22 Mar 2004, Otis Gospodnetic wrote: > I have not tried these other tools yet. > Have you asked Ben Litchfield, the PDFBox author, about handling of > Japanese text? > > Otis > > --- Chandan Tamrakar <[EMAIL PROTECTED]> wrote: > > I am using latest PDFbox library for parsing . I can parse a english > > documents successfully but when I parse a document containing english > > and > > japanese I do not get as I expected . > > > > Have anyone tried using PDFBox library for parsing a japanese > > documents ? Or > > do i need to use other parser like xPDF ,Jpedal ? > > > > Thanks in advace > > Chandan > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing japanese PDF documents
I have not tried these other tools yet. Have you asked Ben Litchfield, the PDFBox author, about handling of Japanese text? Otis --- Chandan Tamrakar <[EMAIL PROTECTED]> wrote: > I am using latest PDFbox library for parsing . I can parse a english > documents successfully but when I parse a document containing english > and > japanese I do not get as I expected . > > Have anyone tried using PDFBox library for parsing a japanese > documents ? Or > do i need to use other parser like xPDF ,Jpedal ? > > Thanks in advace > Chandan > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Final Hits
How exactly would you take advantage of a subclassable Hits class? On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote: Does anyone know why the Hits class is final (thus preventing it from being subclassed)? Regards, Terry - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: CJK Analyzer indexing japanese word document
hi scott, Tnks for ur advise now i am using POI to convert word documents and made sure that i convert into unicode before I put into lucene for indexing . and working perfectly fine. Which parser is best for parsing PDF documents i tried pdfbox but seems it doesnt work well with japanese characters any suggestion ? thnks - Original Message - From: "Scott Smith" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 17, 2004 4:27 AM Subject: RE: CJK Analyzer indexing japanese word document > I have used this analyzer with Japanese and it works fine. In fact, I'm > currently doing English, several western European languages, traditional > and simplified Chinese and Japanese. I throw them all in the same index > and have had no problem other than my users wanted the search limited by > language. I solved that problem by simply adding a keyword field to the > Document which has the 2-letter language code. I then automatically add > the term indicating the language as an additional constraint when the > user specifies the search. > > You do need to be sure that the Shift-JIS gets converted to unicode > before you put it in the Document (and pass it to the analyzer). > Internally, I believe lucene wants everything in unicode (as any good > java program would). Originally, I had problems with Asian languages and > eventually determined my xml parser wasn't translating my Shift-JIS, > Big5, etc. to unicode. Once I fixed that, life was good. > > -Original Message- > From: Che Dong [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 16, 2004 8:31 AM > To: Lucene Users List > Subject: Re: CJK Analyzer indexing japanese word document > > some Korean friends tell me they use it successfully for Korean. So I > think its also work for Japanese. mostly the problem is locale settings > > Please check weblucene project for xml indexing samples: > http://sourceforge.net/projects/weblucene/ > > Che Dong > - Original Message - > From: "Chandan Tamrakar" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, March 16, 2004 4:31 PM > Subject: CJK Analyzer indexing japanese word document > > > > > > I am using a CJKAnalyzer from apache sandbox , I have set the java > > file.encoding setting to SJIS > > and i am able to index and search the japanese html page . I can see > the > > index dumps as i expected , However when i index a word document > containing > > japanese characters it is not indexing as expected . Do I need to > change > > anything with CJKTokenizer and CJKAnalyzer classes? > > I have been able to index a word document with StandardAnalyzers. > > > > thanks in advace > > chandan > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexing japanese PDF documents
I am using latest PDFbox library for parsing . I can parse a english documents successfully but when I parse a document containing english and japanese I do not get as I expected . Have anyone tried using PDFBox library for parsing a japanese documents ? Or do i need to use other parser like xPDF ,Jpedal ? Thanks in advace Chandan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SpanXXQuery Usage
Otis, Can you give me/us a rough idea of what these are supposed to do? It's hard to extrapolate the terse unit test code into much of a general notion. I searched the archives with little success. Regards, Terry - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 22, 2004 2:46 AM Subject: Re: SpanXXQuery Usage > Only in unit tests, so far. > > Otis > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > Is there any documentation (other than that in the source) on how to > > use the new SpanxxQuery features? Specifically: SpanNearQuery, > > SpanNotQuery, SpanFirstQuery and SpanOrQuery? > > > > Regards, > > > > Terry > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]