date:20040322

Lock timeout should show the index it failed on...

2004-03-22 Thread Kevin A. Burton

Just an RFE... if a lock times out we should probably throw the name of 
the FSDirectory (or if it's a RAMDirectory) ...

I'm lazy so this is a reminder for either myself to do this or wait 
until one of you guys take care of it :)

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



signature.asc
Description: OpenPGP digital signature

Re: code works with 1.3-rc1 but not with 1.3-final??

2004-03-22 Thread Matt Quail

Or use IndexWriter.setUseCompundFile(true) to reduce the number of files
created by Lucene.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#setUseCompoundFile(boolean)

=Matt

Kevin A. Burton wrote:

Dan wrote:

I have some code that creates a lucene index. It has been working fine 
with lucene-1.3-rc1.jar but I wanted to upgrade to 
lucene-1.3-final.jar. I did this and the indexer breaks. I get the 
following error when running the index with 1.3-final:

Optimizing the index
IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many 
open files)
Indexed 884 files in 8 directories
Index creation took 242 seconds
%

No... it's you... ;)

Read the FAQ and then run

ulimit -n 100 or so...

You need to increase your file handles.  Chance are you never noticed 
this before but the problem was still present.  If you're on a Linux box 
you would be amazed to find out that you're only about 200 file handles 
away from running out of your per-user quota file quota.

You might have to su as root to change this.. RedHat is more strict 
because it uses the glibc resource restrictions thingy. (who's name 
slips my mind at the moment).
Debian is configured better here as per defaults.

Also a google query would have solved this for you very quickly ;)..

Kevin





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: code works with 1.3-rc1 but not with 1.3-final??

2004-03-22 Thread Kevin A. Burton

Dan wrote:

I have some code that creates a lucene index. It has been working fine 
with lucene-1.3-rc1.jar but I wanted to upgrade to 
lucene-1.3-final.jar. I did this and the indexer breaks. I get the 
following error when running the index with 1.3-final:

Optimizing the index
IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many 
open files)
Indexed 884 files in 8 directories
Index creation took 242 seconds
%

No... it's you... ;)

Read the FAQ and then run

ulimit -n 100 or so...

You need to increase your file handles.  Chance are you never noticed 
this before but the problem was still present.  If you're on a Linux box 
you would be amazed to find out that you're only about 200 file handles 
away from running out of your per-user quota file quota.

You might have to su as root to change this.. RedHat is more strict 
because it uses the glibc resource restrictions thingy. (who's name 
slips my mind at the moment). 

Debian is configured better here as per defaults.

Also a google query would have solved this for you very quickly ;)..

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



signature.asc
Description: OpenPGP digital signature

termPosition does not iterate properly in Lucene 1.3 rc1

2004-03-22 Thread Allen Atamer

Lucene does not iterate through the termPositions on one of my indexed data
sources. It used to iterate properly through this data source, but not
anymore. I tried on a different indexed data source and it iterates
properly. The Lucene index directory does not have any lock files either.

My code is as follows

TermPositions termPos = reader.termPositions(aTerm);
while (termPos.next()) {
// get doc
String docID = reader.document(termPos.doc()).get(keyName);
...
}

Is there anything wrong with that? Thanks for your help,

Allen


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

code works with 1.3-rc1 but not with 1.3-final??

2004-03-22 Thread Dan

I have some code that creates a lucene index. It has been working fine 
with lucene-1.3-rc1.jar but I wanted to upgrade to lucene-1.3-final.jar. 
I did this and the indexer breaks. I get the following error when 
running the index with 1.3-final:

Optimizing the index
IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many open 
files)
Indexed 884 files in 8 directories
Index creation took 242 seconds
%

So it appears the the code that uses 1.3-final breaks on the call to 
optimize(). Does anyone know what is wrong?

Again, the ONLY change between the working version and the version that 
breaks on optimize is the jar file I use. lucene-1.3-rc1.jar works. 
lucene-1.3-final.jar doesnt. Wierd huh?

I've tested this on both Unix (solaris) and on windows. In both cases, 
I'm using jdk 1.4.2_03.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Final Hits

2004-03-22 Thread Erik Hatcher

Terry,

I'm still quite curious how you plan to take advantage of a 
subclassable Hits.  Are you going to create your own IndexSearcher with 
returns your subclass somehow?

You could use a HitCollector (which is what is used under the covers of 
the Hits returning methods anyway) to emulate whatever it is you're 
trying to do, I suspect.

As for 'final' Doug did a great thing by designing Lucene tight and 
controlled with private/package scoped access and final modifiers in 
lots of places.  There is no technical issue with removing the final, 
but we would need to see a pretty compelling detailed reason to do so.

	Erik

On Mar 22, 2004, at 7:56 AM, Terry Steichen wrote:

Erik,

There are a number of different possibilities which I'm still 
evaluating.
But if there is some significant reason for *not* subclassing Hits
(performance?), that will have a major bearing on whether the approach 
I'm
evaluating makes sense.

So, let me rephrase my question: Is the "final" nature of Hits due to 
some
performance reason, or simply because no one has previously expressed 
any
interest in subclassing it?  Or, putting it in reverse, is there any
technical problem likely to arise from removing the "final" 
attribute(s)?

Regards,

Terry

- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 22, 2004 7:06 AM
Subject: Re: Final Hits

How exactly would you take advantage of a subclassable Hits class?

On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote:

Does anyone know why the Hits class is final (thus preventing it from
being subclassed)?
Regards,

Terry


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Final Hits

2004-03-22 Thread Terry Steichen

Erik,

There are a number of different possibilities which I'm still evaluating.
But if there is some significant reason for *not* subclassing Hits
(performance?), that will have a major bearing on whether the approach I'm
evaluating makes sense.

So, let me rephrase my question: Is the "final" nature of Hits due to some
performance reason, or simply because no one has previously expressed any
interest in subclassing it?  Or, putting it in reverse, is there any
technical problem likely to arise from removing the "final" attribute(s)?

Regards,

Terry

- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 22, 2004 7:06 AM
Subject: Re: Final Hits

> How exactly would you take advantage of a subclassable Hits class?
>
>
> On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote:
>
> > Does anyone know why the Hits class is final (thus preventing it from
> > being subclassed)?
> >
> > Regards,
> >
> > Terry
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Specifation of the Key words to be searched

2004-03-22 Thread Otis Gospodnetic

Re-directing to lucene-user list.

One way of doing this is by writing a custom Analyzer that throws away
words you don't want to index (see an example of custom Analyzer in
jGuru FAQ).  Another way would be to just re-use the existing Analyzers
and add words you don't want indexed to the Analyzer's stop list.

Otis

--- jitender ahuja <[EMAIL PROTECTED]> wrote:
> Sir,
>I am implementing lucene for a database as part of my masters'
> project. I desire to reduce the index directory size by specifying
> the key words to be indexed for the "Text" field specified as Reader
> type. This Key words' specification, if possible, will further reduce
> the Index directory size, but am unable to figure out how to do the
> same. 
> Kindly specify the means to achieve the same.
> 
> Regards,
> Jitender

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Demoting results

2004-03-22 Thread Boris Goldowsky

On Fri, 2004-03-19 at 11:58, Doug Cutting wrote:
> Doug Cutting wrote:
> >> On Thu, 2004-03-18 at 13:32, Doug Cutting wrote:
> >>
> >>> Have you tried assigning these very small boosts (0 < boost < 1) and 
> >>> assigning other query clauses relatively large boosts (boost > 1)?
> > 
> > I don't think you understood my proposal.  You should try boosting the 
> > documents when you add them.  Instead of adding a "doctype" field with 
> > "good" and "bad" values, use Document.setBoost(0.01) at index time.
> 
> Sorry.  My mistake.  You did understand my proposal, it was just a bad 
> proposal.  Boosting documents is a better approach, but is less 
> flexible.  I think the final proposal in my previous message might be 
> the best approach (defining a custom coordination function for these 
> query clauses).

Thanks for the ideas - I love the flexibility of Lucene that there are
so many ways to accomplish what at first seemed so difficult.

Boris



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing japanese PDF documents

2004-03-22 Thread Ben Litchfield


Yes he did, but I was away the past couple days.  As this is more of a
PDFBox issue I responded in the PDFBox forums, please follow the thread
there if you are interested.

Ben



On Mon, 22 Mar 2004, Otis Gospodnetic wrote:

> I have not tried these other tools yet.
> Have you asked Ben Litchfield, the PDFBox author, about handling of
> Japanese text?
>
> Otis
>
> --- Chandan Tamrakar <[EMAIL PROTECTED]> wrote:
> > I am using latest PDFbox library for parsing . I can parse a english
> > documents successfully but when I parse a document containing english
> > and
> > japanese I do not get as I expected .
> >
> > Have anyone tried using PDFBox library for parsing a japanese
> > documents ? Or
> > do i need to use other parser like xPDF ,Jpedal ?
> >
> > Thanks in advace
> > Chandan
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing japanese PDF documents

2004-03-22 Thread Otis Gospodnetic

I have not tried these other tools yet.
Have you asked Ben Litchfield, the PDFBox author, about handling of
Japanese text?

Otis

--- Chandan Tamrakar <[EMAIL PROTECTED]> wrote:
> I am using latest PDFbox library for parsing . I can parse a english
> documents successfully but when I parse a document containing english
> and
> japanese I do not get as I expected .
> 
> Have anyone tried using PDFBox library for parsing a japanese
> documents ? Or
> do i need to use other parser like xPDF ,Jpedal ?
> 
> Thanks in advace
> Chandan
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Final Hits

2004-03-22 Thread Erik Hatcher

How exactly would you take advantage of a subclassable Hits class?

On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote:

Does anyone know why the Hits class is final (thus preventing it from 
being subclassed)?

Regards,

Terry


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: CJK Analyzer indexing japanese word document

2004-03-22 Thread Chandan Tamrakar

hi scott,
 Tnks for ur advise now i am using POI to convert word documents and made
sure that i convert into unicode before I put into lucene for indexing .
and working perfectly fine. Which parser is best for parsing PDF documents i
tried pdfbox but seems it doesnt work well with japanese characters
any suggestion ?

thnks
- Original Message -
From: "Scott Smith" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, March 17, 2004 4:27 AM
Subject: RE: CJK Analyzer indexing japanese word document


> I have used this analyzer with Japanese and it works fine.  In fact, I'm
> currently doing English, several western European languages, traditional
> and simplified Chinese and Japanese.  I throw them all in the same index
> and have had no problem other than my users wanted the search limited by
> language.  I solved that problem by simply adding a keyword field to the
> Document which has the 2-letter language code.  I then automatically add
> the term indicating the language as an additional constraint when the
> user specifies the search.
>
> You do need to be sure that the Shift-JIS gets converted to unicode
> before you put it in the Document (and pass it to the analyzer).
> Internally, I believe lucene wants everything in unicode (as any good
> java program would). Originally, I had problems with Asian languages and
> eventually determined my xml parser wasn't translating my Shift-JIS,
> Big5, etc. to unicode.  Once I fixed that, life was good.
>
> -Original Message-
> From: Che Dong [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 16, 2004 8:31 AM
> To: Lucene Users List
> Subject: Re: CJK Analyzer indexing japanese word document
>
> some Korean friends tell me they use it successfully for Korean. So I
> think its also work for Japanese. mostly the problem is locale settings
>
> Please check weblucene project for xml indexing samples:
> http://sourceforge.net/projects/weblucene/
>
> Che Dong
> - Original Message -
> From: "Chandan Tamrakar" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, March 16, 2004 4:31 PM
> Subject: CJK Analyzer indexing japanese word document
>
>
> >
> > I am using a CJKAnalyzer from apache sandbox , I have set the java
> > file.encoding setting to SJIS
> > and  i am able to index and search the japanese html page . I can see
> the
> > index dumps as i expected , However when i index a word document
> containing
> > japanese characters it is not indexing as expected . Do I need to
> change
> > anything with CJKTokenizer and CJKAnalyzer classes?
> > I have been able to index a word document with StandardAnalyzers.
> >
> > thanks in advace
> > chandan
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing japanese PDF documents

2004-03-22 Thread Chandan Tamrakar

I am using latest PDFbox library for parsing . I can parse a english
documents successfully but when I parse a document containing english and
japanese I do not get as I expected .

Have anyone tried using PDFBox library for parsing a japanese documents ? Or
do i need to use other parser like xPDF ,Jpedal ?

Thanks in advace
Chandan



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanXXQuery Usage

2004-03-22 Thread Terry Steichen

Otis,

Can you give me/us a rough idea of what these are supposed to do?  It's hard
to extrapolate the terse unit test code into much of a general notion.  I
searched the archives with little success.

Regards,

Terry

- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 22, 2004 2:46 AM
Subject: Re: SpanXXQuery Usage


> Only in unit tests, so far.
>
> Otis
>
> --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > Is there any documentation (other than that in the source) on how to
> > use the new SpanxxQuery features?  Specifically: SpanNearQuery,
> > SpanNotQuery, SpanFirstQuery and SpanOrQuery?
> >
> > Regards,
> >
> > Terry
> >
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lock timeout should show the index it failed on...

Re: code works with 1.3-rc1 but not with 1.3-final??

Re: code works with 1.3-rc1 but not with 1.3-final??

termPosition does not iterate properly in Lucene 1.3 rc1

code works with 1.3-rc1 but not with 1.3-final??

Re: Final Hits

Re: Final Hits

Re: Specifation of the Key words to be searched

Re: Demoting results

Re: Indexing japanese PDF documents

Re: Indexing japanese PDF documents

Re: Final Hits

Re: CJK Analyzer indexing japanese word document

Indexing japanese PDF documents

Re: SpanXXQuery Usage

15 matches

Site Navigation

Mail list logo

Footer information