Hi Guy's
Apologies .
I am NOT Using sorting code
hits = multiSearcher.search(query, new Sort(new SortField(filename,
SortField.STRING)));
but using multiSearcher.search(query)
in Core Files setup and still getting the Error.
More Advises Required..
Karthik
Exception too many files open means:
- searcher object is nor closed after query execution
- too little file handlers
Regards
J.
Karthik N S
On Nov 10, 2004, at 2:17 AM, [EMAIL PROTECTED] wrote:
Otis or Erik, do you know if a Reader continously opening should cause
the
Writer to fail with a Lock obtain timed out error?
No need to address individuals here.
With the information provided, I have no idea what the issue may be.
There
Hi Guy's
Apologies.
That's Why Somebody on the form asked me to Switch to
: 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher /
ParallelSearcher + Search on Content Field Only for 2
the problem of to many Files open was solved since now there were only 40
On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:
Hi
Guys
Apologies..
No need to apologize for asking questions.
History
Ist type : 4 subindexes + MultiSearcher + Search on Content
Field
You've got 40,000 indexes aggregated under a MultiSearcher and you're
wondering why you're
hi all
I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am
attaching following is the mail from Doug
It sounds like the ThreadLocal in TermInfosReader is not getting
correctly garbage collected when the TermInfosReader is collected.
Researching a bit, this was a bug in
Hi!
I've left out custom stopwords from my index using the
StopAnalyzer(customstopwords).
Now, when I try to searh the index the same way
(StopAnalyzer(customstopwords)), it seems to act
strange:
This query works as expected:
validword AND stopword
(throws out the stopword part and searches
Thanks Justin, it works fine
- Original Message -
From: Justin Swanhart [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, November 09, 2004 7:41 PM
Subject: Re: Searching in keyword field ?
You can add the category keyword multiple times to a document.
On Wednesday 10 November 2004 10:46, Sanyi wrote:
This query seems to crash:
stopword AND validword
(java.lang.ArrayIndexOutOfBoundsException: -1)
I think this has been fixed in the development version (which will become
Lucene 1.9).
Regards
Daniel
--
http://www.danielnaber.de
Sanyi writes:
This query works as expected:
validword AND stopword
(throws out the stopword part and searches for validword)
This query seems to crash:
stopword AND validword
(java.lang.ArrayIndexOutOfBoundsException: -1)
Maybe it can't handle the case if it had to remove the very
Thanx for your replies guys.
Now, I was trying to locate the latest patch for this problem group, and the
last thread I've
read about this is:
http://issues.apache.org/bugzilla/show_bug.cgi?id=25820
It ends with an open question from Morus:
If you want me to change the patch, let me know. That
Hi Guy's
Apologies..
Yes Erik
The Day I switched from Lucene1.3.1 to Lucene1.4.1 We are using the
CompoundFile format to
writer.setUseCompoundFile(true);
Some More Advises Please.
Thx in advance
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Hi
Rupinder Singh Mazara
Apologies
Can u Past the code on to the Mail instead of Attachement...
[ Cause I am not bale to get the Attachement on the Company's mail ]
Thx in advance
Karthik
-Original Message-
From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED]
karthik
i think the core problem in your case is the use of compound files, i would
be best to switch it off
or alternatively issue a optimize as soon as the indexing is over.
i am copying the file contents between file tags, the patch is to be
applied on TermInfosReader.java, this
was done
Sanyi writes:
Thanx for your replies guys.
Now, I was trying to locate the latest patch for this problem group, and
the last thread I've
read about this is:
http://issues.apache.org/bugzilla/show_bug.cgi?id=25820
It ends with an open question from Morus:
If you want me to change the
But the fix seems to be included in 1.4.2.
see
http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.96.2.4
item 5
Thank you! I'm just downloading 1.4.2.
I hope it'll work ;)
Sanyi
__
Do you Yahoo!?
Check out the
On Monday 08 November 2004 11:30, Joachim Arrasz wrote:
So now we are looking for search and index Filters for Lucene, that
were able to integrate out OpenOffice Files also into search result.
I don't know of any existing solutions, but it's not so difficult to write
one: Extract the ZIP file
Hi Daniel,
I don't know of any existing solutions, but it's not so difficult to write
one: Extract the ZIP file using Java's built-in ZIP classes and parse
content.xml and meta.xml. I'm not sure if whitespace issues might become
tricky, e.g. two paragraphs could be in the file as
pone/pptwo/p,
Redirecting to lucene-user, which is more appropriate.
I'm not sure what exactly the question is here, but:
Parse your XML document and for each p element you encounter create a
new Document instance, and then populate its fields with some data,
like the URI data you mentioned.
If you parse with
On Wednesday 10 November 2004 15:18, Joachim Arrasz wrote:
Why should i parse
meta.xml? I thaught content.xml should be enough.
It contains the file's title, keywords, and author etc (those are not in
content.xml).
Regards
Daniel
--
http://www.danielnaber.de
I need to index Word, Excel and Power Point files.
Is this the place to start?
http://jakarta.apache.org/poi/
Is there something better?
Thanks,
Luke
That's one place to start. The other one would be textmining.org, at
least for Word files.
I used both POI and Textmining API in Lucene in Action, and the latter
was much simpler to use. You can also find some comments about both
libs in lucene-user archives. People tend to like Textmining API
Thanks Otis. I am looking forward to this book. Any idea when it may be
released?
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 11:54 AM
Subject: Re: Indexing MS Files
That's one place to start.
Whats's the simplest way to merge 2 or more indexes into one large
index.
Thanks in advance,
Ravi.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No need to address individuals here.
Sorry about that. I just respect
the knowledge that you and Otis have about Lucene so that's why I was asking
you specifically.
With the information provided, I have no idea what
the issue may be.
Running the small sample file that is attached to the
As Manning publications said, you should be able to get it for your
grandma this Christmas.
Otis
--- Luke Shannon [EMAIL PROTECTED] wrote:
Thanks Otis. I am looking forward to this book. Any idea when it may
be
released?
- Original Message -
From: Otis Gospodnetic [EMAIL
Use IndexWriter's addIndexes(Directory[]) call.
Otis
--- Ravi [EMAIL PROTECTED] wrote:
Whats's the simplest way to merge 2 or more indexes into one large
index.
Thanks in advance,
Ravi.
-
To unsubscribe,
I used OpenOffice API to convert all Word and Excel version.
For me it's the solution for complex Word and Excel document.
http://api.openoffice.org/
Good luck !
// UNO API
import com.sun.star.bridge.XUnoUrlResolver;
import com.sun.star.uno.XComponentContext;
import com.sun.star.uno.UnoRuntime;
Thanks. Grandmas around the world will certainly be surprised this
Christmas.
- Original Message -
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:18 PM
Subject: Re: Indexing MS Files
As Manning publications said,
I am working on debugging an existing Lucene implementation.
Before I started, I built a demo to understand Lucene. In my demo I indexed
the entire content hierarhcy all at once, and than optimize this index and
used it for queries. It was time consuming but very simply.
The code I am currently
This looks great. Thank you Thierry!
- Original Message -
From: Thierry Ferrero [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, November 10, 2004 12:23 PM
Subject: Re: Indexing MS Files
I used OpenOffice API to convert all Word and Excel version.
For me
Uh, I hate to market it, but it's in the book. But you don't have
to wait for it, as there already is a Lucene demo that does what you
described. I am not sure if the demo always recreates the index or
whether it deletes and re-adds only the new and modified files, but if
it's the former,
Don't worry, regardless of what I learn in this forum I am telling my
company to get me a copy of that bad boy when it comes out (which as far as
I am concerned can't be soon enough). I will pay for grama's myself.
I think I have reviewed the code you are referring to and have something
similar
Hi,
With the information provided, I have no
idea what the issue
may be.
Is there some information that I should post that will help determine
why Lucene is giving me this error?
Thanks.
--- Lucene Users List [EMAIL PROTECTED]
wrote:
On Nov 10, 2004, at 2:17 AM, [EMAIL
On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED] wrote:
Hi,
With the information provided, I have no
idea what the issue
may be.
Is there some information that I should post that will help determine
why Lucene is giving me this error?
You mentioned posting code - though I don't recall getting an
I have an application that I run monthly that indexes 40 million documents into
6 indexes, then uses a multisearcher. The advantage for me is that I can have
multiple writers indexing 1/6 of that total data reducing the time it takes to
index by about 5X.
-Original Message-
From: Luke
Whoops! Looks like my attachment didn't make it through. I'm
re-attaching my simple test app.
Thanks.
--- Erik Hatcher [EMAIL PROTECTED] wrote:
On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED]
wrote:
Hi,
With the information provided, I have no
idea what the issue
may be.
Is
I added it to Bugzilla like you suggested:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32171
Let me know if you see any way to get around this issue.
--- Lucene
Users List [EMAIL PROTECTED] wrote:
Whoops! Looks like my
attachment didn't make it through. I'm
re-attaching my simple
I just ran the code you provided. On my puny PowerBook (Mac OS X
10.3.5) it dies in much less than 5 minutes.
I do not know what the issue is, but certainly the actions the program
is taking are atypical. Opening and closing an IndexWriter repeatedly
is certainly expensive on large indexes.
I just added a Thread.sleep(1000) in the writer thread and it has run
for quite some time, and is still running as I send this.
Erik
On Nov 10, 2004, at 8:02 PM, [EMAIL PROTECTED] wrote:
I added it to Bugzilla like you suggested:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32171
Hello,
Our program accepts input in the form of Lucene query syntax from the user,
but we wish to perform additional tasks such as thesaurus expansion. So I
want to manipulate the Query object that results from parsing.
My question is, is the result of the Query#rewrite
Hi Otis,
Please let me know what HEAD version of Lucene is?
Actually, I'm consider the advantages of storing document using Lucene Stored
field - For my Search engine.
I've tested with thousands of documents and see that retrieve document (in this
case XML file) with Lucene is a little bit
We have one large index for a document repository of 800,000 documents.
The size of the index is 800MB. When we do searches against the index,
it takes 300-500ms for a single search. We wanted to test the
scalability and tried 100 parallel searches against the index with the
same query and the
Hello,
100 parallel searches going against a single index on a single disk
means a lot of disk seeks all happening at once. One simple way of
working around this is to load your FSDirectory into RAMDirectory.
This should be faster (could you report your
observations/comparisons?). You can also
Hello,
HEAD version means that you should check out Lucene straight out of
CVS. How to work with CVS is another story, probably described
somewhere on jakarta.apache.org site.
Otis
--- Nhan Nguyen Dang [EMAIL PROTECTED] wrote:
Hi Otis,
Please let me know what HEAD version of Lucene is?
Yes, I tried that too and it worked. The issue is that our
Operations folks plan to install this on a pretty busy box and I
was hoping that Lucene wouldn't cause issues if it only had a
small slice of the CPU.
Guess I'll tell them to buy a bigger box! Unless you have any
other ideas. I'm
Does it take 800MB of RAM to load that index into a
RAMDirectory? Or are only some of the files loaded into RAM?
--- Otis Gospodnetic [EMAIL PROTECTED] wrote:
Hello,
100 parallel searches going against a single index on a single
disk
means a lot of disk seeks all happening at once. One
Hi!
First of all, I've read about BooleanQuery$TooManyClauses, so I know that it
has a 1024 Clauses
limit by default which is good enough for me, but I still think it works
strange.
Example:
I have an index with about 20Million documents.
Let's say that there is about 3000 variants in the
48 matches
Mail list logo