On Apr 26, 2004, at 5:16 PM, Norton, James wrote:
Thanks for the reply. I had reached the same conclusion as you
regarding the analyzer for
queries (no multiple tokens per position), but I would still reqard
the behaviour of
QueryParser as incorrect.
I agree that it is odd, but given that
There is a Berkeley DB implementation of Lucene's Directory in the
jakarta-lucene-sandbox repository.
Erik
On Apr 26, 2004, at 8:35 PM, Yukun Song wrote:
As known, currently Lucene uses flat file to store information for
indexing.
Any people has idea or resources for combining database (Like
As lucene implements its own concept of document it is not dedicated to index a
particular type of data source.
It's up to you to write a tool that is able to browse your database and then submit
the data as Lucene documents to the Lucene indexer.
For example if your database contains a
Tuan Jean Tee wrote:
Have anyone implemented any open source web crawler with Lucene? I have
a dynamic website and are looking at putting in a search tools. Your
advice is very much appreciated.
there is a crawler included within Apache Lenya
http://cocoon.apache.org/lenya/
my XML files contain something like
date
year2004/yearmonth04/monthday27/day...
/date
and I would like to sort by this date.
So I guess I need to modify the Documentparser and generate something like
a millisecond field and then sort by this, correct?
Has anyone done something like this yet?
Hi
I wondered if anyone knows whether it is possible to search ONLY the 100 (or
whatever) most recently added documents to a lucene index? I know that once
I have all my results ordered by ID number in Hits I could then just display
the required amount, but I wondered if there is a way to
Here's my two cents on this:
Both ways you will need to combine the date in one field, but if you use a
millisecond representation you will not be able to use the FLOAT sort type
and you'll have use STRING sort (Slower) because the millisecond
representation is longer than FLOAT allows, so you
You may be able to jimmy the bi filter to produce the most recent 100, but
really keeping your fetch count at 100 and ordering by DOC should be
sufficient.
-Original Message-
From: Alan Smith [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 4:03 PM
To: [EMAIL PROTECTED]
Subject:
If you know the id of the last document in the index.
(I don't know what's the best way to get it)
you could probably use a range query.
something like find all docs with the id in [lastId-100 TO lastID].
maybe you should make sure that the first limit is non negative, though.
just a thought
Nader S. Henein wrote:
Here's my two cents on this:
Both ways you will need to combine the date in one field, but if you use a
millisecond representation you will not be able to use the FLOAT sort type
and you'll have use STRING sort (Slower) because the millisecond
representation is longer than
Are the DOC ids sequential? Or just unique and ascending, I'm thinking like
a good little Oracle boy, so does anyone know?
-Original Message-
From: Ioan Miftode [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 4:55 PM
To: Lucene Users List
Subject: Re: searching only part of an
I think that if you include the indexing timestamp in the Document you
create when indexing, you could sort on this and only pick the first 100.
Regards,
Terry
- Original Message -
From: Alan Smith [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 8:02 AM
Subject:
On Apr 27, 2004, at 9:00 AM, Nader S. Henein wrote:
Are the DOC ids sequential? Or just unique and ascending, I'm thinking
like
a good little Oracle boy, so does anyone know?
They are unique and ascending.
Gaps in id's exist when documents are removed, and then the id's are
squeezed back to
So if Alan wants to limit it to the first 100 he can't really use a range
search unless he can guarantee that the index is optimized after deletes,
but then if his deletion rounds are anything like mine ( every 2 mins) then
optimizing it at each delete will make searching the index really slow.
On Apr 27, 2004, at 9:49 AM, Nader S. Henein wrote:
So if Alan wants to limit it to the first 100 he can't really use a
range
search unless he can guarantee that the index is optimized after
deletes,
but then if his deletion rounds are anything like mine ( every 2 mins)
then
optimizing it at
Hello,
I am using Lucene 1.3 and I ran into the following exception:
java.lang.IndexOutOfBoundsException: More than 32 required/prohibited
clauses in query.
at org.apache.lucene.search.BooleanScorer.add(BooleanScorer.java:98)
Is there any easy way to fix/adjust this (like the
On Apr 27, 2004, at 10:24 AM, Erik Hatcher wrote:
On Apr 27, 2004, at 9:49 AM, Nader S. Henein wrote:
So if Alan wants to limit it to the first 100 he can't really use a
range
search unless he can guarantee that the index is optimized after
deletes,
but then if his deletion rounds are anything
Dear all,
has anyone had experience using Lucene with data stored
in MS SQL server 2000 ?
How does indexing and searching work in that case.
Thanks,
Holger
___
The ALL NEW CS2000 from CompuServe
Better! Faster! More Powerful!
250 FREE hours!
Can you provide a simple test case that shows this problem?
Did you reindex when upgrading?
On Apr 27, 2004, at 11:31 AM, Ioan Miftode wrote:
I recently upgraded to lucene 1.4 RC2 because I needed some
sorting capabilities. However some phrase searches don't
work anymore (the hits don't even
Yukun Song wrote:
As known, currently Lucene uses flat file to store information for
indexing.
Any people has idea or resources for combining database (Like MySQL or
PostreSQL) and Lucene instead of current flat index file formats?
A few folks have implemented an SQL-based Lucene Directory, but
Beware of storing timestamps (DateFields, I guess) in Lucene, if you
intend to use range queries (xxx TO yyy).
Otis
--- Michael Wechner [EMAIL PROTECTED] wrote:
my XML files contain something like
date
year2004/yearmonth04/monthday27/day...
/date
and I would like to sort by this date.
Hello
I have documents in XML in which, for each word, I have 4 positions (top,
down, left and right) that would let me to highlight this word in a jpg
image. I want to index this XML documents and to highlight the results of
the queries in the image, so I need to store this positions for each
Otis Gospodnetic wrote:
Beware of storing timestamps (DateFields, I guess) in Lucene, if you
intend to use range queries (xxx TO yyy).
Why?
We have attributes that contain iso8601 date strings and when indexing:
Date date = isoConv.parse(value, new ParsePosition(0));
String dateString =
Because having small time units like milliseconds will result in Range
query expanding to a large number of BooleanQueries, if you have a lot
of documents with unique time stamps. Rounding the timestamp to
minutes, hours, or days, can drastically reduce the number of unique
time stamps, hence
Otis Gospodnetic wrote:
Because having small time units like milliseconds will result in Range
query expanding to a large number of BooleanQueries, if you have a lot
of documents with unique time stamps. Rounding the timestamp to
minutes, hours, or days, can drastically reduce the number of
Using Lucene 1.4 rc2 I've run into a fatal problem: certain
PhraseQueries cause a Read Past EOF exception (see below), while other
PhraseQueries enter an infinite loop due to a negative bufferLength
field in CSInputStream. Environment is WinXP, JDK 1.4.2. The index is
large, incorporating
On Apr 27, 2004, at 2:09 PM, Robert Koberg wrote:
Otis Gospodnetic wrote:
Because having small time units like milliseconds will result in Range
query expanding to a large number of BooleanQueries, if you have a lot
of documents with unique time stamps. Rounding the timestamp to
minutes, hours,
Erik Hatcher wrote:
On Apr 27, 2004, at 2:09 PM, Robert Koberg wrote:
Otis Gospodnetic wrote:
Because having small time units like milliseconds will result in Range
query expanding to a large number of BooleanQueries, if you have a lot
of documents with unique time stamps. Rounding the
Or if I overlooked some previous post or thread that covers this please help
me track it down.
Thank you,
Tate
-Original Message-
From: Tate Avery [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 27, 2004 10:20 AM
To: [EMAIL PROTECTED]
Subject: BooleanScorer - 32 required/prohibited
Thank you Doug, the latest CVS works fine.
ioan
At 12:23 PM 4/27/2004, you wrote:
Ioan Miftode wrote:
I recently upgraded to lucene 1.4 RC2 because I needed some
sorting capabilities. However some phrase searches don't
work anymore (the hits don't even have the term's I'm searching on).
Try
On Apr 27, 2004, at 3:41 PM, Robert Koberg wrote:
Oops, I meant to write DateField.timeToString which I use when
querying. If I use DateField.dateToString when indexing but
timeToString when searching is that a bad practice? I do only need
month, day and year. So should I be indexing with
Erik Hatcher wrote:
On Apr 27, 2004, at 3:41 PM, Robert Koberg wrote:
Oops, I meant to write DateField.timeToString which I use when
querying. If I use DateField.dateToString when indexing but
timeToString when searching is that a bad practice? I do only need
month, day and year. So should I
Robert Koberg wrote:
Ah. Great - thanks! I see you added it to the wiki. Thanks again :)
I guess you mean
http://wiki.apache.org/jakarta-lucene/IndexingDateFields
Thanks as well
Michi
This is perfect in my case since iso8601 is in the format:
2004-04-27T01:23:33
Luckily so far, from my
I am having a problem with using a network path for the index directory.
If I use a path of the form //server/indexdir the IndexWriter finds it
and
indexes documents but the IndexSearcher throws an exception saying it is
not a valid path.
I cannot use a local path as I need to be able to
On Tue, Apr 27, 2004 at 09:15:05AM -0700, Doug Cutting wrote:
Yukun Song wrote:
As known, currently Lucene uses flat file to store information for
indexing.
Any people has idea or resources for combining database (Like MySQL or
PostreSQL) and Lucene instead of current flat index file
Incze Lajos wrote:
Could anybody summarize what would be the technical pros/cons of a DB-based
directory over the flat files? (What I see at the moment is that for some
- significant? - perfomence penalty you'll get an index available over the
network for multiple lucene engines -- if I'm right.)
I'm assuming what you have is an eclipse plugin that is making use of the eclipse help
system. If what you are doing is relying on the lucene eclipse plugin, you may want to
look at the help system anyway since it will give you an example of an eclipse plugin
that is using the lucene plugin.
On Tue, Apr 27, 2004 at 02:46:22PM -0700, Doug Cutting wrote:
Incze Lajos wrote:
Could anybody summarize what would be the technical pros/cons of a DB-based
directory over the flat files? (What I see at the moment is that for some
- significant? - perfomence penalty you'll get an index
As far as I know, LARM is defunct. I read somewhere, perhaps apocryphal, that
Clemens got a job which wasn't supportive of his continued development on LARM.
AFAIK there aren't any other active developers of LARM (at least at the time it
branched off to SF).
Otis recently posted to use Nutch
I assume you are using Wintel platform. You may map the the directory where your
indexes are kept using persistent connection. (this can be done using NET USE.
command in command prompt). This keeps network connection always open, which otherwise
Windows will close the connection after
I assume you are using Wintel platform. You may map the the directory where your
indexes are kept using persistent connection. (this can be done using NET USE.
command in command prompt). This keeps network connection always open, which otherwise
Windows will close the connection after
I suggest you look at:
http://www.manageability.org/blog/stuff/open-source-web-crawlers-java
From what I know of nutch, it's meant as the basic for a competitor to the
big search engines (i.e. google). For a small web site, it might be
overkill especially if it requires you to build from CVS
42 matches
Mail list logo