Jay Malaluan schrieb:
Hi,
You can check out Nutch at http://lucene.apache.org/nutch/.
also see
http://incubator.apache.org/projects/droids.html
Cheers
Michael
Regards,
Jay Joel Malaluan
Haroldo Nascimento-2 wrote:
Hi,
There is any crawler that integrate with index lucene ?
Kesarkar, Dipak schrieb:
Hi,
I am using OpenCms 7.0.5 with Lucene search engine.
I need to index XML content for which I have a following field
configuration in the opencms-search.xml
unfortunately I don't have any knowledge re OpenCMS, but I think you
rather want to ask there (or hav
You can also create a Lucene field using a Reader, if the String is
really too large to materialize at once. Such fields cannot be stored
though.
But, if the String really is so large, I would worry about the end
user's experience (normally you want a Document to be a rather bite-
sized
The ~25 GB represents about 100 million events an avg of about 250 bytes each.
the indexed and searchable values are normal things: small bits of text (8-10
bytes usually); longs; ints; etc...
Also this 25GB is a per-day size, which is why expanding the values in it to
ascii is problematic fro
I am out of the office until 2009-02-02..
I will check emails at night. For anything emergent, you can call my cell
phone (86) 131 6290 0375.
Note: This is an automated response to your message Re: Where to download
package org.apache.lucene.search.trie sent on 24/1/09 19:19:09.
This is the on
Yes, that should work. Stream the file, converting each record to a
Lucene Document. All of the fields should probably be indexed only
(not stored) for size reasons, and then you could have a single stored
but not indexed field that would be the offset into your binary file.
-Yonik
On Fri, Jan
Hi Paul, have you tried persisting the binaries in Base64 format and then
indexing them?
As you are aware, Base64 is a robust representation used in email attachments
for example.
Thanks
Shashi
- Original Message
From: Paul Feuer
To: java-user@lucene.apache.org
Sent: Thursday, Janu
Hello,
I googled, searched this Forum and read the manual, but I'm not sure what
would be the best practice for Lucene search.
I have an e-Commerce application with about 10 mySQL tables for my products.
And I have an Index (which is working fine), with about 10 fields for every
product. Is it a
The binary events in the file are parsable by both our java server-side
processes and the clients of these processes, so we need to keep the data in
the binary format.
./paul
Sent from my Verizon Wireless BlackBerry
-Original Message-
From: Shashi Kant
Date: Fri, 30 Jan 2009 06:3
Hello
I would store normalised data in MySQL and index only searchable content in
Lucene.
Regards
Nilesh
From: ilwes
To: java-user@lucene.apache.org
Sent: Friday, 30 January, 2009 15:08:10
Subject: Best Practice for Lucene Search
Hello,
I googled, sea
That answer is fine, but there are others. We store denormalized data
in lucene, as you are doing, for display on web pages because we can
get it out of lucene much faster then we can get it out of the various
tables in the database. The database is not as fast as it might be,
quite possibly slow
We do it in the same way. We have our RDBMS for administer our
metadata/data. The search frontend for end users works completely with
Lucene/panFMP (www.pangaea.de). We marshal all our relational data to XML
files and index their contents using lucene. But the XML file is also stored
in lucene as s
Do you have a reasonable expectation that performance is going
to be a problem? The reason I ask is that I'm always suspicious
of efficiency arguments when "things are working fine". Unless and
until you can confidently predict that you're going to hit a
performance issue, do it the easiest way pos
Unless I am missing something, not sure I see the issue here. You can convert
to Base64 purely for indexing purposes and leave the original binary as-is.
- Original Message
From: Paul Feuer
To: Lucene User List ; Shashi Kant
Sent: Friday, January 30, 2009 10:12:33 AM
Subject: Re: in
Expanding 25+ GB per day is not ideal. If its possible to index the binary
directly, as it sounds like it might, we'll just do that.
I think what I was missing was - I didn't see AbstractField which seems like it
has the stuff I need (if indeed Field is used as I assume it is)
./paul
Sent
Hi Shashi,
What is the sense of this? The base64 encoded documents cannot be tokenized
and searched. To do this, they must be indexed as plain text. If you want to
store the original binary values as document data in the index, you could
also store them additionally as byte[] in the raw biary form
Hi Uwe, I was suggesting writing a custom tokenizer. In the worst case it
would be a character per token, might not be a very pretty solution, but
should do the job.
What do you think?
Thanks
Shashi
On Fri, Jan 30, 2009 at 12:57 PM, Uwe Schindler wrote:
> Hi Shashi,
>
> What is the sense of th
Assume I have an index of size 20G and a main memory of 1G.
I do the following steps in order.
* Open an IndexSearcher on the directory.
* Serve Searches from that directory
Meanwhile (when the IndexSearcher isstill open on the directory) - the
following operations are performed concurrently.
As long as the indexreader of the seracher is not reopened, the searcher
will not see any changes and will so not crash.
This works, because all changes are written in an extra file (.del for
deleted docs). The concurrent reader will not see those changes. If the
index is then optimized in paralle
I've tried o.a.s.a.EnglishPorterFilterFactory, which creates
org.tartarus.snowball.ext.EnglishStemmer,
but didn't get any success... I'd like to search "went" and "gone" when
I query "go".
Thank you,
Koji
Erick Erickson wrote:
If thou wast to investigate the stemmers would that work? I
conf
I don't expect this to work at all. Stemmers apply heuristics to try to
fold words into their stem. They are notoriously incapable of handling
irregular forms of a word. You'd need to look more at a synonym
list for words like your example.
Best
Erick
On Fri, Jan 30, 2009 at 7:25 PM, Koji Sekiguc
21 matches
Mail list logo