10B documents is a lot of data.
Index/file won't scale: you will not be able to open all the indexes in the
same time (filehandlers limits, memory limits, etc), and if you'll
search through them sequentially, it will take a lot of time.
Unless in your usecase you always know the file you are sear
I suppose that's fair enough. Some quick googling seems that this has
been asked many times with pretty much the same response. Sorry to
add to the noise.
On Tue, Dec 6, 2011 at 9:34 PM, Darren Govoni wrote:
> I asked here[1] and it said "Ask again later."
>
> [1] http://8ball.tridelphia.net/
>
I asked here[1] and it said "Ask again later."
[1] http://8ball.tridelphia.net/
On 12/06/2011 08:46 PM, Jamie Johnson wrote:
Thanks Robert. Is there a timetable for that? I'm trying to gauge
whether it is appropriate to push for my organization to move to the
current lucene 4.0 implementation
Thanks Robert. Is there a timetable for that? I'm trying to gauge
whether it is appropriate to push for my organization to move to the
current lucene 4.0 implementation (we're using solr cloud which is
built against trunk) or if it's expected there will be changes to what
is currently on trunk.
On Tue, Dec 6, 2011 at 6:41 PM, Jamie Johnson wrote:
> Is there a timetable for when it is expected to be finalized?
it will be finalized when Lucene 4.0 is released.
--
lucidimagination.com
-
To unsubscribe, e-mail: java-user
Is there a timetable for when it is expected to be finalized? I'm not
looking for an exact date, just an approximate like (next month, 2
months 6 months,etc)
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For a
I need to implement a "quick and dirty" or "poor man's" translation of a
foreign language document by looking up each word in a dictionary and replacing
it with the English translation. So what I need is to tokenize the original
foreign text into words and then access each word, look it up and g
There are utilities floating around for getting output from analyzers
- would that help? I think there are some in LIA, probably others
elsewhere. The idea being that you grab the stored fields from the
index, pass them through your analyzer, grab the output and use that.
Or can you do something
I'm still struggling with this.
I've tried to implement the solution mentioned in previous reply, but
unfortunately there is a blocking issue with this:
I cannot find a way to create another index from the source index in a
way that the new index has the field values in it. The only way to copy
Try taking a look at the patch, but on a quick glance it doesn't
look like the underlying code has changed much.
But note the whole point of this is that optimize is overused
given its former name, why do you want to keep using it?
Best
Erick
On Tue, Dec 6, 2011 at 1:04 AM, KARTHIK SHIVAKUMAR
w
I had a similar problem. The problem was the "-' char, which is a special
char for Lucene. You can try indexing the data in lowercase and use
WhitespaceAnalyzer for both indexing and searching over the field. One
other option is replace "-" with "_" when indexing and searching. This way,
your data
Try QueryParser.setLowercaseExpandedTerms(false). QueryParser will
lowercase terms in prefix etc queries by default.
If that doesn't work, and it was my problem, I'd just lowercase
everything, everywhere. Life's too short to mess around with case
issues.
--
Ian.
On Tue, Dec 6, 2011 at 8:12 A
Hi Danil,
Thank you for your suggestions.
We will have approximately half million documents per file, so using your
calculation, 2 files * 50 = 10, 000, 000, 000. And we are likely to get
more files in the future, so a scalable solution is most desirable.
The document IDs are not uniq
How many documents there are in the system ?
approximate it by: 2 files * avg(docs/file)
>From my understanding your queries will be just lookup for a document ID
(Q: are those IDs unique between files? or you need to filter by filename?)
If that will be the only usecase than maybe you should
Hi Guys,
Thank you very much for your answers.
I will do some profiling on memory usage, but is there any documentation on how
Lucene uses/allocates the memory?
Best wishes,
Rui Wang
On 6 Dec 2011, at 06:11, KARTHIK SHIVAKUMAR wrote:
> hi
>
>>> would the memory usage go through the roof?
Dear Lucene-users,
I am a bit puzzled over this. I have a query which should return some
documents, if I use Luke, I obtain hits using the
org.apache.lucene.analysis.KeywordAnalyzer.
This is the query:
domain:NB-AR*
(I have data indexed using:
doc.add(new Field("domain", NB-ARC, Field.Store.YE
16 matches
Mail list logo