ocs) with similar parameters
> as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 processor ,40 with
> hyperthreading, 64G Memory) and study indexing speed on HDD, SSD and NVMe.
> While I do see benefit when switching from HDD to SSD, there is not much
> noticeable benefit m
AM
To: java-user@lucene.apache.org
Cc: Anahita Shayesteh-SSI
Subject: Re: Lucene indexing speed on NVMe drive
: Hi. I am studying Lucene performance and in particular how it benefits from
faster I/O such as SSD and NVMe.
: parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20
: proc
: Hi. I am studying Lucene performance and in particular how it benefits from
faster I/O such as SSD and NVMe.
: parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20
: processor ,40 with hyperthreading, 64G Memory) and study indexing speed
...
: I get best performance
study indexing speed on HDD, SSD and NVMe.
While I do see benefit when switching from HDD to SSD, there is not much
noticeable benefit moving to NVMe.
I get best performance (200GB/hour) with 20 indexing threads, increasing number
of threads to 40 hurts performance. Similarly increasing
gt; forceMerge().
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Shai Erera [mailto:ser...@gmail.com]
> > Sent: Thursday, August 07, 2014
.@gmail.com]
> Sent: Thursday, August 07, 2014 4:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: improve indexing speed with nomergepolicy
>
> Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you pass
> it
> at construction time an
Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you
pass it at construction time and don't change it afterwards. I wonder if
after LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
IndexWriter to not hold on to it, but rather pull it from the config.
Not sure
16:05 Uhr
Von: "Jon Stewart"
An: java-user@lucene.apache.org
Betreff: Re: improve indexing speed with nomergepolicy
Related, how does one change the MergePolicy on an IndexWriter (e.g.,
use NoMergePolicy during batch indexing, then change to something
better once finished with batch)
Related, how does one change the MergePolicy on an IndexWriter (e.g.,
use NoMergePolicy during batch indexing, then change to something
better once finished with batch)? It looks like the MergePolicy is set
through IndexWriterConfig but I don't see a way to update an IWC on an
IW.
Thanks,
Jon
O
many thanks again. this was a good tip.
after switching from FSDirectory to NRTCachingDirectory queries run at double
speed.
Sascha
Gesendet: Donnerstag, 07. August 2014 um 14:54 Uhr
Von: "Sascha Janz"
An: java-user@lucene.apache.org
Betreff: Aw: Re: improve indexing
many thanks for the tip with NRTCachingDirectory. didn't know that.
i will try it .
Sascha
Gesendet: Donnerstag, 07. August 2014 um 13:37 Uhr
Von: "Shai Erera"
An: "java-user@lucene.apache.org"
Betreff: Re: improve indexing speed with nomergepolicy
Using NoMerge
Using NoMergePolicy for online indexes is usually not recommended. You want
to use NoMP in case where you build an index in a batch job, then in the
end before the index is "published" you run a forceMerge or maybeMerge
(with a real MergePolicy).
For online indexes, i.e. indexes that are being sea
hi,
i try to speed up our indexing process. we use SeacherManager with applydeletes
to get near real time Reader.
we have not really "much" incoming documents, but the documents must be updated
from time to time and the amount of documents to be updated could be quite
large.
i tried some tes
Hi Uwe,
> Die, Maven, die :-)
Well, I for myself have a love-hate-relationship to maven: its simple
and works nice for deps management. also others can set it up quickly
and IDE support is nice. But sometimes it does a bit too much
(unexpected ;)) or is too complicated to customize.
> (I assum
Hi,
> > I mean my benchmarks show up
> > to 300% improvement with 4.x versus older versions so something is
> > weird ie. non-realistic here or there is a bug so lets figure this
> > out. Can you profile you app and see if you find something suspicious?
> > I'll try now and report back.
>
> It s
> I mean my benchmarks show up
> to 300% improvement with 4.x versus older versions so something is
> weird ie. non-realistic here or there is a bug so lets figure this
> out. Can you profile you app and see if you find something suspicious?
> I'll try now and report back.
It seems to be largely
Hi Simon,
answers below.
>> It does not seem to be an 'IO related issue' because using RAMDirectory
>> results in the same times.
>> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)
> it could be since we use a different term dictionary impl which is
> more expensive in
rt. You should add some more randomness
>> or reality to your test.
>>
>> simon
>>
>> On Tue, Jan 3, 2012 at 5:56 PM, Peter K wrote:
>>> Hi,
>>>
>>> I recently switched an experimental project from Lucene 3.5 to 4.0 from
>>> 6th
ot;_type", "test", StringField.TYPE_STORED);
>>
>> Without them Lucene 4 is faster**. Here is a recreation using different
>> branches for every lucene version:
>> https://github.com/karussell/lucene-tmp
>> Or is there something wrong with my too sim
hub.com/karussell/lucene-tmp
> Or is there something wrong with my too simplistic scenario?
>
> Furthermore: How could I further improve Lucene 4.0 indexing speed?
> (I already read through the performance list on the wiki)
>
> Regards,
> Peter.
>
> *
> ope
my too simplistic scenario?
Furthermore: How could I further improve Lucene 4.0 indexing speed?
(I already read through the performance list on the wiki)
Regards,
Peter.
*
open jdk 1.6.0_20 (but also confirmed with latest java6 from oracle)
ubuntu/10.10 linux/2.6.35-31 i686, 2GB ram
**
lucene
mat need to parse to extract information
>> after
>> >> that i had to index.
>> >> Single thread process one file at a time then i decided to use multi
>> >> threads when the main thread that loops the directory and pass the
>> file
>> >> int
decided to use multi
> >> threads when the main thread that loops the directory and pass the file
> >> into pool of worker threads using a queue
> >> all of the which share same index writer, How ever there is no any
> >> significant changes in indexing speed
using a queue
>> all of the which share same index writer, How ever there is no any
>> significant changes in indexing speed
>>
>> Any hints I am doing wrong or any suggestion
>>
>>
>> Thanks
>> Antony
>>
>
> --
cess one file at a time then i decided to use multi
> threads when the main thread that loops the directory and pass the file
> into pool of worker threads using a queue
> all of the which share same index writer, How ever there is no any
> significant changes in indexing speed
>
&g
directory and pass the file
into pool of worker threads using a queue
all of the which share same index writer, How ever there is no any
significant changes in indexing speed
Any hints I am doing wrong or any suggestion
Thanks
Antony
On Tue, 2011-05-31 at 08:52 +0200, Maciej Klimczuk wrote:
> I did some testing with 3.1.0 demo on Windows and encountered some strange
> bahaviour. I tried to index ~6 small text documents using the demo.
> - First trial took about 18 minutes.
> - Second and third trial took about 2 minutes.
Hello everyone
I did some testing with 3.1.0 demo on Windows and encountered some strange
bahaviour. I tried to index ~6 small text documents using the demo.
- First trial took about 18 minutes.
- Second and third trial took about 2 minutes.
I then made another test on other, bigger docum
: feedback: Indexing speed improvement lucene 2.2->2.3.1
Sorry for my ignorance, I am looking for
NgramStemFilter specifically.
Are you suggesting that it's the same as NGramTokenFilter? Does it have
stemming in it?
Thanks again.
Jay
Otis Gospodnetic wrote:
Sorry, I wrote this stu
dependent searching.
Regards Uwe
-Ursprüngliche Nachricht-
Von: yu [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 26. März 2008 05:26
An: java-user@lucene.apache.org
Betreff: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Sorry for my ignorance, I am looking for
Ng
ect: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Otis,
I checked that contrib before and could not find NgramStemFilter. Am I
missing other contrib?
Thanks for the link!
Jay
Otis Gospodnetic wrote:
Hi Jay,
Sorry, lapsus calami, that would be Lucene *contrib*.
Hav
gt;
To: java-user@lucene.apache.org
Sent: Wednesday, March 26, 2008 12:04:33 AM
Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Otis,
I checked that contrib before and could not find NgramStemFilter. Am I
missing other contrib?
Thanks for the link!
Jay
Otis Gospodne
/index.html
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jay <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 6:15:54 PM
Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Sorry, I could
a-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 6:15:54 PM
Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Sorry, I could not find the filter in the 2.3 API class list (core +
contrib + test). I am not ware of lucene config file either. Could you
please tell me where it
- Original Message
From: Jay <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 1:32:24 PM
Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Uwe,
I am curious what NGramStemFilter is? Is it a combination of porter
stemming an
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jay <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 1:32:24 PM
Subject: Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Uwe,
I am curious
21. März 2008 16:25
An: java-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Uwe,
Could you tell what Analyzer do you use when you marked so big indexing
speedup?
If you use StandardAnalyzer (that uses StandardTokenizer) may be the
reason is in i
EMAIL PROTECTED]
Gesendet: Dienstag, 25. März 2008 17:13
An: java-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
Uwe,
This is a little off thread-topic, but I was wondering how your
search relevance and search performance has fared with this
bigr
query, scoring in a doc the greatest bigramms clusters covering the phrase
> token.
>
> Best Regards
>
> Uwe
>
> -Ursprüngliche Nachricht-
> Von: Ivan Vasilev [mailto:[EMAIL PROTECTED]
> Gesendet: Freitag, 21. März 2008 16:25
> An: java-user@lucene.apache.
ing in a doc the greatest bigramms clusters
covering the phrase token.
Best Regards
Uwe
-Ursprüngliche Nachricht-
Von: Ivan Vasilev [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 21. März 2008 16:25
An: java-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.
ava-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Uwe,
Could you tell what Analyzer do you use when you marked so big
indexing
speedup?
If you use StandardAnalyzer (that uses StandardTokenizer) may be the
reason is in it. You can see the pre last report in
mms clusters covering the phrase
token.
Best Regards
Uwe
-Ursprüngliche Nachricht-
Von: Ivan Vasilev [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 21. März 2008 16:25
An: java-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1
Hi Uwe,
Could you tel
Hi Uwe,
Could you tell what Analyzer do you use when you marked so big indexing
speedup?
If you use StandardAnalyzer (that uses StandardTokenizer) may be the
reason is in it. You can see the pre last report in the thread "Indexing
Speed: 2.3 vs 2.2 (real world numbers)". Accord
That is a nice result -- thanks for reporting this Uwe!
Mike
On Mar 1, 2008, at 3:45 AM, Uwe Goetzke wrote:
This week I switched the lucene library version on one customer
system.
The indexing speed went down from 46m32s to 16m20s for the complete
task
including optimisation. Great Job
This week I switched the lucene library version on one customer system.
The indexing speed went down from 46m32s to 16m20s for the complete task
including optimisation. Great Job!
We index product catalogs from several suppliers, in this case around
56.000 product groups and 360.000 products
On Monday 04 February 2008 21:51:39 Michael McCandless wrote:
> Even pre-2.3, you should have seen gains by adding threads, if indeed
> your hardware has good concurrency.
>
> And definitely with the changes in 2.3, you should see gains by
> adding threads.
With regards to this, I have been wonder
fared:
On a 2.17 million document index, a recent test gave indexing
time to
be:
* lucene 2.2: 4.83 hours
* lucene 2.3: 26 minutes
About a factor of 11 speedup. Holy smokes! Great work folks.
-jake
--
View this message in context:
http://www.nabble.com/Indexing-Speed%3A-2
Note that in particular, we use the StandardTokenizer as part of our
analyzer
chain, which means it has the switch from the JavaCC version to the JFlex
based
code, which I'm betting is a substantial part of that speedup.
-jake
On Feb 3, 2008 2:11 PM, Briggs <[EMAIL PROTECTED]> wrote:
> Damn, r
.17 million document index, a recent test gave indexing time to
> > be:
> >
> > * lucene 2.2: 4.83 hours
> > * lucene 2.3: 26 minutes
> >
> > About a factor of 11 speedup. Holy smokes! Great work folks.
> >
> >
> > -jake
> >
&
t
> fared:
>
> On a 2.17 million document index, a recent test gave indexing time to
> be:
>
> * lucene 2.2: 4.83 hours
> * lucene 2.3: 26 minutes
>
> About a factor of 11 speedup. Holy smokes! Great work folks.
>
>
> -jake
>
>
--
Yeah, I should have mentioned - this was merely with a jar replacement, we
haven't gotten around to doing fun 2.3-related stuff like making sure our
domain-specific tokenizers use the next(Token), as well as making sure set
all of our buffersizes by RAM used.
We tried multithreading the process, a
Damn, really? I haven't had the opportunity to test this yet. Has
anyone else seen this kind of improvement?
On Feb 3, 2008 2:57 PM, Jake Mannix <[EMAIL PROTECTED]> wrote:
> Hello all,
> I know you lucene devs did a lot of work on indexing performance in 2.3,
> and I just tested it out last
Awesome! We are glad to hear that :)
You might be able to make it even faster with the steps here:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Mike
Jake Mannix wrote:
Hello all,
I know you lucene devs did a lot of work on indexing performance
in 2.3,
and I just tested i
Hello all,
I know you lucene devs did a lot of work on indexing performance in 2.3,
and I just tested it out last thursday, so I thought I'd let you know how it
fared:
On a 2.17 million document index, a recent test gave indexing time to be:
* lucene 2.2: 4.83 hours
* lucene 2.3: 26 m
On 10-Sep-07, at 5:59 AM, Laxmilal Menaria wrote:
Hello Everyone,
I have created a Index Application using Java lucene 2.0 in java and
Lucene.Net 2.0 in VB.net. Both application have same logic. But
when I have
indexed a database with 14000 rows from both application and same
machine, I
sur
On Monday 10 September 2007 14:59, Laxmilal Menaria wrote:
> I have created a Index Application using Java lucene 2.0 in java and
> Lucene.Net 2.0 in VB.net. Both application have same logic. But when I
> have indexed a database with 14000 rows from both application and same
> machine, I surprised
Hello Everyone,
I have created a Index Application using Java lucene 2.0 in java and
Lucene.Net 2.0 in VB.net. Both application have same logic. But when I have
indexed a database with 14000 rows from both application and same machine, I
surprised that Java lucene took (198 Seconds) more than doub
less than
> 2
> > minutes for 14000 records. Then I indexed the same data using Lucene2.2.
> > It
> > took about 4 minutes. I got affected indexing speed on using Lucene2.2.
> > The
> > indexing code is same. I just updated the lucenejar.
> >
> > What can b
ilal Menaria wrote:
>
> > Hello everyone,
> >
> > I have indexed a mysql database using Lucene2.0. It was taking less
> > than 2
> > minutes for 14000 records. Then I indexed the same data using
> > Lucene2.2. It
> > took about 4 minutes. I got affected
minutes for 14000 records. Then I indexed the same data using Lucene2.2.
> It
> took about 4 minutes. I got affected indexing speed on using Lucene2.2.
> The
> indexing code is same. I just updated the lucenejar.
>
> What can be done to improve the indexing speed. Please le
data using
Lucene2.2. It
took about 4 minutes. I got affected indexing speed on using
Lucene2.2. The
indexing code is same. I just updated the lucenejar.
What can be done to improve the indexing speed. Please let me know
asap...
--
Thanks in advance,
Laxmilal menaria
http://www.c
Hello everyone,
I have indexed a mysql database using Lucene2.0. It was taking less than 2
minutes for 14000 records. Then I indexed the same data using Lucene2.2. It
took about 4 minutes. I got affected indexing speed on using Lucene2.2. The
indexing code is same. I just updated the lucenejar
Here: http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html
- Original Message
From: [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Monday, February 27, 2006 4:24:27 AM
Subject: RE: Indexing speed
maxBufferedDocs parameters. You can also look for my article about
maxBufferedDocs parameters. You can also look for my article about indexing
with Lucene (link in the Wiki), which includes code for playing with various
parameters and explains what's going on, etc.
Sorry, but where this link ?
Where placed your article ? Please, give me url.
-
maxBufferedDocs parameters. You can also look for my article about indexing
with Lucene (link in the Wiki), which includes code for playing with various
parameters and explains what's going on, etc.
Sorry, but where this link ?
with Lucene (link in the Wiki), which includes code for playing with various
parameters and explains what's going on, etc.
Otis
- Original Message
From: revati joshi <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Fri 24 Feb 2006 12:41:33 PM EST
Subject: Indexi
revati joshi wrote:
hi all,
I just wnted to know how to increase the speed of indexing of files .
I tried it by using Multithreading approach but couldn't get much better
performance.
It was same as it is in usual sequential indexing.Is there any other approach
to get better Inde
hi all,
I just wnted to know how to increase the speed of indexing of files .
I tried it by using Multithreading approach but couldn't get much better
performance.
It was same as it is in usual sequential indexing.Is there any other approach
to get better Indexing performance incas
68 matches
Mail list logo