Thanks Otis,
we will look into these issues again, slightly deeper. Network
problems are not likely, but DB, I do not know, this is huge select
... we will try to scan db, without indexing, just to see if it can
sustain... But gut feeling says, nope, this is not the one.

IO saturation would surprise me, but you never know. Might be very
well that SSD is somehow having problems with this sustained
throughput.

8 Core... no, this was single update thread.

we left default index settings (do not tweak if it works :)
<ramBufferSizeMB>32</ramBufferSizeMB>

32MB sounds like a lot of our documents (100b average on disk size).
Assuming ram efficiency of 50% (?), we lend at 100k buffered
documents. Yes, this is kind of  smallish as every ~3 seconds we
fill-up ramBuffer. (our Analyzers surprised  me with 30k+ records per
second).

256 will do the job, ~24 seconds should be plenty of "idle" time for
IO-OS-JVM  to sort out MMAP issues, if any (windows was newer MMAP
performance champion when using it from java, but once you dance
around it, it works ok)...


Max jvm heap on this test was 768m, memory never went above 500m,
Using  -XX:-UseParallelGC ... this is definitely not a gc problem.

cheers,
eks


On Sun, Sep 25, 2011 at 6:20 AM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
> eks,
>
> This is clear as day - you're using Winblows!  Kidding.
>
> I'd:
> * watch IO with something like vmstat 2 and see if the rate drops correlate 
> to increased disk IO or IO wait time
> * monitor the DB from which you were pulling the data - maybe the DB or the 
> server that runs it had issues
> * monitor the network over which you pull data from DB
>
> If none of the above reveals the problem I'd still:
> * grab all data you need to index and copy it locally
> * index everything locally
>
> Out of curiosity, how big is your ramBufferSizeMB and your -Xmx?
> And on that 8-core box you have ~8 indexing threads going?
>
> Otis
> ----
> Sematext is Hiring -- http://sematext.com/about/jobs.html
>
>
>
>
>>________________________________
>>From: eks dev <eks...@yahoo.co.uk>
>>To: solr-user <solr-user@lucene.apache.org>
>>Sent: Saturday, September 24, 2011 3:18 PM
>>Subject: Update ingest rate drops suddenly
>>
>>just looking for hints where to look for...
>>
>>We were testing single threaded ingest rate on solr, trunk version on
>>atypical collection (a lot of small documents), and we noticed
>>something we are not able to explain.
>>
>>Setup:
>>We use defaults for index settings, windows 64 bit, jdk 7 U2. on SSD,
>>machine with enough memory and 8 cores.   Schema has 5 stored fields,
>>4 of them indexed no positions no norms.
>>Average net document size (optimized index size / number of documents)
>>is around 100 bytes.
>>
>>On a test with 40 Mio document:
>>- we had update ingest rate  on first 4,4Mio documents @  incredible
>>34k records / second...
>>- then it dropped, suddenly to 20k records per second and this rate
>>remained stable (variance 1k) until...
>>- we hit 13Mio, where ingest rate dropped again really hard, from one
>>instant in time to another to 10k records per second.
>>
>>it stayed there until we reached the end @40Mio (slightly reducing, to
>>ca 9k, but this is not long enough to see trend).
>>
>>Nothing unusual happening with jvm memory ( tooth-saw  200- 450M fully
>>regular). CPU in turn was  following the ingest rate trend, inicating
>>that we were waiting on something. No searches , no commits, nothing.
>>
>>autoCommit was turned off. Updates were streaming directly from the database.
>>
>>-----
>>I did not expect something like this, knowing lucene merges in
>>background. Also, having such sudden drops in ingest rate is
>>indicative that we are not leaking something. (drop would have been
>>much more gradual). It is some caches, but why two really significant
>>drops? 33k/sec to 20k and than to 10k... We would love to keep it  @34
>>k/second :)
>>
>>I am not really acquainted with the new MergePolicy and flushing
>>settings, but I suspect this is something there we could tweak.
>>
>>Could it be windows is somehow, hmm, quirky with solr default
>>directory on win64/jvm (I think it is MMAP by default)... We did not
>>saturate IO with such a small documents I guess, It is a just couple
>>of Gig over 1-2 hours.
>>
>>All in all, it works good, but is having such hard update ingest rate
>>drops normal?
>>
>>Thanks,
>>eks.
>>
>>
>>

Reply via email to