Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-20 Thread David Smiley
Ah, Solr wants a Codec*Factory* whereas you supplied the class name of the Codec. And of course your codec is a WIP I assume; you didn't customize the stored fields to not use compression yet. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Mar

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-20 Thread Fikavec F
I tested "streaming expressions" ('expr=search(test_collection,q="*:*",fl="id, text_sn",sort="id asc",rows=1600)') on collection with one shard with small documents - a long preparation of the server response before the data transfer begins (it looks like when the collection consisted of 8

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-19 Thread David Smiley
On Sun, Mar 19, 2023 at 8:39 AM Fikavec F wrote: >I was able to create a collection with "solr.SimpleTextCodecFactory" > codecFactory and solr can proces (return) only 2x more documents per second > from it (214 410 documents per second vs 115 000 "solr.SchemaCodecFactory" > with

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-19 Thread Fikavec F
   I was able to create a collection with "solr.SimpleTextCodecFactory" codecFactory and solr can proces (return) only 2x more documents per second from it (214 410 documents per second vs 115 000 "solr.SchemaCodecFactory" with compression). I expected much much more, because this is a simple

Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-14 Thread David Smiley
You can get parallelism by sharding -- use numShards=8 or whatever number of CPUs you have. In your performance analysis, you did not speak of memory; you spoke as if there are only two factors at play -- CPU and Disk. Compression of data on disk is used in order to allow the OS's disk cache to

RE: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-14 Thread Fikavec F
Thank you for working on the Solr performance issues raised here.   LZ4 is a great solution, but let's look at how things are today. As far as I understand, uncompressed fields have been abandoned since version 4.1 (early 2013). At that time, 15,000 RPM SAS disks produced 350 MB/s or 150-180 iops

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-12 Thread Ishan Chattopadhyaya
> Again, it's worth being aware that what you are doing is very far afield > from what a search engine is *for*. So yeah... performance may not be so > great. Solr users want top-X documents sorted by something, and/or maybe > some facets/stats summarizing fields. Not all docs. Optimizing

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-12 Thread David Smiley
There is compression of stored data; I don't think it makes sense to disable it. The default compression is LZ4 which is the "BEST_SPEED" option offered by Lucene compared to others. Back in 2015, the article you quoted, this faster option wasn't available. I don't see a no-compression option:

RE: RE: Re: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-12 Thread Fikavec F
I checked how an increase in the size of the outputAggregationSize and outputBufferSize in Jetty.xml affects the data transfer speed.   - outputBufferSize - set the size of the buffer into which response content is aggregated before being sent to the client. A larger buffer can improve performance

Re: Re: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-09 Thread Noble Paul
@Fikavec Can you please post the above comments on the PR? It would serve as a record On Thu, Mar 9, 2023 at 9:07 AM Fikavec F wrote: > Thank you for your work Noble Paul, it's very interesting. >You were so attentive that you noticed and replaced "else if (val > instanceof Double) {

RE: Re: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-08 Thread Fikavec F
Thank you for your work Noble Paul, it's very interesting.   You were so attentive that you noticed and replaced "else if (val instanceof Double) { gen.writeNumber(val.floatValue()); }" with "else if (val instanceof Double) { gen.writeNumber(val.doubleValue()); }". But I just copied the code (Line

Re: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-08 Thread Noble Paul
I've opened the PR https://github.com/apache/solr/pull/1440 Some tests are still failing. I'll address them later On Tue, Mar 7, 2023 at 1:56 PM David Smiley wrote: > Fantastic! > I really appreciate you working with the community on this one. > > ~ David Smiley > Apache Lucene/Solr Search

Re: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-06 Thread David Smiley
Fantastic! I really appreciate you working with the community on this one. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Mar 6, 2023 at 6:32 PM Fikavec F wrote: > Thank you, you are very kind. I took measurements on two physical servers >

RE: Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-06 Thread Fikavec F
Thank you, you are very kind. I took measurements on two physical servers with a 10 Gigabit link, the speed and time of full fetching 10 Gb collection (one shard; empty "Accept-Encoding: " header; collection with only id and string stored fields) are as follows:original wt=json      -  419 Mb/s

Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-06 Thread Gus Heck
Keep in mind Yoniks law of half baked patches: "A half-baked patch with no documentation, no tests and no backwards compatibility is better than no patch at all." On Sun, Mar 5, 2023 at 5:43 PM Fikavec F wrote: > Thanks. In the coming days I will conduct testing and measurements on real >

Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-05 Thread Ishan Chattopadhyaya
Here it is, noble. https://github.com/Fikavec/NewAndModifiedSolrResponseWriters On Mon, 6 Mar, 2023, 7:49 am Noble Paul, wrote: > Hi, > > You can just point me to your repo and I can open a proper PR with that > code > > On Mon, Mar 6, 2023, 9:43 AM Fikavec F wrote: > > > Thanks. In the coming

Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-05 Thread Noble Paul
Hi, You can just point me to your repo and I can open a proper PR with that code On Mon, Mar 6, 2023, 9:43 AM Fikavec F wrote: > Thanks. In the coming days I will conduct testing and measurements on real > hardware. >Unfortunately my code is not ready to become part of the project >

RE: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-05 Thread Fikavec F
Thanks. In the coming days I will conduct testing and measurements on real hardware.   Unfortunately my code is not ready to become part of the project directly, since this is a very serious place for changes and I am not a Java developer, I am not deeply familiar with the work of internal Solr

Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-05 Thread Ishan Chattopadhyaya
Wow, that's absolutely fantastic. Would you like to contribute it upstream, maybe a PR? On Sun, 5 Mar, 2023, 9:21 pm Fikavec F, wrote: > Thanks to everyone for your help (especially Mikhail Khludnev and Michael > Gibney)! > >I rewrote JSONResponseWriter using fasterxml jackson library >

RE: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-03-05 Thread Fikavec F
Thanks to everyone for your help (especially Mikhail Khludnev and Michael Gibney)!    I rewrote JSONResponseWriter using fasterxml jackson library (com.fasterxml.jackson.core.JsonGenerator, I took the SmileResponseWriter code as a basis) and added it to the collection as my own queryResponseWriter

Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-27 Thread Ishan Chattopadhyaya
> Does Solr have a project to track basic performance metrics from version to version, similar to Rally (the macro-benchmarking framework for Elasticsearch) ? There's an ongoing effort to have

RE: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-27 Thread Fikavec F
Thank you for helping me figure out the performance of "Response Writers". If you wish, you can close the topic or move it to issues under the title performance of non-binary types of "Response Writers" (if in your opinion the difference of 4-8 times between wt=javabin and wt=json is not normal).

Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-27 Thread Michael Gibney
This is all really interesting, thanks for the detailed feedback (esp. for trying out 9.1)! I think the optimization I mentioned from 9.1 is not relevant, but I should have mentioned: the optimization in 9.1 only applies to `*:*` with sort-by-score. Including the `sort=[anything-other-than-score]`

Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-27 Thread Ishan Chattopadhyaya
Fikavec, thanks for your thorough analysis. It will help Solr. To be clear, 8.11 vs 9.1 Solr has a degradation from 3.66gbps to 1.5gbps with wt=javabin? If yes, this requires fixing this. Can you please help us reproduce it? On Mon, 27 Feb, 2023, 6:41 am Fikavec F, wrote: > David Smiley, sorry

RE: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-26 Thread Fikavec F
David Smiley, sorry for my terminology, I’m used to calling a full data fetching by small parts from DB table (collection) as "scrolling". Of course, in Solr cursors (cursorMark) are designed for this and I use them. Large "rows" values in my examples (measurements) are needed to show the speed at

Re: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-26 Thread Mikhail Khludnev
> > Isn't there a deserializer of Solr javabin format in python/json Well, you can try to marry https://solr.apache.org/guide/solr/latest/query-guide/response-writers.html#smile-response-writer + https://github.com/jhosmer/PySmile On Mon, Feb 27, 2023 at 12:46 AM Fikavec F wrote: > Thank you

RE: Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-26 Thread Fikavec F
Thank you for your help with slow single threaded data receiving from Solr. Today I was able to reach a speed of 3Gigabit+ and got results that may be useful in the future. I turned out to be wrong in assuming that the main problem is in the FastWriter output buffer, but this was the most obvious

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-26 Thread David Smiley
You used the word "scroll" a lot. Can you elaborate? Search is generally optimized for returning top-X where X is not large. My suspicion is that you want lots of results back. You might want to use cursorMark as described here:

Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-26 Thread Michael Gibney
> I'm not sure there's a shortcut bypassing ordering results through heap To expand on this a bit: the behavior Mikhail describes changes as of solr 9.1 (https://issues.apache.org/jira/browse/SOLR-14765), which introduces exactly the proposed bypass. The extra overhead (pre-9.1) scales linear wrt

Re: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-25 Thread Mikhail Khludnev
As being said above, the speed of streaming file to socket, via OS internals is hardly achievable with java code crunching bytes through heap. Using RamDirectory might push on GC ,it's rather better to stick with the default one and leave enough RAM for file cache. Regarding the actual params:

RE: Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-25 Thread Fikavec F
Thanks for the patch for testing. I could not see significant improvements on virtual machines, I will try again this week on servers.I tried the following values for buffers: 65536 -  64Kb, 262144 - 256Kb, 524288 - 512Kb, 1048576 - 1MB, 4194304 - 4MB, 16777216 - 16MB, 33554432 - 32Mb, 67108864 -

RE: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-25 Thread Fikavec F
Today I created a 1GB collection with RAMDirectoryFactory and only id, stored unindexed without docValues text_sn fields in the schema. In solrconfig.xml:... and in section:${solr.lock.type:single} In managed-schema:...        ... Receiving all data from it is also carried out at slow speeds of

Re: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-25 Thread Mikhail Khludnev
Hi, I made it tunable. It should capture it from OS env BUFSIZE=8192, and java property -DBUFSIZE=8192. It logs actual value under INFO for FastWriter https://github.com/mkhludnev/lucene-solr/commit/f0ed425bcbb16c50d01f3ff7fba3879148f50568 Here's a jar

RE: Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-24 Thread Fikavec F
jetty.xml used in default solr configuration, file downloaded by http and nginx maybe faster because in /etc/nginx/sites-available/default - location / { aio threads; sendfile on; }I'm reproduce this in Ubuntu on virtual machines in home PC:# 0 set /etc/sysctl.conf settings from

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-24 Thread Mikhail Khludnev
Hello, Can you rebuild the jar with a bigger buffer and benchmark to confirm the hypothesis? On Fri, Feb 24, 2023 at 6:34 PM Fikavec F wrote: > I'm installed Solr 8.11.1 (SOLR_JAVA_MEM="-Xms31g -Xmx31g") into ram disk > in hi-performance server with 10-Gigabit network adapters. Jumbo Frames >

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-24 Thread Kevin Risden
> > But data recieving speed on simple solr scroll with query *:* on 250Gb > collection (10 shards) by id never speeds up 200 Megabits without jetty > tuning and 350 Megabits with jetty tuning (10GB files from tuned solr jetty > (like /mnt/ramdisk/solr/server/solr-webapp/webapp/testfile.bin)

Re: Low untunable default FastWriter output buffer - possible reason for slow single threaded data receiving from Solr on 1Gigabit+ networks while scroll, search etc

2023-02-24 Thread Ishan Chattopadhyaya
Nice catch! Sounds reasonable to me that it should be configurable. Would you like to open a JIRA to submit a patch for this (or would you rather someone else pick it up)? On Fri, Feb 24, 2023 at 9:04 PM Fikavec F wrote: > I'm installed Solr 8.11.1 (SOLR_JAVA_MEM="-Xms31g -Xmx31g") into ram