Thompsonbry.systap added a comment. In terms of the machine shape, the general guidelines you give are appropriate. However, here is out it plays out in terms of GC. Large heaps => long GC pauses. So you want to keep the JVM heap fairly small (4G => 8G). Analytic queries can use the native C process heap for hash index joins and (in the future) for storing intermediate solutions. So the actual C process heap (for the JVM) can be bigger. If you are bulk loading data then you want more write cache buffers. Those are 1MB buffers. You can have 6 => 1000s. This also helps for bulk load onto disks that can not reorder writes (SATA).
The rest of that RAM is going to buffer the file system and decrease IO Wait. Some of our customers also use warmup procedures to avoid cold start performance. There are a couple of aspect of the cold start issue. One is just that things are slow because they are on the disk. Another is that the JVM is not optimized yet against the code. However, yet another impact is that the data has a longer dwell time during query execution because it takes longer to execute the query. This makes the GC overhead higher for cold disks / cold JVM scenarios. One warmup procedure is just to copy the journal file to /dev/nul. Just get it into the OS cache. Another is to run http://.../bigdata/status?dumpJournal&dumpPages=true This will run through all of the indices and visit all of their pages and provides some interesting reporting. We have been discussing a warmup procedure based on this but which only visits the non-leaf nodes of the indices. After that warmup any leaf would just be a single IO. That should eliminate most of the IO Wait and GC burden associated with slamming a cold node. And if you are load balancing across nodes, then you can obviously just load balanced based on metrics and gradually shift more load to a node as it heats up. Thanks, Bryan TASK DETAIL https://phabricator.wikimedia.org/T90116 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Thompsonbry.systap Cc: Thompsonbry.systap, Beebs.systap, Haasepeter, Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
