How many concurrent queries do you have? Or in other words - how many threads does your executor have? If it has several ones then I understand why CPU load goes up to max.
I am also not sure about the measurements. All your queries get scheduled immediately and initialize startTime at schedule, but the time they stay queued is still accounted. Correct? If yes, I would suggest you rewrite your benchmark. Just start N threads and make each thread submit your query measuring the time. Please set "peerClassLoading" to false. Please provide cache configuration as well and MyObject.java --Yakov 2016-08-03 16:47 GMT+03:00 Piubelli, Manuel <[email protected]>: > Hi Yakov, > > > > Thank you very much for your reply, addressing a few questions: > > > > > “Is it correct that you run your query in a loop” > > > > The query is run asynchronously, as I want to simulate multiple clients > hitting the cluster at the same time (see queries/second) so its > technically not a loop but just scheduled callables. > > > > > “giving enough time for the whole cluster to warmup and only then take > the final measurements?” > > > > I have a non-measured warmup round firing queries for 5 minutes before > starting the measurements. > > > > > I also do not understand why CPU load is 400% which may be interpreted > as full (correct?). This means that at least 4 threads are busy on each > node, but when you broadcast your query it is processed with only 1 thread > on each node. > > > > Yes, I noticed *up to* 400% which is full (my box has 4 Cores), I would > explain this with the fact that the ignite cluster is hit with other > requests while it is processing the first, would that explain it? > > > > >Having this in mind you can try launching 1 node per 1 core on each > server - this will split your data set and will lower the amount of work > for each node. > > > > Would this mean that instances compete more for the same cores in a high > throughput scenario? Is there a way to have one node restricted to one > process – or should I lower the size of the thread pool? > > > > Code (SQL example) : > > //Async Test Runner > > final Queue<Integer> timings = new LinkedBlockingQueue<Integer>(); > > for (long i = 0; i < requestsPerSecond * testTime; i++) { > > PerformanceTest test = > PerformanceTestFactory.getIgnitePerformanceTest(ignite,testName,timings); > > executor.schedule(test, 0, TimeUnit.SECONDS); > > Thread.sleep(1000/requestsPerSecond); > > } > > executor.shutdownNow(); > > > > //Runnable class > > public abstract class PerformanceTest implements Runnable { > > > > protected Ignite ignite; > > private Queue<Integer> timings; > > private long startTime; > > protected IgniteCache<String,BinaryObject> cache; > > > > public PerformanceTest(Ignite ignite,Queue<Integer> timings) { > > super(); > > this.ignite = ignite; > > this.timings = timings; > > this.cache = ignite.cache("MyObjCache").withKeepBinary(); > > this.startTime = System.currentTimeMillis(); > > } > > > > @Override > > public void run() { > > runTest(); > > this.timings.add((int) (System.currentTimeMillis() - > this.startTime)); > > } > > > > public abstract void runTest(); > > > > } > > //Runnable subclass i.e. Test > > public class SQLQueryPerformanceTest extends PerformanceTest{ > > > > private static final String queryString = "select SUM(field1 * > field2)/SUM(field2) as perf from MyObj "; > > private final SqlFieldsQuery query; > > > > public SQLQueryPerformanceTest(Ignite ignite, Queue<Integer> > timings) { > > super(ignite, timings); > > this.query = new SqlFieldsQuery(queryString); > > } > > > > @Override > > public void runTest() { > > this.cache.query(query).getAll(); > > } > > } > > > > Ignite configuration: > > <bean abstract="true" id="ignite.cfg" > class="org.apache.ignite.configuration.IgniteConfiguration"> > > <!-- Set to true to enable distributed class loading for examples, > default is false. --> > > <property name="peerClassLoadingEnabled" value="true"/> > > > > <!-- Explicitly configure TCP discovery SPI to provide list of > initial nodes. --> > > <property name="discoverySpi"> > > <bean > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> > > <property name="ipFinder"> > > <bean > class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder"> > > <property name="addresses"> > > <list> > > <!-- In distributed environment, replace > with actual host IP address. --> > > <value>127.0.0.1:47500..47509</value> > > </list> > > </property> > > </bean> > > </property> > > </bean> > > </property> > > </bean> > > > > > > *From:* Yakov Zhdanov [mailto:[email protected]] > *Sent:* 03 August 2016 13:37 > *To:* [email protected] > *Subject:* Re: Ignite performance > > > > Manuel, > > > > The numbers you are seeing are pretty strange to me. > > > > Is it correct that you run your query in a loop giving enough time for the > whole cluster to warmup and only then take the final measurements? > > I also do not understand why CPU load is 400% which may be interpreted as > full (correct?). This means that at least 4 threads are busy on each node, > but when you broadcast your query it is processed with only 1 thread on > each node. Having this in mind you can try launching 1 node per 1 core on > each server - this will split your data set and will lower the amount of > work for each node. However question with high CPU utilization is still > open. Can you please provide stack for those threads if they are Ignite > threads. You can follow these instructions - > https://blogs.oracle.com/jiechen/entry/analysis_against_jvm_thread_dump > <https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.oracle.com_jiechen_entry_analysis-5Fagainst-5Fjvm-5Fthread-5Fdump&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=TmHAzydJEwZXF4nTEyGZO2lJF7c9EsVzP6DtLQQqOVQ&m=B78AXF0DliAyJs9kSHwzknrhfV3OSCx-EADJCFVe5Qs&s=eyvb9bLoPlaPiuMo7oQfUhIu_7_zwwdtYudCAxmnpyY&e=> > > > > Please tell me what machines you are running this test on. I would ask you > to do all measurements on hardware machines (not virtual) giving all > resources to Ignite. > > > > Please also share your code and configuration for cluster nodes. > > > --Yakov > > > > 2016-08-03 12:49 GMT+03:00 Piubelli, Manuel <[email protected]>: > > Hello, > > I am currently benchmarking Apache Ignite for a near real-time application > and simple operations seem to be excessively slow for a relatively small > sample size. *The following is giving the setup details and timings - > please see 2 questions at the bottom.* > > Setup: > > · Cache mode: Partitioned > > · Number of server nodes: 3 > > · CPUs: 4 per node (12) > > · Heap size: 2GB per node (6GB) > > The first use case is computing the weighted average over two fields of > the object at different rates. > > First method is to run a SQL style query: > > ... > > query = new SqlFieldsQuery("select SUM(field1*field2)/SUM(field2) from > MyObject"); > > cache.query(query).getAll(); > > .... > > The observed timings are: > > Cache: 500,000 Queries/second: 10 > Median: 428ms, 90th percentile: 13,929ms > > Cache: 500,000 Queries/second: 50 > Median: 191,465ms, 90th percentile: 402,285ms > > Clearly this is queuing up with an enormous latency (>400 ms), a simple > weighted average computation on a single jvm (4 Cores) takes 6 ms. > > The second approach is to use the IgniteCompute to broadcast Callables > across nodes and compute the weighted average on each node, reducing at the > caller, latency is only marginally better, throughput improves but still at > unusable levels. > > Cache: 500,000 Queries/second: 10 > Median: 408ms, 90th percentile: 507ms > > Cache: 500,000 Queries/second: 50 > Median: 114,155ms, 90th percentile: 237,521ms > > A few things i noticed during the experiment: > > · No disk swapping is happening > > · CPUs run at up to 400% > > · Query is split up in two different weighted averages (map > reduce) > > · Entries are evenly split across the nodes > > · No garbage collections are triggered with each heap size around > 500MB > > To my questions: > > 1. *Are these timings expected or is there some obvious setting i am > missing? I could not find benchmarks on similar operations.* > > 2. *What is the advised method to run fork-join style computations on > ignite without moving data?* > > Thank you > > Manuel > > >
