Hi Yakov,

Thanks again for your help, I guess my goal here was to measure how a cluster 
behaves if requests come at a certain rate (hence including the 
queuing/slowdown was intentional), the Executor has 128 threads just to make 
sure Ignite’s pipes are kept full, which should be equivalent of starting a 
thread at each call (given the ignite cluster has only 12 CPUs), is there 
something I am missing?

For comparison purposes I have now measured time purely around the query 
(vertical latency), here is what I see at different # of elements in the cache:

Elements

50

75

90

95

99

100

5000

142

143

144

146

155

157

10000

152

153

154

158

175

177

50000

154

157

161

166

178

179

100000

167

170

180

185

188

189


There must be some significant overhead as the increase of timing for more 
elements is very flat. MyObject and CacheConfig enclosed here:

public class MyObject implements Serializable{

       @QuerySqlField
       private long id;

       @QuerySqlField
       private String objectID;

       @QuerySqlField(index = true)
       private String field0;

       @QuerySqlField
       private long field1;

       @QuerySqlField
       private double field2;

       public MyObject (){
       }

       public MyObject (long id,String objectID, String field0, long field1, 
double field2) {
              super();
              this.id = id;
              this. objectID = objectID;
              this. field0= field0;
              this. field1= field1;
              this. field2= field2;
              ;
       }
// Getters for each field
}

public static CacheConfiguration<String, Ord> cache() {
       CacheConfiguration<String, Ord> cfg = new CacheConfiguration<String, 
Ord>("ord");
       // Index the words and their counts,
       // so we can use them for fast SQL querying.
       cfg.setIndexedTypes(String.class, MyObject.class);
       cfg.setCacheMode(CacheMode.PARTITIONED);
       // Sliding window of 5 seconds.
       cfg.setExpiryPolicyFactory(FactoryBuilder.factoryOf(new 
EternalExpiryPolicy()));
return cfg;
}

Thanks
Manuel

From: Yakov Zhdanov [mailto:[email protected]]
Sent: 03 August 2016 15:28
To: Piubelli, Manuel [ICG-IT]
Cc: [email protected]
Subject: Re: Ignite performance

How many concurrent queries do you have? Or in other words - how many threads 
does your executor have? If it has several ones then I understand why CPU load 
goes up to max.

I am also not sure about the measurements. All your queries get scheduled 
immediately and initialize startTime at schedule, but the time they stay queued 
is still accounted. Correct? If yes, I would suggest you rewrite your 
benchmark. Just start N threads and make each thread submit your query 
measuring the time.

Please set "peerClassLoading" to false. Please provide cache configuration as 
well and MyObject.java

--Yakov

2016-08-03 16:47 GMT+03:00 Piubelli, Manuel 
<[email protected]<mailto:[email protected]>>:
Hi Yakov,

Thank you very much for your reply, addressing a few questions:

> “Is it correct that you run your query in a loop”

The query is run asynchronously, as I want to simulate multiple clients hitting 
the cluster at the same time (see queries/second) so its technically not a loop 
but just scheduled callables.

> “giving enough time for the whole cluster to warmup and only then take the 
> final measurements?”

I have a non-measured warmup round firing queries for 5 minutes before starting 
the measurements.

> I also do not understand why CPU load is 400% which may be interpreted as 
> full (correct?). This means that at least 4 threads are busy on each node, 
> but when you broadcast your query it is processed with only 1 thread on each 
> node.

Yes, I noticed up to 400% which is full (my box has 4 Cores), I would explain 
this with the fact that the ignite cluster is hit with other requests while it 
is processing the first, would that explain it?

>Having this in mind you can try launching 1 node per 1 core on each server - 
>this will split your data set and will lower the amount of work for each node.

Would this mean that instances compete more for the same cores in a high 
throughput scenario? Is there a way to have one node restricted to one process 
– or should I lower the size of the thread pool?

Code (SQL example) :
//Async Test Runner
final Queue<Integer> timings = new LinkedBlockingQueue<Integer>();
              for (long i = 0; i < requestsPerSecond * testTime; i++) {
                     PerformanceTest test = 
PerformanceTestFactory.getIgnitePerformanceTest(ignite,testName,timings);
                     executor.schedule(test, 0, TimeUnit.SECONDS);
                     Thread.sleep(1000/requestsPerSecond);
              }
              executor.shutdownNow();

//Runnable class
public abstract class PerformanceTest implements Runnable {

       protected Ignite ignite;
       private Queue<Integer> timings;
       private long startTime;
       protected IgniteCache<String,BinaryObject> cache;

       public PerformanceTest(Ignite ignite,Queue<Integer> timings) {
              super();
              this.ignite = ignite;
              this.timings = timings;
              this.cache = ignite.cache("MyObjCache").withKeepBinary();
              this.startTime = System.currentTimeMillis();
       }

       @Override
       public void run() {
              runTest();
              this.timings.add((int) (System.currentTimeMillis() - 
this.startTime));
       }

       public abstract void runTest();

}
//Runnable subclass i.e. Test
public class SQLQueryPerformanceTest extends PerformanceTest{

       private static final String queryString = "select SUM(field1 * 
field2)/SUM(field2) as perf from MyObj ";
       private final SqlFieldsQuery query;

       public SQLQueryPerformanceTest(Ignite ignite, Queue<Integer> timings) {
              super(ignite, timings);
              this.query = new SqlFieldsQuery(queryString);
       }

       @Override
       public void runTest() {
              this.cache.query(query).getAll();
       }
}

Ignite configuration:
<bean abstract="true" id="ignite.cfg" 
class="org.apache.ignite.configuration.IgniteConfiguration">
        <!-- Set to true to enable distributed class loading for examples, 
default is false. -->
        <property name="peerClassLoadingEnabled" value="true"/>

        <!-- Explicitly configure TCP discovery SPI to provide list of initial 
nodes. -->
        <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    <bean 
class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                        <property name="addresses">
                            <list>
                                <!-- In distributed environment, replace with 
actual host IP address. -->
                                <value>127.0.0.1:47500..47509</value>
                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>
    </bean>


From: Yakov Zhdanov [mailto:[email protected]<mailto:[email protected]>]
Sent: 03 August 2016 13:37
To: [email protected]<mailto:[email protected]>
Subject: Re: Ignite performance

Manuel,

The numbers you are seeing are pretty strange to me.

Is it correct that you run your query in a loop giving enough time for the 
whole cluster to warmup and only then take the final measurements?
I also do not understand why CPU load is 400% which may be interpreted as full 
(correct?). This means that at least 4 threads are busy on each node, but when 
you broadcast your query it is processed with only 1 thread on each node. 
Having this in mind you can try launching 1 node per 1 core on each server - 
this will split your data set and will lower the amount of work for each node. 
However question with high CPU utilization is still open. Can you please 
provide stack for those threads if they are Ignite threads. You can follow 
these instructions - 
https://blogs.oracle.com/jiechen/entry/analysis_against_jvm_thread_dump<https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.oracle.com_jiechen_entry_analysis-5Fagainst-5Fjvm-5Fthread-5Fdump&d=CwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=TmHAzydJEwZXF4nTEyGZO2lJF7c9EsVzP6DtLQQqOVQ&m=B78AXF0DliAyJs9kSHwzknrhfV3OSCx-EADJCFVe5Qs&s=eyvb9bLoPlaPiuMo7oQfUhIu_7_zwwdtYudCAxmnpyY&e=>

Please tell me what machines you are running this test on. I would ask you to 
do all measurements on hardware machines (not virtual) giving all resources to 
Ignite.

Please also share your code and configuration for cluster nodes.

--Yakov

2016-08-03 12:49 GMT+03:00 Piubelli, Manuel 
<[email protected]<mailto:[email protected]>>:

Hello,
I am currently benchmarking Apache Ignite for a near real-time application and 
simple operations seem to be excessively slow for a relatively small sample 
size. The following is giving the setup details and timings - please see 2 
questions at the bottom.
Setup:
•         Cache mode: Partitioned
•         Number of server nodes: 3
•         CPUs: 4 per node (12)
•         Heap size: 2GB per node (6GB)
The first use case is computing the weighted average over two fields of the 
object at different rates.
First method is to run a SQL style query:
...
query = new SqlFieldsQuery("select SUM(field1*field2)/SUM(field2) from 
MyObject");
cache.query(query).getAll();
....
The observed timings are:
Cache: 500,000 Queries/second: 10
Median: 428ms, 90th percentile: 13,929ms
Cache: 500,000 Queries/second: 50
Median: 191,465ms, 90th percentile: 402,285ms
Clearly this is queuing up with an enormous latency (>400 ms), a simple 
weighted average computation on a single jvm (4 Cores) takes 6 ms.
The second approach is to use the IgniteCompute to broadcast Callables across 
nodes and compute the weighted average on each node, reducing at the caller, 
latency is only marginally better, throughput improves but still at unusable 
levels.
Cache: 500,000 Queries/second: 10
Median: 408ms, 90th percentile: 507ms
Cache: 500,000 Queries/second: 50
Median: 114,155ms, 90th percentile: 237,521ms
A few things i noticed during the experiment:
•         No disk swapping is happening
•         CPUs run at up to 400%
•         Query is split up in two different weighted averages (map reduce)
•         Entries are evenly split across the nodes
•         No garbage collections are triggered with each heap size around 500MB
To my questions:
1.    Are these timings expected or is there some obvious setting i am missing? 
I could not find benchmarks on similar operations.
2.    What is the advised method to run fork-join style computations on ignite 
without moving data?

Thank you

Manuel


Reply via email to