[v8-dev] Poor performance in the V8 C++ API functions respnsible for processing strings.

cm Fri, 10 Jun 2016 01:45:14 -0700

Hello,

* This post is mainly for the attention of any V8 Developers who might be 
monitoring this group.  However, all thoughts and suggestions are welcome!


First some background.  I'm development an interface to a database 
specifically for the Node.js environment.  The database in question can 
operate as an embedded entity, so I am using the published Node.js/V8 API 
to communicate with it via its own C API.  I acknowledge that this is a 
slightly unusual approach in that the interface between JavaScript and 
other databases is often created over TCP infrastructure. However, with the 
embedded architecture there is, quite understandably, an expectation of 
high performance - certainly much higher than what could otherwise be 
achieved over TCP.

The requirement for high performance has led me to spend some time 
analyzing the throughput of various aspects of the V8 API.  As a result, I 
have found that there is a particular problem/bottleneck in marshaling 
string based data between the JavaScript environment and C/C++ - which of 
course is an essential part of establishing close-coupled lines of 
communication at this level.

The following simple benchmark illustrates this performance issue.  
Basically, I use the following simple Node.js/JavaScript code to call the 
'db.benchmark()' method 1000000000 times and record the time taken.  
Although the code implies that a connection to the database is made, no 
database is used, or even loaded, in these tests.  The source code to the 
various incarnations of the 'db.benchmark()' method are included together 
with the timing results obtained.

JavaScript Benchmark Code:

var my_database = require('db_api');
var db = new my_database.db_api();
var max = 1000000000;
var d1 = new Date();
var d1_ms = d1.getTime()
console.log("d1: " + d1.toISOString());
for (n = 0; n < max; n ++) {
   db.benchmark("Input String");  
}
var d2 = new Date();
var d2_ms = d2.getTime()
var diff = Math.abs(d1_ms - d2_ms)
console.log("\nd2: " + d2.toISOString());
console.log("diff: " + diff + " secs: " + (diff / 1000));


First, let’s create a baseline by removing the benchmark from the 
JavaScript code …

   //db.benchmark("Input String");  

Results:
d1: 2016-05-16T09:55:01.919Z
d2: 2016-05-16T09:55:06.589Z
diff: 4670 secs: 4.67


Now let’s create a second baseline by adding a benchmark call that does 
absolutely nothing.  The JavaScript call 'db.benchmark("Input String")' is 
reinstated but the C++ code of the benchmark method does absolutely nothing 
...

   static void benchmark(const FunctionCallbackInfo<Value>& args)
   {
      
      // interact with a database via its C API

      return;
   }

Results:
d1: 2016-05-16T09:59:18.915Z
d2: 2016-05-16T09:59:38.933Z
diff: 20018 secs: 20.018

This tells us that calling a C/C++ method/function that does absolutely 
nothing (no inputs or outputs to process) is moderately expensive on its 
own.


Next, let’s accept a single string argument and copy it to a C character 
buffer for use at the database API …

   static void Benchmark(const FunctionCallbackInfo<Value>& args)
   {
      char c_input[256];
      Local<String> input = args[0]->ToString();
      input->WriteUtf8(c_input);
      
      // interact with a database via its C API

      return;
   }


Results:
d1: 2016-05-16T10:02:54.355Z
d2: 2016-05-16T10:04:26.832Z
diff: 92477 secs: 92.477

We see a significant performance hit in simply marshaling a simple JS 
string into a C buffer.


Next, let’s generate an output for JavaScript derived from a string held in 
a C buffer …

   static void benchmark(const FunctionCallbackInfo<Value>& args)
   {
      char c_output[256];
      Isolate* isolate = args.GetIsolate();
      HandleScope scope(isolate);

      // interact with a database via its C API

      strcpy(c_output, "Output String");
      Local<String> output = String::NewFromUtf8(isolate, c_output);
      args.GetReturnValue().Set(output);
      return;
   }

Results:
d1: 2016-05-16T10:14:30.399Z
d2: 2016-05-16T10:17:16.905Z
diff: 166506 secs: 166.506

Creating a JavaScript string resource from a C buffer is also expensive.


Finally, let's put it all together by accepting a string argument and 
returning a string output …

   static void benchmark(const FunctionCallbackInfo<Value>& args)
   {
      char c_input[256], c_output[256];
      Isolate* isolate = args.GetIsolate();
      HandleScope scope(isolate);
      Local<String> input = args[0]->ToString();
      input->WriteUtf8(c_input);
      
      // interact with a database via its C API

      strcpy(c_output, "Output String");
      Local<String> output = String::NewFromUtf8(isolate, c_output);
      args.GetReturnValue().Set(output);
      return;
   }

Results:
d1: 2016-05-16T10:28:47.164Z
d2: 2016-05-16T10:32:58.709Z
diff: 251545 secs: 251.545

This last experiment is fairly representative of the 'real world': string 
based data sent to the database (update operations) and string based data 
returned (retrieval operations).


As you can see, there is a significant cost in marshaling data between C 
string buffers and internal V8 string data types/constructs (and vice 
versa).  Is there an alternative way of doing this?  On the input side, 
simply getting a pointer to the raw input data would probably work fine for 
the purpose of interacting with an outgoing C API.  Likewise, on the output 
side, is there a faster way to generate V8 strings from C character buffers?

Alternatively, are there plans to improve the performance of the existing 
functionality?

When developing this software I was expecting calls to the database 
(particularly update operations) to be the rate limiting part.  However, 
differential benchmarks have demonstrated that operations on the database 
are, on their own, several orders of magnitude faster than the mechanics of 
marshaling data between the V8/JavaScript and C/C++ environment.  This was 
a bit of a surprise to be honest.  Finally, for the curious, what's the 
database?  It's InterSystems Caché. 

Many thanks for reading this and thanks in advance for any thoughts or 
suggestions!

Chris.

Director
M/Gateway Developments Ltd
http://www.mgateway.com

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[v8-dev] Poor performance in the V8 C++ API functions respnsible for processing strings.

Reply via email to