[issue37587] JSON loads performance improvement for long strings

2019-10-17 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-10-17 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 9c11029bb41caab5576f354fbf808a5e91325bb0 by Inada Naoki in branch 
'master':
bpo-37587: json: Use _PyUnicodeWriter when scanning string. (GH-15591)
https://github.com/python/cpython/commit/9c11029bb41caab5576f354fbf808a5e91325bb0


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-29 Thread Inada Naoki


Inada Naoki  added the comment:

@mpaolini  I don't have enough time in these weeks.  Would you try PR-15591?

I confirmed up to 4x speedup.  But I'm afraid about there is performance 
regression in simple cases.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-29 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +15267
pull_request: https://github.com/python/cpython/pull/15591

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini


Marco Paolini  added the comment:

ujson (https://github.com/esnme/ultrajson) instead is faster when decoding 
non-ascii in the same example above, so it is likely there is room for 
improvement...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini

Marco Paolini  added the comment:

ops sorry here's the right commands

python -m pyperf timeit -s 'import json;' -s  'c = "a"; s = json.dumps(c * 
(2**10 // (len(json.dumps(c)) - 2)))' 'json.loads(s)' -o ascii2k.json
python -m pyperf timeit -s 'import json;' -s  'c = "€"; s = json.dumps(c * 
(2**10 // (len(json.dumps(c)) - 2)))' 'json.loads(s)' -o nonascii2k.json

Mean +- std dev: [ascii2k] 3.69 us +- 0.05 us -> [nonascii2k] 12.4 us +- 0.1 
us: 3.35x slower (+235%)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini

Marco Paolini  added the comment:

also worth noting escape sequences for non-ascii characters are slower, even 
when encoded length is the same.

python -m pyperf timeit -s 'import json;' -s  'c = "€"; s = json.dumps(c * 
(2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o nonascii2k.json

python -m pyperf timeit -s 'import json;' -s  'c = "a"; s = json.dumps(c * 
(2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o ascii2k.json

Mean +- std dev: [ascii2k] 2.59 us +- 0.04 us -> [nonascii2k] 9.98 us +- 0.12 
us: 3.86x slower (+286%)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini


Marco Paolini  added the comment:

I also confirm Inada's patch further improves performance!

All my previous benchmarks were done with gcc and PGO optimizations performed 
only with test_json task... maybe this explains the weird results?

I tested the performance of new master 69f37bcb28d7cd78255828029f895958b5baf6ff 
with *all* PGO task reverting my original patch:

iff --git a/Modules/_json.c b/Modules/_json.c
index 112903ea57..9b63167276 100644
--- a/Modules/_json.c
+++ b/Modules/_json.c
@@ -442,7 +442,7 @@ scanstring_unicode(PyObject *pystr, Py_ssize_t end, int 
strict, Py_ssize_t *next
 if (d == '"' || d == '\\') {
 break;
 }
-if (d <= 0x1f && strict) {
+if (strict && d <= 0x1f) {
 raise_errmsg("Invalid control character at", pystr, next);
 goto bail;
 }

... and surprise...

Mean +- std dev: [69f37bcb28d7cd78255828029f895958b5baf6ff] 5.29 us +- 0.07 us 
-> [69f37bcb28d7cd78255828029f895958b5baf6ff-patched] 5.11 us +- 0.03 us: 1.04x 
faster (-4%)

should we revert my original patch entirely now? Or am I missing something?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-08 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 2a570af12ac5e4ac5575a68f8739b31c24d01367 by Inada Naoki in branch 
'master':
bpo-37587: optimize json.loads (GH-15134)
https://github.com/python/cpython/commit/2a570af12ac5e4ac5575a68f8739b31c24d01367


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-05 Thread Inada Naoki


Inada Naoki  added the comment:

And I confirmed performance improvement by my patch (GH-15134) on all of 4 
compilers.

$ ./python -m pyperf timeit  -s "import json; x = json.dumps({'k': '1' * 2 ** 
20})" "json.loads(x)"

old:  9211e2
new:  8a758f
opt2: 284e47

gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0

old:  Mean +- std dev: 721 us +- 0 us
new:  Mean +- std dev: 722 us +- 0 us
opt2: Mean +- std dev: 715 us +- 1 us


gcc-7 (Ubuntu 7.4.0-8ubuntu1) 7.4.0

old:  Mean +- std dev: 1.03 ms +- 0.00 ms
new:  Mean +- std dev: 726 us +- 0 us
opt2: Mean +- std dev: 715 us +- 0 us


clang version 7.0.1-8 (tags/RELEASE_701/final)

old:  Mean +- std dev: 721 us +- 1 us
new:  Mean +- std dev: 722 us +- 0 us
opt2: Mean +- std dev: 715 us +- 1 us


clang version 8.0.0-3 (tags/RELEASE_800/final)

old:  Mean +- std dev: 721 us +- 0 us
new:  Mean +- std dev: 721 us +- 1 us
opt2: Mean +- std dev: 715 us +- 0 us

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-05 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +14873
pull_request: https://github.com/python/cpython/pull/15134

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-05 Thread Inada Naoki


Inada Naoki  added the comment:

I tried without PGO and confirmed performance improved on GCC 7.2.0.
No change on other compiler versions.


$ ./python -m pyperf timeit  -s "import json; x = json.dumps({'k': '1' * 2 ** 
20})" "json.loads(x)"

old: 9211e2
new: 8a758f

gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0

old: Mean +- std dev: 721 us +- 0 us
new: Mean +- std dev: 722 us +- 0 us


gcc-7 (Ubuntu 7.4.0-8ubuntu1) 7.4.0

old: Mean +- std dev: 1.03 ms +- 0.00 ms
new: Mean +- std dev: 726 us +- 0 us


clang version 7.0.1-8 (tags/RELEASE_701/final)

old: Mean +- std dev: 721 us +- 1 us
new: Mean +- std dev: 722 us +- 0 us


clang version 8.0.0-3 (tags/RELEASE_800/final)

old: Mean +- std dev: 721 us +- 0 us
new: Mean +- std dev: 721 us +- 1 us

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki


Inada Naoki  added the comment:

This issue is very compiler sensitive.
Please don't report performance without compiler version and PGO option.

Now I'm facing strange behavior. pyperf reports slower time (1ms) for PGO 
builds, although disasm looks good.
But it's 2:30am already...  Please wait for a few days.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki


Inada Naoki  added the comment:

I'm sorry, I was wrong.  PGO did very nice job on all cases.
gcc allocates `c` to register in the hot loop.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki


Inada Naoki  added the comment:

I tested before, after, Steve's patch, and my patch with gcc 8.3.0 and PGO+LTO.
https://gist.github.com/methane/f6077bd1b0b04d40a9c790d9ed670a44#file-gcc-8-3-0-pgo-md

Surprisingly, there is no difference.  Even my patch didn't help register 
allocation when PGO is enabled.

I will run same test with other compilers & PGO (enabled|disabled).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Steve Dower


Steve Dower  added the comment:

> compiler stores the `c` to stack every time

The disassembly we looked at didn't do this, so it may just be certain 
compilers. Perhaps we can actually use the register keyword to help them out? :)


Here's a slightly altered one that doesn't require rescanning for the sake of 
the error message:

Py_UCS4 c = 0, minc = strict ? 0x20 : 0x00;
for (next = end; next < len; next++) {
c = PyUnicode_READ(kind, buf, next);
if (c == '"' || c == '\\' || c < minc) {
break;
}
}
if (c < minc) {
raise_errmsg("Invalid control character at", pystr, next);
goto bail;
}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki


Inada Naoki  added the comment:

Since scope of "c" is very wide, and there is even `` in the scope, compiler 
stores the `c` to stack every time on:

  c = PyUnicode_READ(kind, buf, next);

That is the bottleneck.  `if (strict && ...)` is not the bottleneck.

My patch used a new variable with tight scope so compiler can bypass store it 
to the stack.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Marco Paolini


Marco Paolini  added the comment:

@steve.dower yes, that's what made me discard that experiment we did during the 
sprint.

Ok will test your new patch soon

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Steve Dower


Steve Dower  added the comment:

Oh, we also need to capture "next"... but then again, since the success case is 
far more common, I'd be okay with scanning again to find it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Steve Dower


Steve Dower  added the comment:

While you're testing patches, can you try this version too?

Py_UCS4 c = 0, minc = 0x20;
for (next = end; next < len; next++) {
c = PyUnicode_READ(kind, buf, next);
if (c == '"' || c == '\\') {
break;
}
minc = c < minc ? c : minc;
}
if (strict && minc <= 0x1f) {
raise_errmsg("Invalid control character at", pystr, next);
goto bail;
}

When we tried this, the conditional expression became a "cmovl" operator which 
removed 3-4 branches from within the loop entirely, and it was 18% better than 
the baseline (which has now moved...)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Inada Naoki


Inada Naoki  added the comment:

Wait... there is no benchmark for the "minimum change".

I tested 4 compilers, and provide much better patch in 
https://bugs.python.org/issue37587#msg348114

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread miss-islington


miss-islington  added the comment:


New changeset 9265a877426af4fa5c44cc8482e0198806889350 by Miss Islington (bot) 
in branch '3.8':
bpo-37587: Make json.loads faster for long strings (GH-14752)
https://github.com/python/cpython/commit/9265a877426af4fa5c44cc8482e0198806889350


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Nick Coghlan


Nick Coghlan  added the comment:

I went ahead and merged the minimal PR and flagged it for backporting to 3.8 - 
it's an obviously beneficial change, that clearly does less work on each pass 
through the loop.

Even if you are doing non-strict parsing of a string that consists entirely of 
invalid characters, you'll get the same level of performance that the previous 
code offered for all strict parsing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread miss-islington


Change by miss-islington :


--
pull_requests: +14783
pull_request: https://github.com/python/cpython/pull/15022

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Nick Coghlan


Nick Coghlan  added the comment:


New changeset 8a758f5b99c5fc3fd32edeac049d7d4a4b7cc163 by Nick Coghlan (Marco 
Paolini) in branch 'master':
bpo-37587: Make json.loads faster for long strings (GH-14752)
https://github.com/python/cpython/commit/8a758f5b99c5fc3fd32edeac049d7d4a4b7cc163


--
nosy: +ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini


Marco Paolini  added the comment:

I forgot to mention, I was inspired by @christian.heimes 's talk at EuroPython 
2019 
https://ep2019.europython.eu/talks/es2pZ6C-introduction-to-low-level-profiling-and-tracing/
 (thanks!)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini


Marco Paolini  added the comment:

I am also working on a different patch that uses the "pcmpestri" SSE4 processor 
instruction, it looks like this for now.

While at it I realized there is (maybe) another potential speedup: avoiding the 
ucs4lib_find_max_char we do for each chunk of the string ( that entails 
scanning the string in memory one more time)... anyways that's another (much 
longer) story, probably for another issue?


```
diff --git a/Modules/_json.c b/Modules/_json.c
index 38beb6f50d..25b1cf4a99 100644
--- a/Modules/_json.c
+++ b/Modules/_json.c
@@ -400,6 +400,38 @@ _build_rval_index_tuple(PyObject *rval, Py_ssize_t idx) {
 Py_CLEAR(chunk); \
 }
 
+
+inline unsigned int
+_fast_search(const void *needle, unsigned int needle_len, const void 
*haystack, unsigned int haystack_len)
+{
+  unsigned int pos;
+  __asm__ __volatile__("movq (%1), %%xmm1;\n"
+   "mov %2, %%eax;\n"
+   "movq %3, %%r8;\n"
+   "mov %4, %%edx;\n"
+   ".intel_syntax noprefix;\n"
+   "loop: pcmpestri xmm1, [r8], 0;\n" /* 0 = equal any */
+   /* "pcmpestri %%mm1, (%%r8), $0;\n" /\* 0 = equal any 
*\/ */
+   ".att_syntax prefix;\n"
+   "cmp $15, %%ecx;\n"
+   "jbe found;\n"
+   "sub $16, %%edx;\n"
+   "jnge notfound;\n"
+   "add $16, %%r8;\n"
+   "jmp loop;\n"
+   "notfound: movl %4, %%ecx;\n"
+   "jmp exit;\n"
+   "found: mov %4, %%eax;\n"
+   "sub %%edx, %%eax;\n"
+   "add %%eax, %%ecx;\n"
+   "exit: mov %%ecx, %0;\n"
+   :"=m"(pos)
+   :"r"(needle), "r"(needle_len), "r"(haystack), 
"r"(haystack_len)
+   :"%eax", "%edx", "%ecx", "%r8", "%xmm1");
+  return pos;
+}
+
+
 static PyObject *
 scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t 
*next_end_ptr)
 {
@@ -431,17 +463,26 @@ scanstring_unicode(PyObject *pystr, Py_ssize_t end, int 
strict, Py_ssize_t *next
 PyErr_SetString(PyExc_ValueError, "end is out of bounds");
 goto bail;
 }
+char needle[2];
+needle[0] = '"';
+needle[1] = '\\';
 while (1) {
 /* Find the end of the string or the next escape */
 Py_UCS4 c = 0;
-for (next = end; next < len; next++) {
+if (kind == PyUnicode_1BYTE_KIND) {
+  next = _fast_search(needle, 2, buf+end, len-end) + end;
+  if (next < len)
 c = PyUnicode_READ(kind, buf, next);
-if (c == '"' || c == '\\') {
-break;
-}
-else if (strict && c <= 0x1f) {
-raise_errmsg("Invalid control character at", pystr, next);
-goto bail;
+} else {
+for (next = end; next < len; next++) {
+c = PyUnicode_READ(kind, buf, next);
+if (c == '"' || c == '\\') {
+break;
+}
+else if (strict && c <= 0x1f) {
+raise_errmsg("Invalid control character at", pystr, next);
+goto bail;
+}
 }
 }
 if (!(c == '"' || c == '\\')) {
```

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini


Marco Paolini  added the comment:

On gcc, running the tests above, the only change that is relevant for speedup 
is switching around the strict check. Removing the extra MOV related to the 
outer "c" variable is not significant (at least on gcc and the few tests I did)

Unfortunately I had to change the patch we did together during the sprint 
because it was breaking the strict check logic...

I updated my PR accordingly, kept only the bare minimum.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-23 Thread Steve Dower


Steve Dower  added the comment:

Marco has a newer patch with better performance that we came up with at the 
sprints, but apparently it hasn't been pushed yet. Hopefully he'll get that up 
soon and we can review it instead - the current PR wasn't as reliably good as 
initial testing suggested.

--
versions: +Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-18 Thread Inada Naoki


Change by Inada Naoki :


--
versions:  -Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-18 Thread Inada Naoki

Inada Naoki  added the comment:

Some compilers produce inefficient code for PR-14752.
I wrote another patch which is friendly to more compilers.


$ perf record ./python -m pyperf timeit  -s "import json; x = json.dumps({'k': 
'1' * 2 ** 20})" "json.loads(x)"

# PR-14752

gcc-7 (Ubuntu 7.4.0-8ubuntu1) 7.4.0
Mean +- std dev: 1.11 ms +- 0.01 ms

```
   │ scanstring_unicode():
   │ c = PyUnicode_READ(kind, buf, next);
 11.92 │270:   movzbl (%r15,%r8,1),%eax
   │ if (c == '"' || c == '\\') {
 27.97 │   cmp$0x22,%eax
   │ c = PyUnicode_READ(kind, buf, next);
 29.22 │   mov%eax,0x34(%rsp)
   │ if (c == '"' || c == '\\') {
  0.46 │ ↑ je ef
  0.02 │   cmp$0x5c,%eax
   │ ↑ je ef
   │ if (c <= 0x1f && invalid < 0) {
   │   cmp$0x1f,%eax
  0.00 │ ↓ ja 297
   │   test   %rdx,%rdx
   │   cmovs  %r8,%rdx
   │ for (next = end; next < len; next++) {
 29.49 │297:   add$0x1,%r8
   │   cmp%r8,%r12
  0.92 │ ↑ jne270
```

gcc-8 (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Mean +- std dev: 712 us +- 1 us
```
   │ c = PyUnicode_READ(kind, buf, next);
   │188:   movzbl 0x0(%rbp,%rbx,1),%eax
   │   mov%eax,0x34(%rsp)
   │ if (c == '"' || c == '\\') {
   │   cmp$0x22,%eax
   │ ↓ je 1d0
   │   nop
  0.00 │1a0:   cmp$0x5c,%eax
   │ ↓ je 1d0
   │ if (c <= 0x1f && invalid < 0) {
   │   cmp$0x1f,%eax
 49.84 │ ↓ ja 1b1
   │   test   %rdx,%rdx
   │   cmovs  %rbx,%rdx
   │ for (next = end; next < len; next++) {
   │1b1:   add$0x1,%rbx
  0.00 │   cmp%rbx,%r15
   │ ↑ je ff
   │ c = PyUnicode_READ(kind, buf, next);
  0.61 │   movzbl 0x0(%rbp,%rbx,1),%eax
 49.53 │   mov%eax,0x34(%rsp)
   │ if (c == '"' || c == '\\') {
  0.01 │   cmp$0x22,%eax
   │ ↑ jne1a0
  0.00 │   nop
```

clang version 7.0.1-8 (tags/RELEASE_701/final)
Mean +- std dev: 951 us +- 1 us

```
   │ c = PyUnicode_READ(kind, buf, next);
  9.76 │110:   movzbl (%r12,%r13,1),%eax
  9.47 │   mov%eax,0xc(%rsp)
  8.85 │   cmp$0x22,%eax
   │ if (c == '"' || c == '\\') {
   │ ↓ je 170
  8.78 │   cmp$0x5c,%al
   │ ↓ je 170
   │ if (c <= 0x1f && invalid < 0) {
  9.16 │   cmp$0x20,%al
  9.09 │   mov%rdx,%rcx
  9.16 │   cmovb  %r13,%rcx
  9.00 │   test   %rdx,%rdx
  8.78 │   cmovs  %rcx,%rdx
   │ for (next = end; next < len; next++) {
  9.09 │   add$0x1,%r13
   │   cmp%r15,%r13
  8.86 │ ↑ jl 110
   │ ↓ jmp170
   │   nop
```

clang version 8.0.0-3 (tags/RELEASE_800/final)
Mean +- std dev: 953 us +- 0 us

```
   │ c = PyUnicode_READ(kind, buf, next);
 10.04 │100:   movzbl (%r15,%r14,1),%eax
  9.27 │   mov%eax,0x4(%rsp)
  8.87 │   cmp$0x22,%eax
   │ if (c == '"' || c == '\\') {
   │ ↓ je 160
  8.78 │   cmp$0x5c,%al
   │ ↓ je 160
   │ if (c <= 0x1f && invalid < 0) {
  8.97 │   cmp$0x20,%al
  8.97 │   mov%rdx,%rcx
  8.89 │   cmovb  %r14,%rcx
  8.81 │   test   %rdx,%rdx
  9.14 │   cmovs  %rcx,%rdx
   │ for (next = end; next < len; next++) {
  9.25 │   add$0x1,%r14
   │   cmp%rdi,%r14
  8.99 │ ↑ jl 100
   │ ↓ jmp160
   │   nop
```

# modified

```
/* Find the end of the string or the next escape */
Py_UCS4 c;
{
Py_UCS4 d = 0;
for (next = end; next < len; next++) {
d = PyUnicode_READ(kind, buf, next);
if (d == '"' || d == '\\') {
break;
}
if (d <= 0x1f && strict) {
raise_errmsg("Invalid control character at", pystr, next);
goto bail;
}
}
c = d;
}
```


gcc-7 (Ubuntu 7.4.0-8ubuntu1) 7.4.0
Mean +- std dev: 708 us +- 1 us

```
   │ for (next = end; next < len; next++) {
 20.29 │170:   add$0x1,%rbx
  0.31 │   cmp%rbx,%r12
   │ ↓ je 1b0
   │ d = PyUnicode_READ(kind, buf, next);
 44.48 │179:   movzbl 0x0(%rbp,%rbx,1),%eax
   │ if (d == '"' || d == '\\') {
  5.38 │   cmp$0x22,%eax
   │ ↓ je 2c0
 23.82 │   cmp$0x5c,%eax
   │ ↓ je 2c0
   │ if (d <= 0x1f && strict) {
   │   cmp$0x1f,%eax
  5.68 │ ↑ ja 170
   │   test   %r13d,%r13d
 

[issue37587] JSON loads performance improvement for long strings

2019-07-18 Thread Inada Naoki


Inada Naoki  added the comment:

> 1. remove the mov entirely. It is not needed inside the loop and it is only 
> needed later, outside the loop to access the variable

How can we lazy "movDWORD PTR [rsp+0x44],eax"?

--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-14 Thread Steve Dower


Change by Steve Dower :


--
nosy: +steve.dower

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Marco Paolini  added the comment:

Here's the real world example

$ ls -hs events-100k.json 
84M events-100k.json

+---+-+-+
| Benchmark | vanilla-bpo-events-100k | patched-bpo-events-100k |
+===+=+=+
| timeit| 985 ms  | 871 ms: 1.13x faster (-12%) |
+---+-+-+

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Marco Paolini  added the comment:

Also on my real workload (loading 60GB jsonl file containing mostly strings) I 
measured a 10% improvement

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Change by Marco Paolini :


--
nosy: +ezio.melotti, rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Change by Marco Paolini :


--
keywords: +patch
pull_requests: +14547
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/14752

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


New submission from Marco Paolini :

I analysed the performance of json.loads in one production workload we have.

Spy-py tells me the majority of time is spent into C json module (see 
events.svg)

Digging deeper, linux perf tells me hottest loop (where 20%+ of the time is 
spent) in _json.scanstring_unicode, in this loop:

189:   movzx  eax,BYTE PTR [rbp+rbx*1+0x0]
   movDWORD PTR [rsp+0x44],eax
   cmpeax,0x22
   je 22f
   cmpeax,0x5c
   je 22f
   test   r13d,r13d
   je 180
   cmpeax,0x1f

which is related to this code in Modules/_json.c


/* Find the end of the string or the next escape */
Py_UCS4 c = 0;
for (next = end; next < len; next++) {
c = PyUnicode_READ(kind, buf, next);
if (c == '"' || c == '\\') {
break;
}
else if (strict && c <= 0x1f) {
raise_errmsg("Invalid control character at", pystr, next);
goto bail;
}
}

Two optimisations can be done:

1. remove the mov entirely. It is not needed inside the loop and it is only 
needed later, outside the loop to access the variable
2. switch around the strict check (in the second if) because strict defaults to 
1 so it will likely pass the test, while the likelyness of finding an invalid 
character is lower

Running the pyperformance json_loads benchmark I get:

++--+-+
| Benchmark  | vanilla-pyperf-pgo38 | patched-pyperf-pgo38|
++==+=+
| json_loads | 54.9 us  | 53.9 us: 1.02x faster (-2%) |
++--+-+


A micro benchmark on a 1MB long json string gives better results:

python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" 
"json.loads(x)"

+---++-+
| Benchmark | vanilla-1m | patched-1m  |
+===++=+
| timeit| 2.62 ms| 2.39 ms: 1.10x faster (-9%) |
+---++-+

--
components: Library (Lib)
files: events.svg
messages: 347832
nosy: christian.heimes, mpaolini, pablogsal
priority: normal
severity: normal
status: open
title: JSON loads performance improvement for long strings
type: performance
versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file48476/events.svg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com