Hi again,
Andrew fixed the "biggest" memory consumption issue for a few weeks ago.
Now I've tested with the same configuration but enabled x509 certificate
authentication.
I've run pluto via valgrind for a few days. Pluto was build from the
master branch (last commit 72f2f4fac52d4fe661c696e965cca756d3ce78a9).
In this mail I've highlighted the "definitely lost" entries from the
valgrind log.
I've tried to fix a couple of them, but with mixed result. Not sure if I
can fix the more intricate ones based on my current knowledge of pluto.
The full valgrind log can be found via the following link:
https://s3-eu-west-1.amazonaws.com/ingate-dump/pluto_valgrind2.txt
1)
==2991== 17 bytes in 1 blocks are definitely lost in loss record 59 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x1BB9C4: lsw_nss_setup (lswnss.c:50)
==2991== by 0x126169: pluto_init_nss (plutomain.c:412)
==2991== by 0x126169: main (plutomain.c:1488)
lswnss.c:50
nssdir = alloc_bytes(strlen(configdir) + strlen(sql) + 1, "(ignore)
nssdir");
nssdir is only free'd in the error path.
2)
==2991== 32 bytes in 1 blocks are definitely lost in loss record 133 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x192E2E: chunk_from_symkey (crypt_symkey.c:388)
==2991== by 0x19540E: calc_skeyseed_v2 (ikev2_prf.c:187)
==2991== by 0x19540E: calc_dh_v2 (ikev2_prf.c:260)
==2991== by 0x1A1B47: pluto_do_crypto_op (pluto_crypt.c:270)
==2991== by 0x1A2644: pluto_crypto_helper (pluto_crypt.c:365)
==2991== by 0x1A28C0: pluto_helper_thread (pluto_crypt.c:1011)
==2991== by 0x5A4BDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2991== by 0x685A1AC: clone (in /usr/lib64/libc-2.17.so)
crypt_symkey.c:388
void *bytes = alloc_bytes(wrapped_key.len, name);
The allocated bytes that is returned via the chunk_t is not free'd.
3)
==2991== 32 bytes in 1 blocks are definitely lost in loss record 134 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x192E2E: chunk_from_symkey (crypt_symkey.c:388)
==2991== by 0x195443: calc_skeyseed_v2 (ikev2_prf.c:192)
==2991== by 0x195443: calc_dh_v2 (ikev2_prf.c:260)
==2991== by 0x1A1B47: pluto_do_crypto_op (pluto_crypt.c:270)
==2991== by 0x1A2644: pluto_crypto_helper (pluto_crypt.c:365)
==2991== by 0x1A28C0: pluto_helper_thread (pluto_crypt.c:1011)
==2991== by 0x5A4BDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2991== by 0x685A1AC: clone (in /usr/lib64/libc-2.17.so)
crypt_symkey.c:388
void *bytes = alloc_bytes(wrapped_key.len, name);
The allocated bytes that is returned via the chunk_t is not free'd.
4)
==2991== 256 bytes in 2 blocks are definitely lost in loss record 566 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x8E7827F: PORT_Alloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x552DEC2: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x55303C8: PK11_DeriveWithTemplate (in /usr/lib64/libnss3.so)
==2991== by 0x55307B1: PK11_DeriveWithFlags (in /usr/lib64/libnss3.so)
==2991== by 0x192C2E: symkey_from_symkey (crypt_symkey.c:302)
==2991== by 0x19542C: calc_skeyseed_v2 (ikev2_prf.c:190)
==2991== by 0x19542C: calc_dh_v2 (ikev2_prf.c:260)
==2991== by 0x1A1B47: pluto_do_crypto_op (pluto_crypt.c:270)
==2991== by 0x1A2644: pluto_crypto_helper (pluto_crypt.c:365)
==2991== by 0x1A28C0: pluto_helper_thread (pluto_crypt.c:1011)
==2991== by 0x5A4BDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2991== by 0x685A1AC: clone (in /usr/lib64/libc-2.17.so)
crypt_symkey.c:302
PK11SymKey *result = PK11_DeriveWithFlags(base_key, derive, ¶m,
target, operation,
key_size, flags);
The returned result isn't free'd
5)
==2991== 288 (256 direct, 32 indirect) bytes in 2 blocks are definitely
lost in loss record 572 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x8E7827F: PORT_Alloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x552DEC2: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x55303C8: PK11_DeriveWithTemplate (in /usr/lib64/libnss3.so)
==2991== by 0x5530720: PK11_Derive (in /usr/lib64/libnss3.so)
==2991== by 0x192A6D: merge_symkey_bytes (crypt_symkey.c:220)
==2991== by 0x1938FE: symkey_from_bytes (crypt_symkey.c:420)
==2991== by 0x13D204: final_symkey (ike_alg_nss_prf_ops.c:181)
==2991== by 0x193E9E: crypt_prf_final_symkey (crypt_prf.c:128)
==2991== by 0x194C3B: ikev2_prfplus (ikev2_prf.c:292)
==2991== by 0x195965: ikev2_child_sa_keymat (ikev2_prf.c:390)
==2991== by 0x192462: ikev2_derive_child_keys (ikev2_crypto.c:147)
crypt_symkey.c:220
PK11SymKey *result = PK11_Derive(base_key, derive, &data_param, target,
operation, key_size);
It seems that the result is free'd in ikev2_crypto.c:160
release_symkey(__func__, "keymat", &keymat);
I guess there are still references to the PK11SymKey keymat. Not sure.
6)
==2991== 800 (512 direct, 288 indirect) bytes in 4 blocks are definitely
lost in loss record 638 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x8E7827F: PORT_Alloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x552DEC2: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x55303C8: PK11_DeriveWithTemplate (in /usr/lib64/libnss3.so)
==2991== by 0x5530720: PK11_Derive (in /usr/lib64/libnss3.so)
==2991== by 0x192A6D: merge_symkey_bytes (crypt_symkey.c:220)
==2991== by 0x1938FE: symkey_from_bytes (crypt_symkey.c:420)
==2991== by 0x13D204: final_symkey (ike_alg_nss_prf_ops.c:181)
==2991== by 0x193E9E: crypt_prf_final_symkey (crypt_prf.c:128)
==2991== by 0x194CE3: ikev2_prfplus (ikev2_prf.c:304)
==2991== by 0x194F8B: ikev2_ike_sa_keymat (ikev2_prf.c:366)
==2991== by 0x195272: calc_skeyseed_v2 (ikev2_prf.c:142)
==2991== by 0x195272: calc_dh_v2 (ikev2_prf.c:260)
Similar to 2, 3 and 4?
7)
==2991== 1,636 (56 direct, 1,580 indirect) bytes in 1 blocks are
definitely lost in loss record 692 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x17CEC4: ikev2_collect_fragment (ikev2.c:761)
==2991== by 0x17F8F0: process_v2_packet (ikev2.c:1252)
==2991== by 0x19FD77: comm_handle (demux.c:234)
==2991== by 0x19FD77: comm_handle_cb (demux.c:193)
==2991== by 0x6329A13: event_base_loop (in
/usr/lib64/libevent-2.0.so.5.1.9)
==2991== by 0x165568: main_loop (server.c:663)
==2991== by 0x165568: call_server (server.c:798)
==2991== by 0x126499: main (plutomain.c:1714)
ikev2.c:761
frag = alloc_thing(struct ikev2_frag, "ikev2_frag");
Should frag be free'd in ikev2_collect_fragment?
8)
==2991== 2,288 (2,048 direct, 240 indirect) bytes in 1 blocks are
definitely lost in loss record 726 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x908F3AE: PL_ArenaAllocate (in /usr/lib64/libplds4.so)
==2991== by 0x8E7851F: PORT_ArenaAlloc_Util (in
/usr/lib64/libnssutil3.so)
==2991== by 0x8E76BC7: SECITEM_CopyItem_Util (in
/usr/lib64/libnssutil3.so)
==2991== by 0x55AC4B3: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x5591BD7: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x555373F: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x557E716: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x557EE3A: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x5581666: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x5506DA1: CERT_PKIXVerifyCert (in /usr/lib64/libnss3.so)
==2991== by 0x1AC665: vfy_chain_pkix (nss_cert_vfy.c:383)
==2991== by 0x1AC665: verify_and_cache_chain (nss_cert_vfy.c:490)
Not sure what's going on here.
9)
==2991== 4,152 bytes in 1 blocks are definitely lost in loss record 751
of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x19FEB8: alloc_md (msgdigest.c:86)
==2991== by 0x1802D4: ikev2_crypto_start (ikev2_parent.c:318)
==2991== by 0x18343F: ikev2parent_outI1 (ikev2_parent.c:619)
==2991== by 0x15548B: initiate_a_connection (initiate.c:331)
==2991== by 0x1566D2: initiate_connection (initiate.c:357)
==2991== by 0x19DA5E: whack_process (rcv_whack.c:521)
==2991== by 0x19E634: whack_handle (rcv_whack.c:738)
==2991== by 0x6329A13: event_base_loop (in
/usr/lib64/libevent-2.0.so.5.1.9)
==2991== by 0x165568: main_loop (server.c:663)
==2991== by 0x165568: call_server (server.c:798)
ikev2_parent.c:318
fake_md = alloc_md();
The fake_md isn't free'd.
Not sure if this is related to commit:
123f7a32699dd3838d296083610418d84b9064ef
10)
==2991== 9,248 (8,304 direct, 944 indirect) bytes in 2 blocks are
definitely lost in loss record 771 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x19FEB8: alloc_md (msgdigest.c:86)
==2991== by 0x1802D4: ikev2_crypto_start (ikev2_parent.c:318)
==2991== by 0x18343F: ikev2parent_outI1 (ikev2_parent.c:619)
==2991== by 0x168179: timer_event_cb (timer.c:750)
==2991== by 0x6329A13: event_base_loop (in
/usr/lib64/libevent-2.0.so.5.1.9)
==2991== by 0x165568: main_loop (server.c:663)
==2991== by 0x165568: call_server (server.c:798)
==2991== by 0x126499: main (plutomain.c:1714)
Same as 9?
11)
==2991== 11,440 (360 direct, 11,080 indirect) bytes in 5 blocks are
definitely lost in loss record 778 of 805
==2991== at 0x4C2B975: calloc (vg_replace_malloc.c:711)
==2991== by 0x8E78351: PORT_ZAlloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x8E783AB: PORT_NewArena_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x550B7B9: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x550E4C5: PK11_GenerateKeyPairWithOpFlags (in
/usr/lib64/libnss3.so)
==2991== by 0x550E8C8: PK11_GenerateKeyPairWithFlags (in
/usr/lib64/libnss3.so)
==2991== by 0x55083FD: SECKEY_CreateDHPrivateKey (in
/usr/lib64/libnss3.so)
==2991== by 0x13B73C: nss_modp_calc_ke (ike_alg_nss_modp.c:71)
==2991== by 0x1A3894: calc_ke (crypt_ke.c:68)
==2991== by 0x1A1BCF: pluto_do_crypto_op (pluto_crypt.c:255)
==2991== by 0x1A2644: pluto_crypto_helper (pluto_crypt.c:365)
==2991== by 0x1A28C0: pluto_helper_thread (pluto_crypt.c:1011)
ike_alg_nss_modp.c:71
*privk = SECKEY_CreateDHPrivateKey(&dh_params,pubk,
lsw_return_nss_password_file_info());
privk doesn't seem to get free'd.
12)
==2991== 13,728 (432 direct, 13,296 indirect) bytes in 6 blocks are
definitely lost in loss record 784 of 805
==2991== at 0x4C2B975: calloc (vg_replace_malloc.c:711)
==2991== by 0x8E78351: PORT_ZAlloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x8E783AB: PORT_NewArena_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x550C07C: ??? (in /usr/lib64/libnss3.so)
==2991== by 0x550E595: PK11_GenerateKeyPairWithOpFlags (in
/usr/lib64/libnss3.so)
==2991== by 0x550E8C8: PK11_GenerateKeyPairWithFlags (in
/usr/lib64/libnss3.so)
==2991== by 0x55083FD: SECKEY_CreateDHPrivateKey (in
/usr/lib64/libnss3.so)
==2991== by 0x13B73C: nss_modp_calc_ke (ike_alg_nss_modp.c:71)
==2991== by 0x1A3894: calc_ke (crypt_ke.c:68)
==2991== by 0x1A1BCF: pluto_do_crypto_op (pluto_crypt.c:255)
==2991== by 0x1A2644: pluto_crypto_helper (pluto_crypt.c:365)
==2991== by 0x1A28C0: pluto_helper_thread (pluto_crypt.c:1011)
Same as 11?
13)
==2991== 18,868 (16,608 direct, 2,260 indirect) bytes in 4 blocks are
definitely lost in loss record 791 of 805
==2991== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==2991== by 0x1B1697: alloc_bytes_raw (alloc.c:95)
==2991== by 0x1B1B0D: alloc_bytes (alloc.c:201)
==2991== by 0x19FEB8: alloc_md (msgdigest.c:86)
==2991== by 0x19FCB4: comm_handle (demux.c:230)
==2991== by 0x19FCB4: comm_handle_cb (demux.c:193)
==2991== by 0x6329A13: event_base_loop (in
/usr/lib64/libevent-2.0.so.5.1.9)
==2991== by 0x165568: main_loop (server.c:663)
==2991== by 0x165568: call_server (server.c:798)
==2991== by 0x126499: main (plutomain.c:1714)
demux.c:230
md = alloc_md();
md->iface = ifp;
if (read_packet(md))
process_packet(&md);
release_any_md(&md);
md seems to get free'd. Not sure what's going on here.
14)
==2991== 42,656 bytes in 5,332 blocks are definitely lost in loss record
795 of 805
==2991== at 0x4C2B975: calloc (vg_replace_malloc.c:711)
==2991== by 0x8E78351: PORT_ZAlloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x553C911: CERT_ImportCerts (in /usr/lib64/libnss3.so)
==2991== by 0x1AC090: crt_tmp_import (nss_cert_vfy.c:194)
==2991== by 0x1AC090: verify_and_cache_chain (nss_cert_vfy.c:474)
==2991== by 0x130526: pluto_process_certs.isra.3 (x509.c:766)
==2991== by 0x130BD4: ikev2_decode_cert (x509.c:877)
==2991== by 0x17D662: ikev2_decode_peer_id_and_certs (ikev2.c:1300)
==2991== by 0x184E6F: ikev2_parent_inI2outR2_tail (ikev2_parent.c:3438)
==2991== by 0x184E6F: ikev2_parent_inI2outR2_continue
(ikev2_parent.c:3381)
==2991== by 0x1A239A: handle_helper_answer (pluto_crypt.c:910)
==2991== by 0x6329A13: event_base_loop (in
/usr/lib64/libevent-2.0.so.5.1.9)
==2991== by 0x165568: main_loop (server.c:663)
==2991== by 0x165568: call_server (server.c:798)
==2991== by 0x126499: main (plutomain.c:1714)
See 8.
15)
==2991== 65,088 bytes in 8,136 blocks are definitely lost in loss record
802 of 805
==2991== at 0x4C2B975: calloc (vg_replace_malloc.c:711)
==2991== by 0x8E78351: PORT_ZAlloc_Util (in /usr/lib64/libnssutil3.so)
==2991== by 0x553C911: CERT_ImportCerts (in /usr/lib64/libnss3.so)
==2991== by 0x1AC090: crt_tmp_import (nss_cert_vfy.c:194)
==2991== by 0x1AC090: verify_and_cache_chain (nss_cert_vfy.c:474)
==2991== by 0x130526: pluto_process_certs.isra.3 (x509.c:766)
==2991== by 0x130BD4: ikev2_decode_cert (x509.c:877)
==2991== by 0x17D662: ikev2_decode_peer_id_and_certs (ikev2.c:1300)
==2991== by 0x185800: ikev2parent_inR2 (ikev2_parent.c:4081)
==2991== by 0x17F93B: process_v2_packet (ikev2.c:1268)
==2991== by 0x19FD77: comm_handle (demux.c:234)
==2991== by 0x19FD77: comm_handle_cb (demux.c:193)
==2991== by 0x6329A13: event_base_loop (in
/usr/lib64/libevent-2.0.so.5.1.9)
==2991== by 0x165568: main_loop (server.c:663)
==2991== by 0x165568: call_server (server.c:798)
Same as 14?
Thanks and regards,
Erik
On 2017-03-03 03:12, Erik Andersson wrote:
Thanks Andrew!
I've re-built with your recent commits and it seems that your efforts
have solved the issue.
I've run for a few hours and the memory consumption doesn't rise :)
Regards,
Erik
On 2017-03-02 22:30, Andrew Cagney wrote:
I didn't do any archaeology
I think two things were at play.
The first change:
commit 02768e7cd76a465443f13f3cbf6864db268f0b52
Date: Mon Nov 12 21:18:54 2007 -0500
#845 - beginning of DH calculation and SKEYSEED derivation.
would copy skeyseed into a wirechunk:
setwirechunk_fromchunk(skr->skeyseed, skeyseed, skr);
but never copied the field back out. This set the trap.
(I suspect this was ok as deleting the wirechunk SKR would reclaim the
memory).
The second change:
commit 944c9a31c1e4dff1ab92cdf9c85629b7270a6157
Date: Tue Feb 18 04:15:35 2014 -0500
smoother nss:
Many NSS pointers were passed through chunks making to code confusing
and brittle. No longer.
A side-effect is that this change makes it harder to ditch NSS.
replaced the wirechunk with individual pointer moves. Since the old
code didn't extract the field skeyseed, the new code didn't either;
falling into the trap :-)
I've pushed a fix. For 3.19, something like:
@@ -143,6 +142,7 @@ static void calc_skeyseed_v2(struct pcr_skeyid_q
*skq,
PK11SymKey *finalkey = ikev2_ike_sa_keymat(skq->prf, skeyseed_k,
ni, nr, spii, spir,
total_keysize);
+ free_any_symkey("skeyseed_", &skeyseed_k);
size_t next_byte = 0;
might be sufficient (but I've not tested it).
Andrew
_______________________________________________
Swan-dev mailing list
[email protected]
https://lists.libreswan.org/mailman/listinfo/swan-dev
_______________________________________________
Swan-dev mailing list
[email protected]
https://lists.libreswan.org/mailman/listinfo/swan-dev