Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks
On 5/22/2018 8:33 AM, Tariq Toukan wrote: On 18/05/2018 12:45 AM, Qing Huang wrote: On 5/17/2018 2:14 PM, Eric Dumazet wrote: On 05/17/2018 01:53 PM, Qing Huang wrote: When a system is under memory presure (high usage with fragments), the original 256KB ICM chunk allocations will likely trigger kernel memory management to enter slow path doing memory compact/migration ops in order to complete high order memory allocations. When that happens, user processes calling uverb APIs may get stuck for more than 120s easily even though there are a lot of free pages in smaller chunks available in the system. Syslog: ... Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task oracle_205573_e:205573 blocked for more than 120 seconds. ... NACK on this patch. You have been asked repeatedly to use kvmalloc() This is not a minor suggestion. Take a look athttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8c13f2271ec5178c52fbde072ec7b562651ed9d Would you please take a look at how table->icm is being used in the mlx4 driver? It's a meta data used for individual pointer variable referencing, not as data frag or in/out buffer. It has no need for contiguous phy. memory. Thanks. NACK. This would cause a degradation when iterating the entries of table->icm. For example, in mlx4_table_get_range. E.g. int mlx4_table_get_range(struct mlx4_dev *dev, struct mlx4_icm_table *table, u32 start, u32 end) { int inc = MLX4_TABLE_CHUNK_SIZE / table->obj_size; int err; u32 i; for (i = start; i <= end; i += inc) { err = mlx4_table_get(dev, table, i); if (err) goto fail; } return 0; ... } E.g. mtt obj is 8 bytes, so a 4KB ICM block would have 512 mtt objects. So you will have to allocate more 512 mtt objects in order to have table->icm pointer to increment by 1 to fetch next pointer value. So 256K mtt objects are needed in order to traverse table->icm pointer across a page boundary in the call stacks. Considering mlx4_table_get_range() is only used in control path, there is no significant gain by using kvzalloc vs. vzalloc for table->icm. Anyway, if a user makes sure mlx4 driver to be loaded very early and doesn't remove and reload it afterwards, we should have enough (and not wasting) contiguous phy mem for table->icm allocation. I will use kvzalloc to replace vzalloc and send a V4 patch. Thanks, Qing Thanks, Tariq And you'll understand some people care about this. Strongly. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks
On 18/05/2018 12:45 AM, Qing Huang wrote: On 5/17/2018 2:14 PM, Eric Dumazet wrote: On 05/17/2018 01:53 PM, Qing Huang wrote: When a system is under memory presure (high usage with fragments), the original 256KB ICM chunk allocations will likely trigger kernel memory management to enter slow path doing memory compact/migration ops in order to complete high order memory allocations. When that happens, user processes calling uverb APIs may get stuck for more than 120s easily even though there are a lot of free pages in smaller chunks available in the system. Syslog: ... Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task oracle_205573_e:205573 blocked for more than 120 seconds. ... NACK on this patch. You have been asked repeatedly to use kvmalloc() This is not a minor suggestion. Take a look athttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8c13f2271ec5178c52fbde072ec7b562651ed9d Would you please take a look at how table->icm is being used in the mlx4 driver? It's a meta data used for individual pointer variable referencing, not as data frag or in/out buffer. It has no need for contiguous phy. memory. Thanks. NACK. This would cause a degradation when iterating the entries of table->icm. For example, in mlx4_table_get_range. Thanks, Tariq And you'll understand some people care about this. Strongly. Thanks.
Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks
On 5/17/2018 2:14 PM, Eric Dumazet wrote: On 05/17/2018 01:53 PM, Qing Huang wrote: When a system is under memory presure (high usage with fragments), the original 256KB ICM chunk allocations will likely trigger kernel memory management to enter slow path doing memory compact/migration ops in order to complete high order memory allocations. When that happens, user processes calling uverb APIs may get stuck for more than 120s easily even though there are a lot of free pages in smaller chunks available in the system. Syslog: ... Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task oracle_205573_e:205573 blocked for more than 120 seconds. ... NACK on this patch. You have been asked repeatedly to use kvmalloc() This is not a minor suggestion. Take a look athttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8c13f2271ec5178c52fbde072ec7b562651ed9d Would you please take a look at how table->icm is being used in the mlx4 driver? It's a meta data used for individual pointer variable referencing, not as data frag or in/out buffer. It has no need for contiguous phy. memory. Thanks. And you'll understand some people care about this. Strongly. Thanks.
Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks
On 05/17/2018 01:53 PM, Qing Huang wrote: > When a system is under memory presure (high usage with fragments), > the original 256KB ICM chunk allocations will likely trigger kernel > memory management to enter slow path doing memory compact/migration > ops in order to complete high order memory allocations. > > When that happens, user processes calling uverb APIs may get stuck > for more than 120s easily even though there are a lot of free pages > in smaller chunks available in the system. > > Syslog: > ... > Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task > oracle_205573_e:205573 blocked for more than 120 seconds. > ... > NACK on this patch. You have been asked repeatedly to use kvmalloc() This is not a minor suggestion. Take a look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8c13f2271ec5178c52fbde072ec7b562651ed9d And you'll understand some people care about this. Strongly. Thanks.