Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Jiang, Dave


> On Aug 2, 2017, at 10:25 PM, Koul, Vinod  wrote:
> 
>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
>> 
>> 
 On Aug 2, 2017, at 9:58 PM, Koul, Vinod  wrote:
 
 On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
 
 
> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>> Do we need a new API / new function, or new capability?
>> Hmmm...you are right. I wonder if we need something like DMA_SG cap
>> 
>> 
> 
> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> to be similar with DMA_MEMSET_SG.
 
 I'm ok with that if Vinod is.
>>> 
>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>> or all :). We should have done bitfields for this though...
>> 
>> Add DMA_MEMCPY_SG to transaction type. 
> 
> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
> scatterlist to scatterlist copy which is used to check for
> device_prep_dma_sg() calls
> 
Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So we 
need something separate than what DMA_SG is used for. 


> -- 
> ~Vinod
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Vinod Koul
On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
> 
> 
> > On Aug 2, 2017, at 9:58 PM, Koul, Vinod  wrote:
> > 
> >> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> >> 
> >> 
> >>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> >>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
> > Do we need a new API / new function, or new capability?
>  Hmmm...you are right. I wonder if we need something like DMA_SG cap
>  
>  
> >>> 
> >>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> >>> to be similar with DMA_MEMSET_SG.
> >> 
> >> I'm ok with that if Vinod is.
> > 
> > So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> > or all :). We should have done bitfields for this though...
> 
> Add DMA_MEMCPY_SG to transaction type. 

Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
scatterlist to scatterlist copy which is used to check for
device_prep_dma_sg() calls

-- 
~Vinod
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Jiang, Dave


> On Aug 2, 2017, at 9:58 PM, Koul, Vinod  wrote:
> 
>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
>> 
>> 
>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
> Do we need a new API / new function, or new capability?
 Hmmm...you are right. I wonder if we need something like DMA_SG cap
 
 
>>> 
>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>> to be similar with DMA_MEMSET_SG.
>> 
>> I'm ok with that if Vinod is.
> 
> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> or all :). We should have done bitfields for this though...

Add DMA_MEMCPY_SG to transaction type. 

> 
>> 
>>> 
>>> enum dma_transaction_type {
>>>DMA_MEMCPY,
>>>DMA_XOR,
>>>DMA_PQ,
>>>DMA_XOR_VAL,
>>>DMA_PQ_VAL,
>>>DMA_MEMSET,
>>>DMA_MEMSET_SG,
>>>DMA_INTERRUPT,
>>>DMA_SG,
>>>DMA_PRIVATE,
>>>DMA_ASYNC_TX,
>>>DMA_SLAVE,
>>>DMA_CYCLIC,
>>>DMA_INTERLEAVE,
>>> /* last transaction type for creation of the capabilities mask */
>>>DMA_TX_TYPE_END,
>>> };
>>> 
> 
> -- 
> ~Vinod
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Vinod Koul
On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> 
> 
> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> > On 8/2/2017 4:52 PM, Dave Jiang wrote:
> >>> Do we need a new API / new function, or new capability?
> >> Hmmm...you are right. I wonder if we need something like DMA_SG cap
> >>
> >>
> > 
> > Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> > to be similar with DMA_MEMSET_SG.
> 
> I'm ok with that if Vinod is.

So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
or all :). We should have done bitfields for this though...

> 
> > 
> > enum dma_transaction_type {
> > DMA_MEMCPY,
> > DMA_XOR,
> > DMA_PQ,
> > DMA_XOR_VAL,
> > DMA_PQ_VAL,
> > DMA_MEMSET,
> > DMA_MEMSET_SG,
> > DMA_INTERRUPT,
> > DMA_SG,
> > DMA_PRIVATE,
> > DMA_ASYNC_TX,
> > DMA_SLAVE,
> > DMA_CYCLIC,
> > DMA_INTERLEAVE,
> > /* last transaction type for creation of the capabilities mask */
> > DMA_TX_TYPE_END,
> > };
> > 

-- 
~Vinod
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt

2017-08-02 Thread Dan Williams
[ adding Tim and Ying who have also been looking at swap optimization
and rw_page interactions ]

On Wed, Aug 2, 2017 at 5:13 PM, Minchan Kim  wrote:
> Hi Ross,
>
> On Wed, Aug 02, 2017 at 04:13:59PM -0600, Ross Zwisler wrote:
>> On Fri, Jul 28, 2017 at 10:31:43AM -0700, Matthew Wilcox wrote:
>> > On Fri, Jul 28, 2017 at 10:56:01AM -0600, Ross Zwisler wrote:
>> > > Dan Williams and Christoph Hellwig have recently expressed doubt about
>> > > whether the rw_page() interface made sense for synchronous memory drivers
>> > > [1][2].  It's unclear whether this interface has any performance benefit
>> > > for these drivers, but as we continue to fix bugs it is clear that it 
>> > > does
>> > > have a maintenance burden.  This series removes the rw_page()
>> > > implementations in brd, pmem and btt to relieve this burden.
>> >
>> > Why don't you measure whether it has performance benefits?  I don't
>> > understand why zram would see performance benefits and not other drivers.
>> > If it's going to be removed, then the whole interface should be removed,
>> > not just have the implementations removed from some drivers.
>>
>> Okay, I've run a bunch of performance tests with the PMEM and with BTT entry
>> points for rw_pages() in a swap workload, and in all cases I do see an
>> improvement over the code when rw_pages() is removed.  Here are the results
>> from my random lab box:
>>
>>   Average latency of swap_writepage()
>> +--++-+-+
>> |  | no rw_page | rw_page | Improvement |
>> +---+
>> | PMEM |  5.0 us|  4.7 us | 6%  |
>> +---+
>> |  BTT |  6.8 us|  6.1 us |10%  |
>> +--++-+-+
>>
>>   Average latency of swap_readpage()
>> +--++-+-+
>> |  | no rw_page | rw_page | Improvement |
>> +---+
>> | PMEM |  3.3 us|  2.9 us |12%  |
>> +---+
>> |  BTT |  3.7 us|  3.4 us | 8%  |
>> +--++-+-+
>>
>> The workload was pmbench, a memory benchmark, run on a system where I had
>> severely restricted the amount of memory in the system with the 'mem' kernel
>> command line parameter.  The benchmark was set up to test more memory than I
>> allowed the OS to have so it spilled over into swap.
>>
>> The PMEM or BTT device was set up as my swap device, and during the test I 
>> got
>> a few hundred thousand samples of each of swap_writepage() and
>> swap_writepage().  The PMEM/BTT device was just memory reserved with the
>> memmap kernel command line parameter.
>>
>> Thanks, Matthew, for asking for performance data.  It looks like removing 
>> this
>> code would have been a mistake.
>
> By suggestion of Christoph Hellwig, I made a quick patch which does IO without
> dynamic bio allocation for swap IO. Actually, it's not formal patch to be
> worth to send mainline yet but I believe it's enough to test the improvement.
>
> Could you test patchset on pmem and btt without rw_page?
>
> For working the patch, block drivers need to declare it's synchronous IO
> device via BDI_CAP_SYNC but if it's hard, you can just make every swap IO
> comes from (sis->flags & SWP_SYNC_IO) with removing condition check
>
> if (!(sis->flags & SWP_SYNC_IO)) in swap_[read|write]page.
>
> Patchset is based on 4.13-rc3.
>
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 856d5dc02451..b1c5e9bf3ad5 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -125,9 +125,9 @@ static inline bool is_partial_io(struct bio_vec *bvec)
>  static void zram_revalidate_disk(struct zram *zram)
>  {
> revalidate_disk(zram->disk);
> -   /* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */
> +   /* revalidate_disk reset the BDI capability so set again */
> zram->disk->queue->backing_dev_info->capabilities |=
> -   BDI_CAP_STABLE_WRITES;
> +   (BDI_CAP_STABLE_WRITES|BDI_CAP_SYNC);
>  }
>
>  /*
> @@ -1096,7 +1096,7 @@ static int zram_open(struct block_device *bdev, fmode_t 
> mode)
>  static const struct block_device_operations zram_devops = {
> .open = zram_open,
> .swap_slot_free_notify = zram_slot_free_notify,
> -   .rw_page = zram_rw_page,
> +   // .rw_page = zram_rw_page,
> .owner = THIS_MODULE
>  };
>
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index 854e1bdd0b2a..05eee145d964 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -130,6 +130,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
> unsigned int max_ratio);
>  #define BDI_CAP_STABLE_WRITES  0x0008
>  #define BDI_CAP_STRICTLIMIT0x0010
>  #define BDI_CAP_CGROUP_WRITEBACK 0x0020
> +#define BDI_CAP_SYNC   

Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt

2017-08-02 Thread Minchan Kim
Hi Ross,

On Wed, Aug 02, 2017 at 04:13:59PM -0600, Ross Zwisler wrote:
> On Fri, Jul 28, 2017 at 10:31:43AM -0700, Matthew Wilcox wrote:
> > On Fri, Jul 28, 2017 at 10:56:01AM -0600, Ross Zwisler wrote:
> > > Dan Williams and Christoph Hellwig have recently expressed doubt about
> > > whether the rw_page() interface made sense for synchronous memory drivers
> > > [1][2].  It's unclear whether this interface has any performance benefit
> > > for these drivers, but as we continue to fix bugs it is clear that it does
> > > have a maintenance burden.  This series removes the rw_page()
> > > implementations in brd, pmem and btt to relieve this burden.
> > 
> > Why don't you measure whether it has performance benefits?  I don't
> > understand why zram would see performance benefits and not other drivers.
> > If it's going to be removed, then the whole interface should be removed,
> > not just have the implementations removed from some drivers.
> 
> Okay, I've run a bunch of performance tests with the PMEM and with BTT entry
> points for rw_pages() in a swap workload, and in all cases I do see an
> improvement over the code when rw_pages() is removed.  Here are the results
> from my random lab box:
> 
>   Average latency of swap_writepage()
> +--++-+-+
> |  | no rw_page | rw_page | Improvement |
> +---+
> | PMEM |  5.0 us|  4.7 us | 6%  |
> +---+
> |  BTT |  6.8 us|  6.1 us |10%  |
> +--++-+-+
> 
>   Average latency of swap_readpage()
> +--++-+-+
> |  | no rw_page | rw_page | Improvement |
> +---+
> | PMEM |  3.3 us|  2.9 us |12%  |
> +---+
> |  BTT |  3.7 us|  3.4 us | 8%  |
> +--++-+-+
> 
> The workload was pmbench, a memory benchmark, run on a system where I had
> severely restricted the amount of memory in the system with the 'mem' kernel
> command line parameter.  The benchmark was set up to test more memory than I
> allowed the OS to have so it spilled over into swap.
> 
> The PMEM or BTT device was set up as my swap device, and during the test I got
> a few hundred thousand samples of each of swap_writepage() and
> swap_writepage().  The PMEM/BTT device was just memory reserved with the
> memmap kernel command line parameter.
> 
> Thanks, Matthew, for asking for performance data.  It looks like removing this
> code would have been a mistake.

By suggestion of Christoph Hellwig, I made a quick patch which does IO without
dynamic bio allocation for swap IO. Actually, it's not formal patch to be
worth to send mainline yet but I believe it's enough to test the improvement.

Could you test patchset on pmem and btt without rw_page?

For working the patch, block drivers need to declare it's synchronous IO
device via BDI_CAP_SYNC but if it's hard, you can just make every swap IO
comes from (sis->flags & SWP_SYNC_IO) with removing condition check

if (!(sis->flags & SWP_SYNC_IO)) in swap_[read|write]page.

Patchset is based on 4.13-rc3.


diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 856d5dc02451..b1c5e9bf3ad5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -125,9 +125,9 @@ static inline bool is_partial_io(struct bio_vec *bvec)
 static void zram_revalidate_disk(struct zram *zram)
 {
revalidate_disk(zram->disk);
-   /* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */
+   /* revalidate_disk reset the BDI capability so set again */
zram->disk->queue->backing_dev_info->capabilities |=
-   BDI_CAP_STABLE_WRITES;
+   (BDI_CAP_STABLE_WRITES|BDI_CAP_SYNC);
 }
 
 /*
@@ -1096,7 +1096,7 @@ static int zram_open(struct block_device *bdev, fmode_t 
mode)
 static const struct block_device_operations zram_devops = {
.open = zram_open,
.swap_slot_free_notify = zram_slot_free_notify,
-   .rw_page = zram_rw_page,
+   // .rw_page = zram_rw_page,
.owner = THIS_MODULE
 };
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 854e1bdd0b2a..05eee145d964 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -130,6 +130,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
 #define BDI_CAP_STABLE_WRITES  0x0008
 #define BDI_CAP_STRICTLIMIT0x0010
 #define BDI_CAP_CGROUP_WRITEBACK 0x0020
+#define BDI_CAP_SYNC   0x0040
 
 #define BDI_CAP_NO_ACCT_AND_WRITEBACK \
(BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB)
@@ -177,6 +178,11 @@ long wait_iff_congested(struct pglist_data *pgdat, int 
sync, long timeout);
 int pdflush_proc_obsolete(struct ctl_table *table, int write,
void __user 

Delivery reports about your e-mail

2017-08-02 Thread aacraid
The original message was received at Thu, 3 Aug 2017 08:12:38 +0800
from pmc-sierra.com [108.177.199.22]

- The following addresses had permanent fatal errors -


- Transcript of session follows -
  while talking to lists.01.org.:
>>> MAIL From:aacr...@pmc-sierra.com
<<< 501 aacr...@pmc-sierra.com... Refused



___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 0/3] remove rw_page() from brd, pmem and btt

2017-08-02 Thread Ross Zwisler
On Fri, Jul 28, 2017 at 10:31:43AM -0700, Matthew Wilcox wrote:
> On Fri, Jul 28, 2017 at 10:56:01AM -0600, Ross Zwisler wrote:
> > Dan Williams and Christoph Hellwig have recently expressed doubt about
> > whether the rw_page() interface made sense for synchronous memory drivers
> > [1][2].  It's unclear whether this interface has any performance benefit
> > for these drivers, but as we continue to fix bugs it is clear that it does
> > have a maintenance burden.  This series removes the rw_page()
> > implementations in brd, pmem and btt to relieve this burden.
> 
> Why don't you measure whether it has performance benefits?  I don't
> understand why zram would see performance benefits and not other drivers.
> If it's going to be removed, then the whole interface should be removed,
> not just have the implementations removed from some drivers.

Okay, I've run a bunch of performance tests with the PMEM and with BTT entry
points for rw_pages() in a swap workload, and in all cases I do see an
improvement over the code when rw_pages() is removed.  Here are the results
from my random lab box:

  Average latency of swap_writepage()
+--++-+-+
|  | no rw_page | rw_page | Improvement |
+---+
| PMEM |  5.0 us|  4.7 us | 6%  |
+---+
|  BTT |  6.8 us|  6.1 us |10%  |
+--++-+-+

  Average latency of swap_readpage()
+--++-+-+
|  | no rw_page | rw_page | Improvement |
+---+
| PMEM |  3.3 us|  2.9 us |12%  |
+---+
|  BTT |  3.7 us|  3.4 us | 8%  |
+--++-+-+

The workload was pmbench, a memory benchmark, run on a system where I had
severely restricted the amount of memory in the system with the 'mem' kernel
command line parameter.  The benchmark was set up to test more memory than I
allowed the OS to have so it spilled over into swap.

The PMEM or BTT device was set up as my swap device, and during the test I got
a few hundred thousand samples of each of swap_writepage() and
swap_writepage().  The PMEM/BTT device was just memory reserved with the
memmap kernel command line parameter.

Thanks, Matthew, for asking for performance data.  It looks like removing this
code would have been a mistake.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Dave Jiang


On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>> Do we need a new API / new function, or new capability?
>> Hmmm...you are right. I wonder if we need something like DMA_SG cap
>>
>>
> 
> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> to be similar with DMA_MEMSET_SG.

I'm ok with that if Vinod is.

> 
> enum dma_transaction_type {
>   DMA_MEMCPY,
>   DMA_XOR,
>   DMA_PQ,
>   DMA_XOR_VAL,
>   DMA_PQ_VAL,
>   DMA_MEMSET,
>   DMA_MEMSET_SG,
>   DMA_INTERRUPT,
>   DMA_SG,
>   DMA_PRIVATE,
>   DMA_ASYNC_TX,
>   DMA_SLAVE,
>   DMA_CYCLIC,
>   DMA_INTERLEAVE,
> /* last transaction type for creation of the capabilities mask */
>   DMA_TX_TYPE_END,
> };
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Sinan Kaya
On 8/2/2017 4:52 PM, Dave Jiang wrote:
>> Do we need a new API / new function, or new capability?
> Hmmm...you are right. I wonder if we need something like DMA_SG cap
> 
> 

Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
to be similar with DMA_MEMSET_SG.

enum dma_transaction_type {
DMA_MEMCPY,
DMA_XOR,
DMA_PQ,
DMA_XOR_VAL,
DMA_PQ_VAL,
DMA_MEMSET,
DMA_MEMSET_SG,
DMA_INTERRUPT,
DMA_SG,
DMA_PRIVATE,
DMA_ASYNC_TX,
DMA_SLAVE,
DMA_CYCLIC,
DMA_INTERLEAVE,
/* last transaction type for creation of the capabilities mask */
DMA_TX_TYPE_END,
};

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Dave Jiang


On 08/02/2017 12:22 PM, Sinan Kaya wrote:
> On 8/2/2017 2:41 PM, Dave Jiang wrote:
>>  if (queue_mode == PMEM_Q_MQ) {
>> +chan = dma_find_channel(DMA_MEMCPY);
>> +if (!chan) {
>> +queue_mode = PMEM_Q_BIO;
>> +dev_warn(dev, "Forced back to PMEM_Q_BIO, no DMA\n");
>> +}
> 
> We can't expect all MEMCPY capable hardware to support this feature, right?
> 
> Do we need a new API / new function, or new capability?

Hmmm...you are right. I wonder if we need something like DMA_SG cap


> 
>> +}
> 
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] dm: allow device-mapper to operate without dax support

2017-08-02 Thread kbuild test robot
Hi Dan,

[auto build test ERROR on dm/for-next]
[also build test ERROR on v4.13-rc3 next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Dan-Williams/dm-allow-device-mapper-to-operate-without-dax-support/20170802-155255
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
for-next
config: sh-sdk7786_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   drivers/md/dm-table.o: In function `device_dax_write_cache_enabled':
   dm-table.c:(.text+0xab4): undefined reference to `dax_write_cache_enabled'
   drivers/md/dm-table.o: In function `dm_table_set_restrictions':
>> (.text+0x23d0): undefined reference to `dax_write_cache'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Sinan Kaya
On 8/2/2017 2:41 PM, Dave Jiang wrote:
>   if (queue_mode == PMEM_Q_MQ) {
> + chan = dma_find_channel(DMA_MEMCPY);
> + if (!chan) {
> + queue_mode = PMEM_Q_BIO;
> + dev_warn(dev, "Forced back to PMEM_Q_BIO, no DMA\n");
> + }

We can't expect all MEMCPY capable hardware to support this feature, right?

Do we need a new API / new function, or new capability?

> + }


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 0/2] dax, dm: stop requiring dax for device-mapper

2017-08-02 Thread Mike Snitzer
On Wed, Aug 02 2017 at  1:57pm -0400,
Dan Williams  wrote:

> Changes since v2 [1]:
> * rebase on -next to integrate with commit 273752c9ff03 "dm, dax: Make
>   sure dm_dax_flush() is called if device supports it" (kbuild robot)
> * fix CONFIG_DAX dependencies to upgrade CONFIG_DAX=m to CONFIG_DAX=y
>   (kbuild robot)
> 
> [1]: https://www.spinics.net/lists/kernel/msg2570522.html
> 
> ---
> 
> Bart points out that the DAX core is unconditionally enabled if
> device-mapper is enabled. Add some config machinery and some
> stub-static-inline routines to allow dax infrastructure to be deleted
> from device-mapper at compile time.
> 
> Since this depends on commit 273752c9ff03 that's already in -next, this
> should go through the device-mapper tree.

Commit 273752c9ff03eb83856601b2a3458218bb949e46 is upstream as of
v4.13-rc3 -- so no real need to have this go via linux-dm.git

That said, I don't mind picking it up once we are satisfied with the
implementation.  I'll start reviewing shortly.

Mike
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 3/5] dmaengine: add SG support to dmaengine_unmap

2017-08-02 Thread Dave Jiang
This should provide support to unmap scatterlist with the
dmaengine_unmap_data. We will support only 1 scatterlist per
direction. The DMA addresses array has been overloaded for the
2 or less entries DMA unmap data structure in order to store the
SG pointer(s).

Signed-off-by: Dave Jiang 
---
 drivers/dma/dmaengine.c   |   45 -
 include/linux/dmaengine.h |4 
 2 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index d9118ec..81d10c97 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1126,16 +1126,35 @@ static void dmaengine_unmap(struct kref *kref)
 {
struct dmaengine_unmap_data *unmap = container_of(kref, typeof(*unmap), 
kref);
struct device *dev = unmap->dev;
-   int cnt, i;
+   int cnt, i, sg_nents;
+   struct scatterlist *sg;
+
+   sg_nents = dma_unmap_data_sg_to_nents(unmap, unmap->map_cnt);
+   if (sg_nents) {
+   i = 0;
+   cnt = 1;
+   sg = (struct scatterlist *)unmap->addr[i];
+   dma_unmap_sg(dev, sg, sg_nents, DMA_TO_DEVICE);
+   } else {
+   cnt = unmap->to_cnt;
+   for (i = 0; i < cnt; i++)
+   dma_unmap_page(dev, unmap->addr[i], unmap->len,
+   DMA_TO_DEVICE);
+   }
+
+   sg_nents = dma_unmap_data_sg_from_nents(unmap, unmap->map_cnt);
+   if (sg_nents) {
+   sg = (struct scatterlist *)unmap->addr[i];
+   dma_unmap_sg(dev, sg, sg_nents, DMA_FROM_DEVICE);
+   cnt++;
+   i++;
+   } else {
+   cnt += unmap->from_cnt;
+   for (; i < cnt; i++)
+   dma_unmap_page(dev, unmap->addr[i], unmap->len,
+   DMA_FROM_DEVICE);
+   }
 
-   cnt = unmap->to_cnt;
-   for (i = 0; i < cnt; i++)
-   dma_unmap_page(dev, unmap->addr[i], unmap->len,
-  DMA_TO_DEVICE);
-   cnt += unmap->from_cnt;
-   for (; i < cnt; i++)
-   dma_unmap_page(dev, unmap->addr[i], unmap->len,
-  DMA_FROM_DEVICE);
cnt += unmap->bidi_cnt;
for (; i < cnt; i++) {
if (unmap->addr[i] == 0)
@@ -1179,6 +1198,10 @@ static int __init dmaengine_init_unmap_pool(void)
size = sizeof(struct dmaengine_unmap_data) +
   sizeof(dma_addr_t) * p->size;
 
+   /* add 2 more entries for SG nents overload */
+   if (i == 0)
+   size += sizeof(dma_addr_t) * 2;
+
p->cache = kmem_cache_create(p->name, size, 0,
 SLAB_HWCACHE_ALIGN, NULL);
if (!p->cache)
@@ -1205,6 +1228,10 @@ dmaengine_get_unmap_data(struct device *dev, int nr, 
gfp_t flags)
return NULL;
 
memset(unmap, 0, sizeof(*unmap));
+   /* clear the overloaded sg nents entries */
+   if (nr < 3)
+   memset(>addr[nr], 0,
+   DMA_UNMAP_SG_ENTS * sizeof(dma_addr_t));
kref_init(>kref);
unmap->dev = dev;
unmap->map_cnt = nr;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 060f152..ba978dd 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -475,6 +475,10 @@ struct dmaengine_unmap_data {
dma_addr_t addr[0];
 };
 
+#define DMA_UNMAP_SG_ENTS  2
+#define dma_unmap_data_sg_to_nents(x, n) x->addr[n]
+#define dma_unmap_data_sg_from_nents(x, n) x->addr[n+1]
+
 /**
  * struct dma_async_tx_descriptor - async transaction descriptor
  * ---dma generic offload fields---

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 1/5] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels

2017-08-02 Thread Dave Jiang
Commit 7618d0359c16 ("dmaengine: ioatdma: Set non RAID channels to be
private capable") makes all non-RAID ioatdma channels as private to be
requestable by dma_request_channel(). With PQ CAP support going away for
ioatdma, this would make all channels private. To support the usage of
ioatdma for blk-mq implementation of pmem we need as many channels we can
share in order to be high performing. Thus reverting the patch.

Signed-off-by: Dave Jiang 
---
 drivers/dma/ioat/init.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 6ad4384..e437112 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1163,9 +1163,6 @@ static int ioat3_dma_probe(struct ioatdma_device 
*ioat_dma, int dca)
}
}
 
-   if (!(ioat_dma->cap & (IOAT_CAP_XOR | IOAT_CAP_PQ)))
-   dma_cap_set(DMA_PRIVATE, dma->cap_mask);
-
err = ioat_probe(ioat_dma);
if (err)
return err;

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Dave Jiang
Adding DMA support for pmem blk reads. This provides signficant CPU
reduction with large memory reads with good performance. DMAs are triggered
with test against bio_multiple_segment(), so the small I/Os (4k or less?)
are still performed by the CPU in order to reduce latency. By default
the pmem driver will be using blk-mq with DMA.

Numbers below are measured against pmem simulated via DRAM using
memmap=NN!SS.  DMA engine used is the ioatdma on Intel Skylake Xeon
platform.  Keep in mind the performance for actual persistent memory
will differ.
Fio 2.21 was used.

64k: 1 task queuedepth=1
CPU Read:  7631 MB/s  99.7% CPUDMA Read: 2415 MB/s  54% CPU
CPU Write: 3552 MB/s  100% CPU DMA Write 2173 MB/s  54% CPU

64k: 16 tasks queuedepth=16
CPU Read: 36800 MB/s  1593% CPUDMA Read:  29100 MB/s  607% CPU
CPU Write 20900 MB/s  1589% CPUDMA Write: 23400 MB/s  585% CPU

2M: 1 task queuedepth=1
CPU Read:  6013 MB/s  99.3% CPUDMA Read:  7986 MB/s  59.3% CPU
CPU Write: 3579 MB/s  100% CPU DMA Write: 5211 MB/s  58.3% CPU

2M: 16 tasks queuedepth=16
CPU Read:  18100 MB/s 1588% CPUDMA Read:  21300 MB/s 180.9% CPU
CPU Write: 14100 MB/s 1594% CPUDMA Write: 20400 MB/s 446.9% CPU

Signed-off-by: Dave Jiang 
---
 drivers/nvdimm/pmem.c |  214 +++--
 1 file changed, 204 insertions(+), 10 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 98e752f..c8f7a2f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "pmem.h"
 #include "pfn.h"
 #include "nd.h"
@@ -41,12 +43,24 @@ enum {
PMEM_Q_MQ = 1,
 };
 
-static int queue_mode = PMEM_Q_BIO;
+static int queue_mode = PMEM_Q_MQ;
 module_param(queue_mode, int, 0444);
-MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ)");
+MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ & DMA)");
+
+static int queue_depth = 128;
+module_param(queue_depth, int, 0444);
+MODULE_PARM_DESC(queue_depth, "I/O Queue Depth for multi queue mode");
+
+/* typically maps to number of DMA channels/devices per socket */
+static int q_per_node = 8;
+module_param(q_per_node, int, 0444);
+MODULE_PARM_DESC(q_per_node, "Hardware queues per node\n");
 
 struct pmem_cmd {
struct request *rq;
+   struct dma_chan *chan;
+   int sg_nents;
+   struct scatterlist sg[];
 };
 
 static struct device *to_dev(struct pmem_device *pmem)
@@ -277,6 +291,159 @@ static void pmem_release_disk(void *__pmem)
put_disk(pmem->disk);
 }
 
+static void nd_pmem_dma_callback(void *data,
+   const struct dmaengine_result *res)
+{
+   struct pmem_cmd *cmd = data;
+   struct request *req = cmd->rq;
+   struct request_queue *q = req->q;
+   struct pmem_device *pmem = q->queuedata;
+   struct nd_region *nd_region = to_region(pmem);
+   struct device *dev = to_dev(pmem);
+   int rc = 0;
+
+   if (res) {
+   enum dmaengine_tx_result dma_err = res->result;
+
+   switch (dma_err) {
+   case DMA_TRANS_READ_FAILED:
+   case DMA_TRANS_WRITE_FAILED:
+   case DMA_TRANS_ABORTED:
+   dev_dbg(dev, "bio failed\n");
+   rc = -ENXIO;
+   break;
+   case DMA_TRANS_NOERROR:
+   default:
+   break;
+   }
+   }
+
+   if (req->cmd_flags & REQ_FUA)
+   nvdimm_flush(nd_region);
+
+   blk_mq_end_request(cmd->rq, rc);
+}
+
+static int pmem_handle_cmd_dma(struct pmem_cmd *cmd, bool is_write)
+{
+   struct request *req = cmd->rq;
+   struct request_queue *q = req->q;
+   struct pmem_device *pmem = q->queuedata;
+   struct device *dev = to_dev(pmem);
+   phys_addr_t pmem_off = blk_rq_pos(req) * 512 + pmem->data_offset;
+   void *pmem_addr = pmem->virt_addr + pmem_off;
+   struct nd_region *nd_region = to_region(pmem);
+   size_t len;
+   struct dma_device *dma = cmd->chan->device;
+   struct dmaengine_unmap_data *unmap;
+   dma_cookie_t cookie;
+   struct dma_async_tx_descriptor *txd;
+   struct page *page;
+   unsigned int off;
+   int rc;
+   enum dma_data_direction dir;
+   dma_addr_t dma_addr;
+
+   if (req->cmd_flags & REQ_FLUSH)
+   nvdimm_flush(nd_region);
+
+   unmap = dmaengine_get_unmap_data(dma->dev, 2, GFP_NOWAIT);
+   if (!unmap) {
+   dev_dbg(dev, "failed to get dma unmap data\n");
+   rc = -ENOMEM;
+   goto err;
+   }
+
+   /*
+* If reading from pmem, writing to scatterlist,
+* and if writing to pmem, reading from scatterlist.
+*/
+   dir = is_write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+   cmd->sg_nents = blk_rq_map_sg(req->q, req, cmd->sg);
+   if (cmd->sg_nents < 1) {
+   rc 

[PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver

2017-08-02 Thread Dave Jiang
Adding blk-mq support to the pmem driver in addition to the direct bio
support. This allows for hardware offloading via DMA engines. By default
the bio method will be enabled. The blk-mq support can be turned on via
module parameter queue_mode=1.

Signed-off-by: Dave Jiang 
---
 drivers/nvdimm/pmem.c |  137 +
 drivers/nvdimm/pmem.h |3 +
 2 files changed, 119 insertions(+), 21 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c544d46..98e752f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -31,10 +31,24 @@
 #include 
 #include 
 #include 
+#include 
 #include "pmem.h"
 #include "pfn.h"
 #include "nd.h"
 
+enum {
+   PMEM_Q_BIO = 0,
+   PMEM_Q_MQ = 1,
+};
+
+static int queue_mode = PMEM_Q_BIO;
+module_param(queue_mode, int, 0444);
+MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ)");
+
+struct pmem_cmd {
+   struct request *rq;
+};
+
 static struct device *to_dev(struct pmem_device *pmem)
 {
/*
@@ -239,9 +253,13 @@ static const struct dax_operations pmem_dax_ops = {
.direct_access = pmem_dax_direct_access,
 };
 
-static void pmem_release_queue(void *q)
+static void pmem_release_queue(void *data)
 {
-   blk_cleanup_queue(q);
+   struct pmem_device *pmem = (struct pmem_device *)data;
+
+   blk_cleanup_queue(pmem->q);
+   if (queue_mode == PMEM_Q_MQ)
+   blk_mq_free_tag_set(>tag_set);
 }
 
 static void pmem_freeze_queue(void *q)
@@ -259,6 +277,54 @@ static void pmem_release_disk(void *__pmem)
put_disk(pmem->disk);
 }
 
+static int pmem_handle_cmd(struct pmem_cmd *cmd)
+{
+   struct request *req = cmd->rq;
+   struct request_queue *q = req->q;
+   struct pmem_device *pmem = q->queuedata;
+   struct nd_region *nd_region = to_region(pmem);
+   struct bio_vec bvec;
+   struct req_iterator iter;
+   int rc = 0;
+
+   if (req->cmd_flags & REQ_FLUSH)
+   nvdimm_flush(nd_region);
+
+   rq_for_each_segment(bvec, req, iter) {
+   rc = pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
+   bvec.bv_offset, op_is_write(req_op(req)),
+   iter.iter.bi_sector);
+   if (rc < 0)
+   break;
+   }
+
+   if (req->cmd_flags & REQ_FUA)
+   nvdimm_flush(nd_region);
+
+   blk_mq_end_request(cmd->rq, rc);
+
+   return rc;
+}
+
+static int pmem_queue_rq(struct blk_mq_hw_ctx *hctx,
+   const struct blk_mq_queue_data *bd)
+{
+   struct pmem_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
+
+   cmd->rq = bd->rq;
+
+   blk_mq_start_request(bd->rq);
+
+   if (pmem_handle_cmd(cmd) < 0)
+   return BLK_MQ_RQ_QUEUE_ERROR;
+   else
+   return BLK_MQ_RQ_QUEUE_OK;
+}
+
+static const struct blk_mq_ops pmem_mq_ops = {
+   .queue_rq   = pmem_queue_rq,
+};
+
 static int pmem_attach_disk(struct device *dev,
struct nd_namespace_common *ndns)
 {
@@ -272,9 +338,9 @@ static int pmem_attach_disk(struct device *dev,
struct nd_pfn_sb *pfn_sb;
struct pmem_device *pmem;
struct resource pfn_res;
-   struct request_queue *q;
struct gendisk *disk;
void *addr;
+   int rc;
 
/* while nsio_rw_bytes is active, parse a pfn info block if present */
if (is_nd_pfn(dev)) {
@@ -303,17 +369,47 @@ static int pmem_attach_disk(struct device *dev,
return -EBUSY;
}
 
-   q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
-   if (!q)
-   return -ENOMEM;
+   if (queue_mode == PMEM_Q_MQ) {
+   pmem->tag_set.ops = _mq_ops;
+   pmem->tag_set.nr_hw_queues = nr_online_nodes;
+   pmem->tag_set.queue_depth = 64;
+   pmem->tag_set.numa_node = dev_to_node(dev);
+   pmem->tag_set.cmd_size = sizeof(struct pmem_cmd);
+   pmem->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+   pmem->tag_set.driver_data = pmem;
+
+   rc = blk_mq_alloc_tag_set(>tag_set);
+   if (rc < 0)
+   return rc;
+
+   pmem->q = blk_mq_init_queue(>tag_set);
+   if (IS_ERR(pmem->q)) {
+   blk_mq_free_tag_set(>tag_set);
+   return -ENOMEM;
+   }
 
-   if (devm_add_action_or_reset(dev, pmem_release_queue, q))
-   return -ENOMEM;
+   if (devm_add_action_or_reset(dev, pmem_release_queue, pmem)) {
+   blk_mq_free_tag_set(>tag_set);
+   return -ENOMEM;
+   }
+   } else if (queue_mode == PMEM_Q_BIO) {
+   pmem->q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
+   if (!pmem->q)
+   return -ENOMEM;
+
+   if (devm_add_action_or_reset(dev, 

[PATCH v2 2/5] dmaengine: ioatdma: dma_prep_memcpy_sg support

2017-08-02 Thread Dave Jiang
Adding ioatdma support to copy from a physically contiguos buffer to a
provided scatterlist and vice versa. This is used to support
reading/writing persistent memory in the pmem driver.

Signed-off-by: Dave Jiang 
---
 drivers/dma/ioat/dma.h|4 +++
 drivers/dma/ioat/init.c   |1 +
 drivers/dma/ioat/prep.c   |   57 +
 include/linux/dmaengine.h |5 
 4 files changed, 67 insertions(+)

diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h
index 56200ee..6c08b06 100644
--- a/drivers/dma/ioat/dma.h
+++ b/drivers/dma/ioat/dma.h
@@ -370,6 +370,10 @@ struct dma_async_tx_descriptor *
 ioat_dma_prep_memcpy_lock(struct dma_chan *c, dma_addr_t dma_dest,
   dma_addr_t dma_src, size_t len, unsigned long flags);
 struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy_sg_lock(struct dma_chan *c,
+   struct scatterlist *sg, unsigned int sg_nents,
+   dma_addr_t dma_addr, bool to_sg, unsigned long flags);
+struct dma_async_tx_descriptor *
 ioat_prep_interrupt_lock(struct dma_chan *c, unsigned long flags);
 struct dma_async_tx_descriptor *
 ioat_prep_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index e437112..f82d3bb 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1091,6 +1091,7 @@ static int ioat3_dma_probe(struct ioatdma_device 
*ioat_dma, int dca)
 
dma = _dma->dma_dev;
dma->device_prep_dma_memcpy = ioat_dma_prep_memcpy_lock;
+   dma->device_prep_dma_memcpy_sg = ioat_dma_prep_memcpy_sg_lock;
dma->device_issue_pending = ioat_issue_pending;
dma->device_alloc_chan_resources = ioat_alloc_chan_resources;
dma->device_free_chan_resources = ioat_free_chan_resources;
diff --git a/drivers/dma/ioat/prep.c b/drivers/dma/ioat/prep.c
index 243421a..d8219af 100644
--- a/drivers/dma/ioat/prep.c
+++ b/drivers/dma/ioat/prep.c
@@ -159,6 +159,63 @@ ioat_dma_prep_memcpy_lock(struct dma_chan *c, dma_addr_t 
dma_dest,
return >txd;
 }
 
+struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy_sg_lock(struct dma_chan *c,
+   struct scatterlist *sg, unsigned int sg_nents,
+   dma_addr_t dma_addr, bool to_sg, unsigned long flags)
+{
+   struct ioatdma_chan *ioat_chan = to_ioat_chan(c);
+   struct ioat_dma_descriptor *hw = NULL;
+   struct ioat_ring_ent *desc = NULL;
+   dma_addr_t dma_off = dma_addr;
+   int num_descs, idx, i;
+   struct scatterlist *s;
+   size_t total_len = 0, len;
+
+
+   if (test_bit(IOAT_CHAN_DOWN, _chan->state))
+   return NULL;
+
+   /*
+* The upper layer will garantee that each entry does not exceed
+* xfercap.
+*/
+   num_descs = sg_nents;
+
+   if (likely(num_descs) &&
+   ioat_check_space_lock(ioat_chan, num_descs) == 0)
+   idx = ioat_chan->head;
+   else
+   return NULL;
+
+   for_each_sg(sg, s, sg_nents, i) {
+   desc = ioat_get_ring_ent(ioat_chan, idx + i);
+   hw = desc->hw;
+   len = sg_dma_len(s);
+   hw->size = len;
+   hw->ctl = 0;
+   if (to_sg) {
+   hw->src_addr = dma_off;
+   hw->dst_addr = sg_dma_address(s);
+   } else {
+   hw->src_addr = sg_dma_address(s);
+   hw->dst_addr = dma_off;
+   }
+   dma_off += len;
+   total_len += len;
+   dump_desc_dbg(ioat_chan, desc);
+   }
+
+   desc->txd.flags = flags;
+   desc->len = total_len;
+   hw->ctl_f.int_en = !!(flags & DMA_PREP_INTERRUPT);
+   hw->ctl_f.fence = !!(flags & DMA_PREP_FENCE);
+   hw->ctl_f.compl_write = 1;
+   dump_desc_dbg(ioat_chan, desc);
+   /* we leave the channel locked to ensure in order submission */
+
+   return >txd;
+}
 
 static struct dma_async_tx_descriptor *
 __ioat_prep_xor_lock(struct dma_chan *c, enum sum_check_flags *result,
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 5336808..060f152 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -694,6 +694,7 @@ struct dma_filter {
  * @device_prep_dma_memset: prepares a memset operation
  * @device_prep_dma_memset_sg: prepares a memset operation over a scatter list
  * @device_prep_dma_interrupt: prepares an end of chain interrupt operation
+ * @device_prep_dma_memcpy_sg: prepares memcpy between scatterlist and buffer
  * @device_prep_slave_sg: prepares a slave dma operation
  * @device_prep_dma_cyclic: prepare a cyclic dma operation suitable for audio.
  * The function takes a buffer of size buf_len. The callback function will
@@ -776,6 +777,10 @@ struct dma_device {
struct scatterlist *dst_sg, unsigned int dst_nents,
struct 

[PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver

2017-08-02 Thread Dave Jiang
v2:
- Make dma_prep_memcpy_* into one function per Dan.
- Addressed various comments from Ross with code formatting and etc.
- Replaced open code with offset_in_page() macro per Johannes.

The following series implements adds blk-mq support to the pmem block driver
and also adds infrastructure code to ioatdma and dmaengine in order to
support copying to and from scatterlist in order to process block
requests provided by blk-mq. The usage of DMA engines available on certain
platforms allow us to drastically reduce CPU utilization and at the same time
maintain performance that is good enough. Experimentations have been done on
DRAM backed pmem block device that showed the utilization of DMA engine is
beneficial. User can revert back to original behavior by providing
queue_mode=0 to the nd_pmem kernel module if desired.

---

Dave Jiang (5):
  dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels
  dmaengine: ioatdma: dma_prep_memcpy_sg support
  dmaengine: add SG support to dmaengine_unmap
  libnvdimm: Adding blk-mq support to the pmem driver
  libnvdimm: add DMA support for pmem blk-mq


 drivers/dma/dmaengine.c   |   45 +-
 drivers/dma/ioat/dma.h|4 +
 drivers/dma/ioat/init.c   |4 -
 drivers/dma/ioat/prep.c   |   57 
 drivers/nvdimm/pmem.c |  331 ++---
 drivers/nvdimm/pmem.h |3 
 include/linux/dmaengine.h |9 +
 7 files changed, 420 insertions(+), 33 deletions(-)

--
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 0/2] dax, dm: stop requiring dax for device-mapper

2017-08-02 Thread Dan Williams
Changes since v2 [1]:
* rebase on -next to integrate with commit 273752c9ff03 "dm, dax: Make
  sure dm_dax_flush() is called if device supports it" (kbuild robot)
* fix CONFIG_DAX dependencies to upgrade CONFIG_DAX=m to CONFIG_DAX=y
  (kbuild robot)

[1]: https://www.spinics.net/lists/kernel/msg2570522.html

---

Bart points out that the DAX core is unconditionally enabled if
device-mapper is enabled. Add some config machinery and some
stub-static-inline routines to allow dax infrastructure to be deleted
from device-mapper at compile time.

Since this depends on commit 273752c9ff03 that's already in -next, this
should go through the device-mapper tree.
---

Dan Williams (2):
  dax: introduce CONFIG_DAX_DRIVER
  dm: allow device-mapper to operate without dax support


 arch/powerpc/platforms/Kconfig |1 +
 drivers/block/Kconfig  |1 +
 drivers/dax/Kconfig|4 +++-
 drivers/md/Kconfig |2 +-
 drivers/md/dm-linear.c |6 ++
 drivers/md/dm-stripe.c |6 ++
 drivers/md/dm.c|   10 ++
 drivers/nvdimm/Kconfig |1 +
 drivers/s390/block/Kconfig |1 +
 include/linux/dax.h|   30 --
 10 files changed, 50 insertions(+), 12 deletions(-)
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v3 2/2] dm: allow device-mapper to operate without dax support

2017-08-02 Thread Dan Williams
Rather than have device-mapper directly 'select DAX', let the fact that
BLK_DEV_PMEM selects dax act as a gate for the device-mapper dax
support. We arrange for all the dax core routines to compile to nops
when CONFIG_DAX=n. With that in place we can simply handle the
alloc_dax() error as expected and ifdef out the other device-mapper-dax
support code.

Now, if dax is provided by a leaf driver that driver may only arrange to
compile the dax core as a module. Since device-mapper dax support is
consumed by the always-built-in portion of the device-mapper
implementation we need to upgrade from DAX=m to DAX=y.

Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Reported-by: Bart Van Assche 
Reported-by: kbuild test robot 
Signed-off-by: Dan Williams 
---
 drivers/md/Kconfig |2 +-
 drivers/md/dm-linear.c |6 ++
 drivers/md/dm-stripe.c |6 ++
 drivers/md/dm.c|   10 ++
 include/linux/dax.h|   30 --
 5 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 4a249ee86364..8ebf09e99006 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -195,12 +195,12 @@ config MD_CLUSTER
 source "drivers/md/bcache/Kconfig"
 
 config BLK_DEV_DM_BUILTIN
+   select DAX if DAX_DRIVER
bool
 
 config BLK_DEV_DM
tristate "Device mapper support"
select BLK_DEV_DM_BUILTIN
-   select DAX
---help---
  Device-mapper is a low level volume manager.  It works by allowing
  people to specify mappings for ranges of logical sectors.  Various
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 41971a090e34..8804e278e834 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -154,6 +154,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
+#if IS_ENABLED(CONFIG_DAX)
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
@@ -197,6 +198,11 @@ static void linear_dax_flush(struct dm_target *ti, pgoff_t 
pgoff, void *addr,
return;
dax_flush(dax_dev, pgoff, addr, size);
 }
+#else
+#define linear_dax_direct_access NULL
+#define linear_dax_copy_from_iter NULL
+#define linear_dax_flush NULL
+#endif
 
 static struct target_type linear_target = {
.name   = "linear",
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index a0375530b07f..eeb6c784dc4f 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -311,6 +311,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
 }
 
+#if IS_ENABLED(CONFIG_DAX)
 static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
@@ -369,6 +370,11 @@ static void stripe_dax_flush(struct dm_target *ti, pgoff_t 
pgoff, void *addr,
return;
dax_flush(dax_dev, pgoff, addr, size);
 }
+#else
+#define stripe_dax_direct_access NULL
+#define stripe_dax_copy_from_iter NULL
+#define stripe_dax_flush NULL
+#endif
 
 /*
  * Stripe status:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2edbcc2d7d3f..70fa48f4d3a3 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1713,7 +1713,7 @@ static void cleanup_mapped_device(struct mapped_device 
*md)
 static struct mapped_device *alloc_dev(int minor)
 {
int r, numa_node_id = dm_get_numa_node();
-   struct dax_device *dax_dev;
+   struct dax_device *dax_dev = NULL;
struct mapped_device *md;
void *old_md;
 
@@ -1779,9 +1779,11 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
-   if (!dax_dev)
-   goto bad;
+   if (IS_ENABLED(CONFIG_DAX)) {
+   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
+   if (!dax_dev)
+   goto bad;
+   }
md->dax_dev = dax_dev;
 
add_disk(md->disk);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index eb0bff6f1eab..59575b8e638e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -27,16 +27,39 @@ extern struct attribute_group dax_attribute_group;
 
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
+struct dax_device *alloc_dax(void *private, const char *host,
+   const struct dax_operations *ops);
 void put_dax(struct dax_device *dax_dev);
+void kill_dax(struct dax_device *dax_dev);
+void dax_write_cache(struct dax_device *dax_dev, bool wc);
+bool dax_write_cache_enabled(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
return 

[PATCH v3 1/2] dax: introduce CONFIG_DAX_DRIVER

2017-08-02 Thread Dan Williams
In support of allowing device-mapper to compile out idle/dead code when
there are no dax providers in the system, introduce the DAX_DRIVER
symbol. This is selected by all leaf drivers that device-mapper might be
layered on top. This allows device-mapper to 'select DAX', i.e. upgrade
it from DAX=m to DAX=y, when a provider is present.

Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Gerald Schaefer 
Cc: Benjamin Herrenschmidt 
Cc: Mike Snitzer 
Cc: Bart Van Assche 
Signed-off-by: Dan Williams 
---
 arch/powerpc/platforms/Kconfig |1 +
 drivers/block/Kconfig  |1 +
 drivers/dax/Kconfig|4 +++-
 drivers/nvdimm/Kconfig |1 +
 drivers/s390/block/Kconfig |1 +
 5 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 4fd64d3f5c44..4561340c1f92 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -296,6 +296,7 @@ config AXON_RAM
tristate "Axon DDR2 memory device driver"
depends on PPC_IBM_CELL_BLADE && BLOCK
select DAX
+   select DAX_DRIVER
default m
help
  It registers one block device per Axon's DDR2 memory bank found
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 8ddc98279c8f..e1b6f4d2a716 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -324,6 +324,7 @@ config BLK_DEV_SX8
 config BLK_DEV_RAM
tristate "RAM block device support"
select DAX if BLK_DEV_RAM_DAX
+   select DAX_DRIVER if BLK_DEV_RAM_DAX
---help---
  Saying Y here will allow you to use a portion of your RAM memory as
  a block device, so that you can make file systems on it, read and
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index b79aa8f7a497..9bf940eb9c06 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,3 +1,6 @@
+config DAX_DRIVER
+   bool
+
 menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
@@ -16,7 +19,6 @@ config DEV_DAX
  baseline memory pool.  Mappings of a /dev/daxX.Y device impose
  restrictions that make the mapping behavior deterministic.
 
-
 config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 5bdd499b5f4f..afe4018d76cf 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -21,6 +21,7 @@ config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
select DAX
+   select DAX_DRIVER
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index 31f014b57bfc..3f563f2f33d6 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -15,6 +15,7 @@ config BLK_DEV_XPRAM
 config DCSSBLK
def_tristate m
select DAX
+   select DAX_DRIVER
prompt "DCSSBLK support"
depends on S390 && BLOCK
help

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2] dm: allow device-mapper to operate without dax support

2017-08-02 Thread kbuild test robot
Hi Dan,

[auto build test ERROR on dm/for-next]
[also build test ERROR on v4.13-rc3 next-20170802]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Dan-Williams/dm-allow-device-mapper-to-operate-without-dax-support/20170802-155255
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
for-next
config: s390-defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   drivers/md/dm.o: In function `cleanup_mapped_device':
>> drivers/md/dm.c:1684: undefined reference to `kill_dax'
>> drivers/md/dm.c:1685: undefined reference to `put_dax'
   drivers/md/dm.o: In function `close_table_device':
   drivers/md/dm.c:651: undefined reference to `put_dax'
   drivers/md/dm.o: In function `open_table_device':
>> drivers/md/dm.c:637: undefined reference to `dax_get_by_host'
   drivers/md/dm.o: In function `dm_dax_flush':
>> drivers/md/dm.c:1003: undefined reference to `dax_get_private'
   drivers/md/dm.o: In function `dm_dax_copy_from_iter':
   drivers/md/dm.c:979: undefined reference to `dax_get_private'
   drivers/md/dm.o: In function `dm_dax_direct_access':
   drivers/md/dm.c:951: undefined reference to `dax_get_private'
   drivers/md/dm.o: In function `alloc_dev':
>> drivers/md/dm.c:1783: undefined reference to `alloc_dax'
   drivers/md/dm-table.o: In function `device_dax_write_cache_enabled':
   drivers/md/dm-table.c:1644: undefined reference to `dax_write_cache_enabled'
   drivers/md/dm-table.o: In function `dm_table_set_restrictions':
   drivers/md/dm-table.c:1822: undefined reference to `dax_write_cache'
   drivers/md/dm-linear.o: In function `linear_dax_flush':
>> drivers/md/dm-linear.c:197: undefined reference to `bdev_dax_pgoff'
>> drivers/md/dm-linear.c:199: undefined reference to `dax_flush'
   drivers/md/dm-linear.o: In function `linear_dax_copy_from_iter':
   drivers/md/dm-linear.c:183: undefined reference to `bdev_dax_pgoff'
>> drivers/md/dm-linear.c:185: undefined reference to `dax_copy_from_iter'
   drivers/md/dm-linear.o: In function `linear_dax_direct_access':
   drivers/md/dm-linear.c:168: undefined reference to `bdev_dax_pgoff'
>> drivers/md/dm-linear.c:171: undefined reference to `dax_direct_access'
   drivers/md/dm-stripe.o: In function `stripe_dax_flush':
>> drivers/md/dm-stripe.c:369: undefined reference to `bdev_dax_pgoff'
>> drivers/md/dm-stripe.c:371: undefined reference to `dax_flush'
   drivers/md/dm-stripe.o: In function `stripe_dax_copy_from_iter':
   drivers/md/dm-stripe.c:350: undefined reference to `bdev_dax_pgoff'
>> drivers/md/dm-stripe.c:352: undefined reference to `dax_copy_from_iter'
   drivers/md/dm-stripe.o: In function `stripe_dax_direct_access':
   drivers/md/dm-stripe.c:330: undefined reference to `bdev_dax_pgoff'
>> drivers/md/dm-stripe.c:333: undefined reference to `dax_direct_access'

vim +1684 drivers/md/dm.c

4a0b4ddf2 Mike Snitzer2010-08-12  1672  
0f20972f7 Mike Snitzer2015-04-28  1673  static void 
cleanup_mapped_device(struct mapped_device *md)
0f20972f7 Mike Snitzer2015-04-28  1674  {
0f20972f7 Mike Snitzer2015-04-28  1675  if (md->wq)
0f20972f7 Mike Snitzer2015-04-28  1676  
destroy_workqueue(md->wq);
0f20972f7 Mike Snitzer2015-04-28  1677  if (md->kworker_task)
0f20972f7 Mike Snitzer2015-04-28  1678  
kthread_stop(md->kworker_task);
0f20972f7 Mike Snitzer2015-04-28  1679  
mempool_destroy(md->io_pool);
0f20972f7 Mike Snitzer2015-04-28  1680  if (md->bs)
0f20972f7 Mike Snitzer2015-04-28  1681  
bioset_free(md->bs);
0f20972f7 Mike Snitzer2015-04-28  1682  
f26c5719b Dan Williams2017-04-12  1683  if (md->dax_dev) {
f26c5719b Dan Williams2017-04-12 @1684  
kill_dax(md->dax_dev);
f26c5719b Dan Williams2017-04-12 @1685  
put_dax(md->dax_dev);
f26c5719b Dan Williams2017-04-12  1686  md->dax_dev = 
NULL;
f26c5719b Dan Williams2017-04-12  1687  }
f26c5719b Dan Williams2017-04-12  1688  
0f20972f7 Mike Snitzer2015-04-28  1689  if (md->disk) {
0f20972f7 Mike Snitzer2015-04-28  1690  
spin_lock(&_minor_lock);
0f20972f7 Mike Snitzer2015-04-28  1691  
md->disk->private_data = NULL;
0f20972f7 Mike Snitzer2015-04-28  1692  
spin_unlock(&_minor_l

制造业新产品导入与NPI实务

2017-08-02 Thread 丁总

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


门店管理七大核心问题解决模板

2017-08-02 Thread 徐总

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] nvdimm: avoid bogus -Wmaybe-uninitialized warning

2017-08-02 Thread Arnd Bergmann
On Wed, Aug 2, 2017 at 12:23 AM, Ross Zwisler
 wrote:
> On Tue, Aug 01, 2017 at 02:45:34PM -0700, Andrew Morton wrote:
>> On Tue,  1 Aug 2017 13:48:48 +0200 Arnd Bergmann  wrote:
>> > --- a/drivers/nvdimm/nd.h
>> > +++ b/drivers/nvdimm/nd.h
>> > @@ -392,8 +392,10 @@ static inline bool nd_iostat_start(struct bio *bio, 
>> > unsigned long *start)
>> >  {
>> > struct gendisk *disk = bio->bi_bdev->bd_disk;
>> >
>> > -   if (!blk_queue_io_stat(disk->queue))
>> > +   if (!blk_queue_io_stat(disk->queue)) {
>> > +   *start = 0;
>> > return false;
>> > +   }
>> >
>> > *start = jiffies;
>> > generic_start_io_acct(bio_data_dir(bio),
>>
>> Well that's sad.
>>
>> The future of btt-remove-btt_rw_page.patch and friends is shrouded in
>> mystery, but if we proceed that way then yes, I guess we'll need to
>> work around such gcc glitches.
>>
>> But let's not leave apparently-unneeded code in place without telling
>> people why it is in fact needed?
>
> Maybe it's just cleaner to initialize 'start' in all the callers, so we don't
> have a mysterious line and have to remember why it's there / comment it?

I considered that but decided that would be worse, since it shuts up more
potential warnings about actual uninitialized use of the variable, and is
slightly harder for the compiler to optimize away. You also end up having
to add a comment in multiple places. Note that Andrew already added
a comment when he applied my patch to his mmotm tree.

 Arnd
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm