Wingate Dunross: Leaders of Executive Search

2017-10-26 Thread Nicholas Meyler
Dear Entrepreneurs/VCs, Hiring Principals, Fellow Scientists/Engineers and HR 
People:

Are you, or your Company, currently trying to fill a particularly challenging 
yet vital position?

Nothing is more valuable to a Corporation than 'Human Capital', and a top 
Recruiter can save you and your company both time and money by filling a key 
position quickly, without the expenditure of countless man-hours wasted on a 
futile effort. Enlisting a wise consultative/collaborative Recruiter with an 
Engineering degree to your side can be a huge advantage, achieving rapid 
connections with a superior echelon of candidates; so is having a Recruiter 
whose research skills were formed and honed at one of the top-ranked academic 
departments in the World.

I offer both qualities, and even provide 'no-risk' guarantees of satisfaction 
on ALL retained searches. Whether your search needs are for junior positions or 
top-level senior scientists with hundreds of patents, I can demonstrate a 
track-record of prior success. Please contact me if you have a hiring need that 
requires special insight, dedication and intelligence to resolve.

Best Regards and Thank You,

Nicholas Meyler
GM/President, Technology
Wingate Dunross Associates, Inc.
ph (818)597-3200 ext. 211
ni...@wdsearch.com



Partial List of Successfully-Completed Engineering Searches:
3D Printing Mechanical Engineer at IrOs LLC
Analytical Chemist at NanoH2O
Board of Scientific Advisors Member at QSI Nanotech
Business Development Manager at Koch-Knight
CEO at RSET Technologies
CEO/COO at Eutricity
CEO/VP Sales and Marketing at Opta
Chief Chemical Engineer at CoolPlanet Biofuels
Chief Product Officer at Nexa3D
CTO at Unidym
Director of Nonvolatile Memory Business Development at Nanosys
Director of Quantum Computational Chemistry at Nanostellar
Director of Technical Marketing at Analogix
Director of Technical Sales at Bitboys
Director of Western Region Sales at S3
Director of WW Sales at NanoInk
DRAM Marketing Manager at Mitsubishi
Electrical Equipment Engineer at Diamond Foundry
Executive VP at Chroma Energy
Field Applications Engineer (Optical Ethernet) at TE Connectivity
Field Applications Engineer at ATI
Field Applications Engineer at Cyrix
Field Applications Engineer at Genesis Microchip
Field Applications Engineer at MediaQ
Field Applications Engineer at NanoH2O
Field Applications Engineer at Weitek
Machine-Vision Scientist at TE Connectivity
Materials Scientist/Rheologist at IrOs LLC
Microfluidics Scientist at Bio-Rad Laboratories
Principle Systems Architect
QLEDs Device Physicist at NanoPhotonica
Senior CMOS Process Integration Engineer at Nantero
Senior Scientist High-Temp Materials at Morgan AM
Technical Marketing Manager at ATI
Technical Marketing Manager at Auravision
Technical Marketing Manager at C-Cube Microsystems
VP Business Development at MicroOptical Corp
VP Business Development at Ubiquitous Energy
VP of Chemical Engineering at Nantero
VP Intellectual Property and Licensing at Nantero
VP of Government Business Development at Nantero/Lockheed
VP of WW PR/Marcomm at 3Dfx
VP of WW Sales and Marketing at 3Dfx
VP Products at Unidym
VP Sales and Marketing at Memsic
VP Sales and Marketing at Millennial.net

Selected Accomplishments:
10 retained software placements at Rasna (3rd fastest-growing startup in the 
Nation, later sold to PTC for $500 million). My first retained search (1989) 
identified a candidate from a graduate program at RPI with studies in 
Differential Geometry in under two weeks, who accepted the job and spent 12 
years with the company rising to VP status. No other search firms in the World 
had been able to produce any candidates at all for several months prior to my 
work.
 
21 placements at Nantero (featured on the cover of Scientific American as 
revolutionary nanotechnology, half of company acquired by Lockheed Martin). I 
am a proud shareholder, as well.  Yes, we do accept stock options as fees, if 
desired... 

4 placements at NanoH2O (sold to LG Chem for $200 million) 

10 retained placements at MicroDisplay, Inc. (miniature high-res LCD chips)
 
12 placements at TE Connectivity (world’s leading connectivity company)
 
Placed prolific Inventor with 391 granted patents and 356 patent applications 
still in process (extracted from Micron) with a $1million sign-on bonus. He led 
company to $billion+ revenues by solving key production issues. 

Placed prolific Inventor with 170 patents in conductive ink chemistry, etc. 
(extracted from Xerox)



http://app.streamsend.com/private/u4Kt/nKR/rPOzpjo/unsubscribe/29848761
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support

2017-10-26 Thread Williams, Dan J
On Thu, 2017-10-26 at 12:58 +0200, Jan Kara wrote:
> On Fri 20-10-17 11:31:48, Christoph Hellwig wrote:
> > On Fri, Oct 20, 2017 at 09:47:50AM +0200, Christoph Hellwig wrote:
> > > I'd like to brainstorm how we can do something better.
> > > 
> > > How about:
> > > 
> > > If we hit a page with an elevated refcount in truncate / hole puch
> > > etc for a DAX file system we do not free the blocks in the file system,
> > > but add it to the extent busy list.  We mark the page as delayed
> > > free (e.g. page flag?) so that when it finally hits refcount zero we
> > > call back into the file system to remove it from the busy list.
> > 
> > Brainstorming some more:
> > 
> > Given that on a DAX file there shouldn't be any long-term page
> > references after we unmap it from the page table and don't allow
> > get_user_pages calls why not wait for the references for all
> > DAX pages to go away first?  E.g. if we find a DAX page in
> > truncate_inode_pages_range that has an elevated refcount we set
> > a new flag to prevent new references from showing up, and then
> > simply wait for it to go away.  Instead of a busy way we can
> > do this through a few hashed waitqueued in dev_pagemap.  And in
> > fact put_zone_device_page already gets called when putting the
> > last page so we can handle the wakeup from there.
> > 
> > In fact if we can't find a page flag for the stop new callers
> > things we could probably come up with a way to do that through
> > dev_pagemap somehow, but I'm not sure how efficient that would
> > be.
> 
> We were talking about this yesterday with Dan so some more brainstorming
> from us. We can implement the solution with extent busy list in ext4
> relatively easily - we already have such list currently similarly to XFS.
> There would be some modifications needed but nothing too complex. The
> biggest downside of this solution I see is that it requires per-filesystem
> solution for busy extents - ext4 and XFS are reasonably fine, however btrfs
> may have problems and ext2 definitely will need some modifications.
> Invisible used blocks may be surprising to users at times although given
> page refs should be relatively short term, that should not be a big issue.
> But are we guaranteed page refs are short term? E.g. if someone creates
> v4l2 videobuf in MAP_SHARED mapping of a file on DAX filesystem, page refs
> can be rather long-term similarly as in RDMA case. Also freeing of blocks
> on page reference drop is another async entry point into the filesystem
> which could unpleasantly surprise us but I guess workqueues would solve
> that reasonably fine.
> 
> WRT waiting for page refs to be dropped before proceeding with truncate (or
> punch hole for that matter - that case is even nastier since we don't have
> i_size to guard us). What I like about this solution is that it is very
> visible there's something unusual going on with the file being truncated /
> punched and so problems are easier to diagnose / fix from the admin side.
> So far we have guarded hole punching from concurrent faults (and
> get_user_pages() does fault once you do unmap_mapping_range()) with
> I_MMAP_LOCK (or its equivalent in ext4). We cannot easily wait for page
> refs to be dropped under I_MMAP_LOCK as that could deadlock - the most
> obvious case Dan came up with is when GUP obtains ref to page A, then hole
> punch comes grabbing I_MMAP_LOCK and waiting for page ref on A to be
> dropped, and then GUP blocks on trying to fault in another page.
> 
> I think we cannot easily prevent new page references to be grabbed as you
> write above since nobody expects stuff like get_page() to fail. But I 
> think that unmapping relevant pages and then preventing them to be faulted
> in again is workable and stops GUP as well. The problem with that is though
> what to do with page faults to such pages - you cannot just fail them for
> hole punch, and you cannot easily allocate new blocks either. So we are
> back at a situation where we need to detach blocks from the inode and then
> wait for page refs to be dropped - so some form of busy extents. Am I
> missing something?
> 

No, that's a good summary of what we talked about. However, I did go
back and give the new lock approach a try and was able to get my test
to pass. The new locking is not pretty especially since you need to
drop and reacquire the lock so that get_user_pages() can finish
grabbing all the pages it needs. Here are the two primary patches in
the series, do you think the extent-busy approach would be cleaner?

---

commit 5023d20a0aa795ddafd43655be1bfb2cbc7f4445
Author: Dan Williams 
Date:   Wed Oct 25 05:14:54 2017 -0700

mm, dax: handle truncate of dma-busy pages

get_user_pages() pins file backed memory pages for access by dma
devices. However, it only pins the memory pages not the page-to-file
offset association. If a file is truncated the pages are mapped out of
the file and dma may continue indefinitely into 

Re: [bug report] libnvdimm: clear the internal poison_list when clearing badblocks

2017-10-26 Thread Dan Williams
On Thu, Oct 26, 2017 at 3:29 AM, Dan Carpenter  wrote:
> Hello Vishal Verma,
>
> The patch e046114af5fc: "libnvdimm: clear the internal poison_list
> when clearing badblocks" from Sep 30, 2016, leads to the following
> static checker warning:

Thanks for the report Dan, we'll take a look.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 17/17] xfs: support for synchronous DAX faults

2017-10-26 Thread Dave Chinner
On Thu, Oct 26, 2017 at 05:48:04PM +0200, Jan Kara wrote:
> On Wed 25-10-17 09:23:22, Dave Chinner wrote:
> > On Tue, Oct 24, 2017 at 05:24:14PM +0200, Jan Kara wrote:
> > > From: Christoph Hellwig 
> > > 
> > > Return IOMAP_F_DIRTY from xfs_file_iomap_begin() when asked to prepare
> > > blocks for writing and the inode is pinned, and has dirty fields other
> > > than the timestamps.
> > 
> > That's "fdatasync dirty", not "fsync dirty".
> 
> Correct.
> 
> > IOMAP_F_DIRTY needs a far better description of it's semantics than
> > "/* block mapping is not yet on persistent storage */" so we know
> > exactly what filesystems are supposed to be implementing here. I
> > suspect that what it really is meant to say is:
> > 
> > /*
> >  * IOMAP_F_DIRTY indicates the inode has uncommitted metadata to
> >  * written data and requires fdatasync to commit to persistent storage.
> >  */
> 
> I'll update the comment. Thanks!
> 
> > []
> > 
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index f179bdf1644d..b43be199fbdf 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -33,6 +33,7 @@
> > >  #include "xfs_error.h"
> > >  #include "xfs_trans.h"
> > >  #include "xfs_trans_space.h"
> > > +#include "xfs_inode_item.h"
> > >  #include "xfs_iomap.h"
> > >  #include "xfs_trace.h"
> > >  #include "xfs_icache.h"
> > > @@ -1086,6 +1087,10 @@ xfs_file_iomap_begin(
> > >   trace_xfs_iomap_found(ip, offset, length, 0, );
> > >   }
> > >  
> > > + if ((flags & IOMAP_WRITE) && xfs_ipincount(ip) &&
> > > + (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> > > + iomap->flags |= IOMAP_F_DIRTY;
> > 
> > This is the very definition of an inode that is "fdatasync dirty".
> > 
> > H, shouldn't this also be set for read faults, too?
> 
> No, read faults don't need to set IOMAP_F_DIRTY since user cannot write any
> data to the page which he'd then like to be persistent. The only reason why
> I thought it could be useful for a while was that it would be nice to make
> MAP_SYNC mapping provide the guarantee that data you see now is the data
> you'll see after a crash

Isn't that the entire point of MAP_SYNC? i.e. That when we return
from a page fault, the app knows that the data and it's underlying
extent is on persistent storage?

> but we cannot provide that guarantee for RO
> mapping anyway if someone else has the page mapped as well. So I just
> decided not to return IOMAP_F_DIRTY for read faults.

If there are multiple MAP_SYNC mappings to the inode, I would have
expected that they all sync all of the data/metadata on every page
fault, regardless of who dirtied the inode. An RO mapping doesn't
mean the data/metadata on the inode can't change, it just means it
can't change through that mapping.  Running fsync() to guarantee the
persistence of that data/metadata doesn't actually changing any
data

IOWs, if read faults don't guarantee the mapped range has stable
extents on a MAP_SYNC mapping, then I think MAP_SYNC is broken
because it's not giving consistent guarantees to userspace. Yes, it
works fine when only one MAP_SYNC mapping is modifying the inode,
but the moment we have concurrent operations on the inode that
aren't MAP_SYNC or O_SYNC this goes out the window

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] mmap.2: Add description of MAP_SHARED_VALIDATE and MAP_SYNC

2017-10-26 Thread Ross Zwisler
On Thu, Oct 26, 2017 at 03:24:02PM +0200, Jan Kara wrote:
> On Tue 24-10-17 15:10:07, Ross Zwisler wrote:
> > On Tue, Oct 24, 2017 at 05:24:15PM +0200, Jan Kara wrote:
> > > Signed-off-by: Jan Kara 
> > 
> > This looks unchanged since the previous version?
> 
> Ah, thanks for checking. I forgot to commit modifications. Attached is
> really updated patch.
> 
>   Honza
> -- 
> Jan Kara 
> SUSE Labs, CR

> From 59eeec2998ed9b3840aab951f213148cb1d053a5 Mon Sep 17 00:00:00 2001
> From: Jan Kara 
> Date: Thu, 19 Oct 2017 14:44:55 +0200
> Subject: [PATCH] mmap.2: Add description of MAP_SHARED_VALIDATE and MAP_SYNC
> 
> Signed-off-by: Jan Kara 

Looks good, you can add: 

Reviewed-by: Ross Zwisler 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 17/17] xfs: support for synchronous DAX faults

2017-10-26 Thread Jan Kara
On Wed 25-10-17 09:23:22, Dave Chinner wrote:
> On Tue, Oct 24, 2017 at 05:24:14PM +0200, Jan Kara wrote:
> > From: Christoph Hellwig 
> > 
> > Return IOMAP_F_DIRTY from xfs_file_iomap_begin() when asked to prepare
> > blocks for writing and the inode is pinned, and has dirty fields other
> > than the timestamps.
> 
> That's "fdatasync dirty", not "fsync dirty".

Correct.

> IOMAP_F_DIRTY needs a far better description of it's semantics than
> "/* block mapping is not yet on persistent storage */" so we know
> exactly what filesystems are supposed to be implementing here. I
> suspect that what it really is meant to say is:
> 
> /*
>  * IOMAP_F_DIRTY indicates the inode has uncommitted metadata to
>  * written data and requires fdatasync to commit to persistent storage.
>  */

I'll update the comment. Thanks!

> []
> 
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index f179bdf1644d..b43be199fbdf 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -33,6 +33,7 @@
> >  #include "xfs_error.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_trans_space.h"
> > +#include "xfs_inode_item.h"
> >  #include "xfs_iomap.h"
> >  #include "xfs_trace.h"
> >  #include "xfs_icache.h"
> > @@ -1086,6 +1087,10 @@ xfs_file_iomap_begin(
> > trace_xfs_iomap_found(ip, offset, length, 0, );
> > }
> >  
> > +   if ((flags & IOMAP_WRITE) && xfs_ipincount(ip) &&
> > +   (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> > +   iomap->flags |= IOMAP_F_DIRTY;
> 
> This is the very definition of an inode that is "fdatasync dirty".
> 
> H, shouldn't this also be set for read faults, too?

No, read faults don't need to set IOMAP_F_DIRTY since user cannot write any
data to the page which he'd then like to be persistent. The only reason why
I thought it could be useful for a while was that it would be nice to make
MAP_SYNC mapping provide the guarantee that data you see now is the data
you'll see after a crash but we cannot provide that guarantee for RO
mapping anyway if someone else has the page mapped as well. So I just
decided not to return IOMAP_F_DIRTY for read faults.

But now that I look at XFS implementation again, it misses handling
of VM_FAULT_NEEDSYNC in xfs_filemap_pfn_mkwrite() (ext4 gets this right).
I'll fix this by using __xfs_filemap_fault() for xfs_filemap_pfn_mkwrite()
as well since it mostly duplicates it anyway... Thanks for inquiring!

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


˭�ܾ��쿪�������Ͽͻ������IJ�Ʒ��˭���Ǵ�Ӯ�ң���

2017-10-26 Thread ����

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


��������Ŀ��������Ʒ����֮���Ĺ�����ϵ

2017-10-26 Thread ������

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2017-10-26 Thread Petrosyan, Ludwig


- Original Message -
> From: "David Laight" 
> To: "Petrosyan, Ludwig" , "Logan Gunthorpe" 
> 
> Cc: "Alexander Deucher" , "linux-kernel" 
> , "linux-rdma"
> , "linux-nvdimm" , 
> "Linux-media" ,
> "dri-devel" , "linux-pci" 
> , "John Bridgman"
> , "Felix Kuehling" , "Serguei 
> Sagalovitch"
> , "Paul Blinzer" , 
> "Christian Koenig" ,
> "Suravee Suthikulpanit" , "Ben Sander" 
> 
> Sent: Tuesday, 24 October, 2017 16:58:24
> Subject: RE: Enabling peer to peer device transactions for PCIe devices

> Please don't top post, write shorter lines, and add the odd blank line.
> Big blocks of text are hard to read quickly.
> 

OK this time I am very short. 
peer2peer works

Ludwig
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] mmap.2: Add description of MAP_SHARED_VALIDATE and MAP_SYNC

2017-10-26 Thread Jan Kara
On Tue 24-10-17 15:10:07, Ross Zwisler wrote:
> On Tue, Oct 24, 2017 at 05:24:15PM +0200, Jan Kara wrote:
> > Signed-off-by: Jan Kara 
> 
> This looks unchanged since the previous version?

Ah, thanks for checking. I forgot to commit modifications. Attached is
really updated patch.

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From 59eeec2998ed9b3840aab951f213148cb1d053a5 Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Thu, 19 Oct 2017 14:44:55 +0200
Subject: [PATCH] mmap.2: Add description of MAP_SHARED_VALIDATE and MAP_SYNC

Signed-off-by: Jan Kara 
---
 man2/mmap.2 | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 47c3148653be..b38ee6809327 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -125,6 +125,21 @@ are carried through to the underlying file.
 to the underlying file requires the use of
 .BR msync (2).)
 .TP
+.BR MAP_SHARED_VALIDATE " (since Linux 4.15)"
+The same as
+.B MAP_SHARED
+except that
+.B MAP_SHARED
+mappings ignore unknown flags in
+.IR flags .
+In contrast when creating mapping of
+.B MAP_SHARED_VALIDATE
+mapping type, the kernel verifies all passed flags are known and fails the
+mapping with
+.BR EOPNOTSUPP
+otherwise. This mapping type is also required to be able to use some mapping
+flags.
+.TP
 .B MAP_PRIVATE
 Create a private copy-on-write mapping.
 Updates to the mapping are not visible to other processes
@@ -134,7 +149,10 @@ It is unspecified whether changes made to the file after the
 .BR mmap ()
 call are visible in the mapped region.
 .PP
-Both of these flags are described in POSIX.1-2001 and POSIX.1-2008.
+.B MAP_SHARED
+and
+.B MAP_PRIVATE
+are described in POSIX.1-2001 and POSIX.1-2008.
 .PP
 In addition, zero or more of the following values can be ORed in
 .IR flags :
@@ -352,6 +370,21 @@ option.
 Because of the security implications,
 that option is normally enabled only on embedded devices
 (i.e., devices where one has complete control of the contents of user memory).
+.TP
+.BR MAP_SYNC " (since Linux 4.15)"
+This flags is available only with
+.B MAP_SHARED_VALIDATE
+mapping type. Mappings of
+.B MAP_SHARED
+type will silently ignore this flag.
+This flag is supported only for files supporting DAX (direct mapping of persistent
+memory). For other files, creating mapping with this flag results in
+.B EOPNOTSUPP
+error. Shared file mappings with this flag provide the guarantee that while
+some memory is writeably mapped in the address space of the process, it will
+be visible in the same file at the same offset even after the system crashes or
+is rebooted. This allows users of such mappings to make data modifications
+persistent in a more efficient way using appropriate CPU instructions.
 .PP
 Of the above flags, only
 .B MAP_FIXED
-- 
2.12.3

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[bug report] libnvdimm: add support for clear poison list and badblocks for device dax

2017-10-26 Thread Dan Carpenter
Hello Dave Jiang,

The patch 006358b35c73: "libnvdimm: add support for clear poison list
and badblocks for device dax" from Apr 7, 2017, leads to the
following static checker warning:

drivers/nvdimm/bus.c:852 nd_pmem_forget_poison_check()
warn: we tested 'nd_dax' before and it was 'false'

drivers/nvdimm/bus.c
   835  static int nd_pmem_forget_poison_check(struct device *dev, void *data)
   836  {
   837  struct nd_cmd_clear_error *clear_err =
   838  (struct nd_cmd_clear_error *)data;
   839  struct nd_btt *nd_btt = is_nd_btt(dev) ? to_nd_btt(dev) : NULL;
   840  struct nd_pfn *nd_pfn = is_nd_pfn(dev) ? to_nd_pfn(dev) : NULL;
   841  struct nd_dax *nd_dax = is_nd_dax(dev) ? to_nd_dax(dev) : NULL;
   ^^
nd_dax is set here.

   842  struct nd_namespace_common *ndns = NULL;
   843  struct nd_namespace_io *nsio;
   844  resource_size_t offset = 0, end_trunc = 0, start, end, pstart, 
pend;
   845  
   846  if (nd_dax || !dev->driver)
^^
We return if it's non-NULL

   847  return 0;
   848  
   849  start = clear_err->address;
   850  end = clear_err->address + clear_err->cleared - 1;
   851  
   852  if (nd_btt || nd_pfn || nd_dax) {
   853  if (nd_btt)
   854  ndns = nd_btt->ndns;
   855  else if (nd_pfn)
   856  ndns = nd_pfn->ndns;
   857  else if (nd_dax)
 ^^
but the rest of the function assumes it can be true.  Perhaps we plan
to enable it in the future?  It's not clear to me.

   858  ndns = nd_dax->nd_pfn.ndns;
   859  
   860  if (!ndns)
   861  return 0;
   862  } else
   863  ndns = to_ndns(dev);
   864  
   865  nsio = to_nd_namespace_io(>dev);
   866  pstart = nsio->res.start + offset;
   867  pend = nsio->res.end - end_trunc;
   868  
   869  if ((pstart >= start) && (pend <= end))
   870  return -EBUSY;
   871  
   872  return 0;
   873  
   874  }

regards,
dan carpenter
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support

2017-10-26 Thread Jan Kara
On Fri 20-10-17 11:31:48, Christoph Hellwig wrote:
> On Fri, Oct 20, 2017 at 09:47:50AM +0200, Christoph Hellwig wrote:
> > I'd like to brainstorm how we can do something better.
> > 
> > How about:
> > 
> > If we hit a page with an elevated refcount in truncate / hole puch
> > etc for a DAX file system we do not free the blocks in the file system,
> > but add it to the extent busy list.  We mark the page as delayed
> > free (e.g. page flag?) so that when it finally hits refcount zero we
> > call back into the file system to remove it from the busy list.
> 
> Brainstorming some more:
> 
> Given that on a DAX file there shouldn't be any long-term page
> references after we unmap it from the page table and don't allow
> get_user_pages calls why not wait for the references for all
> DAX pages to go away first?  E.g. if we find a DAX page in
> truncate_inode_pages_range that has an elevated refcount we set
> a new flag to prevent new references from showing up, and then
> simply wait for it to go away.  Instead of a busy way we can
> do this through a few hashed waitqueued in dev_pagemap.  And in
> fact put_zone_device_page already gets called when putting the
> last page so we can handle the wakeup from there.
> 
> In fact if we can't find a page flag for the stop new callers
> things we could probably come up with a way to do that through
> dev_pagemap somehow, but I'm not sure how efficient that would
> be.

We were talking about this yesterday with Dan so some more brainstorming
from us. We can implement the solution with extent busy list in ext4
relatively easily - we already have such list currently similarly to XFS.
There would be some modifications needed but nothing too complex. The
biggest downside of this solution I see is that it requires per-filesystem
solution for busy extents - ext4 and XFS are reasonably fine, however btrfs
may have problems and ext2 definitely will need some modifications.
Invisible used blocks may be surprising to users at times although given
page refs should be relatively short term, that should not be a big issue.
But are we guaranteed page refs are short term? E.g. if someone creates
v4l2 videobuf in MAP_SHARED mapping of a file on DAX filesystem, page refs
can be rather long-term similarly as in RDMA case. Also freeing of blocks
on page reference drop is another async entry point into the filesystem
which could unpleasantly surprise us but I guess workqueues would solve
that reasonably fine.

WRT waiting for page refs to be dropped before proceeding with truncate (or
punch hole for that matter - that case is even nastier since we don't have
i_size to guard us). What I like about this solution is that it is very
visible there's something unusual going on with the file being truncated /
punched and so problems are easier to diagnose / fix from the admin side.
So far we have guarded hole punching from concurrent faults (and
get_user_pages() does fault once you do unmap_mapping_range()) with
I_MMAP_LOCK (or its equivalent in ext4). We cannot easily wait for page
refs to be dropped under I_MMAP_LOCK as that could deadlock - the most
obvious case Dan came up with is when GUP obtains ref to page A, then hole
punch comes grabbing I_MMAP_LOCK and waiting for page ref on A to be
dropped, and then GUP blocks on trying to fault in another page.

I think we cannot easily prevent new page references to be grabbed as you
write above since nobody expects stuff like get_page() to fail. But I 
think that unmapping relevant pages and then preventing them to be faulted
in again is workable and stops GUP as well. The problem with that is though
what to do with page faults to such pages - you cannot just fail them for
hole punch, and you cannot easily allocate new blocks either. So we are
back at a situation where we need to detach blocks from the inode and then
wait for page refs to be dropped - so some form of busy extents. Am I
missing something?

Honza
-- 
Jan Kara 
SUSE Labs, CR
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[bug report] libnvdimm: clear the internal poison_list when clearing badblocks

2017-10-26 Thread Dan Carpenter
Hello Vishal Verma,

The patch e046114af5fc: "libnvdimm: clear the internal poison_list
when clearing badblocks" from Sep 30, 2016, leads to the following
static checker warning:

drivers/nvdimm/core.c:601 nvdimm_forget_poison()
warn: potential integer overflow from user 'start + len'

drivers/nvdimm/core.c
   597  void nvdimm_forget_poison(struct nvdimm_bus *nvdimm_bus, phys_addr_t 
start,
   598  unsigned int len)
   599  {
   600  struct list_head *poison_list = _bus->poison_list;
   601  u64 clr_end = start + len - 1;
  ^^^
Thes come from the __nd_ioctl() and it looks like they haven't been
checked before we call this function.  It's hard for me to read this
function well enough that I can say for sure the overflow is harmless.

Please review?

   602  struct nd_poison *pl, *next;
   603  
   604  spin_lock(_bus->poison_lock);
   605  WARN_ON_ONCE(list_empty(poison_list));
   606  
   607  /*
   608   * [start, clr_end] is the poison interval being cleared.
   609   * [pl->start, pl_end] is the poison_list entry we're comparing
   610   * the above interval against. The poison list entry may need
   611   * to be modified (update either start or length), deleted, or
   612   * split into two based on the overlap characteristics
   613   */
   614  
   615  list_for_each_entry_safe(pl, next, poison_list, list) {
   616  u64 pl_end = pl->start + pl->length - 1;
   617  
   618  /* Skip intervals with no intersection */
   619  if (pl_end < start)
   620  continue;
   621  if (pl->start >  clr_end)
   622  continue;
   623  /* Delete completely overlapped poison entries */
   624  if ((pl->start >= start) && (pl_end <= clr_end)) {
   625  list_del(>list);

regards,
dan carpenter
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm