Re: [PATCH 3/4] selftests: rds: add gitignore file for include.sh

2024-09-24 Thread Allison Henderson
On Tue, 2024-09-24 at 14:49 +0200, Javier Carrasco wrote:
> The generated include.sh should be ignored by git. Create a new
> gitignore and add the file to the list.
> 
> Signed-off-by: Javier Carrasco 

Thanks!
Reviewed-by: Allison Henderson 

> ---
>  tools/testing/selftests/net/rds/.gitignore | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/net/rds/.gitignore
> b/tools/testing/selftests/net/rds/.gitignore
> new file mode 100644
> index ..1c6f04e2aa11
> --- /dev/null
> +++ b/tools/testing/selftests/net/rds/.gitignore
> @@ -0,0 +1 @@
> +include.sh
> 



Re: [PATCH 2/4] selftests: rds: add include.sh to EXTRA_CLEAN

2024-09-24 Thread Allison Henderson
On Tue, 2024-09-24 at 14:49 +0200, Javier Carrasco wrote:
> The include.sh file is generated when building the net/rds selftests,
> but there is no rule to delete it with the clean target. Add the file
> to
> EXTRA_CLEAN in order to remove it when required.
> 
> Signed-off-by: Javier Carrasco 

Ok, looks good. Thanks for catching this
Reviewed-by: Allison Henderson 

> ---
>  tools/testing/selftests/net/rds/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/rds/Makefile
> b/tools/testing/selftests/net/rds/Makefile
> index da9714bc7aad..0b697669ea51 100644
> --- a/tools/testing/selftests/net/rds/Makefile
> +++ b/tools/testing/selftests/net/rds/Makefile
> @@ -7,6 +7,6 @@ TEST_PROGS := run.sh \
> include.sh \
> test.py
>  
> -EXTRA_CLEAN := /tmp/rds_logs
> +EXTRA_CLEAN := /tmp/rds_logs include.sh
>  
>  include ../../lib.mk
> 



Re: [PATCH v2.1 3/8] fsdax: zero the edges if source is HOLE or UNWRITTEN

2022-12-02 Thread Allison Henderson
On Fri, 2022-12-02 at 09:25 +, Shiyang Ruan wrote:
> If srcmap contains invalid data, such as HOLE and UNWRITTEN, the dest
> page should be zeroed.  Otherwise, since it's a pmem, old data may
> remains on the dest page, the result of CoW will be incorrect.
> 
> The function name is also not easy to understand, rename it to
> "dax_iomap_copy_around()", which means it copys data around the
> range.
> 
> Signed-off-by: Shiyang Ruan 
> Reviewed-by: Darrick J. Wong 
> 
I think the new changes look good
Reviewed-by: Allison Henderson 

> ---
>  fs/dax.c | 79 +++---
> --
>  1 file changed, 49 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index a77739f2abe7..f12645d6f3c8 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1092,7 +1092,8 @@ static int dax_iomap_direct_access(const struct
> iomap *iomap, loff_t pos,
>  }
>  
>  /**
> - * dax_iomap_cow_copy - Copy the data from source to destination
> before write
> + * dax_iomap_copy_around - Prepare for an unaligned write to a
> shared/cow page
> + * by copying the data before and after the range to be written.
>   * @pos:   address to do copy from.
>   * @length:size of copy operation.
>   * @align_size:aligned w.r.t align_size (either PMD_SIZE or
> PAGE_SIZE)
> @@ -1101,35 +1102,50 @@ static int dax_iomap_direct_access(const
> struct iomap *iomap, loff_t pos,
>   *
>   * This can be called from two places. Either during DAX write fault
> (page
>   * aligned), to copy the length size data to daddr. Or, while doing
> normal DAX
> - * write operation, dax_iomap_actor() might call this to do the copy
> of either
> + * write operation, dax_iomap_iter() might call this to do the copy
> of either
>   * start or end unaligned address. In the latter case the rest of
> the copy of
> - * aligned ranges is taken care by dax_iomap_actor() itself.
> + * aligned ranges is taken care by dax_iomap_iter() itself.
> + * If the srcmap contains invalid data, such as HOLE and UNWRITTEN,
> zero the
> + * area to make sure no old data remains.
>   */
> -static int dax_iomap_cow_copy(loff_t pos, uint64_t length, size_t
> align_size,
> +static int dax_iomap_copy_around(loff_t pos, uint64_t length, size_t
> align_size,
> const struct iomap *srcmap, void *daddr)
>  {
> loff_t head_off = pos & (align_size - 1);
> size_t size = ALIGN(head_off + length, align_size);
> loff_t end = pos + length;
> loff_t pg_end = round_up(end, align_size);
> +   /* copy_all is usually in page fault case */
> bool copy_all = head_off == 0 && end == pg_end;
> +   /* zero the edges if srcmap is a HOLE or IOMAP_UNWRITTEN */
> +   bool zero_edge = srcmap->flags & IOMAP_F_SHARED ||
> +    srcmap->type == IOMAP_UNWRITTEN;
> void *saddr = 0;
> int ret = 0;
>  
> -   ret = dax_iomap_direct_access(srcmap, pos, size, &saddr,
> NULL);
> -   if (ret)
> -   return ret;
> +   if (!zero_edge) {
> +   ret = dax_iomap_direct_access(srcmap, pos, size,
> &saddr, NULL);
> +   if (ret)
> +   return ret;
> +   }
>  
> if (copy_all) {
> -   ret = copy_mc_to_kernel(daddr, saddr, length);
> -   return ret ? -EIO : 0;
> +   if (zero_edge)
> +   memset(daddr, 0, size);
> +   else
> +   ret = copy_mc_to_kernel(daddr, saddr,
> length);
> +   goto out;
> }
>  
> /* Copy the head part of the range */
> if (head_off) {
> -   ret = copy_mc_to_kernel(daddr, saddr, head_off);
> -   if (ret)
> -   return -EIO;
> +   if (zero_edge)
> +   memset(daddr, 0, head_off);
> +   else {
> +   ret = copy_mc_to_kernel(daddr, saddr,
> head_off);
> +   if (ret)
> +   return -EIO;
> +   }
> }
>  
> /* Copy the tail part of the range */
> @@ -1137,12 +1153,19 @@ static int dax_iomap_cow_copy(loff_t pos,
> uint64_t length, size_t align_size,
> loff_t tail_off = head_off + length;
> loff_t tail_len = pg_end - end;
>  
> -   ret = copy_mc_to_kernel(daddr + tail_off, saddr +
> tail_off,
> -   tail_len);
> -   if (ret)
> -   return -EIO;
> +   if (zero_edge)

Re: [PATCH v2.1 1/8] fsdax: introduce page->share for fsdax in reflink mode

2022-12-02 Thread Allison Henderson
On Fri, 2022-12-02 at 09:23 +, Shiyang Ruan wrote:
> fsdax page is used not only when CoW, but also mapread. To make the
> it
> easily understood, use 'share' to indicate that the dax page is
> shared
> by more than one extent.  And add helper functions to use it.
> 
> Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
> 
The new changes look reasonable to me
Reviewed-by: Allison Henderson 

> Signed-off-by: Shiyang Ruan 
> ---
>  fs/dax.c   | 38 ++--
> --
>  include/linux/mm_types.h   |  5 -
>  include/linux/page-flags.h |  2 +-
>  3 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 1c6867810cbd..edbacb273ab5 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -334,35 +334,41 @@ static unsigned long dax_end_pfn(void *entry)
> for (pfn = dax_to_pfn(entry); \
> pfn < dax_end_pfn(entry); pfn++)
>  
> -static inline bool dax_mapping_is_cow(struct address_space *mapping)
> +static inline bool dax_page_is_shared(struct page *page)
>  {
> -   return (unsigned long)mapping == PAGE_MAPPING_DAX_COW;
> +   return (unsigned long)page->mapping ==
> PAGE_MAPPING_DAX_SHARED;
>  }
>  
>  /*
> - * Set the page->mapping with FS_DAX_MAPPING_COW flag, increase the
> refcount.
> + * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase
> the
> + * refcount.
>   */
> -static inline void dax_mapping_set_cow(struct page *page)
> +static inline void dax_page_bump_sharing(struct page *page)
>  {
> -   if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
> +   if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_SHARED) {
> /*
>  * Reset the index if the page was already mapped
>  * regularly before.
>  */
> if (page->mapping)
> -   page->index = 1;
> -   page->mapping = (void *)PAGE_MAPPING_DAX_COW;
> +   page->share = 1;
> +   page->mapping = (void *)PAGE_MAPPING_DAX_SHARED;
> }
> -   page->index++;
> +   page->share++;
> +}
> +
> +static inline unsigned long dax_page_drop_sharing(struct page *page)
> +{
> +   return --page->share;
>  }
>  
>  /*
> - * When it is called in dax_insert_entry(), the cow flag will
> indicate that
> + * When it is called in dax_insert_entry(), the shared flag will
> indicate that
>   * whether this entry is shared by multiple files.  If so, set the
> page->mapping
> - * FS_DAX_MAPPING_COW, and use page->index as refcount.
> + * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
>   */
>  static void dax_associate_entry(void *entry, struct address_space
> *mapping,
> -   struct vm_area_struct *vma, unsigned long address,
> bool cow)
> +   struct vm_area_struct *vma, unsigned long address,
> bool shared)
>  {
> unsigned long size = dax_entry_size(entry), pfn, index;
> int i = 0;
> @@ -374,8 +380,8 @@ static void dax_associate_entry(void *entry,
> struct address_space *mapping,
> for_each_mapped_pfn(entry, pfn) {
> struct page *page = pfn_to_page(pfn);
>  
> -   if (cow) {
> -   dax_mapping_set_cow(page);
> +   if (shared) {
> +   dax_page_bump_sharing(page);
> } else {
> WARN_ON_ONCE(page->mapping);
> page->mapping = mapping;
> @@ -396,9 +402,9 @@ static void dax_disassociate_entry(void *entry,
> struct address_space *mapping,
> struct page *page = pfn_to_page(pfn);
>  
> WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
> -   if (dax_mapping_is_cow(page->mapping)) {
> -   /* keep the CoW flag if this page is still
> shared */
> -   if (page->index-- > 0)
> +   if (dax_page_is_shared(page)) {
> +   /* keep the shared flag if this page is still
> shared */
> +   if (dax_page_drop_sharing(page) > 0)
> continue;
> } else
> WARN_ON_ONCE(page->mapping && page->mapping
> != mapping);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 500e536796ca..f46cac3657ad 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -103,7 +103,10 @@ struct page {
&g

PLEASE RESPOND VERY URGENTLY

2020-07-29 Thread Dr. Allison Neher
Dear Friend,

With due respect, i have decided to contact you on a business
transaction that will be beneficial to both of us. At the bank last
account and auditing evaluation, my staffs came across an old account
which was being maintained by a foreign client who we learn was among
the deceased passengers of motor accident on November.2003, the
deceased was unable to run this account since his death. The account
has remained dormant without the knowledge of his family since it was
put in a safe deposit account in the bank for future investment by the
client.

Since his demise, even the members of his family haven't applied for
claims over this fund and it has been in the safe deposit account
until i discovered that it cannot be claimed since our client is a
foreign national and we are sure that he has no next of kin here to
file claims over the money. As the director of the department, this
discovery was brought to my office so as to decide what is to be done;
I decided to seek ways through which to transfer this money out of the
bank and out of the country too.

The total amount in the account is (18.6 million) with my positions as
a staff of this bank, i am handicapped because i cannot operate
foreign accounts and cannot lay benefice claim over this money. The
client was a foreign national and you will only be asked to act as his
next of kin and i will supply you with all the necessary information
and bank data to assist you in being able to transfer this money to
any bank of your choice where this money could be transferred into.

The total sum will be shared as follows: 50% for me, 50% for you, and
expenses incidental occur during the transfer will be incur by both of
us. The transfer is risk free on both sides hence you are going to
follow my instruction till the fund transfer to your account. Since I
work in this bank that is why you should be confident in the success
of this transaction because you will be updated with information’s as
at when desired.

I will wish you to keep this transaction secret and confidential as I
am hoping to retire with my share of this money at the end of
transaction which will be when this money is safety in your account. I
will then come over to your country for sharing according to the
previously agreed percentages. You might even have to advise me on
possibilities of investment in your country or elsewhere of our
choice. May god help you to help me to a restive retirement?

(1) Your full name..
(2) Your age:
(3) Sex:.
(4) Your telephone number:.
(5) Your occupation:.
(6) Your country:.

Yours sincerely,
Dr. Allison Neher
+226 58779013


Re: [PATCH] cifs: fix strcat buffer overflow in smb21_set_oplock_level()

2019-05-06 Thread Jeremy Allison
On Mon, May 06, 2019 at 11:53:44AM -0500, Steve French via samba-technical 
wrote:
> I think strcpy is clearer - but I don't think it can overflow since if
> R, W or W were written to "message" then cinode->oplock would be
> non-zero so we would never strcap "None"

Ahem. In Samba we have :

lib/util/safe_string.h:#define strcpy(dest,src) 
__ERROR__XX__NEVER_USE_STRCPY___;

Maybe you should do likewise :-).

> On Mon, May 6, 2019 at 10:26 AM Christoph Probst  wrote:
> >
> > Change strcat to strcpy in the "None" case as it is never valid to append
> > "None" to any other message. It may also overflow char message[5], in a
> > race condition on cinode if cinode->oplock is unset by another thread
> > after "RHW" or "RH" had been written to message.
> >
> > Signed-off-by: Christoph Probst 
> > ---
> >  fs/cifs/smb2ops.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> > index c36ff0d..5fd5567 100644
> > --- a/fs/cifs/smb2ops.c
> > +++ b/fs/cifs/smb2ops.c
> > @@ -2936,7 +2936,7 @@ smb21_set_oplock_level(struct cifsInodeInfo *cinode, 
> > __u32 oplock,
> > strcat(message, "W");
> > }
> > if (!cinode->oplock)
> > -   strcat(message, "None");
> > +   strcpy(message, "None");
> > cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,
> >  &cinode->vfs_inode);
> >  }
> > --
> > 2.1.4
> >
> 
> 
> -- 
> Thanks,
> 
> Steve
> 


Re: [PATCH 2/2] xfs: clean up xfs_dir2_leaf_addname

2019-03-11 Thread Allison Henderson

Looks ok to me.  Thanks for the clean up.

Reviewed-by: Allison Henderson 

On 3/11/19 9:22 AM, Darrick J. Wong wrote:

From: Darrick J. Wong 

Remove typedefs and consolidate local variable initialization.

Signed-off-by: Darrick J. Wong 
---
  fs/xfs/libxfs/xfs_dir2_leaf.c |   33 +++--
  1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index 2abf945e5844..9c2a0a13ed61 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -563,43 +563,40 @@ xfs_dir3_leaf_find_entry(
   */
  int   /* error */
  xfs_dir2_leaf_addname(
-   xfs_da_args_t   *args)  /* operation arguments */
+   struct xfs_da_args  *args)  /* operation arguments */
  {
+   struct xfs_dir3_icleaf_hdr leafhdr;
+   struct xfs_trans*tp = args->trans;
__be16  *bestsp;/* freespace table in leaf */
-   int compact;/* need to compact leaves */
-   xfs_dir2_data_hdr_t *hdr;   /* data block header */
+   __be16  *tagp;  /* end of data entry */
struct xfs_buf  *dbp;   /* data block buffer */
-   xfs_dir2_data_entry_t   *dep;   /* data block entry */
-   xfs_inode_t *dp;/* incore directory inode */
-   xfs_dir2_data_unused_t  *dup;   /* data unused entry */
+   struct xfs_buf  *lbp;   /* leaf's buffer */
+   struct xfs_dir2_leaf*leaf;  /* leaf structure */
+   struct xfs_inode*dp = args->dp;  /* incore directory inode 
*/
+   struct xfs_dir2_data_hdr *hdr;  /* data block header */
+   struct xfs_dir2_data_entry *dep;/* data block entry */
+   struct xfs_dir2_leaf_entry *lep;/* leaf entry table pointer */
+   struct xfs_dir2_leaf_entry *ents;
+   struct xfs_dir2_data_unused *dup;   /* data unused entry */
+   struct xfs_dir2_leaf_tail *ltp; /* leaf tail pointer */
+   struct xfs_dir2_data_free *bf;  /* bestfree table */
+   int compact;/* need to compact leaves */
int error;  /* error return value */
int grown;  /* allocated new data block */
int highstale = 0;  /* index of next stale leaf */
int i;  /* temporary, index */
int index;  /* leaf table position */
-   struct xfs_buf  *lbp;   /* leaf's buffer */
-   xfs_dir2_leaf_t *leaf;  /* leaf structure */
int length; /* length of new entry */
-   xfs_dir2_leaf_entry_t   *lep;   /* leaf entry table pointer */
int lfloglow;   /* low leaf logging index */
int lfloghigh;  /* high leaf logging index */
int lowstale = 0;   /* index of prev stale leaf */
-   xfs_dir2_leaf_tail_t*ltp;   /* leaf tail pointer */
int needbytes;  /* leaf block bytes needed */
int needlog;/* need to log data header */
int needscan;   /* need to rescan data free */
-   __be16  *tagp;  /* end of data entry */
-   xfs_trans_t *tp;/* transaction pointer */
xfs_dir2_db_t   use_block;  /* data block number */
-   struct xfs_dir2_data_free *bf;  /* bestfree table */
-   struct xfs_dir2_leaf_entry *ents;
-   struct xfs_dir3_icleaf_hdr leafhdr;
  
  	trace_xfs_dir2_leaf_addname(args);
  
-	dp = args->dp;

-   tp = args->trans;
-
error = xfs_dir3_leaf_read(tp, dp, args->geo->leafblk, -1, &lbp);
if (error)
return error;



Re: [PATCH 1/2] xfs: zero initialize highstale and lowstale in xfs_dir2_leaf_addname

2019-03-11 Thread Allison Henderson

Looks fine.  You can add my review.  Thx!

Reviewed-by: Allison Henderson 

On 3/11/19 9:19 AM, Darrick J. Wong wrote:

From: Darrick J. Wong 

Smatch complains about the following:

fs/xfs/libxfs/xfs_dir2_leaf.c:848 xfs_dir2_leaf_addname() error:
uninitialized symbol 'lowstale'.

fs/xfs/libxfs/xfs_dir2_leaf.c:849 xfs_dir2_leaf_addname() error:
uninitialized symbol 'highstale'.

I don't think there's any incorrect behavior associated with the
uninitialized variable, but as the author of the previous zero-init
patch points out, it's best not to be passing around pointers to
uninitialized stack areas.

Signed-off-by: Darrick J. Wong 
---
  fs/xfs/libxfs/xfs_dir2_leaf.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index 9a3767818c50..2abf945e5844 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -574,7 +574,7 @@ xfs_dir2_leaf_addname(
xfs_dir2_data_unused_t  *dup;   /* data unused entry */
int error;  /* error return value */
int grown;  /* allocated new data block */
-   int highstale;  /* index of next stale leaf */
+   int highstale = 0;  /* index of next stale leaf */
int i;  /* temporary, index */
int index;  /* leaf table position */
struct xfs_buf  *lbp;   /* leaf's buffer */
@@ -583,7 +583,7 @@ xfs_dir2_leaf_addname(
xfs_dir2_leaf_entry_t   *lep;   /* leaf entry table pointer */
int lfloglow;   /* low leaf logging index */
int lfloghigh;  /* high leaf logging index */
-   int lowstale;   /* index of prev stale leaf */
+   int lowstale = 0;   /* index of prev stale leaf */
xfs_dir2_leaf_tail_t*ltp;   /* leaf tail pointer */
int needbytes;  /* leaf block bytes needed */
int needlog;/* need to log data header */



Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-03 Thread Jeremy Allison
On Mon, Apr 03, 2017 at 11:36:48AM -0700, Jeremy Allison wrote:
> On Mon, Apr 03, 2017 at 02:18:44PM -0400, Jeff Layton wrote:
> > On Mon, 2017-04-03 at 11:09 -0700, Jeremy Allison wrote:
> > > 
> > > CIFS has a way to reserve space. Look into "allocation size" on create.
> > 
> > That won't help here as it's done on open().
> > 
> > The problem here is that we might create a file (and not preallocate
> > anything), then write a bunch of stuff to the cache under an oplock.
> > Then when we go to write back, we get the CIFS equivalent of -ENOSPC.
> > 
> > What local filesystems do (AIUI) is preallocate so that you can catch
> > an ENOSPC condition earlier, when you're dirtying new pages in the
> > cache. That's pretty much impossible to do on a network filesystem
> > though.
> 
> There's also SMB_SET_FILE_ALLOCATION_INFO which can be
> done over SMB1/2/3 on an open file handle.

There's *always* a way to do something in SMB1/2/3. :-).


Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-03 Thread Jeremy Allison
On Mon, Apr 03, 2017 at 02:18:44PM -0400, Jeff Layton wrote:
> On Mon, 2017-04-03 at 11:09 -0700, Jeremy Allison wrote:
> > On Mon, Apr 03, 2017 at 01:47:37PM -0400, Jeff Layton wrote:
> > > On Mon, 2017-04-03 at 07:32 -0700, Matthew Wilcox wrote:
> > > > On Mon, Apr 03, 2017 at 06:28:38AM -0400, Jeff Layton wrote:
> > > > > On Mon, 2017-04-03 at 14:25 +1000, NeilBrown wrote:
> > > > > > Also I think that EIO should always over-ride ENOSPC as the possible
> > > > > > responses are different.  That probably means you need a separate 
> > > > > > seq
> > > > > > number for each, which isn't ideal.
> > > > > > 
> > > > > 
> > > > > I'm not quite convinced that it's really useful to do anything but
> > > > > report the latest error.
> > > > > 
> > > > > But...if we did need to prefer one over another, could we get away 
> > > > > with
> > > > > always reporting -EIO once that error occurs? If so, then we'd still
> > > > > just need a single sequence counter.
> > > > 
> > > > I wonder whether it's even worth supporting both EIO and ENOSPC for a
> > > > writeback problem.  If I understand correctly, at the time of write(),
> > > > filesystems check to see if they have enough blocks to satisfy the
> > > > request, so ENOSPC only comes up in the writeback context for thinly
> > > > provisioned devices.
> > > > 
> > > > Programs have basically no use for the distinction.  In either case,
> > > > the situation is the same.  The written data is safely in RAM and cannot
> > > > be written to the storage.  If one were to make superhuman efforts,
> > > > one could mmap the file and write() it to a different device, but that
> > > > is incredibly rare.  For most programs, the response is to just die and
> > > > let the human deal with the corrupted file.
> > > > 
> > > > From a sysadmin point of view, of course the situation is different,
> > > > and the remedy is different, but they should be getting that information
> > > > through a different mechanism than monitoring the errno from every
> > > > system call.
> > > > 
> > > > If we do want to continue to support both EIO and ENOSPC from writeback,
> > > > then let's have EIO override ENOSPC as an error.  ie if an ENOSPC comes
> > > > in after an EIO is set, it only bumps the counter and applications will
> > > > see EIO, not ENOSPC on fresh calls to fsync().
> > > 
> > > 
> > > No, ENOSPC on writeback can certainly happen with network filesystems.
> > > NFS and CIFS have no way to reserve space. You wouldn't want to have to
> > > do an extra RPC on every buffered write. :)
> > 
> > CIFS has a way to reserve space. Look into "allocation size" on create.
> 
> That won't help here as it's done on open().
> 
> The problem here is that we might create a file (and not preallocate
> anything), then write a bunch of stuff to the cache under an oplock.
> Then when we go to write back, we get the CIFS equivalent of -ENOSPC.
> 
> What local filesystems do (AIUI) is preallocate so that you can catch
> an ENOSPC condition earlier, when you're dirtying new pages in the
> cache. That's pretty much impossible to do on a network filesystem
> though.

There's also SMB_SET_FILE_ALLOCATION_INFO which can be
done over SMB1/2/3 on an open file handle.


Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-03 Thread Jeremy Allison
On Mon, Apr 03, 2017 at 01:47:37PM -0400, Jeff Layton wrote:
> On Mon, 2017-04-03 at 07:32 -0700, Matthew Wilcox wrote:
> > On Mon, Apr 03, 2017 at 06:28:38AM -0400, Jeff Layton wrote:
> > > On Mon, 2017-04-03 at 14:25 +1000, NeilBrown wrote:
> > > > Also I think that EIO should always over-ride ENOSPC as the possible
> > > > responses are different.  That probably means you need a separate seq
> > > > number for each, which isn't ideal.
> > > > 
> > > 
> > > I'm not quite convinced that it's really useful to do anything but
> > > report the latest error.
> > > 
> > > But...if we did need to prefer one over another, could we get away with
> > > always reporting -EIO once that error occurs? If so, then we'd still
> > > just need a single sequence counter.
> > 
> > I wonder whether it's even worth supporting both EIO and ENOSPC for a
> > writeback problem.  If I understand correctly, at the time of write(),
> > filesystems check to see if they have enough blocks to satisfy the
> > request, so ENOSPC only comes up in the writeback context for thinly
> > provisioned devices.
> > 
> > Programs have basically no use for the distinction.  In either case,
> > the situation is the same.  The written data is safely in RAM and cannot
> > be written to the storage.  If one were to make superhuman efforts,
> > one could mmap the file and write() it to a different device, but that
> > is incredibly rare.  For most programs, the response is to just die and
> > let the human deal with the corrupted file.
> > 
> > From a sysadmin point of view, of course the situation is different,
> > and the remedy is different, but they should be getting that information
> > through a different mechanism than monitoring the errno from every
> > system call.
> > 
> > If we do want to continue to support both EIO and ENOSPC from writeback,
> > then let's have EIO override ENOSPC as an error.  ie if an ENOSPC comes
> > in after an EIO is set, it only bumps the counter and applications will
> > see EIO, not ENOSPC on fresh calls to fsync().
> 
> 
> No, ENOSPC on writeback can certainly happen with network filesystems.
> NFS and CIFS have no way to reserve space. You wouldn't want to have to
> do an extra RPC on every buffered write. :)

CIFS has a way to reserve space. Look into "allocation size" on create.


Re: [PATCH v27 03/21] vfs: Add MAY_DELETE_SELF and MAY_DELETE_CHILD permission flags

2016-12-06 Thread Jeremy Allison
On Tue, Dec 06, 2016 at 10:25:22PM +0100, Miklos Szeredi wrote:
> On Tue, Dec 6, 2016 at 10:13 PM, Jeremy Allison  wrote:
> > On Tue, Dec 06, 2016 at 03:15:29PM -0500, J. Bruce Fields wrote:
> >> On Fri, Dec 02, 2016 at 10:57:42AM +0100, Miklos Szeredi wrote:
> >> > On Tue, Oct 11, 2016 at 2:50 PM, Andreas Gruenbacher
> >> >  wrote:
> >> > > Normally, deleting a file requires MAY_WRITE access to the parent
> >> > > directory.  With richacls, a file may be deleted with MAY_DELETE_CHILD 
> >> > > access
> >> > > to the parent directory or with MAY_DELETE_SELF access to the file.
> >> > >
> >> > > To support that, pass the MAY_DELETE_CHILD mask flag to 
> >> > > inode_permission()
> >> > > when checking for delete access inside a directory, and MAY_DELETE_SELF
> >> > > when checking for delete access to a file itself.
> >> > >
> >> > > The MAY_DELETE_SELF permission overrides the sticky directory check.
> >> >
> >> > And MAY_DELETE_SELF seems totally inappropriate to any kind of rename,
> >> > since from the point of view of the inode we are not doing anything at
> >> > all.  The modifications are all in the parent(s), and that's where the
> >> > permission checks need to be.
> >>
> >> I'm having a hard time finding an authoritative reference here (Samba
> >> people might be able to help), but my understanding is that Windows
> >> gives this a meaning something like "may I delete a link to this file".
> >>
> >> (And not even "may I delete the *last* link to this file", which might
> >> also sound more logical.)
> >
> > I just did a recent patch here. In Samba we now check for
> > SEC_DIR_ADD_FILE/SEC_DIR_ADD_SUBDIR on the target directory
> > (depending on if the object being moved is a file or dir).
> 
> And MAY_DELETE_SELF as well, for rename?  That's really counterintuitive for 
> me.

Yeah on the source handle we insist on DELETE_ACCESS|FILE_WRITE_ATTRIBUTES
permissions also.


Re: [PATCH v27 03/21] vfs: Add MAY_DELETE_SELF and MAY_DELETE_CHILD permission flags

2016-12-06 Thread Jeremy Allison
On Tue, Dec 06, 2016 at 03:15:29PM -0500, J. Bruce Fields wrote:
> On Fri, Dec 02, 2016 at 10:57:42AM +0100, Miklos Szeredi wrote:
> > On Tue, Oct 11, 2016 at 2:50 PM, Andreas Gruenbacher
> >  wrote:
> > > Normally, deleting a file requires MAY_WRITE access to the parent
> > > directory.  With richacls, a file may be deleted with MAY_DELETE_CHILD 
> > > access
> > > to the parent directory or with MAY_DELETE_SELF access to the file.
> > >
> > > To support that, pass the MAY_DELETE_CHILD mask flag to inode_permission()
> > > when checking for delete access inside a directory, and MAY_DELETE_SELF
> > > when checking for delete access to a file itself.
> > >
> > > The MAY_DELETE_SELF permission overrides the sticky directory check.
> > 
> > And MAY_DELETE_SELF seems totally inappropriate to any kind of rename,
> > since from the point of view of the inode we are not doing anything at
> > all.  The modifications are all in the parent(s), and that's where the
> > permission checks need to be.
> 
> I'm having a hard time finding an authoritative reference here (Samba
> people might be able to help), but my understanding is that Windows
> gives this a meaning something like "may I delete a link to this file".
> 
> (And not even "may I delete the *last* link to this file", which might
> also sound more logical.)

I just did a recent patch here. In Samba we now check for
SEC_DIR_ADD_FILE/SEC_DIR_ADD_SUBDIR on the target directory
(depending on if the object being moved is a file or dir).


Re: [PATCH v21 00/22] Richacls

2016-05-10 Thread Jeremy Allison
On Tue, May 10, 2016 at 06:18:10AM +0200, Volker Lendecke wrote:
> On Tue, May 10, 2016 at 12:02:33AM +0200, Andreas Gruenbacher wrote:
> > What more can I do to finally get this merged?
> 
> While I am not the one to comment on kernel specifics, from a pure Samba
> user space perspective let me say: We need this. NOW.

+1 from me. This is something that many vendors need
and have needed for a very long time. Getting this
in will allow *large* amounts of existing storage to
be migrated to Linux.


Re: [PATCH v18 00/22] Richacls (Core and Ext4)

2016-03-15 Thread Jeremy Allison
On Tue, Mar 15, 2016 at 12:11:03AM -0700, Christoph Hellwig wrote:
> On Fri, Mar 11, 2016 at 05:11:51PM +0100, Andreas Gruenbacher wrote:
> > > while breaking a lot of assumptions,
> > 
> > The model is designed specifically to be compliant with the POSIX
> > permission model. What assumptions are you talking about?
> 
> People have long learned that we only have 'alloc' permissions.  Any
> model that mixes allow and deny ACE is a mistake.

People can also learn and change though :-). One of the
biggest complaints people deploying Samba on Linux have is the
incompatible ACL models.

Whilst I have sympathy with your intense dislike of the
Windows ACL model, this comes down to the core of "who
do we serve ?" IMHO we should serve the users (although
I must confess I'd look awful in a TRON suit :-).


Re: [PATCH v18 00/22] Richacls (Core and Ext4)

2016-03-13 Thread Jeremy Allison
On Mon, Mar 14, 2016 at 12:02:13AM +0100, Andreas Gruenbacher wrote:
> On Sat, Mar 12, 2016 at 12:02 AM, Jeremy Allison  wrote:
> > On Fri, Mar 11, 2016 at 02:05:16PM -0600, Steve French wrote:
> >> Sounds like I need to quickly rework the SMB3 ACL helper functions
> >> for cifs.ko
> >>
> >> Also do you know where is the current version of the corresponding
> >> vfs_richacl for
> >> Samba which works with the current RichACL format?
> >
> > I have a patch for a new vfs_richacl somewhere. I remember
> > sending it to Andreas for testing...
> 
> Ah, the patch was for testing, not resting ... how could I get that mixed up.

:-).

> I've applied your patch to the latest master branch, made it compile
> again, and fixed a few obvious problems. The results I get with
> smbcacls look reasonable now.
> 
> The code is here:
>   https://github.com/andreas-gruenbacher/samba richacl
> 
> I've used the following smb.conf:
>   [richacl]
>   comment = Richacl directory
>   path = /mnt/ext4
>   vfs objects = richacl
>   writeable = yes
>   browseable = yes

Great ! Once richacls gets into the kernel I'll submit
this into the Samba master branch.

> Is there a particular reason why you didn't make vfs_richacl a
> dynamically loadable module?

Probably sheer lazyness :-).


Re: [PATCH v18 00/22] Richacls (Core and Ext4)

2016-03-11 Thread Jeremy Allison
On Fri, Mar 11, 2016 at 02:05:16PM -0600, Steve French wrote:
> Sounds like I need to quickly rework the SMB3 ACL helper functions
> for cifs.ko
> 
> Also do you know where is the current version of the corresponding
> vfs_richacl for
> Samba which works with the current RichACL format?

I have a patch for a new vfs_richacl somewhere. I remember
sending it to Andreas for testing...


Re: [RFC v3 00/45] Richacls

2015-05-23 Thread Jeremy Allison
On Fri, Apr 24, 2015 at 01:03:57PM +0200, Andreas Gruenbacher wrote:
> Hello,
> 
> here's another update of the richacl patch queue.  The changes since the last
> posting (https://lwn.net/Articles/638242/) include:
> 
>  * The nfs client now allocates pages for received acls on demand like the
>server does.  It no longer caches the acl size between calls.
> 
>  * All possible acls consisting of only owner@, group@, and everyone@ entries
>which are equivalent to the file mode permission bits are now recognized.
>This is needed because by the NFSv4 specification, the nfs server must
>translate the file mode permission bits into an acl if it supports acls at
>all.
> 
>  * Support for the dacl attribute over NFSv4.1 for Automatic Inheritance, and
>also for the write_retention and write_retention_hold permissions.
> 
>  * The richacl_compute_max_masks() documentation has been improved.
> 
>  * Various minor bug fixes.
> 
> The git version is available here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/agruen/linux-richacl.git \
>   richacl-2015-04-24

FYI. I have a mostly (needs test suite adding) working module
for Samba for Andreas's richacls code.

Using it we map incoming Windows ACLs directly to richacls
using the same mapping as we use for existing ZFS ACLs.

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

2015-05-14 Thread Jeremy Allison
On Thu, May 14, 2015 at 04:24:13PM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 3:09 PM, Jeremy Allison  wrote:
> >
> > Of course we tell people to just set their filesystems
> > up using mkfs.xfs -n version=ci :-).
> 
> So ASCII-only case-insensitivity is sufficient for you guys?

No it's not enough really. But for specific Windows apps that
use restricted namespaces (and there are such) it works.

ZFS on *BSD does do full case-insenitive lookups (utf8) as part of
FreeNAS. I think if it's configured a SMB-only share they turn
that on.

> Doing case-insensitive lookups at a vfs layer level wouldn't be
> impossible (add some new lookup flag, so it would *not* be
> per-filesystem, it would be per-operation!), but the *full*
> case-insensitivity space in utf-8 is too much to expect. Especially
> since different people have different opinions on what it even should
> be.

Yeah, Apple are the real sinners here. utf8-compose-characters - pah !

> What else is problematic? I think you want an error on symlinks in the
> middle, right? So that you can do those manually? I also assume you
> don't like to follow ".." due to containment issues?

We already strip . and .. out of incoming pathnames. Once we've
walked the path (following links) we then use realpath() to ensure
the full path is under the exported share "path =" directive.

> Adding (again per-lookup) flags for "no symlinks" and "no dotdot")
> would be trivial (much more so than the case insensitivity). Would you
> require "error out on non-ascii characters" too, to then handle the
> complex cases by hand?

Don't need the dot-dot stuff - "no symlinks" would be useful,
but you have to remember Samba is still portable to *BSD and
Solaris-clones (all that anyone really cares about these days)
so we'll have to keep the old code paths too.

> I dunno. But it *may* be worth it to really try to give samba what it
> wants. Of course, if samba is happy doing all the name caching in user
> space, then that's not worth worrying about.

Case insensitive pathname lookup is the one I remember
Windows sales guys beating us up on benchmarks (create 1,000,000 files
in a directory and then have a client that looks for a file
that *doesn't* exist :-).

Hopefully Volker is also on this thread and can chime in with
some requirements of his own :-).

> And the reason I don't use samba myself is that I'm not a fan of
> network filesystems. I want my filesystems low-latency and right there
> on the local ssd, thank you very much. But if you have some
> local-machine benchmarkign thing you use, I guess I could use that to
> see what the profile looks like...

You don't know what you're missing mate !

I use it every day via my local NAS to play music (via SONOS
Linux clients) and movies/TV (SageTV Linux boxes) everywhere
in the house. I don't even have a Windows box (other than
VM's for testing) anywhere.

But my house is fully wired for gigabit, I wouldn't do that over
wireless :-).

Samba, it's not just for Windows anymore :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

2015-05-14 Thread Jeremy Allison
On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> On Wed, May 13, 2015 at 8:30 PM, Al Viro  wrote:
> >
> > Maybe...  I'd like to see the profiles, TBH - especially getxattr() and
> > access() frequency on various loads.  Sure, make(1) and cc(1) really care
> > about stat() very much, but I wouldn't be surprised if something like
> > httpd or samba would be hitting getxattr() a lot...
> 
> So I haven't seen samba profiles in ages, but iirc we have more
> serious problems than trying to speed up basic filename lookup.

Let me know what you need :-).

> 
> Also, I *think* samba ends up basically doing most of the pathname
> lookups from its own user-level cache, because of how it needs to
> avoid symlinks and do the whole crazy case insensitive pathname thing.
> I have this very dim memory that that's one reason samba ends up being
> so readdir-intensive. But I might be wrong, it's been many years since
> I talked to anybody about samba, and I don't run it myself.

You should, it's very good these days :-). We have to
walk the directory trees to do the case insensitive pathname
lookups, but we cache these so when we get an incoming
pathname we look it up in the cache, see if we can
stat the looked up name and if so we're done with pathname
resolution.

Of course we tell people to just set their filesystems
up using mkfs.xfs -n version=ci :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v3 20/45] richacl: Automatic Inheritance

2015-05-13 Thread Jeremy Allison
On Wed, May 13, 2015 at 10:47:44PM +0200, Andreas Grünbacher wrote:
> 2015-05-13 22:28 GMT+02:00 Jeremy Allison :
> > On Wed, May 13, 2015 at 10:22:21PM +0200, Andreas Grünbacher wrote:
> >>
> >> That being said, a daemon like Samba can "fake" full Automatic
> >> Inheritance by creating files and then updating the inherited acls
> >> appropriately. This will inevitably be racy, but unless someone
> >> implements a way to create files without a mode, that's the closest
> >> Samba can get.
> >
> > On Windows systems the client fake (no quotes :-) full Automatic
> > Inheritance by creating files and then updating the inherited acls
> > appropriately.
> 
> Hmm, interesting, are you *absolutely* sure about that? Is there
> anywhere I can look that up?

Hmm. Just realized we may be talking about different things :-).

In SMB/Samba the clients can create a file with no ACL, and
the directory ACL is auto inherited. *That* we fake in
Samba by creating then updating.

But in Windows there are the concept of "inherited" ACE
entries, which can come from parents of parents of parents
(etc.) objects. When a client modifies one of these on an
upper level directory, the server doesn't do the auto
updating that the vision of the file system might lead
you to expect - that updating is done by a tree walk
by the client.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v3 20/45] richacl: Automatic Inheritance

2015-05-13 Thread Jeremy Allison
On Wed, May 13, 2015 at 10:22:21PM +0200, Andreas Grünbacher wrote:
> 
> That being said, a daemon like Samba can "fake" full Automatic
> Inheritance by creating files and then updating the inherited acls
> appropriately. This will inevitably be racy, but unless someone
> implements a way to create files without a mode, that's the closest
> Samba can get.

On Windows systems the client fake (no quotes :-) full Automatic
Inheritance by creating files and then updating the inherited acls
appropriately.

Server doesn't do that logic :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)

2015-05-13 Thread Jeremy Allison
On Wed, May 13, 2015 at 12:37:41PM -0700, Daniel Phillips wrote:
> On 05/13/2015 12:09 PM, Martin Steigerwald wrote:
> 
> > "Assume good faith" can help here. No amount of accusing people of bad 
> > intention will change them. The only thing you have the power to change is 
> > your approach. You absolutely and ultimately do not have the power to 
> > change 
> > other people. You can´t force Tux3 in by sheer willpower or attacking 
> > people.
> > 
> > On any account for anyone discussing here: I believe that any personal 
> > attacks, counter-attacks or "you are wrong" kind of speech will not help to 
> > move this discussion out of the circling it seems to be in at the moment.
> 
> Thanks for the sane commentary. I have the power to change my behavior.
> But if nobody else changes their behavior, the process remains just as
> unpleasant for us as it ever was (not just me!). Obviously, this is
> not the first time I have been through this, and it has never been
> pleasant. After a while, contributors just get tired of the grind and
> move on to something more fun. I know I did, and I am far from the
> only one.

Daniel, please listen to Martin. He speaks a fundamental truth
here.

As you know, I am also interested in Tux3, and would love to
see it as a filesystem option for NAS servers using Samba. But
please think about the way you're interacting with people on the
list, and whether that makes this outcome more or less likely.

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

2015-03-30 Thread Jeremy Allison
On Mon, Mar 30, 2015 at 01:37:58PM -0700, Andrew Morton wrote:
> On Mon, 30 Mar 2015 13:32:27 -0700 Jeremy Allison  wrote:
> 
> > On Mon, Mar 30, 2015 at 01:26:25PM -0700, Andrew Morton wrote:
> > > 
> > > cons:
> > > 
> > > d) fincore() is more expensive
> > > 
> > > e) fincore() will very occasionally block
> > 
> > The above is the killer for Samba. If fincore
> > returns true but when we schedule the pread
> > we block, we're hosed.
> > 
> > Once we block, we're done serving clients on the main
> > thread until this returns. That can cause unpredictable
> > response times which can cause client timeouts.
> > 
> > A fincore+pread solution that blocks is simply unsafe
> > to use for us. We'll have to stay with the threadpool :-(.
> 
> Finally.  Thanks ;)
> 
> This implies that the samba main thread also has to avoid any memory
> allocations both direct and within syscall and pagefault - those will
> occasionally exhibit similar worse-case latency. Is this done now?

We don't do anything special around allocations in syscall.
For aio read we do talloc (internal memory allocator) the
return chunk before going into the pthread pread, so I
suppose this could block. Haven't seen this as a reported
problem though. I suppose you can say "well exactly the
same thing is true of fincore()" :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

2015-03-30 Thread Jeremy Allison
On Mon, Mar 30, 2015 at 01:26:25PM -0700, Andrew Morton wrote:
> 
> cons:
> 
> d) fincore() is more expensive
> 
> e) fincore() will very occasionally block

The above is the killer for Samba. If fincore
returns true but when we schedule the pread
we block, we're hosed.

Once we block, we're done serving clients on the main
thread until this returns. That can cause unpredictable
response times which can cause client timeouts.

A fincore+pread solution that blocks is simply unsafe
to use for us. We'll have to stay with the threadpool :-(.

> And I don't believe that e) will be a problem in the real world.  It's
> a significant increase in worst-case latency and a negligible increase
> in average latency.  I've asked at least three times for someone to
> explain why this is unacceptable and no explanation has been provided.

See above.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

2015-03-30 Thread Jeremy Allison
On Mon, Mar 30, 2015 at 12:36:04AM -0700, Christoph Hellwig wrote:
> On Fri, Mar 27, 2015 at 08:58:54AM -0700, Jeremy Allison wrote:
> > The problem with the above is that we can't tell the difference
> > between pread2() returning a short read because the pages are not
> > in cache, or because someone truncated the file. So we need some
> > way to differentiate this.
> 
> Is a race vs truncate really that time critical that you can't
> wait for the thread pool to do the second read to notice it?

Probably not, as this is the fallback path anyway.

> > My preference from userspace would be for pread2() to return
> > EAGAIN if *all* the data requested is not available (where
> > 'all' can be less than the size requested if the file has
> > been truncated in the meantime).
> 
> That is easily implementable, but I can see that for example web apps
> would be happy to get as much as possible.  So if Samba can be ok
> with short reads and only detecting the truncated case in the slow
> path that would make life simpler.  Otherwise we might indeed need two
> flags.

Simpler is better. I can live with the partial read+fallback.

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

2015-03-27 Thread Jeremy Allison
On Fri, Mar 27, 2015 at 09:30:46AM -0700, Andrew Morton wrote:
> 
> But from an interface perspective the behaviour you're asking for is
> insane, frankly - if the kernel copied out 8k of data then pread2()
> should return 8k.  Otherwise there's no way for userspace to know that
> the 8k copy actually happened and we have just wasted a great pile of
> CPU doing a pointless memcpy.

Why would it do the copy in the first place if we asked (for example)
for 16k, but only 8k was available ? Just return EAGAIN and have
done with it.

> I expect that this situation (first part in cache, latter part not in
> cache) is rare - for reasonably small requests the common cases will be
> "all cached" and "nothing cached".  So perhaps the best approach here
> is for samba to add special handling for the short read, to work out
> the reason for its occurrence.

We can do that, but as Volker says this is a very hot code path.

> I take it from your comments that nobody has actually wired up pread2()
> into samba yet?  That's a bit disturbing, because if we later want to
> go and change something like this short-read behaviour, we're screwed -
> it's a non back-compat userspace-visible change.

It's been done as a test, so the code exists and has run (and improved
perforamance as I recall). Not much point commiting it without kernel
support :-).

> And a note on cosmetics: why are we using EAGAIN here rather than
> EWOULDBLOCK?  They have the same numerical value, but EWOULDBLOCK is a
> better name - EAGAIN says "run it again", but that won't work.

Sounds good to me !
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

2015-03-27 Thread Jeremy Allison
On Fri, Mar 27, 2015 at 02:01:59AM -0700, Andrew Morton wrote:
> On Fri, 27 Mar 2015 01:48:33 -0700 Christoph Hellwig  
> wrote:
> 
> > On Fri, Mar 27, 2015 at 01:35:16AM -0700, Andrew Morton wrote:
> > > fincore() doesn't have to be ugly.  Please address the design issues I
> > > raised.  How is pread2() useful to the class of applications which
> > > cannot proceed until all data is available?
> > 
> > It actually makes them work correctly?  preadv2( ..., DONTWAIT) will
> > return -EGAIN, which causes them to bounce to the threadpool where
> > they call preadv(...).
> 
> (I assume you mean RWF_NONBLOCK)
> 
> That isn't how pread2() works.  If the leading one or more pages are
> uptodate, pread2() will return a partial read.  Now what?  Either the
> application reads the same data a second time via the worker thread
> (dumb, but it will usually be a rare case)

The problem with the above is that we can't tell the difference
between pread2() returning a short read because the pages are not
in cache, or because someone truncated the file. So we need some
way to differentiate this.

My preference from userspace would be for pread2() to return
EAGAIN if *all* the data requested is not available (where
'all' can be less than the size requested if the file has
been truncated in the meantime).

So:

ret = pread2(fd, buf, size_wanted, RWF_NONBLOCK)

if (ret == -1) {
if (errno == EAGAIN) {
goto threadpool...
}
.. real error..
}

if (ret == size_wanted) {
.. normal read, file not truncated...
}

if (ret < size_wanted) {
.. file was truncated..
}

The thing I want to avoid is the case where
ret < size_wanted means only part of the file
is in cache.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 00/20] DAX: Page cache bypass for filesystems on memory storage

2015-01-08 Thread Jeremy Allison
On Thu, Jan 08, 2015 at 11:28:40AM -0500, Milosz Tanski wrote:
> >
> 
> Andrew I  got busier with my other job related things between the
> Thanksgiving & Christmas then anticipated. However, I have updated and
> taken apart the patchset into two pieces (preadv2 and pwritev2). That
> should make evaluating the two separately easier. With the help of
> Volker I hacked up preadv2 support into samba and I hopefully have
> some numbers from it soon. Finally, I'm putting together a test case

I'd be very interested in seeing that patch code and those
numbers !

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

2014-11-03 Thread Jeremy Allison
On Mon, Nov 03, 2014 at 10:49:24AM -0800, Eric Rannaud wrote:
> On Mon, Nov 3, 2014 at 9:06 AM, Andy Lutomirski  wrote:
> >> That doesn't help because we explicitly reject O_RDONLY when combined
> >> with O_TMPFILE.
> >
> > I think I'm missing something.  How is an O_RDONLY temporary file
> > useful?  Wouldn't you want an O_RDWR tempfile with mode 0400 or
> > something like that?
> 
> Isn't it because they are essentially emulating an atomic open()
> capable of creating a file with inherited ACLs, according to
> relatively complex rules? open *can* be used with O_CREAT|O_RDONLY
> (touch(1) might do that), which would naively translate into:
> 
> fd = open(dir, O_TMPFILE|O_RDONLY, 0600)
> fsetxattr(fd, "...")
> fsetxattr(fd, "...")
> linkat(AT_FDCWD, "/proc/self/fd/...", ..., AT_SYMLINK_FOLLOW)
> return fd;
> 
> Now this would be happening on the server, and the only reason why it
> would be important to ensure that fd is O_RDONLY, is that smbd does
> not do its own bookkeeping of how each file handle was opened, and
> would rather have the kernel enforce O_RDONLY?
> 
> With O_TMPFILE as implemented now, smbd would have to do open(dir,
> O_TMPFILE|O_RDWR, 0600), but internally keep track that O_RDONLY was
> requested by the client on that fd, and block any writes to fd itself.

Which we already do, actually..

Although the atomic open emulation is
a very interesting idea for us. That's
something we currently don't do correctly
across different protocols (although we
do it between smbd's themselves).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: should we change the name/macros of file-private locks?

2014-04-16 Thread Jeremy Allison
On Wed, Apr 16, 2014 at 10:00:46PM +0200, Michael Kerrisk (man-pages) wrote:
> [CC += Jeremy Allison]
> 
> On Wed, Apr 16, 2014 at 8:57 PM, Jeff Layton  wrote:
> > Sorry to spam so many lists, but I think this needs widespread
> > distribution and consensus.
> >
> > File-private locks have been merged into Linux for v3.15, and *now*
> > people are commenting that the name and macro definitions for the new
> > file-private locks suck.
> >
> > ...and I can't even disagree. They do suck.
> >
> > We're going to have to live with these for a long time, so it's
> > important that we be happy with the names before we're stuck with them.
> 
> So, to add my perspective: The existing byte-range locking system has
> persisted (despite egregious faults) for well over two decades. One
> supposes that Jeff's new improved version might be around
> at least as long. With that in mind, and before setting in stone (and
> pushing into POSIX) a model of thinking that thousands of programmers
> will live with for a long time, it's worth thinking about names.
> 
> > Michael Kerrisk suggested several names but I think the only one that
> > doesn't have other issues is "file-associated locks", which can be
> > distinguished against "process-associated" locks (aka classic POSIX
> > locks).
> 
> The names I have suggested are:
> 
> file-associated locks
> 
> or
> 
>file-handle locks
> 
> or (using POSIX terminology)
> 
> file-description locks

Thanks for the CC: Michael, but to be honest
I don't really care what the name is, I just
want the functionality. I can change our build
system to cope with detecting it under any name
you guys choose :-).

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-31 Thread Jeremy Allison
On Mon, Mar 31, 2014 at 11:44:59AM +0100, One Thousand Gnomes wrote:
> On Wed, 26 Mar 2014 17:23:24 -0700
> Andy Lutomirski  wrote:
> 
> > Hi various people who care about user-space NFS servers and/or
> > security-relevant APIs.
> > 
> > I propose the following set of new syscalls:
> > 
> > int credfd_create(unsigned int flags): returns a new credfd that
> > corresponds to current's creds.
> > 
> > int credfd_activate(int fd, unsigned int flags): Change current's
> > creds to match the creds stored in fd.  To be clear, this changes both
> > the "subjective" and "objective" (aka real_cred and cred) because
> > there aren't any real semantics for what happens when userspace code
> > runs with real_cred != cred.
> 
> What is the sematic of a simultaneous ptrace racing a credfd_activate on
> another processor core ?
> 
> What are the rules for simultaneous threads doing I/O and and credential
> changes ?
> 
> What is the rule for a faulting of an mmapped page in a multithreaded app
> one thread of which has changed credentials ?
> 
> Who owns a file created while you are changing credentials ?
> 
> >  - credfd_activate fails (-EINVAL) if dumpable.  This is because we
> > don't want a privileged daemon to be ptraced while impersonating
> > someone else.
> 
> That's one of the obvious problems but if you have that problem then
> you've got races against signals and ptrace etc to deal with.

FYI, Any process using pthreads and glibc already
has to cope with these races as setresuid on glibc
on Linux is not atomic.

That's why Samba eventually changed to using the
raw system calls on Linux due to an interesting
bug with glibc aio interacting with setresuid
races (receiving signal thread was uid 0, sending
aio thread was non-zero - signal couldn't be
delivered, glibc aio wakeup lost).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeremy Allison
On Thu, Mar 27, 2014 at 11:46:39AM -0700, Andy Lutomirski wrote:
> On Thu, Mar 27, 2014 at 11:26 AM, Jeremy Allison  wrote:
> >
> > Amen to that :-).
> >
> > However, after talking with Jeff and Jim at CollabSummit,
> > I was 'encouraged' to make my opinions known on the list.
> >
> > To me, calling the creds handle a file descriptor just
> > feels wrong. IT *isn't* an fd, you can't read/write/poll
> > on it, and it's only done as a convenience to get the
> > close-on-exec semantics and the fact that the creds are
> > already hung off the fd's in kernel space.
> 
> Windows calls these things "handles."  Linux has "file descriptors,"
> and there's plenty of precedent for things that aren't files.

Sure, but there's a set of expectations around
fd's that these things don't satisfy - IO-ops.

> > That way we can also make it clear this thing only has
> > meaning to a thread group, and SHOULD NOT (and indeed
> > preferably CAN NOT) be passed between processes.
> >
> 
> If you want those semantics, then stick a struct pid * in there for
> the tgid of the cretor and make sure that current's tgid matches when
> you try to use it.
> 
> I think they'd be more useful without that check, though.

I'm more worried about leakage and unintended consequences
here.

> BTW, what do you want to have happen on fork?  I think they should keep 
> working.

Yeah, that's true. I want them to keep
working across fork, but not across exec
or any other method of fd-passing.

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thoughts on credential switching

2014-03-27 Thread Jeremy Allison
On Thu, Mar 27, 2014 at 07:01:26AM -0700, Jeff Layton wrote:
> On Thu, 27 Mar 2014 14:06:32 +0100
> Florian Weimer  wrote:
> 
> > On 03/27/2014 02:02 PM, Jeff Layton wrote:
> > 
> > >> This interface does not address the long-term lack of POSIX
> > >> compliance in setuid and friends, which are required to be
> > >> process-global and not thread-specific (as they are on the kernel
> > >> side).
> > >>
> > >> glibc works around this by reserving a signal and running set*id on
> > >> every thread in a special signal handler.  This is just crass, and
> > >> it is likely impossible to restore the original process state in
> > >> case of partial failure.  We really need kernel support to perform
> > >> the process-wide switch in an all-or-nothing manner.
> > >>
> > >
> > > I disagree. We're treading new ground here with this syscall. It's
> > > not defined by POSIX so we're under no obligation to follow its
> > > silly directives in this regard. Per-process cred switching doesn't
> > > really make much sense in this day and age, IMO. Wasn't part of the
> > > spec was written before threading existed
> > 
> > Okay, then we need to add a separate set of system calls.
> > 
> > I really, really want to get rid of that signal handler mess in
> > glibc, with its partial failures.
> > 
> 
> I agree, it's a hack, but hardly anyone these days really wants to
> switch creds on a per-process basis. It's just that we're saddled with
> a spec for those calls that was written before threads really existed.
> 
> The kernel syscalls already do the right thing as far as I'm concerned.
> What would be nice however is a blessed glibc interface to them
> that didn't involve all of the signal handling stuff. Then samba et. al.
> wouldn't need to call syscall() directly to get at them.

Amen to that :-).

However, after talking with Jeff and Jim at CollabSummit,
I was 'encouraged' to make my opinions known on the list.

To me, calling the creds handle a file descriptor just
feels wrong. IT *isn't* an fd, you can't read/write/poll
on it, and it's only done as a convenience to get the
close-on-exec semantics and the fact that the creds are
already hung off the fd's in kernel space.

I'd rather any creads call use a different type, even if
it's a typedef of 'int -> creds_handle_t', just to make
it really clear it's *NOT* an fd.

That way we can also make it clear this thing only has
meaning to a thread group, and SHOULD NOT (and indeed
preferably CAN NOT) be passed between processes.

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ceph: fix posix ACL hooks

2014-02-06 Thread Jeremy Allison
On Mon, Feb 03, 2014 at 10:31:27PM +, Al Viro wrote:
> 
> > And the fact is, filesystems with hardlinks and path-name-based
> > operations do exist. cifs with the unix extensions is one of them.
> 
> Pox on Tridge...

Actually you have to blame me for that. Tridge always
*HATED* the UNIX extensions :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks

2013-10-11 Thread Jeremy Allison
On Fri, 11 Oct 2013 15:36:43 -0600 Andreas Dilger  wrote:
> > 
> > At this point, my main questions are:
> > 
> > 1) does this look useful, particularly for fileserver implementors?

Yes from the Samba perspective. We'll have to keep the old
code around for compatibility with non-Linux OS'es, but this
will allow Linux Samba to short-circuit a bunch of logic
we have to get around the insane POSIX locking semantics
on close.

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recvfile patch used for Samba.

2013-07-23 Thread Jeremy Allison
On Tue, Jul 23, 2013 at 05:10:27PM +1000, Dave Chinner wrote:
> So, we are nesting up to 32 page locks here. That's bad. And we are
> nesting kmap() calls for all the pages individually - is that even
> safe to do?
> 
> So, what happens when we've got 16 pages in, and the filesystem has
> allocated space for those 16 blocks, and we get ENOSPC on the 17th?
> Sure, you undo the state here, but what about the 16 blocks that the
> filesystem has allocated to this file? There's no notification to
> the filesystem that they need to be truncated away because the write
> failed
> 
> > +
> > +   /* IOV is ready, receive the date from socket now */
> > +   msg.msg_name = NULL;
> > +   msg.msg_namelen = 0;
> > +   msg.msg_iov = (struct iovec *)&iov[0];
> > +   msg.msg_iovlen = cPagesAllocated ;
> > +   msg.msg_control = NULL;
> > +   msg.msg_controllen = 0;
> > +   msg.msg_flags = MSG_KERNSPACE;
> > +   rcvtimeo = sock->sk->sk_rcvtimeo;
> > +   sock->sk->sk_rcvtimeo = 8 * HZ;
> 
> We can hold the inode and the pages locked for 8 seconds?
> 
> I'll stop there. This is fundamentally broken. It's an attempt to do
> a multi-page write operation without any of the supporting
> structures needed to handle the failure cases properly.  The nested
> page locking has "deadlock" written all over it, and the lack of
> partial failure handling shouts "data corruption" and "stale data
> exposure" to me. The fact it can block for up to 8 seconds waiting
> for network shenanigans to be completed while holding lots of locks
> is going to cause all sorts of problems under memory pressure.
> 
> Not to mention it means that all memory allocations in the msgrcv
> path need to be done with GFP_NOFS, because GFP_KERNEL allocations
> are almost guaranteed to deadlock on the locked pages this path
> already holds
> 
> Need I say more?

No, that's great ! :-).

Thanks for the analysis. I'd heard it wasn't
near production quality, but not being a kernel
engineer myself I wasn't able to make that assessment.

Having said that the OEMs that are using it does
find it improves write speeds by a large amount (10%
or more), so it's showing there is room for improvement
here if the correct code can be created for recvfile.

Cheers,

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Recvfile patch used for Samba.

2013-07-22 Thread Jeremy Allison
Hi Steve and Jeff (and others).

Here is a patch that Samba vendors have been using
to implement recvfile (copy directly from socket
to file). It can improve write performance on boxes
by a significant amount (10% or more).

I'm not qualified to evaluate this code, can someone
who is (hi there Steve and Jeff :-) take a look at
this and see if it's work shepherding into the kernel ?

Cheers,

Jeremy.
diff -urp linux-2.6.37-rc5.orig/fs/splice.c linux-2.6.37-rc5/fs/splice.c
--- linux-2.6.37-rc5.orig/fs/splice.c	2010-12-06 20:09:04.0 -0800
+++ linux-2.6.37-rc5/fs/splice.c	2010-12-07 16:16:48.0 -0800
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Attempt to steal a page from a pipe buffer. This should perhaps go into
@@ -1387,6 +1388,141 @@ static long do_splice(struct file *in, l
 	return -EINVAL;
 }
 
+static ssize_t do_splice_from_socket(struct file *file, struct socket *sock,
+ loff_t __user *ppos, size_t count)
+{		
+	struct address_space *mapping = file->f_mapping;
+	struct inode	*inode = mapping->host;
+	loff_t pos;
+	int count_tmp;
+	int err = 0;
+	int cPagePtr = 0;		
+	int cPagesAllocated = 0;
+	struct recvfile_ctl_blk rv_cb[MAX_PAGES_PER_RECVFILE];
+	struct kvec iov[MAX_PAGES_PER_RECVFILE];
+	struct msghdr msg;
+	long rcvtimeo;
+	int ret;
+
+	if(copy_from_user(&pos, ppos, sizeof(loff_t)))
+		return -EFAULT;
+
+	if(count > MAX_PAGES_PER_RECVFILE * PAGE_SIZE) {
+		printk("%s: count(%u) exceeds maxinum\n", __func__, count);
+		return -EINVAL;
+	}
+	mutex_lock(&inode->i_mutex);
+
+	vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
+
+	/* We can write back this queue in page reclaim */
+	current->backing_dev_info = mapping->backing_dev_info;
+
+	err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
+	if (err != 0 || count == 0)
+		goto done;
+
+	file_remove_suid(file);
+	file_update_time(file);	
+
+	count_tmp = count;
+	do {
+		unsigned long bytes;	/* Bytes to write to page */
+		unsigned long offset;	/* Offset into pagecache page */
+		struct page *pageP;
+		void *fsdata;
+
+		offset = (pos & (PAGE_CACHE_SIZE - 1));
+		bytes = PAGE_CACHE_SIZE - offset;
+		if (bytes > count_tmp)
+			bytes = count_tmp;
+		ret = mapping->a_ops->write_begin(file, mapping, pos, bytes,
+		  AOP_FLAG_UNINTERRUPTIBLE,
+		  &pageP, &fsdata);
+
+		if (unlikely(ret)) {
+			err = ret;
+			for(cPagePtr = 0; cPagePtr < cPagesAllocated; cPagePtr++) {
+kunmap(rv_cb[cPagePtr].rv_page);
+ret = mapping->a_ops->write_end(file, mapping,
+rv_cb[cPagePtr].rv_pos,
+rv_cb[cPagePtr].rv_count,
+rv_cb[cPagePtr].rv_count,
+rv_cb[cPagePtr].rv_page,
+rv_cb[cPagePtr].rv_fsdata);
+			}
+			goto done;
+		}
+		rv_cb[cPagesAllocated].rv_page = pageP;
+		rv_cb[cPagesAllocated].rv_pos = pos;
+		rv_cb[cPagesAllocated].rv_count = bytes;
+		rv_cb[cPagesAllocated].rv_fsdata = fsdata;
+		iov[cPagesAllocated].iov_base = kmap(pageP) + offset;
+		iov[cPagesAllocated].iov_len = bytes;
+		cPagesAllocated++;
+		count_tmp -= bytes;
+		pos += bytes;
+	} while (count_tmp);
+
+	/* IOV is ready, receive the date from socket now */
+	msg.msg_name = NULL;
+	msg.msg_namelen = 0;
+	msg.msg_iov = (struct iovec *)&iov[0];
+	msg.msg_iovlen = cPagesAllocated ;
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	msg.msg_flags = MSG_KERNSPACE;
+	rcvtimeo = sock->sk->sk_rcvtimeo;
+	sock->sk->sk_rcvtimeo = 8 * HZ;
+
+	ret = kernel_recvmsg(sock, &msg, &iov[0], cPagesAllocated, count,
+			 MSG_WAITALL | MSG_NOCATCHSIG);
+
+	sock->sk->sk_rcvtimeo = rcvtimeo;
+	if(ret != count)
+		err = -EPIPE;
+	else
+		err = 0;
+
+	if (unlikely(err < 0)) {
+		for(cPagePtr = 0; cPagePtr < cPagesAllocated; cPagePtr++) {
+			kunmap(rv_cb[cPagePtr].rv_page);
+			ret = mapping->a_ops->write_end(file, mapping,
+			rv_cb[cPagePtr].rv_pos,
+			rv_cb[cPagePtr].rv_count,
+			rv_cb[cPagePtr].rv_count,
+			rv_cb[cPagePtr].rv_page,
+			rv_cb[cPagePtr].rv_fsdata);
+		}
+		goto done;
+	}
+
+	for(cPagePtr=0,count=0;cPagePtr < cPagesAllocated;cPagePtr++) {
+		//flush_dcache_page(pageP);
+		kunmap(rv_cb[cPagePtr].rv_page);
+		ret = mapping->a_ops->write_end(file, mapping,
+		rv_cb[cPagePtr].rv_pos,
+		rv_cb[cPagePtr].rv_count,
+		rv_cb[cPagePtr].rv_count,
+		rv_cb[cPagePtr].rv_page,
+		rv_cb[cPagePtr].rv_fsdata);
+		if (unlikely(ret < 0))
+			printk("%s: write_end fail,ret = %d\n", __func__, ret);
+		count += rv_cb[cPagePtr].rv_count;
+		//cond_resched();
+	}
+	balance_dirty_pages_ratelimited_nr(mapping, cPagesAllocated);
+	copy_to_user(ppos,&pos,sizeof(loff_t));
+
+done:
+	current->backing_dev_info = NULL;
+	mutex_unlock(&inode->i_mutex);
+	if(err)
+		return err;
+	else 
+		return count;
+}
+
 /*
  * Map an iov into an array of pages and offset/length tupples. With the
  * partial_page structure, we can map several non-contiguous ranges into
@@ -1698,11 +1834,33 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff
 	long error;
 	struct file *

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Jeremy Allison
On Thu, Feb 21, 2013 at 01:51:53PM +, Myklebust, Trond wrote:
> On Thu, 2013-02-21 at 12:37 +0100, Ric Wheeler wrote:
> > We have debated the need to have a system call to allow for offloading copy 
> > operations, for example to an NFS server (part to the new NFS 4.2 
> > specification), SCSI target device (two different SCSI commands do this), 
> > local 
> > file systems (reflink, etc) and I suspect many other possible parts of the 
> > stack 
> > could implement this.
> 
> sendfile64() pretty much already has the right arguments for a
> "copyfile", however it would be nice to add a 'flags' parameter: the
> NFSv4.2 version would use that to specify whether or not to copy file
> metadata.

What would be really nice is if sendfile allowed zero-copy
from network socket to a file descriptor. That would help
a *lot* of my small system OEMs (and no splice() just doesn't
cut it :-).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs

2012-12-06 Thread Jeremy Allison
On Thu, Dec 06, 2012 at 04:37:27PM -0500, Theodore Ts'o wrote:
> On Thu, Dec 06, 2012 at 01:33:29PM -0800, Jeremy Allison wrote:
> > > I'm confused; why would a userspace application need to be able to
> > > request this behavior?
> > 
> > This isn't my proposal Ted, I'm just commenting on it :-).
> 
> Ah, sorry, I thought was coming from the Samba team.  :-)

Well it sort of is, as the people working on cifsfs are also
Samba Team members, but this isn't an official "Samba" thing,
as I can't see exactly what apps would want this either (other
than Samba smbd running on top of smbd, re-exporting a cifsfs
share, pointing to a Samba server, running on a... ERROR STACK
OVERFLOW :-).

> Hmm... I see wine-devel is cc'ed; is this coming from the Wine team,
> wanting to do SMB paravirtualization?  It would be useful if the
> commit description described how these flags are intended to be used

Indeed :-).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs

2012-12-06 Thread Jeremy Allison
On Thu, Dec 06, 2012 at 04:31:33PM -0500, Theodore Ts'o wrote:
> On Thu, Dec 06, 2012 at 11:57:52AM -0800, Jeremy Allison wrote:
> > 
> > And this is where things get really ugly of course :-).
> > 
> > For the CIFSFS client they're expecting to be able to
> > just ship them to a Windows server, where they'll
> > get the (insane) Windows semantics. These semantics
> > are not what would be wanted on a local filesystem.
> 
> I'm confused; why would a userspace application need to be able to
> request this behavior?

This isn't my proposal Ted, I'm just commenting on it :-).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs

2012-12-06 Thread Jeremy Allison
On Thu, Dec 06, 2012 at 11:57:52AM -0800, Jeremy Allison wrote:
> On Thu, Dec 06, 2012 at 07:49:49PM +, Alan Cox wrote:
> > On Thu,  6 Dec 2012 22:26:28 +0400
> > Pavel Shilovsky  wrote:
> > 
> > > Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags - this 
> > > change can benefit cifs and nfs modules. While this change is ok for 
> > > network filesystems, itsn't not targeted for local filesystems due 
> > > security problems (e.g. when a user process can deny root to delete a 
> > > file).
> > 
> > If I have my root fs on NFS then the same applies does it not.
> > 
> > Your patches fail to describe the security semantics and what file rights
> > I must have to apply each option. How do I track down a lock user, what
> > tools are provided ? How do the new options interact with the security
> > layer?
> > 
> > I don't have a problem with the idea, but it needs a lot more clear
> > description of how it works so the model can be checked and if need be
> > things tweaked (eg needing write to denywrite etc)
> 
> And this is where things get really ugly of course :-).
> 
> For the CIFSFS client they're expecting to be able to
> just ship them to a Windows server, where they'll
> get the (insane) Windows semantics. These semantics
> are not what would be wanted on a local filesystem.
> 
> So unless we just say "these things have Windows
> semantics" (where openers of files can lock out others
> under dubious circumstances) there'll be this horrible
> difference between (I'm assuming) the sane semantics that
> are defined for local filesystems and the insane ones
> that you get when you're connecting remotely.
> 
> I don't know a good way to fix that, but I'm pretty
> sure you don't want the Windows semantics defined
> locally :-).

You could just flags these as "ignored on local filesystems"
of course, exact semantics defined by the remote filesystem.

That's really what applications will get anyway. But it's not
condusive to writing documentation on what these things do :-).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Add O_DENY* flags to fcntl and cifs

2012-12-06 Thread Jeremy Allison
On Thu, Dec 06, 2012 at 07:49:49PM +, Alan Cox wrote:
> On Thu,  6 Dec 2012 22:26:28 +0400
> Pavel Shilovsky  wrote:
> 
> > Network filesystems CIFS, SMB2.0, SMB3.0 and NFSv4 have such flags - this 
> > change can benefit cifs and nfs modules. While this change is ok for 
> > network filesystems, itsn't not targeted for local filesystems due security 
> > problems (e.g. when a user process can deny root to delete a file).
> 
> If I have my root fs on NFS then the same applies does it not.
> 
> Your patches fail to describe the security semantics and what file rights
> I must have to apply each option. How do I track down a lock user, what
> tools are provided ? How do the new options interact with the security
> layer?
> 
> I don't have a problem with the idea, but it needs a lot more clear
> description of how it works so the model can be checked and if need be
> things tweaked (eg needing write to denywrite etc)

And this is where things get really ugly of course :-).

For the CIFSFS client they're expecting to be able to
just ship them to a Windows server, where they'll
get the (insane) Windows semantics. These semantics
are not what would be wanted on a local filesystem.

So unless we just say "these things have Windows
semantics" (where openers of files can lock out others
under dubious circumstances) there'll be this horrible
difference between (I'm assuming) the sane semantics that
are defined for local filesystems and the insane ones
that you get when you're connecting remotely.

I don't know a good way to fix that, but I'm pretty
sure you don't want the Windows semantics defined
locally :-).

Jeremy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Versioning file system

2007-06-19 Thread Jeremy Allison
On Tue, Jun 19, 2007 at 03:05:07AM -0400, Theodore Tso wrote:
> 
> There is a partial implementation lieing around somewhere, but there
> were a number of problems we ran into that were discussed in the
> slidedeck.  Basically, if the only program accessing the files
> containing forks was the Samba program calling forkdepot library, it
> worked fine.  But if there were other programs (or NFS servers) that
> were potentially deleting files, moving files around, the things fell
> apart fairly quickly.

I'd be happy with a Samba-only implementation for Appliance
vendors.

> What, even with Winfs delaying Microsoft Longwait by years before
> finally being flushed?  :-)

I'm not talking WinFS, I'm talking streams. Streams are already
being used (mainly by malware writers of course - but hey, don't
you want full compatibility ? :-).

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Versioning file system

2007-06-18 Thread Jeremy Allison
On Mon, Jun 18, 2007 at 06:10:21PM -0400, Theodore Tso wrote:
> On Mon, Jun 18, 2007 at 02:31:14PM -0700, H. Peter Anvin wrote:
> > And that makes them different from extended attributes, how?
> > 
> > Both of these really are nothing but ad hocky syntactic sugar for
> > directories, sometimes combined with in-filesystem support for small
> > data items.
> 
> There's a good discussion of the issues involved in my LCA 2006
> presentation  which doesn't seem to be on the LCA 2006 site.  Hrm.
> I'll have to ask that this be fixed.  In any case, here it is:
> 
>   http://thunk.org/tytso/forkdepot.odp

Did you ever code up forkdepot ? Just wondering ?

Just because I now agree with you that streams are
a bad idea doesn't mean the pressure to support them
in some way in Samba has gone away :-).

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Versioning file system

2007-06-18 Thread Jeremy Allison
On Tue, Jun 19, 2007 at 12:26:57AM +0200, Jörn Engel wrote:
> 
> Pointless here means that _I_ don't see the point.  Maybe there are
> valid uses for extended attributes.  If there are, noone has explained
> them to me yet.

Samba uses them to store DOS'ism's that you don't want in your
POSIX filesystem.

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Versioning file system

2007-06-18 Thread Jeremy Allison
On Mon, Jun 18, 2007 at 02:31:14PM -0700, H. Peter Anvin wrote:

> And that makes them different from extended attributes, how?

Streams on systems that support them allow lseek and are
accessed by fd's. EA's are always a blob of data, read/written
in their entirity.

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Versioning file system

2007-06-18 Thread Jeremy Allison
On Mon, Jun 18, 2007 at 01:29:56PM -0400, Theodore Tso wrote:
> On Mon, Jun 18, 2007 at 09:16:30AM -0700, alan wrote:
> > 
> > I just wish that people would learn from the mistakes of others.  The 
> > MacOS is a prime example of why you do not want to use a forked 
> > filesystem, yet some people still seem to think it is a good idea. 
> > (Forked filesystems tend to be fragile and do not play well with 
> > non-forked filesystems.)
> 
> Jeremy Alison used to be the one who was always pestering me to add
> Streams support into ext4, but recently he's admitted that I was right
> that it was a Very Bad Idea.

Yeah, ok - but do you have to rub my nose in it every chance you get ?

:-) :-).

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-cifs-client] Re: SMB2 file system - should it be a distinct module

2007-05-04 Thread Jeremy Allison
On Thu, May 03, 2007 at 09:46:05AM -0500, Gerald Carter wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Simo,
> 
> > I guess DFS referrals can work cross protocol, so if you are redirected
> > from a longhorn server to a windoes 2000 or a samba server you want to
> > be able to follow the DFS referral and not return an error.
> > To do that you need to have either 1 module that support both protocols
> > or a way from one module to call the other. Just separating the 2
> > without any glue will not work (or you will have to add some userspace
> > upcall hack to make it work).
> 
> Long term I agree that CIFS and SMB2 should be in the same .ko

Actually I disagree. I think Christoph is correct. These
are two independent protocols and should be in two different
modules.

> But NTLM 0.12 still works for Vista and DFS referrals.
> Breaking out SMB2 initially means that it will not clutter
> the working cifs.ko code.  Remember that an SMB2 client fs is
> mostly research at this point, and not engineering.

Long term the common functions should be factored out
and put into a lower-level module that both cifs and
SMB2 are dependent upon.

That's the cleaner solution IMHO.

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext3 vs NTFS performance

2007-05-02 Thread Jeremy Allison
On Wed, May 02, 2007 at 12:16:38PM -0400, Theodore Tso wrote:
> On Tue, May 01, 2007 at 02:23:25PM -0700, Andrew Morton wrote:
> > On Tue, 1 May 2007 13:43:18 -0700
> > "Cabot, Mason B" <[EMAIL PROTECTED]> wrote:
> > 
> > > I've been testing the NAS performance of ext3/Openfiler 2.2 against
> > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for
> > > video workloads. The Windows CIFS client will attempt a poor-man's
> > > pre-allocation of the file on the server by sending 1-byte writes at
> > > 128K-byte strides, breaking block allocation on ext3 and leading to
> > > fragmentation and poor performance. This will happen for many
> > > applications (including iTunes) as the CIFS client issues these
> > > pre-allocates under the application layer.
> > 
> > Oh my gawd, what a stupid hack.  Now we know what the MS interoperability
> > lab has been working on.
> 
> I wonder if they patented this technique as well, as well as one of
> their dozen or so patents they are filing every day?  "A Method of
> Screwing Over Samba's Performance So that Windows Longhorn Can Compete
> On Performance" coming soon, to a patent database near you!  :-)
> 
> > > I've posted a brief paper on Intel's OSS website
> > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give
> > > it a read and let me know what you think. In particular, I'd like to
> > > arrive at the right place to fix this problem: is it in the filesystem,
> > > VFS, or Samba?
> 
> The right place is clearly Samba.  I can't think of any other program
> or filesystem protocol where writing a 1 byte write at 128k strides
> would be used to signal a desire to do preallocation.  In fact, it's
> hard to think of a worse way of doing things.

In fact they don't need to do this - there's an explicit CIFS
set file allocation call to pre-allocate size they could use.

There's a specific Samba VFS module that has XFS specific calls
to do this - vfs_prealloc. - but this won't work on ext3.

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-cifs-client] Re: cifs and kthread_run / kernel_thread

2007-04-03 Thread Jeremy Allison
On Tue, Apr 03, 2007 at 02:17:59PM -0500, Steve French wrote:

> Now merged into cifs-2.6 git tree.  Thanks to Q and Wilhelm

Up to date SVN please ! :-).

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Heads up on sys_fallocate()

2007-03-01 Thread Jeremy Allison
On Thu, Mar 01, 2007 at 03:23:19PM -0500, Jeff Garzik wrote:
> I certainly agree that we want something like this.
> 
> posix_fallocate() is the glibc interface we want to be compatible with 
> (which your definition is, AFAICS).

This would be great for Samba. Windows clients do this a lot

Jeremy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel page table and module text

2005-04-19 Thread Allison
I want to do the following.

I want to find where each module is loaded in memory by traversing the
module list . Once I have the address and the size of the module, I
want to read the bytes in memory of the module and hash it to check
it's integrity.

How do I,
1. Traverse the module list and find it's address
2. Read the kernel page table to find the physical address of the
module location

thanks,
Allison


Allison wrote:
> Hi,
> 
> Since module is loaded in non-contiguous memory, there has to be an
> entry in the kernel page table for all modules that are loaded on the
> system. I am trying to find entries corresponding to my module text in
> the page tables.
> 
> I am not clear about how the kernel page table is organized after the
> system switches to protected mode.
> 
> I printed out the page starting with swapper_pg_dir . But I do not
> find the addresses for all the modules loaded in the system.
> 
> Do I still need to read the pg0 and pg1 pages ?
> 
> If somebody can explain how to traverse the kernel page tables, that
> would be very helpful.
> 
> thanks,
> Allison
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel page table and module text

2005-04-17 Thread Allison
Hi,

Since module is loaded in non-contiguous memory, there has to be an
entry in the kernel page table for all modules that are loaded on the
system. I am trying to find entries corresponding to my module text in
the page tables.

I am not clear about how the kernel page table is organized after the
system switches to protected mode.

I printed out the page starting with swapper_pg_dir . But I do not
find the addresses for all the modules loaded in the system.

Do I still need to read the pg0 and pg1 pages ?

If somebody can explain how to traverse the kernel page tables, that
would be very helpful.

thanks,
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Rootkits

2005-04-15 Thread Allison
Isn't the kernel code segment marked read-only ? How can the module
write into the function text in the kernel ? Shouldn't this cause some
kind of protection fault ?

thanks,
Allison

Lee Revell wrote:
> On Fri, 2005-04-15 at 18:15 +, Allison wrote:
> > Once these are loaded into the kernel, is there no way the kernel
> > functions can be protected ?
> 
> No.  If the attacker can load arbitrary code into the kernel, game over.
> Think about it.
> 
> Lee
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Rootkits

2005-04-15 Thread Allison
hi,

I got the terminology mixed up. I guess what I really want to know is,
what are the different types of exploits by which rootkits
(specifically the ones that modify the kernel) can get installed on
your system.(other than buffer overflow and somebody stealing the root
password)

I know that SucKIT is a rootkit that gets loaded as a kernel module
and adds new system calls. Some other rootkits change machine
instructions in several kernel functions.

Once these are loaded into the kernel, is there no way the kernel
functions can be protected ?

thanks,
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel Rootkits

2005-04-15 Thread Allison
Hi,

I was curious about how kernel rootkits become a part of the kernel ?
One way I guess is by inserting a kernel module.  And rootkits also
manage to hide themselves from rootkit detectors.

few questions:
1. Are there any other ways by which rootkits become part of the kernel ?

2. If modules can access only exported symbols, how is it that kernel
rootkits manage to get hold of other information from the kernel ? For
ex, the process table.

I am not familiar with the /dev/kmem interface. Does this interface
let any kernel module read any symbol (even non-exported) from the
kernel ?

3. If I want to hide a function which is part of the kernel from
kernel modules, is this possible ideally ?

thanks,
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel module_list

2005-04-14 Thread Allison
Right now , I am just learn. trying a test module that lists out all the modules

Allison

Arjan van de Ven wrote:
> On Thu, 2005-04-14 at 19:53 +0000, Allison wrote:
> > 
> > I am trying to access the module list kernel data structure from a
> > kernel module. If I gather correctly, module_list is the symbol that
> > is the head pointer of this list.
> 
> can you explain what you want to do with this symbol ?
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


New kernel thread

2005-04-14 Thread Allison
Hi,

I need to create a new kernel thread that will stay alive as long as
the system is up. This should be created as soon as the system boots. 
I need this thread to perform a specific task.

I am not very familiar with the code. Where should I put this thread
creation and my function code (I mean which file ?)? Do I use
kernel_thread function to create a new thread ?

Do I need to cleanup when the system exists ? 
What function should I call ?

thanks,
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[no subject]

2005-04-14 Thread Allison
I am trying to simply print out the module names and code sizes.
I am just learning how to rtraverse these data structures.

Also, on what basis is the decision made whether to export a symbol or not ?

thanks,
Allison

Arjan van de Ven wrote:
> On Thu, 2005-04-14 at 19:53 +0000, Allison wrote:
> > 
> > I am trying to access the module list kernel data structure from a
> > kernel module. If I gather correctly, module_list is the symbol that
> > is the head pointer of this list.
> 
> can you explain what you want to do with this symbol ?
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel module_list

2005-04-14 Thread Allison
Hi,

I am trying to access the module list kernel data structure from a
kernel module. If I gather correctly, module_list is the symbol that
is the head pointer of this list.

This module compiles fine but when I try to insmod it, it say
module_list is unresolved symbol.

Does this symbol have to show up in the /proc/ksyms  ? 
It currently show up in the System.map file. 

What do I need to do to access this symbol.

Also, what do the three columns in the System.map file stand for ?
First col looks like the virtual address  and third looks like
function/symbol name. How do I read the second ?

thanks,
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel compile

2005-04-07 Thread Allison
Hi,

Is it possible to compile a 2.4.20 kernel on a 2.6 system ?
And use the new image successfully ?

thanks,
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux: detect application crash

2005-03-17 Thread Allison
Hi,

Several times when I worked with Windows, I have had a scenario when I
am editing a file and saved some time ago and then the application
crashes and I lose all recent data.

Can the operating system detect all application crashes ? If so, why
can't the OS save the user data to disk before the application quits ?

How does this work in Linux. I was curious if such a functionality
already exists in Linux. If not, what are the issues involved in
implementing this functionality.

thanks
Allison
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6 : physical memory address and pid

2005-03-12 Thread Allison
Thanks for the answer! 

Another related question :

I need to gather all application pages by reading the page tables. 
The hard part is, I need to do this from a PCI device using DMA.  As I
understand it,  when a DMA is being performed, the pages are pinned in
memory . Since the PCI device has grabbed the bus, the processor is
not able to access memory to perform page replacement right ?
So, this is a form of mutual exclusion.

However, if I have to fetch the page struct, the process address space
of the process owning the page (I am ignoring shared pages to make
things simpler) and the page itself, will a scatter gather DMA make
sure that  the processor cannot modify any of these data structures
till the DMA is complete ? I am using Linux 2.6 and the i386
architecture.

thanks,
Allison





On Sat, 12 Mar 2005 17:23:23 -0800, Matt Mackall <[EMAIL PROTECTED]> wrote:
> On Sat, Mar 12, 2005 at 08:05:11PM -0500, firefly blue wrote:
> > Hi,
> >
> > With the 2.6 Linux kernel, I want to find, from the physical page
> > frame, the virtual address of the page loaded in the frame and the
> > process id of the process owning it.
> 
> Follow struct page->mapping to struct address_space. A page can be
> mapped into any number of processes and multiple times per process so
> you'll need to walk the data structures there.
> 
> --
> Mathematics is the supreme nostalgia of our time.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Status Of POSIX ACLs

2001-03-23 Thread Jeremy Allison

I don't read linux-kernel (too much traffic already on the Samba lists :-)
but I do read the kernel traffic summaries, and I noticed this item :


"Jochen Dolze [*] asked, "i found at http://acl.bestbits.at the
ACL-linux-project.
Now i want to know, if there is a plan to integrate posix-ACLs into the
fs-part of
the kernel, e.g. into the VFS-Layer? Is there a general discussion about
this
anywhere? What are the biggest problems? (i know that many userland-tools
must
be changed for this)."

" Albert D. Cahalan [*] retched into his hand, and said he
hoped POSIX ACLs never got into the kernel. He added, "POSIX ACLs are crap.
NFSv4 mostly follows NT. Compatibility with NFSv4 and SMB (Samba's protocol)
is important."

And Bernd Eckenfels [*] added: AFAIK there is no Support in User Land
Programs
required. You just have additional tools for managing the ACLs . The main
problem
with ACLs are the storage of the additional info in the file system. This is
a
hard job if you want to have it for all/most file systems. Remy had a
working
Version for ext2, but it never got very public.. dunno why. NTs ACLs are
somewhat
messy cause they require too much scanning. 


Well as I like to say, they may be crap, but at least they're
slow and buggy :-) :-).

Actually, the next rev. of Samba (2.2 which will ship soon)
will *depend* upon the POSIX ACL patch at http://acl.bestbits.at
in order to support ACLs on Linux.

The reason for this is that the ACL code there is reasonably
common (ie. enough for me to have a wrapper layer that hides
all the uglyness :-) enough to provide ACL support across
Solaris, HPUX, AIX, IRIX, Sco UnixWare (all of which have 
POSIX ACLs or something similar) and Linux.

In order to support ACLs, Samba needs to have an underlying
implementation of ACLs in the kernel, as Samba doesn't make
policy decisions on allowing file access in user-space (that
way root race holes lie... :-).

I just spent 3 weeks coding up a (somewhat) reasonable
mapping between NT ACLs and POSIX ACLs (ie., it's as good
as I can get it - and it's a *hard* problem :-) and it is
also the number ONE Samba feature request from shops that
use NT servers who are looking at Linux+Samba to get around
the "client access license" 'problem' :-).

If we don't eventually get them in the kernel I'm sure Sun
will be happy to suggest they convert to Samba on Solaris
to get the functionality they need :-) :-).

I certainly hope POSIX ACLs (or some form of ACL support)
does get into the kernel at some point (no, not NT ACLS - they
*suck* and are ordering dependent brrr :-) otherwise
there will be a host of applications for which Linux servers
will be disqualified for, and that would be a shame.

Please respond to [EMAIL PROTECTED] or to me personally
if you want more timely feedback, else I'll wait for the next
kernel-traffic summary and take my answer off line (in the
grand tradition of polite radio talk show call in listeners :-).

Cheers,

Jeremy Allison,
Samba Team.

-- 

Buying an operating system without source is like buying
a self-assembly Space Shuttle with no instructions.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/