from:"Gu Zheng"

Re: elevator: Fix a race about elevator switching.

2013-04-16 Thread Gu Zheng

On 02/21/2013 04:42 PM, majianpeng wrote:

 Thare's a race between elevator switching and normal io operation.
 Because the allocation of struct elevator_queue and struct elevator_data
 don't in a atomic operation.So there are have chance to use NULL
 -elevator_data.
 For example:
 Thread A: Thread B
blk_flush_plug_listelevator_switch
   __elv_add_request)  
   blk_peek_request  elevator_alloc
   noop_dispatch
   
   elevator_init_fn
 Because call elevator_alloc, it can't hold queue_lock and the 
 -elevator_data is NULL after allocating.So at the same time, threadA 
 call elv_merge and
 nedd some info of elevator_data.So the crash happened.
 [  196.125709] BUG: unable to handle kernel 
 [  196.126046] Modules linked in: netconsole configfs nfsd lockd auth_rpcgss 
 sunrpc exportfs raid1 btrfs zlib_deflate libcrc32c
 [  196.126046] CPU 2 
 [  196.126046] Pid: 6747, comm: dd Not tainted 3.8.0+ #107 To Be Filled By 
 O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
 [  196.126046] RIP: 0010:[812a8c63]  [812a8c63] 
 noop_dispatch+0x13/0x40
 [  196.126046] RSP: 0018:8800a6fef838  EFLAGS: 00010046
 [  196.126046] RAX:  RBX: 8800b53abf20 RCX: 
 
 [  196.126046] RDX:  RSI:  RDI: 
 8800b53abf20
 [  196.126046] RBP: 8800a6fef838 R08:  R09: 
 
 [  196.126046] R10: 0001 R11: 0001 R12: 
 8800b53abf20
 [  196.126046] R13: 8800a6feffd8 R14: 8800a6feffd8 R15: 
 8800b3c68090
 [  196.126046] FS:  7f6ef48d6700() GS:8800ba00() 
 knlGS:
 [  196.126046] CS:  0010 DS:  ES:  CR0: 8005003b
 [  196.126046] CR2:  CR3: a83a1000 CR4: 
 000407e0
 [  196.126046] DR0:  DR1:  DR2: 
 
 [  196.126046] DR3:  DR6: 0ff0 DR7: 
 0400
 [  196.126046] Process dd (pid: 6747, threadinfo 8800a6fee000, task 
 88009e2a2840)
 [  196.126046] Stack:
 [  196.126046]  8800a6fef888 81297a54 8800b50de7b0 
 8800b3c68090
 [  196.126046]  8800a6fef888 8800b3c68000 8800b53abf20 
 
 [  196.126046]  8800b50de7b0 8800b3c68090 8800a6fef8f8 
 8140fe5a
 [  196.126046] Call Trace:
 [  196.126046]  [81297a54] blk_peek_request+0x194/0x250
 [  196.126046]  [8140fe5a] scsi_request_fn+0x4a/0x4f0
 [  196.126046]  [810995ef] ? __lock_is_held+0x5f/0x80
 [  196.126046]  [81290d37] __blk_run_queue+0x37/0x50
 [  196.126046]  [812904cd] __elv_add_request+0xad/0x2d0
 [  196.126046]  [81297ecc] blk_flush_plug_list+0x1bc/0x260
 [  196.126046]  [81297f88] blk_finish_plug+0x18/0x50
 [  196.126046]  [81195f1e] do_blockdev_direct_IO+0x18be/0x20e0
 [  196.126046]  [81077cd8] ? sched_clock_cpu+0xa8/0x120
 [  196.126046]  [811913d0] ? I_BDEV+0x10/0x10
 [  196.126046]  [810d0868] ? rcu_irq_exit+0x68/0xb0
 [  196.126046]  [81196795] __blockdev_direct_IO+0x55/0x60
 [  196.126046]  [811913d0] ? I_BDEV+0x10/0x10
 [  196.126046]  [81191c67] blkdev_direct_IO+0x57/0x60
 [  196.126046]  [811913d0] ? I_BDEV+0x10/0x10
 [  196.126046]  [81106633] generic_file_aio_read+0x703/0x770
 [  196.126046]  [811915c1] blkdev_aio_read+0x51/0x80
 [  196.126046]  [81095d45] ? 
 lock_release_holdtime.part.23+0x15/0x1a0
 [  196.126046]  [81157bb3] do_sync_read+0xa3/0xe0
 [  196.126046]  [81158343] vfs_read+0xb3/0x180
 [  196.126046]  [81158465] sys_read+0x55/0xa0
 [  196.126046]  [816f4242] system_call_fastpath+0x16/0x1b
 [  196.126046] Code: 48 83 c4 08 5b 5d c3 90 b8 10 00 00 00 eb e0 b8 f4 ff ff 
 ff eb ea 66 90 66 66 66 66 90 48 8b 47 18 55 48 89 e5 48 8b 50 08 31 c0 48 
 8b 32 48 39 f2 74 1f 48 8b 46 08 48 8b 16 48 89 42 08 48 89 
 [  196.126046] RIP  [812a8c63] noop_dispatch+0x13/0x40
 [  196.126046]  RSP 8800a6fef838
 [  196.126046] CR2: 
 
 Move the elevator_alloc into func elevator_init_fn, it make the
 operations in a atomic operation.
 
 Using the follow method can easy reproduce this bug
 1:dd if=/dev/sdb of=/dev/null
 2:while true;do echo noop  scheduler;echo deadline  scheduler;done
 
 The test method also use this method.
 
 Signed-off-by: Jianpeng Ma majianp...@gmail.com

  Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com
  Tested-by: Gu Zheng guz.f...@cn.fujitsu.com

Thanks,
Gu

 ---
  block/cfq-iosched.c  |   17 ++---
  block

[PATCH]fs/block_dev.c: fix the inaccurate judgement in function blkdev_aio_read

2013-04-18 Thread Gu Zheng

In function blkdev_aio_read(), the judgement of 'size', if it is equal or 
greater than
the target count we request(iocb-ki_left), there is no need to call 
iov_shorten() to
reduce number of segments and the iovec's length.
So the judgement should be changed to 'if (size  iocb-ki_left)' instead.

Signed-off-by: Jianpeng Ma majianp...@gmail.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/block_dev.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index aae187a..f0328f1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1559,7 +1559,7 @@ static ssize_t blkdev_aio_read(struct kiocb *iocb, const 
struct iovec *iov,
return 0;
 
size -= pos;
-   if (size  INT_MAX)
+   if (size  iocb-ki_left)
nr_segs = iov_shorten((struct iovec *)iov, nr_segs, size);
return generic_file_aio_read(iocb, iov, nr_segs, pos);
 }
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] resource: Add release_mem_region_adjustable()

2013-04-03 Thread Gu Zheng

On 04/03/2013 12:17 AM, Toshi Kani wrote:

 Added release_mem_region_adjustable(), which releases a requested
 region from a currently busy memory resource.  This interface
 adjusts the matched memory resource accordingly if the requested
 region does not match exactly but still fits into.
 
 This new interface is intended for memory hot-delete.  During
 bootup, memory resources are inserted from the boot descriptor
 table, such as EFI Memory Table and e820.  Each memory resource
 entry usually covers the whole contigous memory range.  Memory
 hot-delete request, on the other hand, may target to a particular
 range of memory resource, and its size can be much smaller than
 the whole contiguous memory.  Since the existing release interfaces
 like __release_region() require a requested region to be exactly
 matched to a resource entry, they do not allow a partial resource
 to be released.
 
 There is no change to the existing interfaces since their restriction
 is valid for I/O resources.
 
 Signed-off-by: Toshi Kani toshi.k...@hp.com
 ---
  include/linux/ioport.h |2 +
  kernel/resource.c  |   87 
 
  2 files changed, 89 insertions(+)
 
 diff --git a/include/linux/ioport.h b/include/linux/ioport.h
 index 85ac9b9b..0fe1a82 100644
 --- a/include/linux/ioport.h
 +++ b/include/linux/ioport.h
 @@ -192,6 +192,8 @@ extern struct resource * __request_region(struct resource 
 *,
  extern int __check_region(struct resource *, resource_size_t, 
 resource_size_t);
  extern void __release_region(struct resource *, resource_size_t,
   resource_size_t);
 +extern int release_mem_region_adjustable(struct resource *, resource_size_t,
 + resource_size_t);
  
  static inline int __deprecated check_region(resource_size_t s,
   resource_size_t n)
 diff --git a/kernel/resource.c b/kernel/resource.c
 index ae246f9..789f160 100644
 --- a/kernel/resource.c
 +++ b/kernel/resource.c
 @@ -1021,6 +1021,93 @@ void __release_region(struct resource *parent, 
 resource_size_t start,
  }
  EXPORT_SYMBOL(__release_region);
  
 +/**
 + * release_mem_region_adjustable - release a previously reserved memory 
 region
 + * @parent: parent resource descriptor
 + * @start: resource start address
 + * @size: resource region size
 + *
 + * The requested region is released from a currently busy memory resource.
 + * It adjusts the matched busy memory resource accordingly if the requested
 + * region does not match exactly but still fits into.  Existing children of
 + * the busy memory resource must be immutable in this request.
 + *
 + * Note, when the busy memory resource gets split into two entries, the code
 + * assumes that all children remain in the lower address entry for 
 simplicity.
 + * Enhance this logic when necessary.
 + */
 +int release_mem_region_adjustable(struct resource *parent,
 + resource_size_t start, resource_size_t size)
 +{
 + struct resource **p;
 + struct resource *res, *new;
 + resource_size_t end;
 + int ret = 0;
 +
 + p = parent-child;
 + end = start + size - 1;
 +
 + write_lock(resource_lock);
 +
 + while ((res = *p)) {
 + if (res-start  start || res-end  end) {
 + p = res-sibling;
 + continue;
 + }
 +
 + if (!(res-flags  IORESOURCE_MEM)) {
 + ret = -EINVAL;
 + break;
 + }
 +
 + if (!(res-flags  IORESOURCE_BUSY)) {
 + p = res-child;
 + continue;
 + }
 +
 + if (res-start == start  res-end == end) {
 + /* free the whole entry */
 + *p = res-sibling;
 + kfree(res);
 + } else if (res-start == start  res-end != end) {
 + /* adjust the start */
 + ret = __adjust_resource(res, end+1,
 + res-end - end);
 + } else if (res-start != start  res-end == end) {
 + /* adjust the end */
 + ret = __adjust_resource(res, res-start,
 + start - res-start);
 + } else {
 + /* split into two entries */
 + new = kzalloc(sizeof(struct resource), GFP_KERNEL);
 + if (!new) {
 + ret = -ENOMEM;
 + break;
 + }
 + new-name = res-name;
 + new-start = end + 1;
 + new-end = res-end;
 + new-flags = res-flags;
 + new-parent = res-parent;
 + new-sibling = res-sibling;
 + new-child = NULL;
 +
 + ret =

pci-sysfs: queue sysfs rescan routine into workqueue to avoid potential deadlock situation

2013-02-06 Thread Gu Zheng

] pci_stop_bus_device+0x94/0xa0
 [8127ad90] pci_stop_bus_device+0x40/0xa0
 [8127ad90] pci_stop_bus_device+0x40/0xa0
 [8127ad90] pci_stop_bus_device+0x40/0xa0
 [8127af66] pci_stop_and_remove_bus_device+0x16/0x30
 [81282359] remove_callback+0x29/0x40
 [811e4344] sysfs_schedule_callback_work+0x24/0x70
 [81070009] process_one_work+0x179/0x4b0
 [8107210e] worker_thread+0x12e/0x330
 [81071fe0] ? manage_workers+0x110/0x110
 [8107705e] kthread+0x9e/0xb0
 [81525bc4] kernel_thread_helper+0x4/0x10
 [81076fc0] ? kthread_freezable_should_stop+0x70/0x70
 [81525bc0] ? gs_change+0x13/0x13


Signed-off-by: Yinghai Lu ying...@kernel.org
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Lin Feng linf...@cn.fujitsu.com
---
 drivers/pci/pci-sysfs.c |   92 +--
 1 files changed, 65 insertions(+), 27 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c6e9bb..e66b498 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -285,21 +285,34 @@ msi_bus_store(struct device *dev, struct device_attribute
*attr,
 }

 static DEFINE_MUTEX(pci_remove_rescan_mutex);
+
+static void bus_rescan_callback(struct device *dev)
+{
+   struct pci_bus *b = NULL;
+
+   mutex_lock(pci_remove_rescan_mutex);
+   while ((b = pci_find_next_bus(b)) != NULL)
+   pci_rescan_bus(b);
+   mutex_unlock(pci_remove_rescan_mutex);
+}
+
 static ssize_t bus_rescan_store(struct bus_type *bus, const char *buf,
size_t count)
 {
+   int err;
unsigned long val;
-   struct pci_bus *b = NULL;
+   struct device *dev = bus-dev_root;

if (strict_strtoul(buf, 0, val)  0)
return -EINVAL;

-   if (val) {
-   mutex_lock(pci_remove_rescan_mutex);
-   while ((b = pci_find_next_bus(b)) != NULL)
-   pci_rescan_bus(b);
-   mutex_unlock(pci_remove_rescan_mutex);
-   }
+   if (!val)
+   return count;
+
+   err = device_schedule_callback(dev, bus_rescan_callback);
+   if (err)
+   return err;
+
return count;
 }

@@ -308,21 +321,32 @@ struct bus_attribute pci_bus_attrs[] = {
__ATTR_NULL
 };

+static void dev_rescan_callback(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   if (pdev-is_added) {
+   mutex_lock(pci_remove_rescan_mutex);
+   pci_rescan_bus(pdev-bus);
+   mutex_unlock(pci_remove_rescan_mutex);
+   }
+}
+
 static ssize_t
 dev_rescan_store(struct device *dev, struct device_attribute *attr,
 const char *buf, size_t count)
 {
+   int err;
unsigned long val;
-   struct pci_dev *pdev = to_pci_dev(dev);

if (strict_strtoul(buf, 0, val)  0)
return -EINVAL;

-   if (val) {
-   mutex_lock(pci_remove_rescan_mutex);
-   pci_rescan_bus(pdev-bus);
-   mutex_unlock(pci_remove_rescan_mutex);
-   }
+   if (!val)
+   return count;
+   err = device_schedule_callback(dev, dev_rescan_callback);
+   if (err)
+   return err;
return count;
 }

@@ -339,7 +363,7 @@ static ssize_t
 remove_store(struct device *dev, struct device_attribute *dummy,
 const char *buf, size_t count)
 {
-   int ret = 0;
+   int err;
unsigned long val;

if (strict_strtoul(buf, 0, val)  0)
@@ -348,31 +372,45 @@ remove_store(struct device *dev, struct device_attribute
*dummy,
/* An attribute cannot be unregistered by one of its own methods,
 * so we have to use this roundabout approach.
 */
-   if (val)
-   ret = device_schedule_callback(dev, remove_callback);
-   if (ret)
-   count = ret;
+   if (!val)
+   return count;
+
+   err = device_schedule_callback(dev, remove_callback);
+   if (err)
+   return err;
+
return count;
 }

+static void dev_bus_rescan_callback(struct device *dev)
+{
+   struct pci_bus *bus = to_pci_bus(dev);
+
+   mutex_lock(pci_remove_rescan_mutex);
+   if (!pci_is_root_bus(bus)  list_empty(bus-devices))
+   pci_rescan_bus_bridge_resize(bus-self);
+   else
+   pci_rescan_bus(bus);
+   mutex_unlock(pci_remove_rescan_mutex);
+}
+
 static ssize_t
 dev_bus_rescan_store(struct device *dev, struct device_attribute *attr,
 const char *buf, size_t count)
 {
+   int err;
unsigned long val;
-   struct pci_bus *bus = to_pci_bus(dev);

if (strict_strtoul(buf, 0, val)  0)
return -EINVAL;

-   if (val) {
-   mutex_lock(pci_remove_rescan_mutex);
-   if (!pci_is_root_bus(bus)  list_empty(bus-devices

Re: [PATCH] pci-sysfs: replace mutex_lock with mutex_trylock to avoid potential deadlock situation

2013-01-25 Thread Gu Zheng

Hi Bjorn,
Thanks for your review and comments! Please refer to inlined comments 
below.

On 01/25/2013 07:12 AM, Bjorn Helgaas wrote:

 On Thu, Dec 27, 2012 at 12:42 AM, Lin Feng linf...@cn.fujitsu.com wrote:
 There is a potential deadlock situation when we manipulate the pci-sysfs user
 interfaces from different bus hierarchy simultaneously, described as 
 following:

 path1: sysfs remove device: | path2: sysfs rescan device:
 sysfs_schedule_callback_work()  | sysfs_write_file()
   remove_callback() |   flush_write_buffer()
 *1* mutex_lock(pci_remove_rescan_mutex)|*2*  sysfs_get_active(attr_sd)
   ...   | dev_attr_store()
 device_remove_file()|   dev_rescan_store()
   ...   |*4*  
 mutex_lock(pci_remove_rescan_mutex)
 *3*   sysfs_deactivate(sd)  | ...
 wait_for_completion()   |*5*  sysfs_put_active(attr_sd)
 *6* mutex_unlock(pci_remove_rescan_mutex)
...snip...
 Reported-by: Taku Izumi izumi.t...@jp.fujitsu.com
 Signed-off-by: Lin Feng linf...@cn.fujitsu.com
 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  drivers/pci/pci-sysfs.c |   42 ++
  1 files changed, 26 insertions(+), 16 deletions(-)

 diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
 index 05b78b1..d2efbb0 100644
 --- a/drivers/pci/pci-sysfs.c
 +++ b/drivers/pci/pci-sysfs.c
 @@ -295,10 +295,13 @@ static ssize_t bus_rescan_store(struct bus_type *bus,
const char *buf,
 return -EINVAL;

 if (val) {
 -   mutex_lock(pci_remove_rescan_mutex);
 -   while ((b = pci_find_next_bus(b)) != NULL)
 -   pci_rescan_bus(b);
 -   mutex_unlock(pci_remove_rescan_mutex);
 +   if (mutex_trylock(pci_remove_rescan_mutex)) {
 +   while ((b = pci_find_next_bus(b)) != NULL)
 +   pci_rescan_bus(b);
 +   mutex_unlock(pci_remove_rescan_mutex);
 +   } else {
 +   return 0;
 What are the semantics of returning 0 from a sysfs store function?
 Does the user's write just get dropped?  I would think we'd return
 count for that case.

Oh, yes, return count seems suitable here, although we did not reach the
user's target goal(rescan the bus), but the user's write has been flushed yet.
But the user still can not judge whether pci_rescan_bus() was successfully done
only by the return value. Shall we return a suitable error here to tell the user
that his write was written, but pci_rescan_bus() was not done ?

 Is there some sort of automatic retry in libc
 or something if we return zero?

No, there is not any extra operations in libc if we return zero indeed.

 Are you relying on the user code to
 notice that nothing was written and do its own retry?


Yes, but it seems impractical.

 The last seems most likely, but that seems like it complicates the
 user's life unnecessarily.

 +   }
 }
 return count;
  }
 @@ -319,9 +322,12 @@ dev_rescan_store(struct device *dev, struct 
 device_attribute *attr,
 return -EINVAL;

 if (val) {
 -   mutex_lock(pci_remove_rescan_mutex);
 -   pci_rescan_bus(pdev-bus);
 -   mutex_unlock(pci_remove_rescan_mutex);
 +   if (mutex_trylock(pci_remove_rescan_mutex)) {
 +   pci_rescan_bus(pdev-bus);
 +   mutex_unlock(pci_remove_rescan_mutex);
 +   } else {
 +   return 0;
 +   }
 }
 return count;
  }
 @@ -330,9 +336,10 @@ static void remove_callback(struct device *dev)
  {
 struct pci_dev *pdev = to_pci_dev(dev);

 -   mutex_lock(pci_remove_rescan_mutex);
 -   pci_stop_and_remove_bus_device(pdev);
 -   mutex_unlock(pci_remove_rescan_mutex);
 +   if (mutex_trylock(pci_remove_rescan_mutex)) {
 +   pci_stop_and_remove_bus_device(pdev);
 +   mutex_unlock(pci_remove_rescan_mutex);
 +   }
 In the other cases, I think the user will at least get some
 indication, e.g., a write() that returns zero, when we abort.  But
 here, we silently skip the pci_stop_and_remove_bus_device().  That
 sounds wrong to me.  What actually happens here, and why is it OK to
 skip it?

Yeah, the hasty skip seems not suitable. We should give out some information
here, if we can not do pci_stop_and_remove_bus_device().

 Can we avoid the deadlock by queuing these in a workqueue instead of
 using the mutex_trylock() approach?


No, I think use a workqueue to queue the rescan routine into workqueue as the
remove is not suitable. 
After we queue the scan-bus work into workqueue, the rescan routine can
return directly(case1) or wait until work is completed(case2).
case1:
If we return directly after we queue the scan-bus work

[PATCH RESEND] pci-sysfs: replace mutex_lock with mutex_trylock to avoid potential deadlock situation

2013-01-17 Thread Gu Zheng

] pci_stop_and_remove_bus_device+0x16/0x30
 [81282359] remove_callback+0x29/0x40
 [811e4344] sysfs_schedule_callback_work+0x24/0x70
 [81070009] process_one_work+0x179/0x4b0
 [8107210e] worker_thread+0x12e/0x330
 [81071fe0] ? manage_workers+0x110/0x110
 [8107705e] kthread+0x9e/0xb0
 [81525bc4] kernel_thread_helper+0x4/0x10
 [81076fc0] ? kthread_freezable_should_stop+0x70/0x70
 [81525bc0] ? gs_change+0x13/0x13

Reported-by: Taku Izumi izumi.t...@jp.fujitsu.com 
Signed-off-by: Lin Feng linf...@cn.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/pci/pci-sysfs.c |   42 ++
 1 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 05b78b1..d2efbb0 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -295,10 +295,13 @@ static ssize_t bus_rescan_store(struct bus_type *bus, 
const char *buf,
return -EINVAL;
 
if (val) {
-   mutex_lock(pci_remove_rescan_mutex);
-   while ((b = pci_find_next_bus(b)) != NULL)
-   pci_rescan_bus(b);
-   mutex_unlock(pci_remove_rescan_mutex);
+   if (mutex_trylock(pci_remove_rescan_mutex)) {
+   while ((b = pci_find_next_bus(b)) != NULL)
+   pci_rescan_bus(b);
+   mutex_unlock(pci_remove_rescan_mutex);
+   } else {
+   return 0;
+   }
}
return count;
 }
@@ -319,9 +322,12 @@ dev_rescan_store(struct device *dev, struct 
device_attribute *attr,
return -EINVAL;
 
if (val) {
-   mutex_lock(pci_remove_rescan_mutex);
-   pci_rescan_bus(pdev-bus);
-   mutex_unlock(pci_remove_rescan_mutex);
+   if (mutex_trylock(pci_remove_rescan_mutex)) {
+   pci_rescan_bus(pdev-bus);
+   mutex_unlock(pci_remove_rescan_mutex);
+   } else {
+   return 0;
+   }
}
return count;
 }
@@ -330,9 +336,10 @@ static void remove_callback(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
 
-   mutex_lock(pci_remove_rescan_mutex);
-   pci_stop_and_remove_bus_device(pdev);
-   mutex_unlock(pci_remove_rescan_mutex);
+   if (mutex_trylock(pci_remove_rescan_mutex)) {
+   pci_stop_and_remove_bus_device(pdev);
+   mutex_unlock(pci_remove_rescan_mutex);
+   }
 }
 
 static ssize_t
@@ -366,12 +373,15 @@ dev_bus_rescan_store(struct device *dev, struct 
device_attribute *attr,
return -EINVAL;
 
if (val) {
-   mutex_lock(pci_remove_rescan_mutex);
-   if (!pci_is_root_bus(bus)  list_empty(bus-devices))
-   pci_rescan_bus_bridge_resize(bus-self);
-   else
-   pci_rescan_bus(bus);
-   mutex_unlock(pci_remove_rescan_mutex);
+   if (mutex_trylock(pci_remove_rescan_mutex)) {
+   if (!pci_is_root_bus(bus)  list_empty(bus-devices))
+   pci_rescan_bus_bridge_resize(bus-self);
+   else
+   pci_rescan_bus(bus);
+   mutex_unlock(pci_remove_rescan_mutex);
+   } else {
+   return 0;
+   }
}
return count;
 }
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] f2fs: add sysfs entries to select the gc policy

2013-08-04 Thread Gu Zheng

On 08/04/2013 10:10 PM, Namjae Jeon wrote:

 From: Namjae Jeon namjae.j...@samsung.com
 
 Add sysfs entry gc_idle to control the gc policy. Where
 gc_idle = 1 corresponds to selecting a cost benefit approach,
 while gc_idle = 2 corresponds to selecting a greedy approach
 to garbage collection. The selection is mutually exclusive one
 approach will work at any point. If gc_idle = 0, then this
 option is disabled.
 
 Cc: Gu Zheng guz.f...@cn.fujitsu.com
 Signed-off-by: Namjae Jeon namjae.j...@samsung.com
 Signed-off-by: Pankaj Kumar pankaj...@samsung.com


Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com

 ---
  Documentation/ABI/testing/sysfs-fs-f2fs |6 +-
  Documentation/filesystems/f2fs.txt  |6 ++
  fs/f2fs/gc.c|   24 +---
  fs/f2fs/gc.h|3 +++
  fs/f2fs/super.c |2 ++
  5 files changed, 37 insertions(+), 4 deletions(-)
 
 diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
 b/Documentation/ABI/testing/sysfs-fs-f2fs
 index 5f44095..31942ef 100644
 --- a/Documentation/ABI/testing/sysfs-fs-f2fs
 +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
 @@ -19,4 +19,8 @@ Description:
Controls the default sleep time for gc_thread. Time
is in milliseconds.
  
 -
 +What:/sys/fs/f2fs/disk/gc_idle
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the victim selection policy for garbage collection.
 diff --git a/Documentation/filesystems/f2fs.txt 
 b/Documentation/filesystems/f2fs.txt
 index 5daf3bb..3cd27be 100644
 --- a/Documentation/filesystems/f2fs.txt
 +++ b/Documentation/filesystems/f2fs.txt
 @@ -158,6 +158,12 @@ Files in /sys/fs/f2fs/devname
time for the garbage collection thread. Time is
in milliseconds.
  
 + gc_idle  This parameter controls the selection of victim
 +  policy for garbage collection. Setting gc_idle 
 = 0
 +  (default) will disable this option. Setting
 +  gc_idle = 1 will select the Cost Benefit 
 approach
 +   setting gc_idle = 2 will select the greedy 
 aproach.
 +
  
 
  USAGE
  
 
 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 60d4f67..2c0c8ad 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -106,6 +106,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
   gc_th-max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
   gc_th-no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
  
 + gc_th-gc_idle = 0;
 +
   sbi-gc_thread = gc_th;
   init_waitqueue_head(sbi-gc_thread-gc_wait_queue_head);
   sbi-gc_thread-f2fs_gc_task = kthread_run(gc_thread_func, sbi,
 @@ -130,9 +132,25 @@ void stop_gc_thread(struct f2fs_sb_info *sbi)
   sbi-gc_thread = NULL;
  }
  
 -static int select_gc_type(int gc_type)
 +static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type)
  {
 - return (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
 + int gc_mode;
 +
 + if (gc_th  gc_th-gc_idle) {
 + /* Cost Benefit Policy */
 + if (gc_th-gc_idle == 1) {
 + gc_mode = GC_CB;
 + goto out;
 + } else if (gc_th-gc_idle == 2) {
 + /* Greedy Policy */
 + gc_mode = GC_GREEDY;
 + goto out;
 + }
 + }
 +
 + gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
 +out:
 + return gc_mode;
  }
  
  static void select_policy(struct f2fs_sb_info *sbi, int gc_type,
 @@ -145,7 +163,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
 gc_type,
   p-dirty_segmap = dirty_i-dirty_segmap[type];
   p-ofs_unit = 1;
   } else {
 - p-gc_mode = select_gc_type(gc_type);
 + p-gc_mode = select_gc_type(sbi-gc_thread, gc_type);
   p-dirty_segmap = dirty_i-dirty_segmap[DIRTY];
   p-ofs_unit = sbi-segs_per_sec;
   }
 diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
 index f4bf44c..c22dee9 100644
 --- a/fs/f2fs/gc.h
 +++ b/fs/f2fs/gc.h
 @@ -30,6 +30,9 @@ struct f2fs_gc_kthread {
   unsigned int min_sleep_time;
   unsigned int max_sleep_time;
   unsigned int no_gc_sleep_time;
 +
 + /* for changing gc mode */
 + unsigned int gc_idle;
  };
  
  struct inode_entry {
 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
 index 0a3e88f..f9c6c0b 100644
 --- a/fs/f2fs/super.c
 +++ b/fs/f2fs/super.c
 @@ -148,12 +148,14 @@ static struct f2fs_attr f2fs_attr_##_name = {   
 \
  F2FS_RW_ATTR(gc_min_sleep_time, min_sleep_time);
  F2FS_RW_ATTR(gc_max_sleep_time, max_sleep_time);
  F2FS_RW_ATTR

Re: [PATCH 1/3] f2fs: add sysfs support for controlling the gc_thread

2013-08-04 Thread Gu Zheng

On 08/04/2013 10:09 PM, Namjae Jeon wrote:

 From: Namjae Jeon namjae.j...@samsung.com
 
 Add sysfs entries to control the timing parameters for
 f2fs gc thread.
 
 Various Sysfs options introduced are:
 gc_min_sleep_time: Min Sleep time for GC in ms
 gc_max_sleep_time: Max Sleep time for GC in ms
 gc_no_gc_sleep_time: Default Sleep time for GC in ms
 
 Cc: Gu Zheng guz.f...@cn.fujitsu.com
 Signed-off-by: Namjae Jeon namjae.j...@samsung.com
 Signed-off-by: Pankaj Kumar pankaj...@samsung.com


Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com

 ---
  Documentation/ABI/testing/sysfs-fs-f2fs |   22 ++
  Documentation/filesystems/f2fs.txt  |   26 +++
  fs/f2fs/f2fs.h  |4 +
  fs/f2fs/gc.c|   17 +++--
  fs/f2fs/gc.h|   33 +
  fs/f2fs/super.c |  122 
 +++
  6 files changed, 204 insertions(+), 20 deletions(-)
  create mode 100644 Documentation/ABI/testing/sysfs-fs-f2fs
 
 diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
 b/Documentation/ABI/testing/sysfs-fs-f2fs
 new file mode 100644
 index 000..5f44095
 --- /dev/null
 +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
 @@ -0,0 +1,22 @@
 +What:/sys/fs/f2fs/disk/gc_max_sleep_time
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the maximun sleep time for gc_thread. Time
 +  is in milliseconds.
 +
 +What:/sys/fs/f2fs/disk/gc_min_sleep_time
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the minimum sleep time for gc_thread. Time
 +  is in milliseconds.
 +
 +What:/sys/fs/f2fs/disk/gc_no_gc_sleep_time
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the default sleep time for gc_thread. Time
 +  is in milliseconds.
 +
 +
 diff --git a/Documentation/filesystems/f2fs.txt 
 b/Documentation/filesystems/f2fs.txt
 index 0500c19..5daf3bb 100644
 --- a/Documentation/filesystems/f2fs.txt
 +++ b/Documentation/filesystems/f2fs.txt
 @@ -133,6 +133,32 @@ f2fs. Each file shows the whole f2fs information.
   - current memory footprint consumed by f2fs.
  
  
 
 +SYSFS ENTRIES
 +
 +
 +Information about mounted f2f2 file systems can be found in
 +/sys/fs/f2fs.  Each mounted filesystem will have a directory in
 +/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
 +The files in each per-device directory are shown in table below.
 +
 +Files in /sys/fs/f2fs/devname
 +(see also Documentation/ABI/testing/sysfs-fs-f2fs)
 +..
 + File Content
 +
 + gc_max_sleep_timeThis tuning parameter controls the maximum 
 sleep
 +  time for the garbage collection thread. Time is
 +  in milliseconds.
 +
 + gc_min_sleep_timeThis tuning parameter controls the minimum 
 sleep
 +  time for the garbage collection thread. Time is
 +  in milliseconds.
 +
 + gc_no_gc_sleep_time  This tuning parameter controls the default 
 sleep
 +  time for the garbage collection thread. Time is
 +  in milliseconds.
 +
 +
  USAGE
  
 
  
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 78777cd..63813be 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -430,6 +430,10 @@ struct f2fs_sb_info {
  #endif
   unsigned int last_victim[2];/* last victim segment # */
   spinlock_t stat_lock;   /* lock for stat operations */
 +
 + /* For sysfs suppport */
 + struct kobject s_kobj;
 + struct completion s_kobj_unregister;
  };
  
  /*
 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 35f9b1a..60d4f67 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -29,10 +29,11 @@ static struct kmem_cache *winode_slab;
  static int gc_thread_func(void *data)
  {
   struct f2fs_sb_info *sbi = data;
 + struct f2fs_gc_kthread *gc_th = sbi-gc_thread;
   wait_queue_head_t *wq = sbi-gc_thread-gc_wait_queue_head;
   long wait_ms;
  
 - wait_ms = GC_THREAD_MIN_SLEEP_TIME;
 + wait_ms = gc_th-min_sleep_time;
  
   do {
   if (try_to_freeze())
 @@ -45,7 +46,7 @@ static int gc_thread_func(void *data)
   break;
  
   if (sbi-sb-s_writers.frozen = SB_FREEZE_WRITE

[PATCH] f2fs: move bio_private allocation out of f2fs_bio_alloc()

2013-07-24 Thread Gu Zheng

bio-bi_private is not always needed. As in the reading data path,
end_read_io does not need bio_private for further using, so moving
bio_private allocation out of f2fs_bio_alloc(). Alloc it in the
submit_write_page(), and ignore it in the f2fs_readpage().

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/data.c|1 -
 fs/f2fs/segment.c |   19 +++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c73c394..19cd7c6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -365,7 +365,6 @@ static void read_end_io(struct bio *bio, int err)
}
unlock_page(page);
} while (bvec = bio-bi_io_vec);
-   kfree(bio-bi_private);
bio_put(bio);
 }
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a86d125..9b74ae2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -611,18 +611,12 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 struct bio *f2fs_bio_alloc(struct block_device *bdev, int npages)
 {
struct bio *bio;
-   struct bio_private *priv;
-retry:
-   priv = kmalloc(sizeof(struct bio_private), GFP_NOFS);
-   if (!priv) {
-   cond_resched();
-   goto retry;
-   }
 
/* No failure on bio allocation */
bio = bio_alloc(GFP_NOIO, npages);
bio-bi_bdev = bdev;
-   bio-bi_private = priv;
+   bio-bi_private = NULL;
+
return bio;
 }
 
@@ -681,8 +675,17 @@ static void submit_write_page(struct f2fs_sb_info *sbi, 
struct page *page,
do_submit_bio(sbi, type, false);
 alloc_new:
if (sbi-bio[type] == NULL) {
+   struct bio_private *priv;
+retry:
+   priv = kmalloc(sizeof(struct bio_private), GFP_NOFS);
+   if (!priv) {
+   cond_resched();
+   goto retry;
+   }
+
sbi-bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
sbi-bio[type]-bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
+   sbi-bio[type]-bi_private = priv;
/*
 * The end_io will be assigned at the sumbission phase.
 * Until then, let bio_add_page() merge consecutive IOs as much
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] driver/vga16fb.c: remove the unused variable dev of function vga16fb_destroy()

2013-07-24 Thread Gu Zheng

Commit  e21d2170f36602ae2708 removed the unnecessary platform_set_drvdata(),
but left the variable dev unused, delete it.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/video/vga16fb.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/video/vga16fb.c b/drivers/video/vga16fb.c
index 830ded4..2827333 100644
--- a/drivers/video/vga16fb.c
+++ b/drivers/video/vga16fb.c
@@ -1265,7 +1265,6 @@ static void vga16fb_imageblit(struct fb_info *info, const 
struct fb_image *image
 
 static void vga16fb_destroy(struct fb_info *info)
 {
-   struct platform_device *dev = container_of(info-device, struct 
platform_device, dev);
iounmap(info-screen_base);
fb_dealloc_cmap(info-cmap);
/* XXX unshare VGA regions */
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver/vga16fb.c: remove the unused variable dev of function vga16fb_destroy()

2013-07-25 Thread Gu Zheng

On 07/25/2013 05:58 PM, Geert Uytterhoeven wrote:

 On Thu, Jul 25, 2013 at 5:37 AM, Gu Zheng guz.f...@cn.fujitsu.com wrote:
 Commit  e21d2170f36602ae2708 removed the unnecessary platform_set_drvdata(),
 but left the variable dev unused, delete it.
 
 When referring to another commit, please also include the oneline summary of
 the commit, to make it easier for people to see what it's about.

Got it, thanks for your reminder.:)

 
 E.g. Commit  e21d2170f36602ae2708 (video: remove unnecessary
 platform_set_drvdata()) removed the unnecessary platform_set_drvdata(),
 but left the variable dev unused, delete it.

This is easier reading. I'll update it.

Regards,
Gu


 
 Thanks!
 
 Gr{oetje,eeting}s,
 
 Geert
 
 --
 Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
 ge...@linux-m68k.org
 
 In personal conversations with technical people, I call myself a hacker. But
 when I'm talking to journalists I just say programmer or something like 
 that.
 -- Linus Torvalds
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] driver/vga16fb.c: remove the unused variable dev of function vga16fb_destroy()

2013-07-25 Thread Gu Zheng

Commit e21d2170f36602ae2708 (video: remove unnecessary
platform_set_drvdata()) removed the unnecessary platform_set_drvdata(),
but left the variable dev unused, delete it.

v2:
   Following Geert's suggestion to make change log easier reading.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/video/vga16fb.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/video/vga16fb.c b/drivers/video/vga16fb.c
index 830ded4..2827333 100644
--- a/drivers/video/vga16fb.c
+++ b/drivers/video/vga16fb.c
@@ -1265,7 +1265,6 @@ static void vga16fb_imageblit(struct fb_info *info, const 
struct fb_image *image
 
 static void vga16fb_destroy(struct fb_info *info)
 {
-   struct platform_device *dev = container_of(info-device, struct 
platform_device, dev);
iounmap(info-screen_base);
fb_dealloc_cmap(info-cmap);
/* XXX unshare VGA regions */
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: question about splice

2013-07-26 Thread Gu Zheng

Hi Jianpeng,

On 07/26/2013 03:08 PM, majianpeng wrote:

 Hi all,
   I used splice and found a prolem(at least i call).
 The demo is:
 A:splice(regularfileA---pipe);
 B:splice(pipe---regularfileB)
 Before do B, we modify the data of regA which now in pipe. The data to 
 regularfileB willbe change.
 If we used the buff
 A:read(regA, buff);
 B: write(buff, regB);
 After A, the contend of regA can't effect the buff.
 Review the code of splice,I know the pipe share the pagecache of regA.

Right. And also this is the splice's original design intention, using share 
mmap rather
than copy_to_user/copy_from_user in order to achieve zero-copy.

Thanks,
Gu

 Maybe this is not a problem or am i missing something?

 
 Thanks!
 Jianpeng 
 MaN嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{睉赙zXФ洝塄}财爖�j:+v墾�珣赙zZ+€�+zf＂穐殘啳嗃i�z�畐ア�?櫒璀��)撷f旟^j谦y呩@A玜囤�
 0鹅h�鍜i


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] fs/bio-integrity: fix a potential mem leak

2013-07-28 Thread Gu Zheng

Free the bio_integrity_pool in the fail path of biovec_create_pool
in function bioset_integrity_create().

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/bio-integrity.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 8fb4291..6025084 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -716,13 +716,14 @@ int bioset_integrity_create(struct bio_set *bs, int 
pool_size)
return 0;
 
bs-bio_integrity_pool = mempool_create_slab_pool(pool_size, bip_slab);
-
-   bs-bvec_integrity_pool = biovec_create_pool(bs, pool_size);
-   if (!bs-bvec_integrity_pool)
+   if (!bs-bio_integrity_pool)
return -1;
 
-   if (!bs-bio_integrity_pool)
+   bs-bvec_integrity_pool = biovec_create_pool(bs, pool_size);
+   if (!bs-bvec_integrity_pool) {
+   mempool_destroy(bs-bio_integrity_pool);
return -1;
+   }
 
return 0;
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] f2fs: move bio_private allocation out of f2fs_bio_alloc()

2013-07-28 Thread Gu Zheng

bio-bi_private is not always needed. As in the reading data path,
end_read_io does not need bio_private for further using, so moving
bio_private allocation out of f2fs_bio_alloc(). Alloc it in the
submit_write_page(), and ignore it in the f2fs_readpage().

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/data.c|1 -
 fs/f2fs/segment.c |   19 +++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c73c394..19cd7c6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -365,7 +365,6 @@ static void read_end_io(struct bio *bio, int err)
}
unlock_page(page);
} while (bvec = bio-bi_io_vec);
-   kfree(bio-bi_private);
bio_put(bio);
 }
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a86d125..9b74ae2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -611,18 +611,12 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 struct bio *f2fs_bio_alloc(struct block_device *bdev, int npages)
 {
struct bio *bio;
-   struct bio_private *priv;
-retry:
-   priv = kmalloc(sizeof(struct bio_private), GFP_NOFS);
-   if (!priv) {
-   cond_resched();
-   goto retry;
-   }
 
/* No failure on bio allocation */
bio = bio_alloc(GFP_NOIO, npages);
bio-bi_bdev = bdev;
-   bio-bi_private = priv;
+   bio-bi_private = NULL;
+
return bio;
 }
 
@@ -681,8 +675,17 @@ static void submit_write_page(struct f2fs_sb_info *sbi, 
struct page *page,
do_submit_bio(sbi, type, false);
 alloc_new:
if (sbi-bio[type] == NULL) {
+   struct bio_private *priv;
+retry:
+   priv = kmalloc(sizeof(struct bio_private), GFP_NOFS);
+   if (!priv) {
+   cond_resched();
+   goto retry;
+   }
+
sbi-bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
sbi-bio[type]-bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
+   sbi-bio[type]-bi_private = priv;
/*
 * The end_io will be assigned at the sumbission phase.
 * Until then, let bio_add_page() merge consecutive IOs as much
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] Add namespace support for syslog v2

2013-07-29 Thread Gu Zheng

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

 This patchset introduces a system log namespace.
 
 It is the 2nd version. The link of the 1st version is 
 http://lwn.net/Articles/525728/. In that version, syslog_
 namespace was added into nsproxy and created through a new
 clone flag CLONE_SYSLOG when cloning a process. 
 
 There were some discussion in last November about the 1st 
 version. This version used these important advice, and 
 referred to Serge's patch(http://lwn.net/Articles/525629/).
 
 Unlike the 1st version, in this patchset, syslog namespace 
 is tied to a user namespace. Add we must create a new user 
 ns before create a new syslog ns, because that will make 
 users have full capabilities in this new userns after 
 cloning a new user ns. The syslog namespace can be created 
 through a new command(11) to __NR_syslog syscall. That owe 
 to a new syslog flag SYSLOG_ACTION_NEW_NS.
 
 In syslog_namespace, some necessary identifiers for handling 
 syslog buf are containerized. When one container creates a
 new syslog ns, individual buf will be allocated to store log
 ownned this container. 
 
 A new interface ns_printk is added to print the logs which 
 we want to see in the container. Through ns_printk, we can 
 get more logs related to a specific net ns, for instance, 
 iptables. Here we use it to report iptable logs per 
 contianer.
 
 Then default printk targeted at the init_syslog_ns will 
 continue to print out most kernel log to host.
 
 One task in a new syslog ns could affect only current 
 container through dmesg, dmesg -c and /dev/kmsg 
 actions. The read/write interface such as /dev/kmsg, 
 /pro/kmsg and syslog syscall continue to be useful for 
 container users.
 
 This patchset is based on linus' linux tree.

Changelog details between V2 and V1 is seriously needed, the inline description
is not easy reading for other guys.

 
 Rui Xiang (9):
   syslog_ns: add syslog_namespace and put/get_syslog_ns
   syslog_ns: add syslog_ns into user_namespace
   syslog_ns: add init syslog_ns for global syslog
   syslog_ns: make syslog handling per namespace
   syslog_ns: make permisiion check per user namespace
   syslog_ns: use init syslog_ns for console action
   syslog_ns: implement function for creating syslog ns
   syslog_ns: implement ns_printk for specific syslog_ns
   netfilter: use ns_printk in iptable context
 
  fs/proc/kmsg.c |  17 +-
  include/linux/printk.h |   5 +-
  include/linux/syslog.h |  79 -
  include/linux/user_namespace.h |   2 +
  include/net/netfilter/xt_log.h |   6 +-
  kernel/printk.c| 642 
 -
  kernel/sysctl.c|   3 +-
  kernel/user.c  |   3 +
  kernel/user_namespace.c|   4 +
  net/netfilter/xt_LOG.c |   4 +-
  10 files changed, 493 insertions(+), 272 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/9] syslog_ns: add syslog_namespace and put/get_syslog_ns

2013-07-29 Thread Gu Zheng

Hi Rui,
Refer to inline:).

On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add a struct syslog_namespace which contains the necessary
 members for hanlding syslog and realize get_syslog_ns and
 put_syslog_ns API.
 
 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  include/linux/syslog.h | 68 
 ++
  kernel/printk.c|  7 --
  2 files changed, 68 insertions(+), 7 deletions(-)
 
 diff --git a/include/linux/syslog.h b/include/linux/syslog.h
 index 98a3153..425fafe 100644
 --- a/include/linux/syslog.h
 +++ b/include/linux/syslog.h
 @@ -21,6 +21,9 @@
  #ifndef _LINUX_SYSLOG_H
  #define _LINUX_SYSLOG_H
  
 +#include linux/slab.h
 +#include linux/kref.h
 +
  /* Close the log.  Currently a NOP. */
  #define SYSLOG_ACTION_CLOSE  0
  /* Open the log. Currently a NOP. */
 @@ -47,6 +50,71 @@
  #define SYSLOG_FROM_READER   0
  #define SYSLOG_FROM_PROC 1
  
 +enum log_flags {
 + LOG_NOCONS  = 1,/* already flushed, do not print to console */
 + LOG_NEWLINE = 2,/* text ended with a newline */
 + LOG_PREFIX  = 4,/* text started with a prefix */
 + LOG_CONT= 8,/* text is a fragment of a continuation line */
 +};
 +
 +struct syslog_namespace {
 + struct kref kref;   /* syslog_ns reference count  control */
 +
 + raw_spinlock_t logbuf_lock; /* access conflict locker */
 + /* cpu currently holding logbuf_lock of ns */
 + unsigned int logbuf_cpu;
 +
 + /* index and sequence number of the first record stored in the buffer */
 + u64 log_first_seq;
 + u32 log_first_idx;
 +
 + /* index and sequence number of the next record stored in the buffer */
 + u64 log_next_seq;
 + u32 log_next_idx;
 +
 + /* the next printk record to read after the last 'clear' command */
 + u64 clear_seq;
 + u32 clear_idx;
 +
 + char *log_buf;
 + u32 log_buf_len;
 +
 + /* the next printk record to write to the console */
 + u64 console_seq;
 + u32 console_idx;
 +
 + /* the next printk record to read by syslog(READ) or /proc/kmsg */
 + u64 syslog_seq;
 + u32 syslog_idx;
 + enum log_flags syslog_prev;
 + size_t syslog_partial;
 +
 + int dmesg_restrict;
 +};
 +
 +static inline struct syslog_namespace *get_syslog_ns(
 + struct syslog_namespace *ns)
 +{
 + if (ns)
 + kref_get(ns-kref);
 + return ns;
 +}
 +
 +static inline void free_syslog_ns(struct kref *kref)
 +{
 + struct syslog_namespace *ns;
 + ns = container_of(kref, struct syslog_namespace, kref);
 +
 + kfree(ns-log_buf);
 + kfree(ns);
 +}

This interface seems a bit ugly, why not use the format like put_syslog_ns()?

static inline void free_syslog_ns(struct syslog_namespace *ns)

 +
 +static inline void put_syslog_ns(struct syslog_namespace *ns)
 +{
 + if (ns)
 + kref_put(ns-kref, free_syslog_ns);
 +}
 +
  int do_syslog(int type, char __user *buf, int count, bool from_file);
  
  #endif /* _LINUX_SYSLOG_H */
 diff --git a/kernel/printk.c b/kernel/printk.c
 index d37d45c..7e544bf 100644
 --- a/kernel/printk.c
 +++ b/kernel/printk.c
 @@ -193,13 +193,6 @@ static int console_may_schedule;
   * separated by ',', and find the message after the ';' character.
   */
  
 -enum log_flags {
 - LOG_NOCONS  = 1,/* already flushed, do not print to console */
 - LOG_NEWLINE = 2,/* text ended with a newline */
 - LOG_PREFIX  = 4,/* text started with a prefix */
 - LOG_CONT= 8,/* text is a fragment of a continuation line */
 -};
 -
  struct log {
   u64 ts_nsec;/* timestamp in nanoseconds */
   u16 len;/* length of entire record */


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/9] syslog_ns: add syslog_ns into user_namespace

2013-07-29 Thread Gu Zheng

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add a syslog_ns pointer to user_namespace, and make
 syslog_ns per user_namespace, not global.
 
 Since syslog_ns is assigned to user_ns, we can have
 full capabilities in new user_ns to create a new syslog_ns.
 
 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  include/linux/syslog.h | 5 +
  include/linux/user_namespace.h | 1 +
  2 files changed, 6 insertions(+)
 
 diff --git a/include/linux/syslog.h b/include/linux/syslog.h
 index 425fafe..62ce47f 100644
 --- a/include/linux/syslog.h
 +++ b/include/linux/syslog.h
 @@ -90,6 +90,11 @@ struct syslog_namespace {
   size_t syslog_partial;
  
   int dmesg_restrict;
 +
 + /*
 +  * user namespace which owns this syslog ns.
 +  */
 + struct user_namespace *owner;
  };
  
  static inline struct syslog_namespace *get_syslog_ns(
 diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
 index b6b215f..ce2de5b 100644
 --- a/include/linux/user_namespace.h
 +++ b/include/linux/user_namespace.h
 @@ -28,6 +28,7 @@ struct user_namespace {
   unsigned intproc_inum;
   boolmay_mount_sysfs;
   boolmay_mount_proc;
 + struct syslog_namespace *syslog_ns;

As we add a syslog_ns pointer to user_namespace to make
syslog_ns per user_namespace and the caps check.
But why also add a point to syslog_namespace in
user_namespace? Am I missing something?:)

Thanks,
Gu

  };
  
  extern struct user_namespace init_user_ns;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/9] syslog_ns: make syslog handling per namespace

2013-07-29 Thread Gu Zheng

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

 This patch makes syslog buf and other fields per
 namespace.
 
 Here use ns-log_buf(log_buf_len, logbuf_lock,
 log_first_seq, logbuf_lock, and so on) fields
 instead of global ones to handle syslog.
 
 Syslog interfaces such as /dev/kmsg, /proc/kmsg,
 and syslog syscall are all containerized for
 container users.
 
 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  fs/proc/kmsg.c |  17 +-
  include/linux/printk.h |   1 -
  include/linux/syslog.h |   3 +-
  kernel/printk.c| 507 
 +
  kernel/sysctl.c|   3 +-
  5 files changed, 273 insertions(+), 258 deletions(-)
 
 diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
 index bdfabda..cb98431 100644
 --- a/fs/proc/kmsg.c
 +++ b/fs/proc/kmsg.c
 @@ -13,6 +13,8 @@
  #include linux/proc_fs.h
  #include linux/fs.h
  #include linux/syslog.h
 +#include linux/cred.h
 +#include linux/user_namespace.h
  
  #include asm/uaccess.h
  #include asm/io.h
 @@ -21,12 +23,14 @@ extern wait_queue_head_t log_wait;
  
  static int kmsg_open(struct inode * inode, struct file * file)
  {
 - return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC);
 + return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC,
 + file-f_cred-user_ns-syslog_ns);

How about adding a help function to get the syslog_ns that file belongs to?
 

  }
  
  static int kmsg_release(struct inode * inode, struct file * file)
  {
 - (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC);
 + (void) do_syslog(SYSLOG_ACTION_CLOSE, NULL, 0, SYSLOG_FROM_PROC,
 + file-f_cred-user_ns-syslog_ns);
   return 0;
  }
  
 @@ -34,15 +38,18 @@ static ssize_t kmsg_read(struct file *file, char __user 
 *buf,
size_t count, loff_t *ppos)
  {
   if ((file-f_flags  O_NONBLOCK) 
 - !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
 + !do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
 + file-f_cred-user_ns-syslog_ns))
   return -EAGAIN;
 - return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC);
 + return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_PROC,
 + file-f_cred-user_ns-syslog_ns);
  }
  
  static unsigned int kmsg_poll(struct file *file, poll_table *wait)
  {
   poll_wait(file, log_wait, wait);
 - if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC))
 + if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC,
 + file-f_cred-user_ns-syslog_ns))
   return POLLIN | POLLRDNORM;
   return 0;
  }
 diff --git a/include/linux/printk.h b/include/linux/printk.h
 index 22c7052..29e3f85 100644
 --- a/include/linux/printk.h
 +++ b/include/linux/printk.h
 @@ -139,7 +139,6 @@ extern bool printk_timed_ratelimit(unsigned long 
 *caller_jiffies,
  unsigned int interval_msec);
  
  extern int printk_delay_msec;
 -extern int dmesg_restrict;
  extern int kptr_restrict;
  
  extern void wake_up_klogd(void);
 diff --git a/include/linux/syslog.h b/include/linux/syslog.h
 index 363bc56..fbf0cb6 100644
 --- a/include/linux/syslog.h
 +++ b/include/linux/syslog.h
 @@ -120,7 +120,8 @@ static inline void put_syslog_ns(struct syslog_namespace 
 *ns)
   kref_put(ns-kref, free_syslog_ns);
  }
  
 -int do_syslog(int type, char __user *buf, int count, bool from_file);
 +int do_syslog(int type, char __user *buf, int count, bool from_file,
 + struct syslog_namespace *ns);
  
  extern struct syslog_namespace init_syslog_ns;
  #endif /* _LINUX_SYSLOG_H */
 diff --git a/kernel/printk.c b/kernel/printk.c
 index fd83ec1..846fef5 100644
 --- a/kernel/printk.c
 +++ b/kernel/printk.c
 @@ -213,29 +213,8 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
  
  #ifdef CONFIG_PRINTK
  DECLARE_WAIT_QUEUE_HEAD(log_wait);
 -/* the next printk record to read by syslog(READ) or /proc/kmsg */
 -static u64 syslog_seq;
 -static u32 syslog_idx;
 -static enum log_flags syslog_prev;
 -static size_t syslog_partial;
 -
 -/* index and sequence number of the first record stored in the buffer */
 -static u64 log_first_seq;
 -static u32 log_first_idx;
 -
 -/* index and sequence number of the next record to store in the buffer */
 -static u64 log_next_seq;
 -static u32 log_next_idx;
 -
 -/* the next printk record to write to the console */
 -static u64 console_seq;
 -static u32 console_idx;
  static enum log_flags console_prev;
  
 -/* the next printk record to read after the last 'clear' command */
 -static u64 clear_seq;
 -static u32 clear_idx;
 -
  #define PREFIX_MAX   32
  #define LOG_LINE_MAX 1024 - PREFIX_MAX
  
 @@ -246,12 +225,8 @@ static u32 clear_idx;
  #define LOG_ALIGN __alignof__(struct log)
  #endif
  #define

Re: [PATCH 2/9] syslog_ns: add syslog_ns into user_namespace

2013-07-29 Thread Gu Zheng

On 07/29/2013 05:54 PM, Gao feng wrote:

 On 07/29/2013 05:46 PM, Gu Zheng wrote:
 Hi Rui,

 On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add a syslog_ns pointer to user_namespace, and make
 syslog_ns per user_namespace, not global.

 Since syslog_ns is assigned to user_ns, we can have
 full capabilities in new user_ns to create a new syslog_ns.

 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  include/linux/syslog.h | 5 +
  include/linux/user_namespace.h | 1 +
  2 files changed, 6 insertions(+)

 diff --git a/include/linux/syslog.h b/include/linux/syslog.h
 index 425fafe..62ce47f 100644
 --- a/include/linux/syslog.h
 +++ b/include/linux/syslog.h
 @@ -90,6 +90,11 @@ struct syslog_namespace {
 size_t syslog_partial;
  
 int dmesg_restrict;
 +
 +   /*
 +* user namespace which owns this syslog ns.
 +*/
 +   struct user_namespace *owner;
  };
  
  static inline struct syslog_namespace *get_syslog_ns(
 diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
 index b6b215f..ce2de5b 100644
 --- a/include/linux/user_namespace.h
 +++ b/include/linux/user_namespace.h
 @@ -28,6 +28,7 @@ struct user_namespace {
 unsigned intproc_inum;
 boolmay_mount_sysfs;
 boolmay_mount_proc;
 +   struct syslog_namespace *syslog_ns;

 As we add a syslog_ns pointer to user_namespace to make
 syslog_ns per user_namespace and the caps check.
 But why also add a point to syslog_namespace in
 user_namespace? Am I missing something?:)

 
 yep,with this we can make sure all the other types of namespace such as 
 mount, net, pid
 can access syslog_ns through user namespace.

Got it.:)

Thanks,
Gu

 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

2013-07-29 Thread Gu Zheng

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add create_syslog_ns function to create a new ns. We
 must create a user_ns before create a new syslog ns.
 And then tie the new syslog_ns to current user_ns
 instead of original syslog_ns which comes from
 parent user_ns.
 
 Add a new syslog flag SYSLOG_ACTION_NEW_NS to implement
 a new command(11) of __NR_syslog system call. Through
 that command, we can create a new syslog ns in user
 space.
 
 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  include/linux/syslog.h |  2 ++
  kernel/printk.c| 52 
 ++
  2 files changed, 54 insertions(+)
 
 diff --git a/include/linux/syslog.h b/include/linux/syslog.h
 index fbf0cb6..df57c21 100644
 --- a/include/linux/syslog.h
 +++ b/include/linux/syslog.h
 @@ -46,6 +46,8 @@
  #define SYSLOG_ACTION_SIZE_UNREAD9
  /* Return size of the log buffer */
  #define SYSLOG_ACTION_SIZE_BUFFER   10
 +/* Create a new syslog ns */
 +#define SYSLOG_ACTION_NEW_NS11
  
  #define SYSLOG_FROM_READER   0
  #define SYSLOG_FROM_PROC 1
 diff --git a/kernel/printk.c b/kernel/printk.c
 index fd2d600..6b561db 100644
 --- a/kernel/printk.c
 +++ b/kernel/printk.c
 @@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool 
 from_file,
   || type == SYSLOG_ACTION_CONSOLE_LEVEL)
   ns = init_syslog_ns;
  
 + /* create a new syslog ns */
 + if (type == SYSLOG_ACTION_NEW_NS)
 + return 0;
 +

Don't we need further permission or caps check here? Return success directly 
seems sloppy. 

Thanks,
Gu

   if (syslog_action_restricted(type, ns)) {
   if (ns_capable(ns-owner, CAP_SYSLOG))
   return 0;
 @@ -1131,6 +1135,51 @@ static int syslog_print_all(char __user *buf, int 
 size, bool clear,
   return len;
  }
  
 +static int create_syslog_ns(void)
 +{
 + struct user_namespace *userns = current_user_ns();
 + struct syslog_namespace *oldns, *newns;
 + int err;
 +
 + /*
 +  * syslog ns belongs to a user ns.  So you can only unshare your
 +  * user_ns if you share a user_ns with your parent userns
 +  */
 + if (userns == init_user_ns ||
 + userns-syslog_ns != userns-parent-syslog_ns)
 + return -EINVAL;
 +
 + if (!ns_capable(userns, CAP_SYSLOG))
 + return -EPERM;
 +
 + err = -ENOMEM;
 + oldns = userns-syslog_ns;
 + newns = kzalloc(sizeof(*newns), GFP_ATOMIC);
 + if (!newns)
 + goto out;
 + newns-log_buf_len = __LOG_BUF_LEN;
 + newns-log_buf = kzalloc(newns-log_buf_len, GFP_ATOMIC);
 + if (!newns-log_buf)
 + goto out;
 +
 + newns-owner = get_user_ns(userns);
 + raw_spin_lock_init((newns-logbuf_lock));
 + newns-logbuf_cpu = UINT_MAX;
 + newns-dmesg_restrict = oldns-dmesg_restrict;
 + put_syslog_ns(oldns);
 + kref_init(newns-kref);
 + userns-syslog_ns = newns;
 + newns = NULL;
 +
 + err = 0;
 +out:
 + if (newns) {
 + kfree(newns-log_buf);
 + kfree(newns);
 + }
 + return err;
 +}
 +
  int do_syslog(int type, char __user *buf, int len, bool from_file,
   struct syslog_namespace *ns)
  {
 @@ -1254,6 +1303,9 @@ int do_syslog(int type, char __user *buf, int len, bool 
 from_file,
   case SYSLOG_ACTION_SIZE_BUFFER:
   error = ns-log_buf_len;
   break;
 + case SYSLOG_ACTION_NEW_NS:
 + error = create_syslog_ns();
 + break;
   default:
   error = -EINVAL;
   break;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/9] syslog_ns: implement ns_printk for specific syslog_ns

2013-07-29 Thread Gu Zheng

Hi Rui,

On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add a new interface named ns_printk, and assign an
 patamater ns. Log which belong to a container can
 be printed by ns_printk.

One question, with the syslog_ns used, do the log we print by *printk* in the
host contains the log in each syslog_ns(print out with ns_printk) or not?

Thanks,
Gu

 
 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  include/linux/printk.h |  4 
  kernel/printk.c| 53 
 ++
  2 files changed, 53 insertions(+), 4 deletions(-)
 
 diff --git a/include/linux/printk.h b/include/linux/printk.h
 index 29e3f85..bf83ad9 100644
 --- a/include/linux/printk.h
 +++ b/include/linux/printk.h
 @@ -6,6 +6,7 @@
  #include linux/kern_levels.h
  #include linux/linkage.h
  
 +struct syslog_namespace;
  extern const char linux_banner[];
  extern const char linux_proc_banner[];
  
 @@ -123,6 +124,9 @@ asmlinkage int printk_emit(int facility, int level,
  asmlinkage __printf(1, 2) __cold
  int printk(const char *fmt, ...);
  
 +asmlinkage __printf(2, 3) __cold
 +int ns_printk(struct syslog_namespace *ns, const char *fmt, ...);
 +
  /*
   * Special printk facility for scheduler use only, _DO_NOT_USE_ !
   */
 diff --git a/kernel/printk.c b/kernel/printk.c
 index 6b561db..56a8b27 100644
 --- a/kernel/printk.c
 +++ b/kernel/printk.c
 @@ -1554,9 +1554,10 @@ static size_t cont_print_text(char *text, size_t size)
   return textlen;
  }
  
 -asmlinkage int vprintk_emit(int facility, int level,
 - const char *dict, size_t dictlen,
 - const char *fmt, va_list args)
 +static int ns_vprintk_emit(int facility, int level,
 + const char *dict, size_t dictlen,
 + const char *fmt, va_list args,
 + struct syslog_namespace *ns)
  {
   static int recursion_bug;
   static char textbuf[LOG_LINE_MAX];
 @@ -1566,7 +1567,6 @@ asmlinkage int vprintk_emit(int facility, int level,
   unsigned long flags;
   int this_cpu;
   int printed_len = 0;
 - struct syslog_namespace *ns = init_syslog_ns;
  
   boot_delay_msec(level);
   printk_delay();
 @@ -1697,6 +1697,14 @@ out_restore_irqs:
  
   return printed_len;
  }
 +
 +asmlinkage int vprintk_emit(int facility, int level,
 + const char *dict, size_t dictlen,
 + const char *fmt, va_list args)
 +{
 + return ns_vprintk_emit(facility, level, dict, dictlen, fmt, args,
 + init_syslog_ns);
 +}
  EXPORT_SYMBOL(vprintk_emit);
  
  asmlinkage int vprintk(const char *fmt, va_list args)
 @@ -1762,6 +1770,43 @@ asmlinkage int printk(const char *fmt, ...)
  }
  EXPORT_SYMBOL(printk);
  
 +/**
 + * ns_printk - print a kernel message in syslog_ns
 + * @ns: syslog namespace
 + * @fmt: format string
 + *
 + * This is ns_printk().
 + * It can be called from container context. We add a param
 + * ns to record current syslog namespace, because we need to
 + * print some log which are not generated by host, but contaner.
 + *
 + * See the vsnprintf() documentation for format string extensions over C99.
 + **/
 +asmlinkage int ns_printk(struct syslog_namespace *ns,
 + const char *fmt, ...)
 +{
 + va_list args;
 + int r;
 +
 + if (!ns)
 + ns = current_user_ns()-syslog_ns;
 +
 +#ifdef CONFIG_KGDB_KDB
 + if (unlikely(kdb_trap_printk)) {
 + va_start(args, fmt);
 + r = vkdb_printf(fmt, args);
 + va_end(args);
 + return r;
 + }
 +#endif
 + va_start(args, fmt);
 + r = ns_vprintk_emit(0, -1, NULL, 0, fmt, args, ns);
 + va_end(args);
 +
 + return r;
 +}
 +EXPORT_SYMBOL(ns_printk);
 +

Here can we do some clean up to printk using ns_printk?

  #else /* CONFIG_PRINTK */
  
  #define LOG_LINE_MAX 0


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/9] syslog_ns: add syslog_namespace and put/get_syslog_ns

2013-07-29 Thread Gu Zheng

On 07/29/2013 07:47 PM, Rui Xiang wrote:

 On 2013/7/29 17:40, Gu Zheng wrote:
 Hi Rui,
  Refer to inline:).

 Hi Gu,
 
 Thanks for your attention.
 
 On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add a struct syslog_namespace which contains the necessary
 members for hanlding syslog and realize get_syslog_ns and
 put_syslog_ns API.

 Signed-off-by: Rui Xiang rui.xi...@huawei.com
 ---
  include/linux/syslog.h | 68 
 ++
  kernel/printk.c|  7 --
  2 files changed, 68 insertions(+), 7 deletions(-)

 
 ...
 
 +
 +static inline void free_syslog_ns(struct kref *kref)
 +{
 +   struct syslog_namespace *ns;
 +   ns = container_of(kref, struct syslog_namespace, kref);
 +
 +   kfree(ns-log_buf);
 +   kfree(ns);
 +}

 This interface seems a bit ugly, why not use the format like put_syslog_ns()?

 static inline void free_syslog_ns(struct syslog_namespace *ns)

 
 Free_syslog_ns is used in put_syslog_ns. And the kref_put function uses kref 
 as
 a parameter for its relase funtion. You can see that from 
 static inline int kref_put(struct kref *kref, void (*release)(struct kref 
 *kref)).

Got it.

Regards,
Gu

 
 Thanks.
 
 +
 +static inline void put_syslog_ns(struct syslog_namespace *ns)
 +{
 +   if (ns)
 +   kref_put(ns-kref, free_syslog_ns);
 +}
 +

 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] syslog_ns: implement function for creating syslog ns

2013-07-29 Thread Gu Zheng

On 07/30/2013 11:39 AM, Rui Xiang wrote:

 On 2013/7/29 18:25, Gu Zheng wrote:
 Hi Rui,

 On 07/29/2013 10:31 AM, Rui Xiang wrote:

 Add create_syslog_ns function to create a new ns. We
 must create a user_ns before create a new syslog ns.
 And then tie the new syslog_ns to current user_ns
 instead of original syslog_ns which comes from
 parent user_ns.
 
 ...
 
 diff --git a/kernel/printk.c b/kernel/printk.c
 index fd2d600..6b561db 100644
 --- a/kernel/printk.c
 +++ b/kernel/printk.c
 @@ -384,6 +384,10 @@ static int check_syslog_permissions(int type, bool 
 from_file,
 || type == SYSLOG_ACTION_CONSOLE_LEVEL)
 ns = init_syslog_ns;
  
 +   /* create a new syslog ns */
 +   if (type == SYSLOG_ACTION_NEW_NS)
 +   return 0;
 +

 Don't we need further permission or caps check here? Return success directly 
 seems sloppy. 

 CAP_SYSLOG is checked in create_syslog_ns, so I think we can return 0 
 temporarily.

If so, why not move the check here? IMO, permission checking is the earlier the 
better,
what's your opinion?

Regards,
Gu

 
 
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-25 Thread Gu Zheng

Hi Rafael,

On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki rafael.j.wyso...@intel.com
 
 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove physical device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.
 
 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the online device attribute of CPUs and memory
 blocks and the eject attribute of ACPI device objects are affected
 by this issue.]
 
 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]
 
 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.
 
 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Reported-by: Gu Zheng guz.f...@cn.fujitsu.com

I'm sorry to forget to mention that the original reporter is
Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
the investigation and found more issues.

We tested this patch on kernel 3.11-rc6, but it seems that the
issue is still there. Detail info as following.

Thanks,
Gu

==  

 
[ INFO: possible circular locking dependency detected ] 

 
3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF

 
--- 

 
kworker/0:2/754 is trying to acquire lock:  

 
 (s_active#73){.+}, at: [8121062b] sysfs_addrm_finish+0x3b/0x70   

 


 
but task is already holding lock:   

 
 (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
remove_memory_block+0x1d/0xa0   



 
which lock already depends on the new lock. 

 


 


 
the existing dependency chain (in reverse order

[PATCH] drivers/base/memory.c: introduce help macro to_memory_block

2013-08-26 Thread Gu Zheng

Introduce help macro to_memory_block to hide the 
conversion(device--memory_block),
just clean up.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/base/memory.c |   27 ---
 1 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 2b7813e..4a874c6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -30,6 +30,8 @@ static DEFINE_MUTEX(mem_sysfs_mutex);
 
 #define MEMORY_CLASS_NAME  memory
 
+#define to_memory_block(dev) container_of(dev, struct memory_block, dev)
+
 static int sections_per_block;
 
 static inline int base_memory_block_id(int section_nr)
@@ -77,7 +79,7 @@ EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
 static void memory_block_release(struct device *dev)
 {
-   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
 
kfree(mem);
 }
@@ -110,8 +112,7 @@ static unsigned long get_memory_block_size(void)
 static ssize_t show_mem_start_phys_index(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
phys_index = mem-start_section_nr / sections_per_block;
@@ -121,8 +122,7 @@ static ssize_t show_mem_start_phys_index(struct device *dev,
 static ssize_t show_mem_end_phys_index(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
phys_index = mem-end_section_nr / sections_per_block;
@@ -137,8 +137,7 @@ static ssize_t show_mem_removable(struct device *dev,
 {
unsigned long i, pfn;
int ret = 1;
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
 
for (i = 0; i  sections_per_block; i++) {
pfn = section_nr_to_pfn(mem-start_section_nr + i);
@@ -154,8 +153,7 @@ static ssize_t show_mem_removable(struct device *dev,
 static ssize_t show_mem_state(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
ssize_t len = 0;
 
/*
@@ -280,7 +278,7 @@ static int __memory_block_change_state(struct memory_block 
*mem,
 
 static int memory_subsys_online(struct device *dev)
 {
-   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
int ret;
 
mutex_lock(mem-state_mutex);
@@ -295,7 +293,7 @@ static int memory_subsys_online(struct device *dev)
 
 static int memory_subsys_offline(struct device *dev)
 {
-   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
int ret;
 
mutex_lock(mem-state_mutex);
@@ -349,7 +347,7 @@ store_mem_state(struct device *dev,
bool offline;
int ret = -EINVAL;
 
-   mem = container_of(dev, struct memory_block, dev);
+   mem = to_memory_block(dev);
 
lock_device_hotplug();
 
@@ -392,8 +390,7 @@ store_mem_state(struct device *dev,
 static ssize_t show_phys_device(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
return sprintf(buf, %d\n, mem-phys_device);
 }
 
@@ -525,7 +522,7 @@ struct memory_block *find_memory_block_hinted(struct 
mem_section *section,
put_device(hint-dev);
if (!dev)
return NULL;
-   return container_of(dev, struct memory_block, dev);
+   return to_memory_block(dev);
 }
 
 /*
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng

Hi Rafael,

On 08/26/2013 10:43 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,

 Hi,

 On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki rafael.j.wyso...@intel.com

 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove physical device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.

 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the online device attribute of CPUs and memory
 blocks and the eject attribute of ACPI device objects are affected
 by this issue.]

 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]

 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.

 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Reported-by: Gu Zheng guz.f...@cn.fujitsu.com

 I'm sorry to forget to mention that the original reporter is
 Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
 the investigation and found more issues.

 We tested this patch on kernel 3.11-rc6, but it seems that the
 issue is still there. Detail info as following.

 Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
 because we'll need to take acpi_scan_lock during system suspend for PCI hot
 remove to work and that's under pm_mutex.  So I wonder if we can simply
 drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
 a side note, because dropping it won't help here.

 Now -

 ==  
 
  
 [ INFO: possible circular locking dependency detected ] 
 
  
 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF
 
  
 --- 
 
  
 kworker/0:2/754 is trying to acquire lock:  
 
  
  (s_active#73){.+}, at: [8121062b] 
 sysfs_addrm_finish+0x3b/0x70
 
 
 
  
 but task is already holding lock:   
 
  
  (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
 remove_memory_block+0x1d/0xa0   
 
 
 
  
 which lock already depends on the new lock

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng

Hi Rafael,

On 08/26/2013 10:43 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,

 Hi,

 On 08/26/2013 04:09 AM, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki rafael.j.wyso...@intel.com

 There are two mutexes, device_hotplug_lock and acpi_scan_lock, held
 around the acpi_bus_trim() call in acpi_scan_hot_remove() which
 generally removes devices (it removes ACPI device objects at least,
 but it may also remove physical device objects through .detach()
 callbacks of ACPI scan handlers).  Thus, potentially, device sysfs
 attributes are removed under these locks and to remove those
 attributes it is necessary to hold the s_active references of their
 directory entries for writing.

 On the other hand, the execution of a .show() or .store() callback
 from a sysfs attribute is carried out with that attribute's s_active
 reference held for reading.  Consequently, if any device sysfs
 attribute that may be removed from within acpi_scan_hot_remove()
 through acpi_bus_trim() has a .store() or .show() callback which
 acquires either acpi_scan_lock or device_hotplug_lock, the execution
 of that callback may deadlock with the removal of the attribute.
 [Unfortunately, the online device attribute of CPUs and memory
 blocks and the eject attribute of ACPI device objects are affected
 by this issue.]

 To avoid those deadlocks introduce a new protection mechanism that
 can be used by the device sysfs attributes in question.  Namely,
 if a device sysfs attribute's .store() or .show() callback routine
 is about to acquire device_hotplug_lock or acpi_scan_lock, it can
 first execute read_lock_device_remove() and return an error code if
 that function returns false.  If true is returned, the lock in
 question may be acquired and read_unlock_device_remove() must be
 called.  [This mechanism is implemented by means of an additional
 rwsem in drivers/base/core.c.]

 Make the affected sysfs attributes in the driver core and ACPI core
 use read_lock_device_remove() and read_unlock_device_remove() as
 described above.

 Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com
 Reported-by: Gu Zheng guz.f...@cn.fujitsu.com

 I'm sorry to forget to mention that the original reporter is
 Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com. I continued
 the investigation and found more issues.

 We tested this patch on kernel 3.11-rc6, but it seems that the
 issue is still there. Detail info as following.

 Well, taking pm_mutex under acpi_scan_lock (trace #2) is a bad idea anyway,
 because we'll need to take acpi_scan_lock during system suspend for PCI hot
 remove to work and that's under pm_mutex.  So I wonder if we can simply
 drop the system sleep locking from lock/unlock_memory_hotplug().  But that's
 a side note, because dropping it won't help here.

 Now -

 ==  
 
  
 [ INFO: possible circular locking dependency detected ] 
 
  
 3.11.0-rc6-lockdebug-refea+ #162 Tainted: GF
 
  
 --- 
 
  
 kworker/0:2/754 is trying to acquire lock:  
 
  
  (s_active#73){.+}, at: [8121062b] 
 sysfs_addrm_finish+0x3b/0x70
 
 
 
  
 but task is already holding lock:   
 
  
  (mem_sysfs_mutex){+.+.+.}, at: [813b949d] 
 remove_memory_block+0x1d/0xa0   
 
 
 
  
 which lock already depends on the new lock

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-26 Thread Gu Zheng

Hi Rafael,

On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,
 
 [...]
 

 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.
 
 And if this patch does make the splat go away for you, please also test the
 appended one (Tejun, thanks for the hint!).

Yes, this one works too, and as expected, the ACPI part is still there.

Thanks,
Gu

==  

[ INFO: possible circular locking dependency detected ] 

3.11.0-rc6-fix-refeal-fix-01+ #171 Tainted: GF  

--- 

kworker/0:1/96 is trying to acquire lock:   

 (s_active#245){.+}, at: [8121062b] sysfs_addrm_finish+0x3b/0x70  



but task is already holding lock:   

 (device_hotplug_lock){+.+.+.}, at: [813a16b7] 
lock_device_hotplug+0x17/0x20  


which lock already depends on the new lock. 





the existing dependency chain (in reverse order) is:



- #2 (device_hotplug_lock){+.+.+.}:

   [810ba88c] validate_chain+0x70c/0x870  

   [810bad5f] __lock_acquire+0x36f/0x5f0  

   [810bb080] lock_acquire+0xa0/0x130 

   [8159779b] mutex_lock_nested+0x7b/0x3b0

   [813a16b7] lock_device_hotplug+0x17/0x20   

   [8131c131] acpi_scan_bus_device_check+0x33/0x10f   

   [8131c220] acpi_scan_device_check+0x13/0x15

   [81315dac] acpi_os_execute_deferred+0x27/0x34  

   [8106bec8] process_one_work+0x1e8/0x560

   [8106d0a0] worker_thread+0x120/0x3a0   

   [81073b5e] kthread+0xee/0x100  

   [815a5fdc] ret_from_fork+0x7c/0xb0 



- #1 (acpi_scan_lock){+.+.+.}: 

   [810ba88c] validate_chain+0x70c/0x870  

   [810bad5f] __lock_acquire+0x36f/0x5f0  

   [810bb080] lock_acquire+0xa0/0x130 

   [8159779b] mutex_lock_nested+0x7b/0x3b0

   [8131a58a] acpi_eject_store+0x88/0x170 

   [813a0f40] dev_attr_store+0x20/0x30

   [8120ed96] sysfs_write_file+0xe6/0x170 

   [81195bc8] vfs_write+0xc8/0x170

Re: [PATCH] f2fs: fix omitting to update inode page

2013-08-26 Thread Gu Zheng

On 08/26/2013 08:28 PM, Jaegeuk Kim wrote:

 The f2fs_set_link updates its parent inode number, so we should sync this to
 the inode block.
 Otherwise, the data can be lost after sudden-power-off.
 
 Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com
 ---
  fs/f2fs/namei.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
 index 4e47518..9e90d31 100644
 --- a/fs/f2fs/namei.c
 +++ b/fs/f2fs/namei.c
 @@ -447,6 +447,7 @@ static int f2fs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
   else
   release_orphan_inode(sbi);
  
 + update_inode_page(old_inode):

':' -- ';'

   update_inode_page(new_inode);
   } else {
   err = f2fs_add_link(new_dentry, old_inode);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Gu Zheng

Hi Rafael,

On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,
 
 [...]
 

 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.
 
 And if this patch does make the splat go away for you, please also test the
 appended one (Tejun, thanks for the hint!).
 
 I'll address the ACPI part differently later.

What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
attached one(With a preliminary test, it also can make the splat go away).:)

Regards,
Gu

 
[...]
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


From f1682ceaef4105f75f4d6a0bb8e77c8a5dde365b Mon Sep 17 00:00:00 2001
From: Gu Zheng guz.f...@cn.fujitsu.com
Date: Tue, 27 Aug 2013 17:59:55 +0900
Subject: [PATCH] acpi: fix removal lock dep


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/acpi/scan.c|   43 ++-
 drivers/acpi/sysfs.c   |7 +--
 drivers/base/core.c|   45 -
 drivers/base/memory.c  |5 +++--
 include/linux/device.h |8 ++--
 5 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 8a46c92..bb41760 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -36,7 +36,7 @@ bool acpi_force_hot_remove;
 static const char *dummy_hid = device;
 
 static LIST_HEAD(acpi_bus_id_list);
-static DEFINE_MUTEX(acpi_scan_lock);
+static DECLARE_RWSEM(acpi_scan_rwsem);
 static LIST_HEAD(acpi_scan_handlers_list);
 DEFINE_MUTEX(acpi_device_lock);
 LIST_HEAD(acpi_wakeup_device_list);
@@ -49,13 +49,13 @@ struct acpi_device_bus_id{
 
 void acpi_scan_lock_acquire(void)
 {
-   mutex_lock(acpi_scan_lock);
+   down_write(acpi_scan_rwsem);
 }
 EXPORT_SYMBOL_GPL(acpi_scan_lock_acquire);
 
 void acpi_scan_lock_release(void)
 {
-   mutex_unlock(acpi_scan_lock);
+   up_write(acpi_scan_rwsem);
 }
 EXPORT_SYMBOL_GPL(acpi_scan_lock_release);
 
@@ -207,7 +207,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
return -EINVAL;
}
 
-   lock_device_hotplug();
+   device_hotplug_begin();
 
/*
 * Carry out two passes here and ignore errors in the first pass,
@@ -240,7 +240,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
acpi_bus_online_companions, NULL,
NULL, NULL);
 
-   unlock_device_hotplug();
+   device_hotplug_end();
 
put_device(device-dev);
return -EBUSY;
@@ -252,7 +252,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 
acpi_bus_trim(device);
 
-   unlock_device_hotplug();
+   device_hotplug_end();
 
/* Device node has been unregistered. */
put_device(device-dev);
@@ -308,7 +308,7 @@ static void acpi_bus_device_eject(void *context)
struct acpi_scan_handler *handler;
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
 
-   mutex_lock(acpi_scan_lock);
+   acpi_scan_lock_acquire();
 
acpi_bus_get_device(handle, device);
if (!device)
@@ -334,7 +334,7 @@ static void acpi_bus_device_eject(void *context)
}
 
  out:
-   mutex_unlock(acpi_scan_lock);
+   acpi_scan_lock_release();
return;
 
  err_out:
@@ -349,8 +349,8 @@ static void acpi_scan_bus_device_check(acpi_handle handle, 
u32 ost_source)
u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
int error;
 
-   mutex_lock(acpi_scan_lock);
-   lock_device_hotplug();
+   acpi_scan_lock_acquire();
+   device_hotplug_begin();
 
if (ost_source != ACPI_NOTIFY_BUS_CHECK) {
acpi_bus_get_device(handle, device);
@@ -376,9 +376,9 @@ static void acpi_scan_bus_device_check(acpi_handle handle, 
u32 ost_source)
kobject_uevent(device-dev.kobj, KOBJ_ONLINE);
 
  out:
-   unlock_device_hotplug();
+   device_hotplug_end();
acpi_evaluate_hotplug_ost(handle, ost_source, ost_code, NULL);
-   mutex_unlock(acpi_scan_lock);
+   acpi_scan_lock_release();
 }
 
 static void acpi_scan_bus_check(void *context)
@@ -469,15 +469,14 @@ void acpi_bus_hot_remove_device(void *context)
acpi_handle handle = device-handle;
int error;
 
-   mutex_lock(acpi_scan_lock);
+   acpi_scan_lock_acquire();
 
error = acpi_scan_hot_remove(device);
if (error  handle)
acpi_evaluate_hotplug_ost(handle, ej_event-event

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-27 Thread Gu Zheng

Hi Toshi,

On 08/28/2013 05:38 AM, Toshi Kani wrote:

 On Tue, 2013-08-27 at 17:21 +0800, Gu Zheng wrote:
 Hi Rafael,

 On 08/26/2013 11:02 PM, Rafael J. Wysocki wrote:

 On Monday, August 26, 2013 04:43:26 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 02:42:09 PM Rafael J. Wysocki wrote:
 On Monday, August 26, 2013 11:13:13 AM Gu Zheng wrote:
 Hi Rafael,

 [...]


 OK, so the patch below is quick and dirty and overkill, but it should make 
 the
 splat go away at least.

 And if this patch does make the splat go away for you, please also test the
 appended one (Tejun, thanks for the hint!).

 I'll address the ACPI part differently later.

 What about changing device_hotplug_lock and acpi_scan_lock to rwsem? like the
 attached one(With a preliminary test, it also can make the splat go away).:)
 
 I am curious how msleep(10)  restart_syscall() work in the change
 below.  Doesn't the msleep() make s_active held longer time, which can
 lead the thread holding device_hotplug_lock to wait it for deletion?

Yes, but it can avoid busy waiting. 

 Also, does restart_syscall() release s_active and reopen this file
 again?

Sure, it just set a TIF_SIGPENDING flag and return an -ERESTARTNOINTR error, 
s_active/file
will be released/closed in the failed path. And when do_signal() catches the 
-ERESTARTNOINTR,
it will change the regs to restart the syscall.

Thanks,
Gu

 
 @@ -408,9 +408,13 @@ static ssize_t show_online(struct device *dev,
 struct device_attribute *attr,
  {
 bool val;
 
 -   lock_device_hotplug();
 +   if (!read_lock_device_hotplug()) {
 +   msleep(10);
 +   return restart_syscall();
 +   }
 +
 
 Thanks,
 -Toshi
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] drivers/base/memory.c: introduce help macro to_memory_block

2013-08-28 Thread Gu Zheng

Introduce help macro to_memory_block to hide the 
conversion(device--memory_block),
just clean up.

Reviewed-by: Yasuaki Ishimatsu  isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/base/memory.c |   29 -
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 2a38cd2..69e09a1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -29,6 +29,8 @@ static DEFINE_MUTEX(mem_sysfs_mutex);
 
 #define MEMORY_CLASS_NAME  memory
 
+#define to_memory_block(dev) container_of(dev, struct memory_block, dev)
+
 static int sections_per_block;
 
 static inline int base_memory_block_id(int section_nr)
@@ -76,7 +78,7 @@ EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
 static void memory_block_release(struct device *dev)
 {
-   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
 
kfree(mem);
 }
@@ -109,8 +111,7 @@ static unsigned long get_memory_block_size(void)
 static ssize_t show_mem_start_phys_index(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
phys_index = mem-start_section_nr / sections_per_block;
@@ -120,8 +121,7 @@ static ssize_t show_mem_start_phys_index(struct device *dev,
 static ssize_t show_mem_end_phys_index(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
unsigned long phys_index;
 
phys_index = mem-end_section_nr / sections_per_block;
@@ -136,8 +136,7 @@ static ssize_t show_mem_removable(struct device *dev,
 {
unsigned long i, pfn;
int ret = 1;
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
 
for (i = 0; i  sections_per_block; i++) {
pfn = section_nr_to_pfn(mem-start_section_nr + i);
@@ -153,8 +152,7 @@ static ssize_t show_mem_removable(struct device *dev,
 static ssize_t show_mem_state(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
ssize_t len = 0;
 
/*
@@ -282,7 +280,7 @@ static int memory_block_change_state(struct memory_block 
*mem,
 /* The device lock serializes operations on memory_subsys_[online|offline] */
 static int memory_subsys_online(struct device *dev)
 {
-   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
int ret;
 
if (mem-state == MEM_ONLINE)
@@ -306,7 +304,7 @@ static int memory_subsys_online(struct device *dev)
 
 static int memory_subsys_offline(struct device *dev)
 {
-   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
 
if (mem-state == MEM_OFFLINE)
return 0;
@@ -318,11 +316,9 @@ static ssize_t
 store_mem_state(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
 {
-   struct memory_block *mem;
+   struct memory_block *mem = to_memory_block(dev);
int ret, online_type;
 
-   mem = container_of(dev, struct memory_block, dev);
-
lock_device_hotplug();
 
if (!strncmp(buf, online_kernel, min_t(int, count, 13)))
@@ -376,8 +372,7 @@ store_mem_state(struct device *dev,
 static ssize_t show_phys_device(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   struct memory_block *mem =
-   container_of(dev, struct memory_block, dev);
+   struct memory_block *mem = to_memory_block(dev);
return sprintf(buf, %d\n, mem-phys_device);
 }
 
@@ -509,7 +504,7 @@ struct memory_block *find_memory_block_hinted(struct 
mem_section *section,
put_device(hint-dev);
if (!dev)
return NULL;
-   return container_of(dev, struct memory_block, dev);
+   return to_memory_block(dev);
 }
 
 /*
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver core / ACPI: Avoid device removal locking problems

2013-08-28 Thread Gu Zheng

Hi Rafael,

On 08/28/2013 05:45 AM, Rafael J. Wysocki wrote:

 On Tuesday, August 27, 2013 02:36:44 PM Tejun Heo wrote:
 Hello,

[...]
 
 I've thought about that a bit over the last several hours and I'm still
 thinking that that patch is a bit overkill, because it will trigger the
 restart_syscall() for all cases when device_hotplug_lock is locked, even
 if they can't lead to any deadlocks.  The only deadlockish situation is
 when device *removal* is in progress when store_online(), for example,
 is called.
 
 So to address that particular situation without adding too much overhead for
 other cases, I've come up with the appended patch (untested for now).
 
 This is how it is supposed to work.
 
 There are three lock levels for device hotplug, normal, remove
 and weak.  The difference is related to how __lock_device_hotplug()
 works.  Namely, if device hotplug is currently locked, that function
 will either block or return false, depending on the current lock
 level and its argument (the new lock level).  The rules here are
 that false is returned immediately if the current lock level is
 remove and the new lock level is weak.  The function blocks
 for all other combinations of the two.
 
 There are two functions supposed to use device hotplug lock levels
 other than normal: store_online() and acpi_scan_hot_remove().
 Everybody else is supposed to use normal (well, there are more
 potential users of weak in drivers/base/memory.c).
 
 acpi_scan_hot_remove() uses the remove lock level to indicate
 that it is going to remove devices while holding device hotplug
 locked.  In turn, store_online() uses the weak lock level so
 that it doesn't block when devices are being removed with device
 hotplug locked, because that may lead to a deadlock.
 
 show_online() actually doesn't need to lock device hotplug, but
 it is useful to serialize it with respect to device_offline()
 and device_online() (in case user space attempts to run them
 concurrently).

Yeah. I tested this one on latest kernel tree, it does make the splat go away.
Looking forward to the ACPI part one.:)

Regards,
Gu

 
 ---
  drivers/acpi/scan.c|4 +-
  drivers/base/core.c|   72 
 ++---
  include/linux/device.h |   25 -
  3 files changed, 83 insertions(+), 18 deletions(-)
 
 Index: linux-pm/drivers/base/core.c
 ===
 --- linux-pm.orig/drivers/base/core.c
 +++ linux-pm/drivers/base/core.c
 @@ -49,6 +49,55 @@ static struct kobject *dev_kobj;
  struct kobject *sysfs_dev_char_kobj;
  struct kobject *sysfs_dev_block_kobj;
  
 +static struct {
 + struct task_struct *holder;
 + enum dev_hotplug_lock_type type;
 + struct mutex lock; /* Synchronizes accesses to holder and type */
 + wait_queue_head_t wait_queue;
 +} device_hotplug = {
 + .holder = NULL,
 + .type = DEV_HOTPLUG_LOCK_NONE,
 + .lock = __MUTEX_INITIALIZER(device_hotplug.lock),
 + .wait_queue = __WAIT_QUEUE_HEAD_INITIALIZER(device_hotplug.wait_queue),
 +};
 +
 +bool __lock_device_hotplug(enum dev_hotplug_lock_type type)
 +{
 + DEFINE_WAIT(wait);
 + bool ret = true;
 +
 + mutex_lock(device_hotplug.lock);
 + for (;;) {
 + prepare_to_wait(device_hotplug.wait_queue, wait,
 + TASK_UNINTERRUPTIBLE);
 + if (!device_hotplug.holder) {
 + device_hotplug.holder = current;
 + device_hotplug.type = type;
 + break;
 + } else if (type == DEV_HOTPLUG_LOCK_WEAK
 +  device_hotplug.type == DEV_HOTPLUG_LOCK_REMOVE) {
 + ret = false;
 + break;
 + }
 + mutex_unlock(device_hotplug.lock);
 + schedule();
 + mutex_lock(device_hotplug.lock);
 + }
 + finish_wait(device_hotplug.wait_queue, wait);
 + mutex_unlock(device_hotplug.lock);
 + return ret;
 +}
 +
 +void unlock_device_hotplug(void)
 +{
 + mutex_lock(device_hotplug.lock);
 + BUG_ON(device_hotplug.holder != current);
 + device_hotplug.holder = NULL;
 + device_hotplug.type = DEV_HOTPLUG_LOCK_NONE;
 + wake_up(device_hotplug.wait_queue);
 + mutex_unlock(device_hotplug.lock);
 +}
 +
  #ifdef CONFIG_BLOCK
  static inline int device_is_not_partition(struct device *dev)
  {
 @@ -408,9 +457,10 @@ static ssize_t show_online(struct device
  {
   bool val;
  
 - lock_device_hotplug();
 + /* Serialize against device_online() and device_offline(). */
 + device_lock(dev);
   val = !dev-offline;
 - unlock_device_hotplug();
 + device_unlock(dev);
   return sprintf(buf, %u\n, val);
  }
  
 @@ -424,7 +474,11 @@ static ssize_t store_online(struct devic
   if (ret  0)
   return ret;
  
 - lock_device_hotplug();
 + if (!__lock_device_hotplug(DEV_HOTPLUG_LOCK_WEAK)) {
 + /* Avoid

[PATCH] f2fs: fix a compound statement label error

2013-08-18 Thread Gu Zheng

From 685b72b66cb8ce019429b1958c91f346b260bc65 Mon Sep 17 00:00:00 2001
From: Gu Zheng guz.f...@cn.fujitsu.com
Date: Mon, 19 Aug 2013 09:41:15 +0800
Subject: [PATCH] f2fs: fix a compound statement label error
An error label at end of compound statement will occur if CONFIG_F2FS_STAT_FS
disabled.
fs/f2fs/segment.c:556:1: error: label at end of compound statement
So clean up the 'out' label to fix it.

Reported-by: Fengguang Wu fengguang...@intel.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/segment.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9c45b8e..09af9c7 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -540,12 +540,9 @@ static void allocate_segment_by_default(struct 
f2fs_sb_info *sbi,
 {
struct curseg_info *curseg = CURSEG_I(sbi, type);
 
-   if (force) {
+   if (force)
new_curseg(sbi, type, true);
-   goto out;
-   }
-
-   if (type == CURSEG_WARM_NODE)
+   else if (type == CURSEG_WARM_NODE)
new_curseg(sbi, type, false);
else if (curseg-alloc_type == LFS  is_next_segment_free(sbi, type))
new_curseg(sbi, type, false);
@@ -553,7 +550,6 @@ static void allocate_segment_by_default(struct f2fs_sb_info 
*sbi,
change_curseg(sbi, type, true);
else
new_curseg(sbi, type, false);
-out:
 #ifdef CONFIG_F2FS_STAT_FS
sbi-segment_count[curseg-alloc_type]++;
 #endif
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs: use strncasecmp() simplify the string comparison

2013-08-22 Thread Gu Zheng

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/namei.c |   12 +---
 1 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 4e47518..106c0b4 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -83,21 +83,11 @@ static int is_multimedia_file(const unsigned char *s, const 
char *sub)
 {
size_t slen = strlen(s);
size_t sublen = strlen(sub);
-   int ret;
 
if (sublen  slen)
return 0;
 
-   ret = memcmp(s + slen - sublen, sub, sublen);
-   if (ret) {  /* compare upper case */
-   int i;
-   char upper_sub[8];
-   for (i = 0; i  sublen  i  sizeof(upper_sub); i++)
-   upper_sub[i] = toupper(sub[i]);
-   return !memcmp(s + slen - sublen, upper_sub, sublen);
-   }
-
-   return !ret;
+   return !strncasecmp(s + slen - sublen, sub, sublen);
 }
 
 /*
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()

2013-07-09 Thread Gu Zheng

On 07/10/2013 06:11 AM, Joel Becker wrote:

 On Mon, Jul 08, 2013 at 03:52:53PM +0800, Gu Zheng wrote:
 Add the missing NULL check of the return value of find_or_create_page() in
 function ocfs2_duplicate_clusters_by_page().

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/ocfs2/refcounttree.c |6 +-
  1 files changed, 5 insertions(+), 1 deletions(-)

 diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
 index 998b17e..456d0e4 100644
 --- a/fs/ocfs2/refcounttree.c
 +++ b/fs/ocfs2/refcounttree.c
 @@ -2965,7 +2965,11 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle,
  to = map_end  (PAGE_CACHE_SIZE - 1);

  page = find_or_create_page(mapping, page_index, GFP_NOFS);
 -
 +if (!page) {
 +ret = -ENOMEM;
 +mlog_errno(ret);
 +break;
 +}
  /*
   * In case PAGE_CACHE_SIZE = CLUSTER_SIZE, This page
   * can't be dirtied before we CoW it out.
 
 Put a blank line between the closing brace and the comment.  Otherwise,

Got it.:)

 Acked-by: Joel Becker jl...@evilplan.org

Thanks~

Regards,
Gu

 
 Joel
 -- 
 1.7.7

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] fs/anon_inode: Introduce a new lib function, anon_inode_getfile_private()

2013-07-11 Thread Gu Zheng

ping...


On 07/08/2013 06:38 PM, Gu Zheng wrote:

 Introduce a new lib function anon_inode_getfile_private(), it creates a new 
 file
 instance by hooking it up to an anonymous inode, and a dentry that describe 
 the
 class of the file, similar to anon_inode_getfile(), but each file holds a
 single inode. Furthermore, anyone who wants to create a private anon file will
 benefit from this change.
 
 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 Signed-off-by: Benjamin LaHaise b...@kvack.org
 ---
  fs/anon_inodes.c|   66 
 +++
  include/linux/anon_inodes.h |3 ++
  2 files changed, 69 insertions(+), 0 deletions(-)
 
 diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
 index 47a65df..85c9618 100644
 --- a/fs/anon_inodes.c
 +++ b/fs/anon_inodes.c
 @@ -109,6 +109,72 @@ static struct file_system_type anon_inode_fs_type = {
  };
 
  /**
 + * anon_inode_getfile_private - creates a new file instance by hooking it up 
 to an
 + *  anonymous inode, and a dentry that describe the 
 class
 + *  of the file
 + *
 + * @name:[in]name of the class of the new file
 + * @fops:[in]file operations for the new file
 + * @priv:[in]private data for the new file (will be file's 
 private_data)
 + * @flags:   [in]flags
 + *
 + *
 + * Similar to anon_inode_getfile, but each file holds a single inode.
 + *
 + */
 +struct file *anon_inode_getfile_private(const char *name,
 + const struct file_operations *fops,
 + void *priv, int flags)
 +{
 + struct qstr this;
 + struct path path;
 + struct file *file;
 + struct inode *inode;
 +
 + if (fops-owner  !try_module_get(fops-owner))
 + return ERR_PTR(-ENOENT);
 +
 + inode = anon_inode_mkinode(anon_inode_mnt-mnt_sb);
 + if (IS_ERR(inode)) {
 + file = ERR_PTR(-ENOMEM);
 + goto err_module;
 + }
 +
 + /*
 +  * Link the inode to a directory entry by creating a unique name
 +  * using the inode sequence number.
 +  */
 + file = ERR_PTR(-ENOMEM);
 + this.name = name;
 + this.len = strlen(name);
 + this.hash = 0;
 + path.dentry = d_alloc_pseudo(anon_inode_mnt-mnt_sb, this);
 + if (!path.dentry)
 + goto err_module;
 +
 + path.mnt = mntget(anon_inode_mnt);
 +
 + d_instantiate(path.dentry, inode);
 +
 + file = alloc_file(path, OPEN_FMODE(flags), fops);
 + if (IS_ERR(file))
 + goto err_dput;
 +
 + file-f_mapping = inode-i_mapping;
 + file-f_flags = flags  (O_ACCMODE | O_NONBLOCK);
 + file-private_data = priv;
 +
 + return file;
 +
 +err_dput:
 + path_put(path);
 +err_module:
 + module_put(fops-owner);
 + return file;
 +}
 +EXPORT_SYMBOL_GPL(anon_inode_getfile_private);
 +
 +/**
   * anon_inode_getfile - creates a new file instance by hooking it up to an
   *  anonymous inode, and a dentry that describe the 
 class
   *  of the file
 diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
 index 8013a45..cf573c2 100644
 --- a/include/linux/anon_inodes.h
 +++ b/include/linux/anon_inodes.h
 @@ -13,6 +13,9 @@ struct file_operations;
  struct file *anon_inode_getfile(const char *name,
   const struct file_operations *fops,
   void *priv, int flags);
 +struct file *anon_inode_getfile_private(const char *name,
 + const struct file_operations *fops,
 + void *priv, int flags);
  int anon_inode_getfd(const char *name, const struct file_operations *fops,
void *priv, int flags);
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Add support to aio ring pages migration

2013-07-11 Thread Gu Zheng

ping...

On 07/08/2013 06:38 PM, Gu Zheng wrote:

 Currently aio ring pages use get_user_pages() to allocate pages from movable
 zone,as discussed in thread https://lkml.org/lkml/2012/11/29/69, it is easy to
 pin user pages for a long time, which is fatal for memory hotplug/remove 
 framework.
 
 As Mel Gorman suggested, Implement a callback for migration to unpin pages,
 barrier operations until migration completes and pin the new pfns can soloved
 this issue. And the best palce to hold the callbacks is address space 
 operations
 which can be found via page-mapping.
 
 But the current aio ring pages are anonymous pages, they don't have
 address_space_operations, so we use an anon inode file as the aio ring file to
 manage the aio ring pages, so that we can implement the callback and register 
 it
 to page-mmapping-a_ops-migratepage.
 
 But there's a ploblem that all files created by anon_inode_getfile() share the
 same inode, so mutil aio context will share the same aio ring pages, it'll 
 lead
 to io events chaos. In order to solve this issus, we introduce a new fucntion
 anon_inode_getfile_private() which is samilar to anon_inode_getfile(), but 
 each
 new file has its own anon inode.
 
 This work is based on Benjamin's patch,
 http://www.spinics.net/lists/linux-fsdevel/msg66014.html
 
 Gu Zheng (2):
   fs/anon_inode: Introduce a new lib function
 anon_inode_getfile_private()
   fs/aio: Add support to aio ring pages migration
 
  fs/aio.c|  120 
 +++
  fs/anon_inodes.c|   66 +++
  include/linux/anon_inodes.h |3 +
  include/linux/migrate.h |3 +
  mm/migrate.c|2 +-
  5 files changed, 182 insertions(+), 12 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] fs/aio: Add support to aio ring pages migration

2013-07-11 Thread Gu Zheng

ping...

On 07/08/2013 06:38 PM, Gu Zheng wrote:

 As the aio job will pin the ring pages, that will lead to mem migrated
 failed. In order to fix this problem we use an anon inode to manage the aio 
 ring
 pages, and  setup the migratepage callback in the anon inode's address space, 
 so
 that when mem migrating the aio ring pages will be moved to other mem node 
 safely.
 
 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 Signed-off-by: Benjamin LaHaise b...@kvack.org
 ---
  fs/aio.c|  120 ++
  include/linux/migrate.h |3 +
  mm/migrate.c|2 +-
  3 files changed, 113 insertions(+), 12 deletions(-)
 
 diff --git a/fs/aio.c b/fs/aio.c
 index 9b5ca11..d10f956 100644
 --- a/fs/aio.c
 +++ b/fs/aio.c
 @@ -35,6 +35,9 @@
  #include linux/eventfd.h
  #include linux/blkdev.h
  #include linux/compat.h
 +#include linux/anon_inodes.h
 +#include linux/migrate.h
 +#include linux/ramfs.h
 
  #include asm/kmap_types.h
  #include asm/uaccess.h
 @@ -110,6 +113,7 @@ struct kioctx {
   } cacheline_aligned_in_smp;
 
   struct page *internal_pages[AIO_RING_PAGES];
 + struct file *aio_ring_file;
  };
 
  /*-- sysctl variables*/
 @@ -138,15 +142,78 @@ __initcall(aio_setup);
 
  static void aio_free_ring(struct kioctx *ctx)
  {
 - long i;
 + int i;
 + struct file *aio_ring_file = ctx-aio_ring_file;
 
 - for (i = 0; i  ctx-nr_pages; i++)
 + for (i = 0; i  ctx-nr_pages; i++) {
 + pr_debug(pid(%d) [%d] page-count=%d\n, current-pid, i,
 + page_count(ctx-ring_pages[i]));
   put_page(ctx-ring_pages[i]);
 + }
 
   if (ctx-ring_pages  ctx-ring_pages != ctx-internal_pages)
   kfree(ctx-ring_pages);
 +
 + if (aio_ring_file) {
 + truncate_setsize(aio_ring_file-f_inode, 0);
 + pr_debug(pid(%d) i_nlink=%u d_count=%d d_unhashed=%d 
 i_count=%d\n,
 + current-pid, aio_ring_file-f_inode-i_nlink,
 + aio_ring_file-f_path.dentry-d_count,
 + d_unhashed(aio_ring_file-f_path.dentry),
 + atomic_read(aio_ring_file-f_inode-i_count));
 + fput(aio_ring_file);
 + ctx-aio_ring_file = NULL;
 + }
 +}
 +
 +static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 + vma-vm_ops = generic_file_vm_ops;
 + return 0;
 +}
 +
 +static const struct file_operations aio_ring_fops = {
 + .mmap = aio_ring_mmap,
 +};
 +
 +static int aio_set_page_dirty(struct page *page)
 +{
 + return 0;
  }
 
 +static int aio_migratepage(struct address_space *mapping, struct page *new,
 + struct page *old, enum migrate_mode mode)
 +{
 + struct kioctx *ctx = mapping-private_data;
 + unsigned long flags;
 + unsigned idx = old-index;
 + int rc;
 +
 + /*Writeback must be complete*/
 + BUG_ON(PageWriteback(old));
 + put_page(old);
 +
 + rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
 + if (rc != MIGRATEPAGE_SUCCESS) {
 + get_page(old);
 + return rc;
 + }
 +
 + get_page(new);
 +
 + spin_lock_irqsave(ctx-completion_lock, flags);
 + migrate_page_copy(new, old);
 + ctx-ring_pages[idx] = new;
 + spin_unlock_irqrestore(ctx-completion_lock, flags);
 +
 + return rc;
 +}
 +
 +static const struct address_space_operations aio_ctx_aops = {
 + .set_page_dirty = aio_set_page_dirty,
 + .migratepage= aio_migratepage,
 +};
 +
  static int aio_setup_ring(struct kioctx *ctx)
  {
   struct aio_ring *ring;
 @@ -154,20 +221,45 @@ static int aio_setup_ring(struct kioctx *ctx)
   struct mm_struct *mm = current-mm;
   unsigned long size, populate;
   int nr_pages;
 + int i;
 + struct file *file;
 
   /* Compensate for the ring buffer's head/tail overlap entry */
   nr_events += 2; /* 1 is required, 2 for good luck */
 
   size = sizeof(struct aio_ring);
   size += sizeof(struct io_event) * nr_events;
 - nr_pages = (size + PAGE_SIZE-1)  PAGE_SHIFT;
 
 + nr_pages = (size + PAGE_SIZE-1)  PAGE_SHIFT;
   if (nr_pages  0)
   return -EINVAL;
 
 - nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / 
 sizeof(struct
 io_event);
 + file = anon_inode_getfile_private([aio], aio_ring_fops, ctx, O_RDWR);
 + if (IS_ERR(file)) {
 + ctx-aio_ring_file = NULL;
 + return -EAGAIN;
 + }
 +
 + file-f_inode-i_mapping-a_ops = aio_ctx_aops;
 + file-f_inode-i_mapping-private_data = ctx;
 + file-f_inode-i_size = PAGE_SIZE * (loff_t)nr_pages;
 +
 + for (i = 0; i  nr_pages; i++) {
 + struct page *page;
 + page = find_or_create_page(file-f_inode-i_mapping,
 +i, GFP_HIGHUSER | __GFP_ZERO);
 + if (!page)
 + break

[PATCH] f2fs: introduce help function F2FS_NODE()

2013-07-15 Thread Gu Zheng

Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/data.c |2 +-
 fs/f2fs/dir.c  |2 +-
 fs/f2fs/f2fs.h |9 +++--
 fs/f2fs/file.c |2 +-
 fs/f2fs/inode.c|4 ++--
 fs/f2fs/node.c |   10 +-
 fs/f2fs/node.h |   40 
 fs/f2fs/recovery.c |6 ++
 8 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 035f9a3..c73c394 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -39,7 +39,7 @@ static void __set_data_blkaddr(struct dnode_of_data *dn,
block_t new_addr)

wait_on_page_writeback(node_page);

-   rn = (struct f2fs_node *)page_address(node_page);
+   rn = F2FS_NODE(node_page);

/* Get physical address of data block */
addr_array = blkaddr_in_node(rn);
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 62f0d59..89ecb37 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -270,7 +270,7 @@ static void init_dent_inode(const struct qstr *name, struct
page *ipage)
struct f2fs_node *rn;

/* copy name info. to this inode page */
-   rn = (struct f2fs_node *)page_address(ipage);
+   rn = F2FS_NODE(ipage);
rn-i.i_namelen = cpu_to_le32(name-len);
memcpy(rn-i.i_name, name-name, name-len);
set_page_dirty(ipage);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c7620b9..ffa34f4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -455,6 +455,11 @@ static inline struct f2fs_checkpoint *F2FS_CKPT(struct
f2fs_sb_info *sbi)
return (struct f2fs_checkpoint *)(sbi-ckpt);
 }

+static inline struct f2fs_node *F2FS_NODE(struct page *page)
+{
+   return (struct f2fs_node *)page_address(page);
+}
+
 static inline struct f2fs_nm_info *NM_I(struct f2fs_sb_info *sbi)
 {
return (struct f2fs_nm_info *)(sbi-nm_info);
@@ -813,7 +818,7 @@ static inline struct kmem_cache
*f2fs_kmem_cache_create(const char *name,

 static inline bool IS_INODE(struct page *page)
 {
-   struct f2fs_node *p = (struct f2fs_node *)page_address(page);
+   struct f2fs_node *p = F2FS_NODE(page);
return RAW_IS_INODE(p);
 }

@@ -827,7 +832,7 @@ static inline block_t datablock_addr(struct page *node_page,
 {
struct f2fs_node *raw_node;
__le32 *addr_array;
-   raw_node = (struct f2fs_node *)page_address(node_page);
+   raw_node = F2FS_NODE(node_page);
addr_array = blkaddr_in_node(raw_node);
return le32_to_cpu(addr_array[offset]);
 }
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 157a635..65ca3b3 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -206,7 +206,7 @@ int truncate_data_blocks_range(struct dnode_of_data *dn, int
count)
struct f2fs_node *raw_node;
__le32 *addr;

-   raw_node = page_address(dn-node_page);
+   raw_node = F2FS_NODE(dn-node_page);
addr = blkaddr_in_node(raw_node) + ofs;

for ( ; count  0; count--, addr++, dn-ofs_in_node++) {
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 2b2d45d1..debf743 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -56,7 +56,7 @@ static int do_read_inode(struct inode *inode)
if (IS_ERR(node_page))
return PTR_ERR(node_page);

-   rn = page_address(node_page);
+   rn = F2FS_NODE(node_page);
ri = (rn-i);

inode-i_mode = le16_to_cpu(ri-i_mode);
@@ -153,7 +153,7 @@ void update_inode(struct inode *inode, struct page 
*node_page)

wait_on_page_writeback(node_page);

-   rn = page_address(node_page);
+   rn = F2FS_NODE(node_page);
ri = (rn-i);

ri-i_mode = cpu_to_le16(inode-i_mode);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index b418aee..f5172e2 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -565,7 +565,7 @@ static int truncate_nodes(struct dnode_of_data *dn, unsigned
int nofs,
return PTR_ERR(page);
}

-   rn = (struct f2fs_node *)page_address(page);
+   rn = F2FS_NODE(page);
if (depth  3) {
for (i = ofs; i  NIDS_PER_BLOCK; i++, freed++) {
child_nid = le32_to_cpu(rn-in.nid[i]);
@@ -698,7 +698,7 @@ restart:
set_new_dnode(dn, inode, page, NULL, 0);
unlock_page(page);

-   rn = page_address(page);
+   rn = F2FS_NODE(page);
switch (level) {
case 0:
case 1:
@@ -1484,8 +1484,8 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct
page *page)
SetPageUptodate(ipage);
fill_node_footer(ipage, ino, ino, 0, true);

-   src = (struct f2fs_node *)page_address(page);
-   dst = (struct f2fs_node *)page_address(ipage);
+   src = F2FS_NODE(page);
+   dst = F2FS_NODE(ipage);

memcpy(dst, src, (unsigned long)src-i.i_ext - (unsigned long)src-i);
dst-i.i_size = 0;
@@ -1535,7 +1535,7 @@ int restore_node_summary(struct f2fs_sb_info *sbi

[PATCH] fs/f2fs: Code cleanup and simplify in func {find/add}_gc_inode

2013-06-20 Thread Gu Zheng


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/gc.c |   17 +
 1 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 1496159..0b8b439 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -314,28 +314,21 @@ static const struct victim_selection default_v_ops = {

 static struct inode *find_gc_inode(nid_t ino, struct list_head *ilist)
 {
-   struct list_head *this;
struct inode_entry *ie;

-   list_for_each(this, ilist) {
-   ie = list_entry(this, struct inode_entry, list);
+   list_for_each_entry(ie, ilist, list)
if (ie-inode-i_ino == ino)
return ie-inode;
-   }
return NULL;
 }

 static void add_gc_inode(struct inode *inode, struct list_head *ilist)
 {
-   struct list_head *this;
-   struct inode_entry *new_ie, *ie;
+   struct inode_entry *new_ie;

-   list_for_each(this, ilist) {
-   ie = list_entry(this, struct inode_entry, list);
-   if (ie-inode == inode) {
-   iput(inode);
-   return;
-   }
+   if (inode == find_gc_inode(inode-i_ino, ilist)) {
+   iput(inode);
+   return;
}
 repeat:
new_ie = kmem_cache_alloc(winode_slab, GFP_NOFS);
-- 
1.7.7
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs/jffs2: remove the unused paramters of function jffs2_{compress,decompress}

2013-07-16 Thread Gu Zheng

Remove the unused paramters of function jffs2_{compress,decompress}.


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/jffs2/compr.c |   12 ++--
 fs/jffs2/compr.h |   12 ++--
 fs/jffs2/gc.c|2 +-
 fs/jffs2/read.c  |2 +-
 fs/jffs2/write.c |2 +-
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/jffs2/compr.c b/fs/jffs2/compr.c
index 4849a4c..6fcb426 100644
--- a/fs/jffs2/compr.c
+++ b/fs/jffs2/compr.c
@@ -145,9 +145,9 @@ static int jffs2_selected_compress(u8 compr, unsigned char
*data_in,
  * jffs2_compress should compress as much as will fit, and should set
  * *datalen accordingly to show the amount of data which were compressed.
  */
-uint16_t jffs2_compress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-   unsigned char *data_in, unsigned char **cpage_out,
-   uint32_t *datalen, uint32_t *cdatalen)
+uint16_t jffs2_compress(struct jffs2_sb_info *c, unsigned char *data_in,
+   unsigned char **cpage_out, uint32_t *datalen,
+   uint32_t *cdatalen)
 {
int ret = JFFS2_COMPR_NONE;
int mode, compr_ret;
@@ -250,9 +250,9 @@ uint16_t jffs2_compress(struct jffs2_sb_info *c, struct
jffs2_inode_info *f,
return ret;
 }

-int jffs2_decompress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-uint16_t comprtype, unsigned char *cdata_in,
-unsigned char *data_out, uint32_t cdatalen, uint32_t 
datalen)
+int jffs2_decompress(uint16_t comprtype, unsigned char *cdata_in,
+unsigned char *data_out, uint32_t cdatalen,
+uint32_t datalen)
 {
struct jffs2_compressor *this;
int ret;
diff --git a/fs/jffs2/compr.h b/fs/jffs2/compr.h
index 5e91d57..092089a 100644
--- a/fs/jffs2/compr.h
+++ b/fs/jffs2/compr.h
@@ -70,13 +70,13 @@ int jffs2_unregister_compressor(struct jffs2_compressor 
*comp);
 int jffs2_compressors_init(void);
 int jffs2_compressors_exit(void);

-uint16_t jffs2_compress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-   unsigned char *data_in, unsigned char **cpage_out,
-   uint32_t *datalen, uint32_t *cdatalen);
+uint16_t jffs2_compress(struct jffs2_sb_info *c, unsigned char *data_in,
+   unsigned char **cpage_out, uint32_t *datalen,
+   uint32_t *cdatalen);

-int jffs2_decompress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-uint16_t comprtype, unsigned char *cdata_in,
-unsigned char *data_out, uint32_t cdatalen, uint32_t 
datalen);
+int jffs2_decompress(uint16_t comprtype, unsigned char *cdata_in,
+unsigned char *data_out, uint32_t cdatalen,
+uint32_t datalen);

 void jffs2_free_comprbuf(unsigned char *comprbuf, unsigned char *orig);

diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 5a2dec2..8dc85aa 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1330,7 +1330,7 @@ static int jffs2_garbage_collect_dnode(struct
jffs2_sb_info *c, struct jffs2_era

writebuf = pg_ptr + (offset  (PAGE_CACHE_SIZE -1));

-   comprtype = jffs2_compress(c, f, writebuf, comprbuf, datalen, 
cdatalen);
+   comprtype = jffs2_compress(c, writebuf, comprbuf, datalen, 
cdatalen);

ri.magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);
ri.nodetype = cpu_to_je16(JFFS2_NODETYPE_INODE);
diff --git a/fs/jffs2/read.c b/fs/jffs2/read.c
index 0b042b1..6395f41 100644
--- a/fs/jffs2/read.c
+++ b/fs/jffs2/read.c
@@ -132,7 +132,7 @@ int jffs2_read_dnode(struct jffs2_sb_info *c, struct
jffs2_inode_info *f,
jffs2_dbg(2, Decompress %d bytes from %p to %d bytes at %p\n,
  je32_to_cpu(ri-csize), readbuf,
  je32_to_cpu(ri-dsize), decomprbuf);
-   ret = jffs2_decompress(c, f, ri-compr | (ri-usercompr  8), 
readbuf,
decomprbuf, je32_to_cpu(ri-csize), je32_to_cpu(ri-dsize));
+   ret = jffs2_decompress(ri-compr | (ri-usercompr  8), 
readbuf, decomprbuf,
je32_to_cpu(ri-csize), je32_to_cpu(ri-dsize));
if (ret) {
pr_warn(Error: jffs2_decompress returned %d\n, ret);
goto out_decomprbuf;
diff --git a/fs/jffs2/write.c b/fs/jffs2/write.c
index b634de4..dbc26de 100644
--- a/fs/jffs2/write.c
+++ b/fs/jffs2/write.c
@@ -369,7 +369,7 @@ int jffs2_write_inode_range(struct jffs2_sb_info *c, struct
jffs2_inode_info *f,
datalen = min_t(uint32_t, writelen, PAGE_CACHE_SIZE - (offset 
(PAGE_CACHE_SIZE-1)));
cdatalen = min_t(uint32_t, alloclen - sizeof(*ri), datalen);

-   comprtype = jffs2_compress(c, f, buf, comprbuf, datalen, 
cdatalen);
+   comprtype = jffs2_compress(c, buf, comprbuf, datalen, 
cdatalen);

ri-magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);
ri

[PATCH RESEND 0/2] Add support to aio ring pages migration

2013-07-16 Thread Gu Zheng

Currently aio ring pages use get_user_pages() to allocate pages from movable
zone,as discussed in thread https://lkml.org/lkml/2012/11/29/69, it is easy to
pin user pages for a long time, which is fatal for memory hotplug/remove 
framework.

As Mel Gorman suggested, Implement a callback for migration to unpin pages,
barrier operations until migration completes and pin the new pfns can soloved
this issue. And the best palce to hold the callbacks is address space operations
which can be found via page-mapping.

But the current aio ring pages are anonymous pages, they don't have
address_space_operations, so we use an anon inode file as the aio ring file to
manage the aio ring pages, so that we can implement the callback and register it
to page-mmapping-a_ops-migratepage.

But there's a ploblem that all files created by anon_inode_getfile() share the
same inode, so mutil aio context will share the same aio ring pages, it'll lead
to io events chaos. In order to solve this issus, we introduce a new fucntion
anon_inode_getfile_private() which is samilar to anon_inode_getfile(), but each
new file has its own anon inode.

This work is based on Benjamin's patch,
http://www.spinics.net/lists/linux-fsdevel/msg66014.html

Gu Zheng (2):
  fs/anon_inode: Introduce a new lib function anon_inode_getfile_private()
  fs/aio: Add support to aio ring pages migration

 fs/aio.c|  120 +++
 fs/anon_inodes.c|   66 +++
 include/linux/anon_inodes.h |3 +
 include/linux/migrate.h |3 +
 mm/migrate.c|2 +-
 5 files changed, 182 insertions(+), 12 deletions(-)

-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 2/2] fs/aio: Add support to aio ring pages migration

2013-07-16 Thread Gu Zheng

As the aio job will pin the ring pages, that will lead to mem migrated
failed. In order to fix this problem we use an anon inode to manage the aio ring
pages, and  setup the migratepage callback in the anon inode's address space, so
that when mem migrating the aio ring pages will be moved to other mem node 
safely.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Benjamin LaHaise b...@kvack.org
---
 fs/aio.c|  120 ++
 include/linux/migrate.h |3 +
 mm/migrate.c|2 +-
 3 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 9b5ca11..d10f956 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -35,6 +35,9 @@
 #include linux/eventfd.h
 #include linux/blkdev.h
 #include linux/compat.h
+#include linux/anon_inodes.h
+#include linux/migrate.h
+#include linux/ramfs.h
 
 #include asm/kmap_types.h
 #include asm/uaccess.h
@@ -110,6 +113,7 @@ struct kioctx {
} cacheline_aligned_in_smp;
 
struct page *internal_pages[AIO_RING_PAGES];
+   struct file *aio_ring_file;
 };
 
 /*-- sysctl variables*/
@@ -138,15 +142,78 @@ __initcall(aio_setup);
 
 static void aio_free_ring(struct kioctx *ctx)
 {
-   long i;
+   int i;
+   struct file *aio_ring_file = ctx-aio_ring_file;
 
-   for (i = 0; i  ctx-nr_pages; i++)
+   for (i = 0; i  ctx-nr_pages; i++) {
+   pr_debug(pid(%d) [%d] page-count=%d\n, current-pid, i,
+   page_count(ctx-ring_pages[i]));
put_page(ctx-ring_pages[i]);
+   }
 
if (ctx-ring_pages  ctx-ring_pages != ctx-internal_pages)
kfree(ctx-ring_pages);
+
+   if (aio_ring_file) {
+   truncate_setsize(aio_ring_file-f_inode, 0);
+   pr_debug(pid(%d) i_nlink=%u d_count=%d d_unhashed=%d 
i_count=%d\n,
+   current-pid, aio_ring_file-f_inode-i_nlink,
+   aio_ring_file-f_path.dentry-d_count,
+   d_unhashed(aio_ring_file-f_path.dentry),
+   atomic_read(aio_ring_file-f_inode-i_count));
+   fput(aio_ring_file);
+   ctx-aio_ring_file = NULL;
+   }
+}
+
+static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   vma-vm_ops = generic_file_vm_ops;
+   return 0;
+}
+
+static const struct file_operations aio_ring_fops = {
+   .mmap = aio_ring_mmap,
+};
+
+static int aio_set_page_dirty(struct page *page)
+{
+   return 0;
 }
 
+static int aio_migratepage(struct address_space *mapping, struct page *new,
+   struct page *old, enum migrate_mode mode)
+{
+   struct kioctx *ctx = mapping-private_data;
+   unsigned long flags;
+   unsigned idx = old-index;
+   int rc;
+
+   /*Writeback must be complete*/
+   BUG_ON(PageWriteback(old));
+   put_page(old);
+
+   rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
+   if (rc != MIGRATEPAGE_SUCCESS) {
+   get_page(old);
+   return rc;
+   }
+
+   get_page(new);
+
+   spin_lock_irqsave(ctx-completion_lock, flags);
+   migrate_page_copy(new, old);
+   ctx-ring_pages[idx] = new;
+   spin_unlock_irqrestore(ctx-completion_lock, flags);
+
+   return rc;
+}
+
+static const struct address_space_operations aio_ctx_aops = {
+   .set_page_dirty = aio_set_page_dirty,
+   .migratepage= aio_migratepage,
+};
+
 static int aio_setup_ring(struct kioctx *ctx)
 {
struct aio_ring *ring;
@@ -154,20 +221,45 @@ static int aio_setup_ring(struct kioctx *ctx)
struct mm_struct *mm = current-mm;
unsigned long size, populate;
int nr_pages;
+   int i;
+   struct file *file;
 
/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
 
size = sizeof(struct aio_ring);
size += sizeof(struct io_event) * nr_events;
-   nr_pages = (size + PAGE_SIZE-1)  PAGE_SHIFT;
 
+   nr_pages = (size + PAGE_SIZE-1)  PAGE_SHIFT;
if (nr_pages  0)
return -EINVAL;
 
-   nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / 
sizeof(struct io_event);
+   file = anon_inode_getfile_private([aio], aio_ring_fops, ctx, O_RDWR);
+   if (IS_ERR(file)) {
+   ctx-aio_ring_file = NULL;
+   return -EAGAIN;
+   }
+
+   file-f_inode-i_mapping-a_ops = aio_ctx_aops;
+   file-f_inode-i_mapping-private_data = ctx;
+   file-f_inode-i_size = PAGE_SIZE * (loff_t)nr_pages;
+
+   for (i = 0; i  nr_pages; i++) {
+   struct page *page;
+   page = find_or_create_page(file-f_inode-i_mapping,
+  i, GFP_HIGHUSER | __GFP_ZERO);
+   if (!page)
+   break;
+   pr_debug(pid(%d) page[%d

[PATCH RESEND 1/2] fs/anon_inode: Introduce a new lib function anon_inode_getfile_private()

2013-07-16 Thread Gu Zheng


Introduce a new lib function anon_inode_getfile_private(), it creates a new file
instance by hooking it up to an anonymous inode, and a dentry that describe the
class of the file, similar to anon_inode_getfile(), but each file holds a
single inode. Furthermore, anyone who wants to create a private anon file will
benefit from this change.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
Signed-off-by: Benjamin LaHaise b...@kvack.org
---
 fs/anon_inodes.c|   66 +++
 include/linux/anon_inodes.h |3 ++
 2 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 47a65df..85c9618 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -109,6 +109,72 @@ static struct file_system_type anon_inode_fs_type = {
 };
 
 /**
+ * anon_inode_getfile_private - creates a new file instance by hooking it up 
to an
+ *  anonymous inode, and a dentry that describe the class
+ *  of the file
+ *
+ * @name:[in]name of the class of the new file
+ * @fops:[in]file operations for the new file
+ * @priv:[in]private data for the new file (will be file's 
private_data)
+ * @flags:   [in]flags
+ *
+ *
+ * Similar to anon_inode_getfile, but each file holds a single inode.
+ *
+ */
+struct file *anon_inode_getfile_private(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags)
+{
+   struct qstr this;
+   struct path path;
+   struct file *file;
+   struct inode *inode;
+
+   if (fops-owner  !try_module_get(fops-owner))
+   return ERR_PTR(-ENOENT);
+
+   inode = anon_inode_mkinode(anon_inode_mnt-mnt_sb);
+   if (IS_ERR(inode)) {
+   file = ERR_PTR(-ENOMEM);
+   goto err_module;
+   }
+
+   /*
+* Link the inode to a directory entry by creating a unique name
+* using the inode sequence number.
+*/
+   file = ERR_PTR(-ENOMEM);
+   this.name = name;
+   this.len = strlen(name);
+   this.hash = 0;
+   path.dentry = d_alloc_pseudo(anon_inode_mnt-mnt_sb, this);
+   if (!path.dentry)
+   goto err_module;
+
+   path.mnt = mntget(anon_inode_mnt);
+
+   d_instantiate(path.dentry, inode);
+
+   file = alloc_file(path, OPEN_FMODE(flags), fops);
+   if (IS_ERR(file))
+   goto err_dput;
+
+   file-f_mapping = inode-i_mapping;
+   file-f_flags = flags  (O_ACCMODE | O_NONBLOCK);
+   file-private_data = priv;
+
+   return file;
+
+err_dput:
+   path_put(path);
+err_module:
+   module_put(fops-owner);
+   return file;
+}
+EXPORT_SYMBOL_GPL(anon_inode_getfile_private);
+
+/**
  * anon_inode_getfile - creates a new file instance by hooking it up to an
  *  anonymous inode, and a dentry that describe the class
  *  of the file
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index 8013a45..cf573c2 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -13,6 +13,9 @@ struct file_operations;
 struct file *anon_inode_getfile(const char *name,
const struct file_operations *fops,
void *priv, int flags);
+struct file *anon_inode_getfile_private(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags);
 int anon_inode_getfd(const char *name, const struct file_operations *fops,
 void *priv, int flags);
 
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/2] fs/anon_inode: Introduce a new lib function anon_inode_getfile_private()

2013-07-16 Thread Gu Zheng

Hi Ben,

On 07/16/2013 09:16 PM, Benjamin LaHaise wrote:

 On Tue, Jul 16, 2013 at 05:56:12PM +0800, Gu Zheng wrote:

 Introduce a new lib function anon_inode_getfile_private(), it creates a new 
 file
 instance by hooking it up to an anonymous inode, and a dentry that describe 
 the
 class of the file, similar to anon_inode_getfile(), but each file holds a
 single inode. Furthermore, anyone who wants to create a private anon file 
 will
 benefit from this change.

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 Signed-off-by: Benjamin LaHaise b...@kvack.org
 
 Please don't add my Signed-off-by when I have never even seen or reviewed 
 a patch -- that is completely unacceptable.  

Sorry for my reckless action, I'll remember your reminder.:)

 Second, I don't think this 
 patch is suitable for 3.11, as it has not seen much testing outside of one 
 test program I had written.  It's a long standing bug, so it isn't urgent 
 to get the fix into the tree.  That said, it did pass a few tests I ran 
 last night, so it is probably suitable for the -next tree.

Thanks for your test.:)

Regards,
Gu

 
 As for patch 1, it looks okay to me, but will need Al Viro's signoff.
 
   -ben


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 2/2] fs/aio: Add support to aio ring pages migration

2013-07-16 Thread Gu Zheng

Hi Ben,

On 07/16/2013 09:34 PM, Benjamin LaHaise wrote:

 On Tue, Jul 16, 2013 at 05:56:16PM +0800, Gu Zheng wrote:
 As the aio job will pin the ring pages, that will lead to mem migrated
 failed. In order to fix this problem we use an anon inode to manage the aio 
 ring
 pages, and  setup the migratepage callback in the anon inode's address 
 space, so
 that when mem migrating the aio ring pages will be moved to other mem node 
 safely.
 
 There are a few minor issues that needed to be fixed -- see below.  I've 
 made these changes and added them to git://git.kvack.org/~bcrl/aio-next.git ,
 and will ask for that tree to be included in linux-next.

Thanks very much, and your review.
Stephen sent out a build failed msg when merger this patch into next-tree from 
your aio_next.
This is because we use migrate_page_move_mapping() which is protected by 
CONFIG_MIGRATION, I'll
fix this issue in the next version.

Best regards,
Gu
  

 
 mm folks: can someone familiar with page migration / hot plug memory please 
 review the migration changes?
 

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 Signed-off-by: Benjamin LaHaise b...@kvack.org
 
 Again, I had not provided my Signed-off-by on this patch previously, so 
 don't add it for me.

Sorry again.:)

 
 ---
  fs/aio.c|  120 
 ++
  include/linux/migrate.h |3 +
  mm/migrate.c|2 +-
  3 files changed, 113 insertions(+), 12 deletions(-)

 diff --git a/fs/aio.c b/fs/aio.c
 index 9b5ca11..d10f956 100644
 --- a/fs/aio.c
 +++ b/fs/aio.c
 @@ -35,6 +35,9 @@
  #include linux/eventfd.h
  #include linux/blkdev.h
  #include linux/compat.h
 +#include linux/anon_inodes.h
 +#include linux/migrate.h
 +#include linux/ramfs.h
  
  #include asm/kmap_types.h
  #include asm/uaccess.h
 @@ -110,6 +113,7 @@ struct kioctx {
  } cacheline_aligned_in_smp;
  
  struct page *internal_pages[AIO_RING_PAGES];
 +struct file *aio_ring_file;
  };
  
  /*-- sysctl variables*/
 @@ -138,15 +142,78 @@ __initcall(aio_setup);
  
  static void aio_free_ring(struct kioctx *ctx)
  {
 -long i;
 +int i;
 +struct file *aio_ring_file = ctx-aio_ring_file;
  
 -for (i = 0; i  ctx-nr_pages; i++)
 +for (i = 0; i  ctx-nr_pages; i++) {
 +pr_debug(pid(%d) [%d] page-count=%d\n, current-pid, i,
 +page_count(ctx-ring_pages[i]));
  put_page(ctx-ring_pages[i]);
 +}
  
  if (ctx-ring_pages  ctx-ring_pages != ctx-internal_pages)
  kfree(ctx-ring_pages);
 +
 +if (aio_ring_file) {
 +truncate_setsize(aio_ring_file-f_inode, 0);
 +pr_debug(pid(%d) i_nlink=%u d_count=%d d_unhashed=%d 
 i_count=%d\n,
 +current-pid, aio_ring_file-f_inode-i_nlink,
 +aio_ring_file-f_path.dentry-d_count,
 +d_unhashed(aio_ring_file-f_path.dentry),
 +atomic_read(aio_ring_file-f_inode-i_count));
 +fput(aio_ring_file);
 +ctx-aio_ring_file = NULL;
 +}
 +}
 +
 +static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 +vma-vm_ops = generic_file_vm_ops;
 +return 0;
 +}
 +
 +static const struct file_operations aio_ring_fops = {
 +.mmap = aio_ring_mmap,
 +};
 +
 +static int aio_set_page_dirty(struct page *page)
 +{
 +return 0;
  }
  
 +static int aio_migratepage(struct address_space *mapping, struct page *new,
 +struct page *old, enum migrate_mode mode)
 +{
 +struct kioctx *ctx = mapping-private_data;
 +unsigned long flags;
 +unsigned idx = old-index;
 +int rc;
 +
 +/*Writeback must be complete*/
 
 Missing spaces before/after beginning and end of comment.

 
 +BUG_ON(PageWriteback(old));
 +put_page(old);
 +
 +rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
 +if (rc != MIGRATEPAGE_SUCCESS) {
 +get_page(old);
 +return rc;
 +}
 +
 +get_page(new);
 +
 +spin_lock_irqsave(ctx-completion_lock, flags);
 +migrate_page_copy(new, old);
 +ctx-ring_pages[idx] = new;
 +spin_unlock_irqrestore(ctx-completion_lock, flags);
 +
 +return rc;
 +}
 +
 +static const struct address_space_operations aio_ctx_aops = {
 +.set_page_dirty = aio_set_page_dirty,
 +.migratepage= aio_migratepage,
 +};
 +
  static int aio_setup_ring(struct kioctx *ctx)
  {
  struct aio_ring *ring;
 @@ -154,20 +221,45 @@ static int aio_setup_ring(struct kioctx *ctx)
  struct mm_struct *mm = current-mm;
  unsigned long size, populate;
  int nr_pages;
 +int i;
 +struct file *file;
  
  /* Compensate for the ring buffer's head/tail overlap entry */
  nr_events += 2; /* 1 is required, 2 for good luck */
  
  size = sizeof(struct aio_ring);
  size += sizeof(struct io_event) * nr_events;
 -nr_pages = (size + PAGE_SIZE-1)  PAGE_SHIFT;
  
 +nr_pages

[PATCH V2 2/2] fs/aio: Add support to aio ring pages migration

2013-07-17 Thread Gu Zheng

As the aio job will pin the ring pages, that will lead to mem migrated
failed. In order to fix this problem we use an anon inode to manage the aio ring
pages, and  setup the migratepage callback in the anon inode's address space, so
that when mem migrating the aio ring pages will be moved to other mem node 
safely.

v1-v2:
Fix build failed issue if CONFIG_MIGRATION disabled.
Fix some minor issues under Benjamin's comments.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/aio.c|  116 +++
 include/linux/migrate.h |9 
 mm/migrate.c|2 +-
 3 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2bbcacf..15e8a13 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -35,6 +35,9 @@
 #include linux/eventfd.h
 #include linux/blkdev.h
 #include linux/compat.h
+#include linux/anon_inodes.h
+#include linux/migrate.h
+#include linux/ramfs.h
 
 #include asm/kmap_types.h
 #include asm/uaccess.h
@@ -108,6 +111,7 @@ struct kioctx {
} cacheline_aligned_in_smp;
 
struct page *internal_pages[AIO_RING_PAGES];
+   struct file *aio_ring_file;
 };
 
 /*-- sysctl variables*/
@@ -136,15 +140,78 @@ __initcall(aio_setup);
 
 static void aio_free_ring(struct kioctx *ctx)
 {
-   long i;
-
-   for (i = 0; i  ctx-nr_pages; i++)
+   int i;
+   struct file *aio_ring_file = ctx-aio_ring_file;
+   for (i = 0; i  ctx-nr_pages; i++) {
+   pr_debug(pid(%d) [%d] page-count=%d\n, current-pid, i,
+   page_count(ctx-ring_pages[i]));
put_page(ctx-ring_pages[i]);
+   }
 
if (ctx-ring_pages  ctx-ring_pages != ctx-internal_pages)
kfree(ctx-ring_pages);
+
+   if (aio_ring_file) {
+   truncate_setsize(aio_ring_file-f_inode, 0);
+   pr_debug(pid(%d) i_nlink=%u d_count=%d d_unhashed=%d 
i_count=%d\n,
+   current-pid, aio_ring_file-f_inode-i_nlink,
+   aio_ring_file-f_path.dentry-d_count,
+   d_unhashed(aio_ring_file-f_path.dentry),
+   atomic_read(aio_ring_file-f_inode-i_count));
+   fput(aio_ring_file);
+   ctx-aio_ring_file = NULL;
+   }
+}
+
+static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   vma-vm_ops = generic_file_vm_ops;
+   return 0;
+}
+
+static const struct file_operations aio_ring_fops = {
+   .mmap = aio_ring_mmap,
+};
+
+static int aio_set_page_dirty(struct page *page)
+{
+   return 0;
 }
 
+static int aio_migratepage(struct address_space *mapping, struct page *new,
+   struct page *old, enum migrate_mode mode)
+{
+   struct kioctx *ctx = mapping-private_data;
+   unsigned long flags;
+   unsigned idx = old-index;
+   int rc;
+
+   /* Writeback must be complete */
+   BUG_ON(PageWriteback(old));
+
+   put_page(old);
+
+   rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
+   if (rc != MIGRATEPAGE_SUCCESS) {
+   get_page(old);
+   return rc;
+   }
+
+   get_page(new);
+
+   spin_lock_irqsave(ctx-completion_lock, flags);
+   migrate_page_copy(new, old);
+   ctx-ring_pages[idx] = new;
+   spin_unlock_irqrestore(ctx-completion_lock, flags);
+
+   return rc;
+}
+
+static const struct address_space_operations aio_ctx_aops = {
+   .set_page_dirty = aio_set_page_dirty,
+   .migratepage= aio_migratepage,
+};
+
 static int aio_setup_ring(struct kioctx *ctx)
 {
struct aio_ring *ring;
@@ -152,18 +219,42 @@ static int aio_setup_ring(struct kioctx *ctx)
struct mm_struct *mm = current-mm;
unsigned long size, populate;
int nr_pages;
+   int i;
+   struct file *file;
 
/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
 
size = sizeof(struct aio_ring);
size += sizeof(struct io_event) * nr_events;
-   nr_pages = (size + PAGE_SIZE-1)  PAGE_SHIFT;
+   nr_pages = PFN_UP(size);
 
if (nr_pages  0)
return -EINVAL;
+   file = anon_inode_getfile_private([aio], aio_ring_fops, ctx, O_RDWR);
+   if (IS_ERR(file)) {
+   ctx-aio_ring_file = NULL;
+   return -EAGAIN;
+   }
+   file-f_inode-i_mapping-a_ops = aio_ctx_aops;
+   file-f_inode-i_mapping-private_data = ctx;
+   file-f_inode-i_size = PAGE_SIZE * (loff_t)nr_pages;
 
-   nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / 
sizeof(struct io_event);
+   for (i = 0; i  nr_pages; i++) {
+   struct page *page;
+   page = find_or_create_page(file-f_inode-i_mapping,
+  i, GFP_HIGHUSER | __GFP_ZERO);
+   if (!page

Re: [PATCH V2 2/2] fs/aio: Add support to aio ring pages migration

2013-07-17 Thread Gu Zheng

Hi Ben,

On 07/17/2013 09:44 PM, Benjamin LaHaise wrote:

 On Wed, Jul 17, 2013 at 05:22:30PM +0800, Gu Zheng wrote:
 As the aio job will pin the ring pages, that will lead to mem migrated
 failed. In order to fix this problem we use an anon inode to manage the aio 
 ring
 pages, and  setup the migratepage callback in the anon inode's address 
 space, so
 that when mem migrating the aio ring pages will be moved to other mem node 
 safely.

 v1-v2:
  Fix build failed issue if CONFIG_MIGRATION disabled.
  Fix some minor issues under Benjamin's comments.
 
 I don't know what you did with this patch, but it doesn't apply to any of 
 the trees I can find, and interdiff isn't able to compare it against your 
 original patch.  Since the first version of the patch was already applied 
 it is generally more appropriate to provide an incremental fix.  I've 
 added the following to my tree (git://git.kvack.org/~bcrl/aio-next.git/) 
 to fix the build issue.  I've tested this with CONFIG_MIGRATION enabled 
 and disabled on x86.

My patch is applied on 3.10 release. I'm sorry that my working department is
forbidden to access all the urls based on git protocol, so I can not make patch 
on
your aio_next. Does aio_next have trees based on http/https protocol?

Your fix looks very well.
IMHO, because we *extern* the migrate_page_move_mapping(), so we have
the duty to make sure it can work well all the place. If some one later use 
migrate_page_move_mapping() with out the protection of CONFIG_MIGRATION,
it will lead to build-fail if CONFIG_MIGRATION is disable. So I think the
following change(return ENOSYS error is CONFIG_MIGRATION disabled) is still 
needed.

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c407d88..3d0a486 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -88,6 +88,13 @@ static inline int migrate_huge_page_move_mapping(struct 
address_space *mapping,
return -ENOSYS;
 }
 
+static inline int migrate_page_move_mapping(struct address_space *mapping,
+   struct page *newpage, struct page *page,
+   struct buffer_head *head, enum migrate_mode mode)
+{
+   return -ENOSYS;
+}
+
 /* Possible settings for the migrate_page() method in address_operations */
 #define migrate_page NULL
 #define fail_migrate_page NULL



Best regards,
Gu

 
   -ben


diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c407d88..3d0a486 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -88,6 +88,13 @@ static inline int migrate_huge_page_move_mapping(struct 
address_space *mapping,
return -ENOSYS;
 }
 
+static inline int migrate_page_move_mapping(struct address_space *mapping,
+   struct page *newpage, struct page *page,
+   struct buffer_head *head, enum migrate_mode mode)
+{
+   return -ENOSYS;
+}
+
 /* Possible settings for the migrate_page() method in address_operations */
 #define migrate_page NULL
 #define fail_migrate_page NULL

[PATCH RESEND] fs/jffs2: remove the unused paramters of function jffs2_{compress,decompress}

2013-07-19 Thread Gu Zheng

Remove the unused paramters of function jffs2_{compress,decompress}.


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/jffs2/compr.c |   12 ++--
 fs/jffs2/compr.h |   12 ++--
 fs/jffs2/gc.c|2 +-
 fs/jffs2/read.c  |4 +++-
 fs/jffs2/write.c |2 +-
 5 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fs/jffs2/compr.c b/fs/jffs2/compr.c
index 4849a4c..6fcb426 100644
--- a/fs/jffs2/compr.c
+++ b/fs/jffs2/compr.c
@@ -145,9 +145,9 @@ static int jffs2_selected_compress(u8 compr, unsigned char 
*data_in,
  * jffs2_compress should compress as much as will fit, and should set
  * *datalen accordingly to show the amount of data which were compressed.
  */
-uint16_t jffs2_compress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-   unsigned char *data_in, unsigned char **cpage_out,
-   uint32_t *datalen, uint32_t *cdatalen)
+uint16_t jffs2_compress(struct jffs2_sb_info *c, unsigned char *data_in,
+   unsigned char **cpage_out, uint32_t *datalen,
+   uint32_t *cdatalen)
 {
int ret = JFFS2_COMPR_NONE;
int mode, compr_ret;
@@ -250,9 +250,9 @@ uint16_t jffs2_compress(struct jffs2_sb_info *c, struct 
jffs2_inode_info *f,
return ret;
 }
 
-int jffs2_decompress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-uint16_t comprtype, unsigned char *cdata_in,
-unsigned char *data_out, uint32_t cdatalen, uint32_t 
datalen)
+int jffs2_decompress(uint16_t comprtype, unsigned char *cdata_in,
+unsigned char *data_out, uint32_t cdatalen,
+uint32_t datalen)
 {
struct jffs2_compressor *this;
int ret;
diff --git a/fs/jffs2/compr.h b/fs/jffs2/compr.h
index 5e91d57..092089a 100644
--- a/fs/jffs2/compr.h
+++ b/fs/jffs2/compr.h
@@ -70,13 +70,13 @@ int jffs2_unregister_compressor(struct jffs2_compressor 
*comp);
 int jffs2_compressors_init(void);
 int jffs2_compressors_exit(void);
 
-uint16_t jffs2_compress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-   unsigned char *data_in, unsigned char **cpage_out,
-   uint32_t *datalen, uint32_t *cdatalen);
+uint16_t jffs2_compress(struct jffs2_sb_info *c, unsigned char *data_in,
+   unsigned char **cpage_out, uint32_t *datalen,
+   uint32_t *cdatalen);
 
-int jffs2_decompress(struct jffs2_sb_info *c, struct jffs2_inode_info *f,
-uint16_t comprtype, unsigned char *cdata_in,
-unsigned char *data_out, uint32_t cdatalen, uint32_t 
datalen);
+int jffs2_decompress(uint16_t comprtype, unsigned char *cdata_in,
+unsigned char *data_out, uint32_t cdatalen,
+uint32_t datalen);
 
 void jffs2_free_comprbuf(unsigned char *comprbuf, unsigned char *orig);
 
diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c
index 5a2dec2..8dc85aa 100644
--- a/fs/jffs2/gc.c
+++ b/fs/jffs2/gc.c
@@ -1330,7 +1330,7 @@ static int jffs2_garbage_collect_dnode(struct 
jffs2_sb_info *c, struct jffs2_era
 
writebuf = pg_ptr + (offset  (PAGE_CACHE_SIZE -1));
 
-   comprtype = jffs2_compress(c, f, writebuf, comprbuf, datalen, 
cdatalen);
+   comprtype = jffs2_compress(c, writebuf, comprbuf, datalen, 
cdatalen);
 
ri.magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);
ri.nodetype = cpu_to_je16(JFFS2_NODETYPE_INODE);
diff --git a/fs/jffs2/read.c b/fs/jffs2/read.c
index 0b042b1..aed9183 100644
--- a/fs/jffs2/read.c
+++ b/fs/jffs2/read.c
@@ -132,7 +132,9 @@ int jffs2_read_dnode(struct jffs2_sb_info *c, struct 
jffs2_inode_info *f,
jffs2_dbg(2, Decompress %d bytes from %p to %d bytes at %p\n,
  je32_to_cpu(ri-csize), readbuf,
  je32_to_cpu(ri-dsize), decomprbuf);
-   ret = jffs2_decompress(c, f, ri-compr | (ri-usercompr  8), 
readbuf, decomprbuf, je32_to_cpu(ri-csize), je32_to_cpu(ri-dsize));
+   ret = jffs2_decompress(ri-compr | (ri-usercompr  8),
+   readbuf, decomprbuf, je32_to_cpu(ri-csize),
+   je32_to_cpu(ri-dsize));
if (ret) {
pr_warn(Error: jffs2_decompress returned %d\n, ret);
goto out_decomprbuf;
diff --git a/fs/jffs2/write.c b/fs/jffs2/write.c
index b634de4..dbc26de 100644
--- a/fs/jffs2/write.c
+++ b/fs/jffs2/write.c
@@ -369,7 +369,7 @@ int jffs2_write_inode_range(struct jffs2_sb_info *c, struct 
jffs2_inode_info *f,
datalen = min_t(uint32_t, writelen, PAGE_CACHE_SIZE - (offset  
(PAGE_CACHE_SIZE-1)));
cdatalen = min_t(uint32_t, alloclen - sizeof(*ri), datalen);
 
-   comprtype = jffs2_compress(c, f, buf, comprbuf, datalen, 
cdatalen);
+   comprtype = jffs2_compress(c, buf, comprbuf, datalen, 
cdatalen

[PATCH ] lib/crc32: update the comments of, crc32_{be,le}_generic()

2013-07-19 Thread Gu Zheng



Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 lib/crc32.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/crc32.c b/lib/crc32.c
index 072fbd8..4722659 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -131,11 +131,14 @@ crc32_body(u32 crc, unsigned char const *buf, size_t len, 
const u32 (*tab)[256])
 #endif
 
 /**
- * crc32_le() - Calculate bitwise little-endian Ethernet AUTODIN II CRC32
+ * crc32_le_generic() - Calculate bitwise little-endian Ethernet AUTODIN II
+ * CRC32/CRC32C
  * @crc: seed value for computation.  ~0 for Ethernet, sometimes 0 for
- * other uses, or the previous crc32 value if computing incrementally.
- * @p: pointer to buffer over which CRC is run
+ * other uses, or the previous crc32/crc32c value if computing 
incrementally.
+ * @p: pointer to buffer over which CRC32/CRC32C is run
  * @len: length of buffer @p
+ * @tab: little-endian Ethernet table
+ * @polynomial: CRC32/CRC32c LE polynomial
  */
 static inline u32 __pure crc32_le_generic(u32 crc, unsigned char const *p,
  size_t len, const u32 (*tab)[256],
@@ -201,11 +204,13 @@ EXPORT_SYMBOL(crc32_le);
 EXPORT_SYMBOL(__crc32c_le);
 
 /**
- * crc32_be() - Calculate bitwise big-endian Ethernet AUTODIN II CRC32
+ * crc32_be_generic() - Calculate bitwise big-endian Ethernet AUTODIN II CRC32
  * @crc: seed value for computation.  ~0 for Ethernet, sometimes 0 for
  * other uses, or the previous crc32 value if computing incrementally.
- * @p: pointer to buffer over which CRC is run
+ * @p: pointer to buffer over which CRC32 is run
  * @len: length of buffer @p
+ * @tab: big-endian Ethernet table
+ * @polynomial: CRC32 BE polynomial
  */
 static inline u32 __pure crc32_be_generic(u32 crc, unsigned char const *p,
  size_t len, const u32 (*tab)[256],
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs: add the missing delection of orphan inode entry in write_orphan_inodes()

2013-07-19 Thread Gu Zheng

After writing orphan inode entry in jornal block, we need to delete each
entry from the orphan entry list, and release them.


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 66a6b85..290db04 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -337,6 +337,10 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
memset(orphan_blk, 0, sizeof(*orphan_blk));
 page_exist:
orphan_blk-ino[nentries++] = cpu_to_le32(orphan-ino);
+
+   list_del(orphan-list);
+   kmem_cache_free(orphan_entry_slab, orphan);
+   sbi-n_orphans--;
}
if (!page)
goto end;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs: use list_for_each rather than list_for_each_safe, in remove_orphan_inode()

2013-07-19 Thread Gu Zheng

As we remove the target single node, so list_for_each is enought, in order to
clean up, we use list_for_each_entry instead.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 290db04..87f7bc2 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -237,13 +237,12 @@ out:
 
 void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
-   struct list_head *this, *next, *head;
+   struct list_head *head;
struct orphan_inode_entry *orphan;
 
mutex_lock(sbi-orphan_inode_mutex);
head = sbi-orphan_inode_list;
-   list_for_each_safe(this, next, head) {
-   orphan = list_entry(this, struct orphan_inode_entry, list);
+   list_for_each_entry(orphan, head, list) {
if (orphan-ino == ino) {
list_del(orphan-list);
kmem_cache_free(orphan_entry_slab, orphan);
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH aio-next] aio: fix race in ring buffer page lookup introduced by page migration support

2013-09-09 Thread Gu Zheng

Hi Ben, Al,

On 09/10/2013 12:02 AM, Benjamin LaHaise wrote:

 Hi Al, Gu,
 
 I've added this patch to my tree at git://git.kvack.org/~bcrl/aio-next.git 
 to fix the get_user_pages() issue introduced by Gu's changes in the page 
 migration patch.  Thanks Al for spotting this.

Thanks very much for spotting and fixing this issue.

Best regards,
Gu

 
   -ben
 
 commit d6c355c7dabcd753a75bc77d150d36328a355267
 Author: Benjamin LaHaise b...@kvack.org
 Date:   Mon Sep 9 11:57:59 2013 -0400
 
 aio: fix race in ring buffer page lookup introduced by page migration 
 support
 
 Prior to the introduction of page migration support in fs/aio: Add 
 support
 to aio ring pages migration / 36bc08cc01709b4a9bb563b35aa530241ddc63e3,
 mapping of the ring buffer pages was done via get_user_pages() while
 retaining mmap_sem held for write.  This avoided possible races with 
 userland
 racing an munmap() or mremap().  The page migration patch, however, 
 switched
 to using mm_populate() to prime the page mapping.  mm_populate() cannot be
 called with mmap_sem held.
 
 Instead of dropping the mmap_sem, revert to the old behaviour and simply
 drop the use of mm_populate() since get_user_pages() will cause the pages 
 to
 get mapped anyways.  Thanks to Al Viro for spotting this issue.
 
 Signed-off-by: Benjamin LaHaise b...@kvack.org
 
 diff --git a/fs/aio.c b/fs/aio.c
 index 6e26755..f4a27af 100644
 --- a/fs/aio.c
 +++ b/fs/aio.c
 @@ -307,16 +307,25 @@ static int aio_setup_ring(struct kioctx *ctx)
   aio_free_ring(ctx);
   return -EAGAIN;
   }
 - up_write(mm-mmap_sem);
 -
 - mm_populate(ctx-mmap_base, populate);
  
   pr_debug(mmap address: 0x%08lx\n, ctx-mmap_base);
 +
 + /* We must do this while still holding mmap_sem for write, as we
 +  * need to be protected against userspace attempting to mremap()
 +  * or munmap() the ring buffer.
 +  */
   ctx-nr_pages = get_user_pages(current, mm, ctx-mmap_base, nr_pages,
  1, 0, ctx-ring_pages, NULL);
 +
 + /* Dropping the reference here is safe as the page cache will hold
 +  * onto the pages for us.  It is also required so that page migration
 +  * can unmap the pages and get the right reference count.
 +  */
   for (i = 0; i  ctx-nr_pages; i++)
   put_page(ctx-ring_pages[i]);
  
 + up_write(mm-mmap_sem);
 +
   if (unlikely(ctx-nr_pages != nr_pages)) {
   aio_free_ring(ctx);
   return -EAGAIN;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: optimize fs_lock for better performance

2013-09-10 Thread Gu Zheng

Hi Jaegeuk,
On 09/10/2013 08:59 AM, Jaegeuk Kim wrote:

 Hi,
 
 2013-09-07 (토), 08:00 +, Chao Yu:
 Hi Knize,

 Thanks for your reply, I think it's actually meaningless that it's
 being named after spin_lock,
 it's better to rename this spinlock to round_robin_lock.

 This patch can only resolve the issue of unbalanced fs_lock usage,
 it can not fix the deadlock issue.
 can we fix deadlock issue through this method:

 - vfs_create()
  - f2fs_create() - takes an fs_lock and save current thread info into
 thread_info[NR_GLOBAL_LOCKS]
   - f2fs_add_link()
- __f2fs_add_link()
 - init_inode_metadata()
  - f2fs_init_security()
   - security_inode_init_security()
- f2fs_initxattrs()
 - f2fs_setxattr() - get fs_lock only if there is no current
 thread info in thread_info
 
 So it keeps one thread can only hold one fs_lock to avoid deadlock.
 Can we use this solution?
 
 It could be.
 But, I think we can avoid to grab the fs_lock at the f2fs_initxattrs()

Agree. This fs_lock here is used to protect the xattr from parallel 
modification,
but here is in the initxattrs routine, parallel modification can not happen.
And in the normal setxattr routine the inode-i_mutex (vfs layer) is used to
avoid parallel modification. So I think this fs_lock is needless.
Am I missing something?

Regards,
Gu

 level, since this case only happens when f2fs_initxattrs() is called.
 Let's think about ut in more detail.
 Thanks,
 

  

 thanks again!

  

 --- Original Message ---

 Sender : Russ Knizeruss.kn...@motorola.com

 Date : 九月 07, 2013 04:25 (GMT+09:00)

 Title : Re: [f2fs-dev] [PATCH] f2fs: optimize fs_lock for better
 performance

  

 I encountered this same issue recently and solved it in much the same
 way.  Can we rename spin_lock to something more meaningful? 


 This race actually exposed a potential deadlock between f2fs_create()
 and f2fs_initxattrs(): 


 - vfs_create()
  - f2fs_create() - takes an fs_lock
   - f2fs_add_link()
- __f2fs_add_link()
 - init_inode_metadata()
  - f2fs_init_security()
   - security_inode_init_security()
- f2fs_initxattrs()
 - f2fs_setxattr() - also takes an fs_lock


 If another CPU happens to have the same lock that f2fs_setxattr() was
 trying to take because of the race around next_lock_num, we can get
 into a deadlock situation if the two threads are also contending over
 another resource (like bdi).


 Another scenario is if the above happens while another thread is in
 the middle of grabbing all of the locks via mutex_lock_all().
  f2fs_create() is holding a lock that mutex_lock_all() is waiting for
 and mutex_lock_all() is holding a lock that f2fs_setxattr() is waiting
 for.


 Russ


 On Fri, Sep 6, 2013 at 4:48 AM, Chao Yu chao2...@samsung.com wrote:
 Hi Kim:
 
  I think there is a performance problem: when all
 sbi-fs_lock is holded, 
 
 then all other threads may get the same next_lock value from
 sbi-next_lock_num in function mutex_lock_op, 
 
 and wait to get the same lock at position fs_lock[next_lock],
 it unbalance the fs_lock usage. 
 
 It may lost performance when we do the multithread test.
 
  
 
 Here is the patch to fix this problem:
 
  
 
 Signed-off-by: Yu Chao chao2...@samsung.com
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 
 old mode 100644
 
 new mode 100755
 
 index 467d42d..983bb45
 
 --- a/fs/f2fs/f2fs.h
 
 +++ b/fs/f2fs/f2fs.h
 
 @@ -371,6 +371,7 @@ struct f2fs_sb_info {
 
 struct mutex fs_lock[NR_GLOBAL_LOCKS];  /* blocking FS
 operations */
 
 struct mutex node_write;/* locking
 node writes */
 
 struct mutex writepages;/* mutex for
 writepages() */
 
 +   spinlock_t spin_lock;   /* lock for
 next_lock_num */
 
 unsigned char next_lock_num;/* round-robin
 global locks */
 
 int por_doing;  /* recovery is
 doing or not */
 
 int on_build_free_nids; /*
 build_free_nids is doing */
 
 @@ -533,15 +534,19 @@ static inline void
 mutex_unlock_all(struct f2fs_sb_info *sbi)
 
  
 
  static inline int mutex_lock_op(struct f2fs_sb_info *sbi)
 
  {
 
 -   unsigned char next_lock = sbi-next_lock_num %
 NR_GLOBAL_LOCKS;
 
 +   unsigned char next_lock;
 
 int i = 0;
 
  
 
 for (; i  NR_GLOBAL_LOCKS; i++)

Re: [f2fs-dev][PATCH] f2fs: optimize fs_lock for better performance

2013-09-10 Thread Gu Zheng

Hi Jaegeuk,

On 09/10/2013 08:52 AM, Jaegeuk Kim wrote:

 Hi,
 
 At first, thank you for the report and please follow the email writing
 rules. :)
 
 Anyway, I agree to the below issue.
 One thing that I can think of is that we don't need to use the
 spin_lock, since we don't care about the exact lock number, but just
 need to get any not-collided number.

Agree, but if all the locks are held, IMO, we need to balance the following
threads to wait for each not-collided number lock, though complete balance is 
unreachable.

 
 So, how about removing the spin_lock?

Yeah, in this case, spin_lock is a bit heavy cost. 

 And how about using a random number?

Now NR_GLOBAL_LOCKS is 8, it seems that random can not offer an balance number 
as we expected.

Regards,
Gu 

 Thanks,
 
 2013-09-06 (금), 09:48 +, Chao Yu:
 Hi Kim:

  I think there is a performance problem: when all sbi-fs_lock is
 holded, 

 then all other threads may get the same next_lock value from
 sbi-next_lock_num in function mutex_lock_op, 

 and wait to get the same lock at position fs_lock[next_lock], it
 unbalance the fs_lock usage. 

 It may lost performance when we do the multithread test.

  

 Here is the patch to fix this problem:

  

 Signed-off-by: Yu Chao chao2...@samsung.com

 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h

 old mode 100644

 new mode 100755

 index 467d42d..983bb45

 --- a/fs/f2fs/f2fs.h

 +++ b/fs/f2fs/f2fs.h

 @@ -371,6 +371,7 @@ struct f2fs_sb_info {

 struct mutex fs_lock[NR_GLOBAL_LOCKS];  /* blocking FS
 operations */

 struct mutex node_write;/* locking node writes
 */

 struct mutex writepages;/* mutex for
 writepages() */

 +   spinlock_t spin_lock;   /* lock for
 next_lock_num */

 unsigned char next_lock_num;/* round-robin global
 locks */

 int por_doing;  /* recovery is doing
 or not */

 int on_build_free_nids; /* build_free_nids is
 doing */

 @@ -533,15 +534,19 @@ static inline void mutex_unlock_all(struct
 f2fs_sb_info *sbi)

  

  static inline int mutex_lock_op(struct f2fs_sb_info *sbi)

  {

 -   unsigned char next_lock = sbi-next_lock_num %
 NR_GLOBAL_LOCKS;

 +   unsigned char next_lock;

 int i = 0;

  

 for (; i  NR_GLOBAL_LOCKS; i++)

 if (mutex_trylock(sbi-fs_lock[i]))

 return i;

  

 -   mutex_lock(sbi-fs_lock[next_lock]);

 +   spin_lock(sbi-spin_lock);

 +   next_lock = sbi-next_lock_num % NR_GLOBAL_LOCKS;

 sbi-next_lock_num++;

 +   spin_unlock(sbi-spin_lock);

 +

 +   mutex_lock(sbi-fs_lock[next_lock]);

 return next_lock;

  }

  

 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c

 old mode 100644

 new mode 100755

 index 75c7dc3..4f27596

 --- a/fs/f2fs/super.c

 +++ b/fs/f2fs/super.c

 @@ -657,6 +657,7 @@ static int f2fs_fill_super(struct super_block *sb,
 void *data, int silent)

 mutex_init(sbi-cp_mutex);

 for (i = 0; i  NR_GLOBAL_LOCKS; i++)

 mutex_init(sbi-fs_lock[i]);

 +   spin_lock_init(sbi-spin_lock);

 mutex_init(sbi-node_write);

 sbi-por_doing = 0;

 spin_lock_init(sbi-stat_lock);

 (END)

  




 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev][PATCH] f2fs: optimize fs_lock for better performance

2013-09-10 Thread Gu Zheng

Hi Jaegeuk, Chao,

On 09/10/2013 08:52 AM, Jaegeuk Kim wrote:

 Hi,
 
 At first, thank you for the report and please follow the email writing
 rules. :)
 
 Anyway, I agree to the below issue.
 One thing that I can think of is that we don't need to use the
 spin_lock, since we don't care about the exact lock number, but just
 need to get any not-collided number.

IMHO, just moving sbi-next_lock_num++ before 
mutex_lock(sbi-fs_lock[next_lock])
can avoid unbalance issue mostly.
IMO, the case two or more threads increase sbi-next_lock_num in the same time 
is
really very very little. If you think it is not rigorous, change next_lock_num 
to
atomic one can fix it.
What's your opinion?

Regards,
Gu

 
 So, how about removing the spin_lock?
 And how about using a random number?

 Thanks,
 
 2013-09-06 (금), 09:48 +, Chao Yu:
 Hi Kim:

  I think there is a performance problem: when all sbi-fs_lock is
 holded, 

 then all other threads may get the same next_lock value from
 sbi-next_lock_num in function mutex_lock_op, 

 and wait to get the same lock at position fs_lock[next_lock], it
 unbalance the fs_lock usage. 

 It may lost performance when we do the multithread test.

  

 Here is the patch to fix this problem:

  

 Signed-off-by: Yu Chao chao2...@samsung.com

 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h

 old mode 100644

 new mode 100755

 index 467d42d..983bb45

 --- a/fs/f2fs/f2fs.h

 +++ b/fs/f2fs/f2fs.h

 @@ -371,6 +371,7 @@ struct f2fs_sb_info {

 struct mutex fs_lock[NR_GLOBAL_LOCKS];  /* blocking FS
 operations */

 struct mutex node_write;/* locking node writes
 */

 struct mutex writepages;/* mutex for
 writepages() */

 +   spinlock_t spin_lock;   /* lock for
 next_lock_num */

 unsigned char next_lock_num;/* round-robin global
 locks */

 int por_doing;  /* recovery is doing
 or not */

 int on_build_free_nids; /* build_free_nids is
 doing */

 @@ -533,15 +534,19 @@ static inline void mutex_unlock_all(struct
 f2fs_sb_info *sbi)

  

  static inline int mutex_lock_op(struct f2fs_sb_info *sbi)

  {

 -   unsigned char next_lock = sbi-next_lock_num %
 NR_GLOBAL_LOCKS;

 +   unsigned char next_lock;

 int i = 0;

  

 for (; i  NR_GLOBAL_LOCKS; i++)

 if (mutex_trylock(sbi-fs_lock[i]))

 return i;

  

 -   mutex_lock(sbi-fs_lock[next_lock]);

 +   spin_lock(sbi-spin_lock);

 +   next_lock = sbi-next_lock_num % NR_GLOBAL_LOCKS;

 sbi-next_lock_num++;

 +   spin_unlock(sbi-spin_lock);

 +

 +   mutex_lock(sbi-fs_lock[next_lock]);

 return next_lock;

  }

  

 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c

 old mode 100644

 new mode 100755

 index 75c7dc3..4f27596

 --- a/fs/f2fs/super.c

 +++ b/fs/f2fs/super.c

 @@ -657,6 +657,7 @@ static int f2fs_fill_super(struct super_block *sb,
 void *data, int silent)

 mutex_init(sbi-cp_mutex);

 for (i = 0; i  NR_GLOBAL_LOCKS; i++)

 mutex_init(sbi-fs_lock[i]);

 +   spin_lock_init(sbi-spin_lock);

 mutex_init(sbi-node_write);

 sbi-por_doing = 0;

 spin_lock_init(sbi-stat_lock);

 (END)

  




 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev][PATCH] f2fs: optimize fs_lock for better performance

2013-09-11 Thread Gu Zheng

Hi Chao,
On 09/12/2013 10:40 AM, 俞超 wrote:

 Hi Gu

 -Original Message-
 From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
 Sent: Wednesday, September 11, 2013 1:38 PM
 To: jaegeuk@samsung.com
 Cc: chao2...@samsung.com; shu@samsung.com;
 linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
 linux-f2fs-de...@lists.sourceforge.net
 Subject: Re: [f2fs-dev][PATCH] f2fs: optimize fs_lock for better performance

 Hi Jaegeuk, Chao,

 On 09/10/2013 08:52 AM, Jaegeuk Kim wrote:

 Hi,

 At first, thank you for the report and please follow the email writing
 rules. :)

 Anyway, I agree to the below issue.
 One thing that I can think of is that we don't need to use the
 spin_lock, since we don't care about the exact lock number, but just
 need to get any not-collided number.

 IMHO, just moving sbi-next_lock_num++ before
 mutex_lock(sbi-fs_lock[next_lock])
 can avoid unbalance issue mostly.
 IMO, the case two or more threads increase sbi-next_lock_num in the same
 time is really very very little. If you think it is not rigorous, change
 next_lock_num to atomic one can fix it.
 What's your opinion?

 Regards,
 Gu

 I did the test sbi-next_lock_num++ compare with the atomic one,
 And I found performance of them is almost the same under a small number 
 thread racing.
 So as your and Kim's opinion, it's enough to use sbi-next_lock_num++ to 
 fix this issue.

Good, but it seems that your replay patch is out of format, and it's hard for 
Jaegeuk to merge.
I'll format it, see the following thread.

Thanks,
Gu

 Thanks for the advice.

 So, how about removing the spin_lock?
 And how about using a random number?

 Thanks,

 2013-09-06 (금), 09:48 +, Chao Yu:
 Hi Kim:

  I think there is a performance problem: when all sbi-fs_lock is
 holded,

 then all other threads may get the same next_lock value from
 sbi-next_lock_num in function mutex_lock_op,

 and wait to get the same lock at position fs_lock[next_lock], it
 unbalance the fs_lock usage.

 It may lost performance when we do the multithread test.

 Here is the patch to fix this problem:

 Signed-off-by: Yu Chao chao2...@samsung.com

 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h

 old mode 100644

 new mode 100755

 index 467d42d..983bb45

 --- a/fs/f2fs/f2fs.h

 +++ b/fs/f2fs/f2fs.h

 @@ -371,6 +371,7 @@ struct f2fs_sb_info {

 struct mutex fs_lock[NR_GLOBAL_LOCKS];  /* blocking FS
 operations */

 struct mutex node_write;/* locking node
 writes
 */

 struct mutex writepages;/* mutex for
 writepages() */

 +   spinlock_t spin_lock;   /* lock for
 next_lock_num */

 unsigned char next_lock_num;/* round-robin
 global
 locks */

 int por_doing;  /* recovery is doing
 or not */

 int on_build_free_nids; /* build_free_nids is
 doing */

 @@ -533,15 +534,19 @@ static inline void mutex_unlock_all(struct
 f2fs_sb_info *sbi)

  static inline int mutex_lock_op(struct f2fs_sb_info *sbi)

  {

 -   unsigned char next_lock = sbi-next_lock_num %
 NR_GLOBAL_LOCKS;

 +   unsigned char next_lock;

 int i = 0;

 for (; i  NR_GLOBAL_LOCKS; i++)

 if (mutex_trylock(sbi-fs_lock[i]))

 return i;

 -   mutex_lock(sbi-fs_lock[next_lock]);

 +   spin_lock(sbi-spin_lock);

 +   next_lock = sbi-next_lock_num % NR_GLOBAL_LOCKS;

 sbi-next_lock_num++;

 +   spin_unlock(sbi-spin_lock);

 +

 +   mutex_lock(sbi-fs_lock[next_lock]);

 return next_lock;

  }

 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c

 old mode 100644

 new mode 100755

 index 75c7dc3..4f27596

 --- a/fs/f2fs/super.c

 +++ b/fs/f2fs/super.c

 @@ -657,6 +657,7 @@ static int f2fs_fill_super(struct super_block
 *sb, void *data, int silent)

 mutex_init(sbi-cp_mutex);

 for (i = 0; i  NR_GLOBAL_LOCKS; i++)

 mutex_init(sbi-fs_lock[i]);

 +   spin_lock_init(sbi-spin_lock);

 mutex_init(sbi-node_write);

 sbi-por_doing = 0;

 spin_lock_init(sbi-stat_lock);

 (END)

 =

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH V2] f2fs: optimize fs_lock for better performance

2013-09-11 Thread Gu Zheng

From: Yu Chao chao2...@samsung.com

There is a performance problem: when all sbi-fs_lock are holded, then
all the following threads may get the same next_lock value from 
sbi-next_lock_num
in function mutex_lock_op, and wait for the same lock(fs_lock[next_lock]),
it may cause performance reduce.
So we move the sbi-next_lock_num++ before getting lock, this will average the
following threads if all sbi-fs_lock are holded. 

v1--v2:
Drop the needless spin_lock as Jaegeuk suggested.

Suggested-by: Jaegeuk Kim jaegeuk@samsung.com
Signed-off-by: Yu Chao chao2...@samsung.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/f2fs.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 608f0df..7fd99d8 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -544,15 +544,15 @@ static inline void mutex_unlock_all(struct f2fs_sb_info 
*sbi)
 
 static inline int mutex_lock_op(struct f2fs_sb_info *sbi)
 {
-   unsigned char next_lock = sbi-next_lock_num % NR_GLOBAL_LOCKS;
+   unsigned char next_lock;
int i = 0;
 
for (; i  NR_GLOBAL_LOCKS; i++)
if (mutex_trylock(sbi-fs_lock[i]))
return i;
 
+   next_lock = sbi-next_lock_num++ % NR_GLOBAL_LOCKS;
mutex_lock(sbi-fs_lock[next_lock]);
-   sbi-next_lock_num++;
return next_lock;
 }
 
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs: add a wait step when submit bio with {READ,WRITE}_SYNC

2013-07-30 Thread Gu Zheng

When we submit bio with READ_SYNC or WRITE_SYNC, we need to wait a
moment for the io completion, current codes only find_data_page() follows the
rule, other places missing this step, so add it.

Further more, moving the PageUptodate check into f2fs_readpage() to clean up
the codes.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |1 -
 fs/f2fs/data.c   |   39 +--
 fs/f2fs/node.c   |1 -
 fs/f2fs/recovery.c   |2 --
 fs/f2fs/segment.c|2 +-
 5 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index fe91773..e376a42 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -64,7 +64,6 @@ repeat:
if (f2fs_readpage(sbi, page, index, READ_SYNC))
goto repeat;
 
-   lock_page(page);
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 19cd7c6..b048936 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -216,13 +216,11 @@ struct page *find_data_page(struct inode *inode, pgoff_t 
index, bool sync)
 
err = f2fs_readpage(sbi, page, dn.data_blkaddr,
sync ? READ_SYNC : READA);
-   if (sync) {
-   wait_on_page_locked(page);
-   if (!PageUptodate(page)) {
-   f2fs_put_page(page, 0);
-   return ERR_PTR(-EIO);
-   }
-   }
+   if (err)
+   return ERR_PTR(err);
+
+   if (sync)
+   unlock_page(page);
return page;
 }
 
@@ -267,11 +265,6 @@ repeat:
if (err)
return ERR_PTR(err);
 
-   lock_page(page);
-   if (!PageUptodate(page)) {
-   f2fs_put_page(page, 1);
-   return ERR_PTR(-EIO);
-   }
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
@@ -325,11 +318,7 @@ repeat:
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err)
return ERR_PTR(err);
-   lock_page(page);
-   if (!PageUptodate(page)) {
-   f2fs_put_page(page, 1);
-   return ERR_PTR(-EIO);
-   }
+
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
@@ -399,6 +388,16 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
*page,
 
submit_bio(type, bio);
up_read(sbi-bio_sem);
+
+   if (type == READ_SYNC) {
+   wait_on_page_locked(page);
+   lock_page(page);
+   if (!PageUptodate(page)) {
+   f2fs_put_page(page, 1);
+   return -EIO;
+   }
+   }
+
return 0;
 }
 
@@ -679,11 +678,7 @@ repeat:
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err)
return err;
-   lock_page(page);
-   if (!PageUptodate(page)) {
-   f2fs_put_page(page, 1);
-   return -EIO;
-   }
+
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index f5172e2..f061554 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1534,7 +1534,6 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
if (f2fs_readpage(sbi, page, addr, READ_SYNC))
goto out;
 
-   lock_page(page);
rn = F2FS_NODE(page);
sum_entry-nid = rn-footer.nid;
sum_entry-version = 0;
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 639eb34..ec68183 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -140,8 +140,6 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, 
struct list_head *head)
if (err)
goto out;
 
-   lock_page(page);
-
if (cp_ver != cpver_of_node(page))
break;
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9b74ae2..bcd19db 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -639,7 +639,7 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
 
trace_f2fs_do_submit_bio(sbi-sb, btype, sync, sbi-bio[btype]);
 
-   if (type == META_FLUSH) {
+   if ((type == META_FLUSH) || (rw  WRITE_SYNC)) {
DECLARE_COMPLETION_ONSTACK(wait);
p-is_sync = true;
p-wait = wait;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

Re: [PATCH] f2fs: add a wait step when submit bio with {READ,WRITE}_SYNC

2013-07-30 Thread Gu Zheng

Hi Kim,

On 07/30/2013 08:29 PM, Jaegeuk Kim wrote:

 Hi Gu,
 
 The original read flow was to avoid redandunt lock/unlock_page() calls.

Right, this can gain better read performance. But is the wait step after 
submitting bio with READ_SYNC needless too?

 And we should not wait for WRITE_SYNC, since it is just for write
 priority, not for synchronization of the file system.

Got it, thanks for your explanation.:) 

Best regards,
Gu

 Thanks,
 
 2013-07-30 (화), 18:06 +0800, Gu Zheng:
 When we submit bio with READ_SYNC or WRITE_SYNC, we need to wait a
 moment for the io completion, current codes only find_data_page() follows the
 rule, other places missing this step, so add it.

 Further more, moving the PageUptodate check into f2fs_readpage() to clean up
 the codes.

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/f2fs/checkpoint.c |1 -
  fs/f2fs/data.c   |   39 +--
  fs/f2fs/node.c   |1 -
  fs/f2fs/recovery.c   |2 --
  fs/f2fs/segment.c|2 +-
  5 files changed, 18 insertions(+), 27 deletions(-)

 diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
 index fe91773..e376a42 100644
 --- a/fs/f2fs/checkpoint.c
 +++ b/fs/f2fs/checkpoint.c
 @@ -64,7 +64,6 @@ repeat:
  if (f2fs_readpage(sbi, page, index, READ_SYNC))
  goto repeat;
  
 -lock_page(page);
  if (page-mapping != mapping) {
  f2fs_put_page(page, 1);
  goto repeat;
 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index 19cd7c6..b048936 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -216,13 +216,11 @@ struct page *find_data_page(struct inode *inode, 
 pgoff_t index, bool sync)
  
  err = f2fs_readpage(sbi, page, dn.data_blkaddr,
  sync ? READ_SYNC : READA);
 -if (sync) {
 -wait_on_page_locked(page);
 -if (!PageUptodate(page)) {
 -f2fs_put_page(page, 0);
 -return ERR_PTR(-EIO);
 -}
 -}
 +if (err)
 +return ERR_PTR(err);
 +
 +if (sync)
 +unlock_page(page);
  return page;
  }
  
 @@ -267,11 +265,6 @@ repeat:
  if (err)
  return ERR_PTR(err);
  
 -lock_page(page);
 -if (!PageUptodate(page)) {
 -f2fs_put_page(page, 1);
 -return ERR_PTR(-EIO);
 -}
  if (page-mapping != mapping) {
  f2fs_put_page(page, 1);
  goto repeat;
 @@ -325,11 +318,7 @@ repeat:
  err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
  if (err)
  return ERR_PTR(err);
 -lock_page(page);
 -if (!PageUptodate(page)) {
 -f2fs_put_page(page, 1);
 -return ERR_PTR(-EIO);
 -}
 +
  if (page-mapping != mapping) {
  f2fs_put_page(page, 1);
  goto repeat;
 @@ -399,6 +388,16 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
 *page,
  
  submit_bio(type, bio);
  up_read(sbi-bio_sem);
 +
 +if (type == READ_SYNC) {
 +wait_on_page_locked(page);
 +lock_page(page);
 +if (!PageUptodate(page)) {
 +f2fs_put_page(page, 1);
 +return -EIO;
 +}
 +}
 +
  return 0;
  }
  
 @@ -679,11 +678,7 @@ repeat:
  err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
  if (err)
  return err;
 -lock_page(page);
 -if (!PageUptodate(page)) {
 -f2fs_put_page(page, 1);
 -return -EIO;
 -}
 +
  if (page-mapping != mapping) {
  f2fs_put_page(page, 1);
  goto repeat;
 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index f5172e2..f061554 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -1534,7 +1534,6 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
  if (f2fs_readpage(sbi, page, addr, READ_SYNC))
  goto out;
  
 -lock_page(page);
  rn = F2FS_NODE(page);
  sum_entry-nid = rn-footer.nid;
  sum_entry-version = 0;
 diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
 index 639eb34..ec68183 100644
 --- a/fs/f2fs/recovery.c
 +++ b/fs/f2fs/recovery.c
 @@ -140,8 +140,6 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, 
 struct list_head *head)
  if (err)
  goto out;
  
 -lock_page(page);
 -
  if (cp_ver != cpver_of_node(page))
  break;
  
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 9b74ae2..bcd19db 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -639,7 +639,7 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
  
  trace_f2fs_do_submit_bio(sbi-sb, btype, sync, sbi-bio[btype

Re: [PATCH] fbdev: fix build warning in vga16fb.c

2013-07-30 Thread Gu Zheng

Hoho, Tomi has applied the patch from Lius to fix this warning.
And this is the sixth patch to fix the same issue since last week.

Thanks,
Gu


On 07/31/2013 11:21 AM, Xishi Qiu wrote:

 When building v3.11-rc3, I get the following warning:
 ...
 drivers/video/vga16fb.c: In function ‘vga16fb_destroy’:
 drivers/video/vga16fb.c:1268: warning: unused variable ‘dev’
 ...
 
 Signed-off-by: Xishi Qiu qiuxi...@huawei.com
 ---
  drivers/video/vga16fb.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/video/vga16fb.c b/drivers/video/vga16fb.c
 index 830ded4..2827333 100644
 --- a/drivers/video/vga16fb.c
 +++ b/drivers/video/vga16fb.c
 @@ -1265,7 +1265,6 @@ static void vga16fb_imageblit(struct fb_info *info, 
 const struct fb_image *image
  
  static void vga16fb_destroy(struct fb_info *info)
  {
 - struct platform_device *dev = container_of(info-device, struct 
 platform_device, dev);
   iounmap(info-screen_base);
   fb_dealloc_cmap(info-cmap);
   /* XXX unshare VGA regions */


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] fs/bio-integrity: fix a potential mem leak

2013-07-31 Thread Gu Zheng

cc akpm

On 07/29/2013 09:49 AM, Gu Zheng wrote:

 Free the bio_integrity_pool in the fail path of biovec_create_pool
 in function bioset_integrity_create().
 
 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/bio-integrity.c |9 +
  1 files changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
 index 8fb4291..6025084 100644
 --- a/fs/bio-integrity.c
 +++ b/fs/bio-integrity.c
 @@ -716,13 +716,14 @@ int bioset_integrity_create(struct bio_set *bs, int 
 pool_size)
   return 0;
  
   bs-bio_integrity_pool = mempool_create_slab_pool(pool_size, bip_slab);
 -
 - bs-bvec_integrity_pool = biovec_create_pool(bs, pool_size);
 - if (!bs-bvec_integrity_pool)
 + if (!bs-bio_integrity_pool)
   return -1;
  
 - if (!bs-bio_integrity_pool)
 + bs-bvec_integrity_pool = biovec_create_pool(bs, pool_size);
 + if (!bs-bvec_integrity_pool) {
 + mempool_destroy(bs-bio_integrity_pool);
   return -1;
 + }
  
   return 0;
  }


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] f2fs: add a wait step when submit bio with {READ,WRITE}_SYNC

2013-07-31 Thread Gu Zheng

On 07/31/2013 06:06 PM, Jaegeuk Kim wrote:

 2013-07-31 (수), 09:59 +0800, Gu Zheng:
 Hi Kim,

 On 07/30/2013 08:29 PM, Jaegeuk Kim wrote:

 Hi Gu,

 The original read flow was to avoid redandunt lock/unlock_page() calls.

 Right, this can gain better read performance. But is the wait step after 
 submitting bio with READ_SYNC needless too?
 
 Correct, the READ_SYNC is also used for IO priority.
 The basic read policy here is that the caller should lock the page only
 when it wants to manipulate there-in data.

 Otherwise, we don't need to unnecessary lock and unlocks.

Got it, it seems that I had some miss reading originally, it's
clear now, thanks very much for your explanation.:)

Regards,
Gu

 Thanks,

 

 And we should not wait for WRITE_SYNC, since it is just for write
 priority, not for synchronization of the file system.

 Got it, thanks for your explanation.:) 

 Best regards,
 Gu

 Thanks,

 2013-07-30 (화), 18:06 +0800, Gu Zheng:
 When we submit bio with READ_SYNC or WRITE_SYNC, we need to wait a
 moment for the io completion, current codes only find_data_page() follows 
 the
 rule, other places missing this step, so add it.

 Further more, moving the PageUptodate check into f2fs_readpage() to clean 
 up
 the codes.

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/f2fs/checkpoint.c |1 -
  fs/f2fs/data.c   |   39 +--
  fs/f2fs/node.c   |1 -
  fs/f2fs/recovery.c   |2 --
  fs/f2fs/segment.c|2 +-
  5 files changed, 18 insertions(+), 27 deletions(-)

 diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
 index fe91773..e376a42 100644
 --- a/fs/f2fs/checkpoint.c
 +++ b/fs/f2fs/checkpoint.c
 @@ -64,7 +64,6 @@ repeat:
if (f2fs_readpage(sbi, page, index, READ_SYNC))
goto repeat;
  
 -  lock_page(page);
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index 19cd7c6..b048936 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -216,13 +216,11 @@ struct page *find_data_page(struct inode *inode, 
 pgoff_t index, bool sync)
  
err = f2fs_readpage(sbi, page, dn.data_blkaddr,
sync ? READ_SYNC : READA);
 -  if (sync) {
 -  wait_on_page_locked(page);
 -  if (!PageUptodate(page)) {
 -  f2fs_put_page(page, 0);
 -  return ERR_PTR(-EIO);
 -  }
 -  }
 +  if (err)
 +  return ERR_PTR(err);
 +
 +  if (sync)
 +  unlock_page(page);
return page;
  }
  
 @@ -267,11 +265,6 @@ repeat:
if (err)
return ERR_PTR(err);
  
 -  lock_page(page);
 -  if (!PageUptodate(page)) {
 -  f2fs_put_page(page, 1);
 -  return ERR_PTR(-EIO);
 -  }
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
 @@ -325,11 +318,7 @@ repeat:
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err)
return ERR_PTR(err);
 -  lock_page(page);
 -  if (!PageUptodate(page)) {
 -  f2fs_put_page(page, 1);
 -  return ERR_PTR(-EIO);
 -  }
 +
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
 @@ -399,6 +388,16 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct 
 page *page,
  
submit_bio(type, bio);
up_read(sbi-bio_sem);
 +
 +  if (type == READ_SYNC) {
 +  wait_on_page_locked(page);
 +  lock_page(page);
 +  if (!PageUptodate(page)) {
 +  f2fs_put_page(page, 1);
 +  return -EIO;
 +  }
 +  }
 +
return 0;
  }
  
 @@ -679,11 +678,7 @@ repeat:
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err)
return err;
 -  lock_page(page);
 -  if (!PageUptodate(page)) {
 -  f2fs_put_page(page, 1);
 -  return -EIO;
 -  }
 +
if (page-mapping != mapping) {
f2fs_put_page(page, 1);
goto repeat;
 diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
 index f5172e2..f061554 100644
 --- a/fs/f2fs/node.c
 +++ b/fs/f2fs/node.c
 @@ -1534,7 +1534,6 @@ int restore_node_summary(struct f2fs_sb_info *sbi,
if (f2fs_readpage(sbi, page, addr, READ_SYNC))
goto out;
  
 -  lock_page(page);
rn = F2FS_NODE(page);
sum_entry-nid = rn-footer.nid;
sum_entry-version = 0;
 diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
 index 639eb34..ec68183 100644
 --- a/fs/f2fs/recovery.c
 +++ b/fs/f2fs/recovery.c
 @@ -140,8 +140,6 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, 
 struct list_head *head)
if (err)
goto out;
  
 -  lock_page(page);
 -
if (cp_ver != cpver_of_node(page

Re: [PATCH 1/2] f2fs: add sysfs support for controlling the gc_thread

2013-07-31 Thread Gu Zheng

Hi Jeon,

On 07/31/2013 10:33 PM, Namjae Jeon wrote:

 From: Namjae Jeon namjae.j...@samsung.com
 
 Add sysfs entries to control the timing parameters for
 f2fs gc thread.
 
 Various Sysfs options introduced are:
 gc_min_sleep_time: Min Sleep time for GC in ms
 gc_max_sleep_time: Max Sleep time for GC in ms
 gc_no_gc_sleep_time: Default Sleep time for GC in ms
 
 Signed-off-by: Namjae Jeon namjae.j...@samsung.com
 Signed-off-by: Pankaj Kumar pankaj...@samsung.com
 ---
  Documentation/ABI/testing/sysfs-fs-f2fs |   22 ++
  Documentation/filesystems/f2fs.txt  |   26 +++
  fs/f2fs/f2fs.h  |4 +
  fs/f2fs/gc.c|   17 +++--
  fs/f2fs/gc.h|   33 
  fs/f2fs/super.c |  124 
 +++
  6 files changed, 206 insertions(+), 20 deletions(-)
  create mode 100644 Documentation/ABI/testing/sysfs-fs-f2fs
 
 diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
 b/Documentation/ABI/testing/sysfs-fs-f2fs
 new file mode 100644
 index 000..5f44095
 --- /dev/null
 +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
 @@ -0,0 +1,22 @@
 +What:/sys/fs/f2fs/disk/gc_max_sleep_time
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the maximun sleep time for gc_thread. Time
 +  is in milliseconds.
 +
 +What:/sys/fs/f2fs/disk/gc_min_sleep_time
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the minimum sleep time for gc_thread. Time
 +  is in milliseconds.
 +
 +What:/sys/fs/f2fs/disk/gc_no_gc_sleep_time
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the default sleep time for gc_thread. Time
 +  is in milliseconds.
 +
 +
 diff --git a/Documentation/filesystems/f2fs.txt 
 b/Documentation/filesystems/f2fs.txt
 index 0500c19..2e9e873 100644
 --- a/Documentation/filesystems/f2fs.txt
 +++ b/Documentation/filesystems/f2fs.txt
 @@ -133,6 +133,32 @@ f2fs. Each file shows the whole f2fs information.
   - current memory footprint consumed by f2fs.
  
  
 
 +SYSFS ENTRIES
 +
 +
 +Information about mounted f2fs file systems can be found in
 +/sys/fs/f2fs.  Each mounted filesystem will have a directory in
 +/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
 +The files in each per-device directory are shown in table below.
 +
 +Files in /sys/fs/f2fs/devname
 +(see also Documentation/ABI/testing/sysfs-fs-f2fs)
 +..
 + File Content
 +
 + gc_max_sleep_timeThis tuning parameter controls the maximum 
 sleep
 +  time for the garbage collection thread. Time is
 +  in milliseconds.
 +
 + gc_min_sleep_timeThis tuning parameter controls the minimum 
 sleep
 +  time for the garbage collection thread. Time is
 +  in milliseconds.
 +
 + gc_no_gc_sleep_time  This tuning parameter controls the default 
 sleep
 +  time for the garbage collection thread. Time is
 +  in milliseconds.
 +
 +
  USAGE
  
 
  
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 78777cd..63813be 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -430,6 +430,10 @@ struct f2fs_sb_info {
  #endif
   unsigned int last_victim[2];/* last victim segment # */
   spinlock_t stat_lock;   /* lock for stat operations */
 +
 + /* For sysfs suppport */
 + struct kobject s_kobj;
 + struct completion s_kobj_unregister;

What is this completion used for? Or it's an ahead design? I do not find 
synchronization
routines use it. Am I missing something?


  };
  
  /*
 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 35f9b1a..60d4f67 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -29,10 +29,11 @@ static struct kmem_cache *winode_slab;
  static int gc_thread_func(void *data)
  {
   struct f2fs_sb_info *sbi = data;
 + struct f2fs_gc_kthread *gc_th = sbi-gc_thread;
   wait_queue_head_t *wq = sbi-gc_thread-gc_wait_queue_head;
   long wait_ms;
  
 - wait_ms = GC_THREAD_MIN_SLEEP_TIME;
 + wait_ms = gc_th-min_sleep_time;
  
   do {
   if (try_to_freeze())
 @@ -45,7 +46,7 @@ static int gc_thread_func(void *data)
   break;

Re: [PATCH 2/2] f2fs: add sysfs entries to select the gc policy

2013-07-31 Thread Gu Zheng

Hi Jeon,

On 07/31/2013 10:33 PM, Namjae Jeon wrote:

 From: Namjae Jeon namjae.j...@samsung.com
 
 Add sysfs entries namely gc_long_idle and gc_short_idle to control the
 gc policy. Where long idle corresponds to selecting a cost benefit approach,
 while short idle corresponds to selecting a greedy approach to garbage
 collection. The selection is mutually exclusive one approach will work at
 any point.
 
 Signed-off-by: Namjae Jeon namjae.j...@samsung.com
 Signed-off-by: Pankaj Kumar pankaj...@samsung.com
 ---
  Documentation/ABI/testing/sysfs-fs-f2fs |   12 +++
  Documentation/filesystems/f2fs.txt  |8 +
  fs/f2fs/gc.c|   22 ++--
  fs/f2fs/gc.h|4 +++
  fs/f2fs/super.c |   59 
 +--
  5 files changed, 99 insertions(+), 6 deletions(-)
 
 diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
 b/Documentation/ABI/testing/sysfs-fs-f2fs
 index 5f44095..96b62ea 100644
 --- a/Documentation/ABI/testing/sysfs-fs-f2fs
 +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
 @@ -19,4 +19,16 @@ Description:
Controls the default sleep time for gc_thread. Time
is in milliseconds.
  
 +What:/sys/fs/f2fs/disk/gc_long_idle
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the selection of gc policy. long_idle is used
 +  to select the cost benefit approach for garbage collection.
  
 +What:/sys/fs/f2fs/disk/gc_short_idle
 +Date:July 2013
 +Contact: Namjae Jeon namjae.j...@samsung.com
 +Description:
 +  Controls the selection of gc policy. short_idle is used
 +  to select the greedy approach for garbage collection.
 diff --git a/Documentation/filesystems/f2fs.txt 
 b/Documentation/filesystems/f2fs.txt
 index 2e9e873..06dd5d7 100644
 --- a/Documentation/filesystems/f2fs.txt
 +++ b/Documentation/filesystems/f2fs.txt
 @@ -158,6 +158,14 @@ Files in /sys/fs/f2fs/devname
time for the garbage collection thread. Time is
in milliseconds.
  
 + gc_long_idle This parameter controls the selection of cost
 +  benefit approach for garbage collectoin. 
 Writing
 +  1 to this file will select the cost benefit 
 policy.
 +
 + gc_short_idleThis parameter controls the selection of greedy
 +  approach for the garbage collection. Writing 1
 +  to this file will select the greedy policy.

Why introduce two opposite attributes? It'll cause some confusion condition if 
we
double enable/disable them.

 +
  
 
  USAGE
  
 
 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 60d4f67..af2d9d7 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -106,6 +106,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
   gc_th-max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
   gc_th-no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
  
 + gc_th-long_idle = gc_th-short_idle = 0;
 +
   sbi-gc_thread = gc_th;
   init_waitqueue_head(sbi-gc_thread-gc_wait_queue_head);
   sbi-gc_thread-f2fs_gc_task = kthread_run(gc_thread_func, sbi,
 @@ -130,9 +132,23 @@ void stop_gc_thread(struct f2fs_sb_info *sbi)
   sbi-gc_thread = NULL;
  }
  
 -static int select_gc_type(int gc_type)
 +static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type)
  {
 - return (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
 + int gc_mode;
 +
 + if (gc_th) {
 + if (gc_th-long_idle) {
 + gc_mode = GC_CB;
 + goto out;
 + } else if (gc_th-short_idle) {
 + gc_mode = GC_GREEDY;
 + goto out;
 + }
 + }
 +
 + gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
 +out:
 + return gc_mode;
  }
  
  static void select_policy(struct f2fs_sb_info *sbi, int gc_type,
 @@ -145,7 +161,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
 gc_type,
   p-dirty_segmap = dirty_i-dirty_segmap[type];
   p-ofs_unit = 1;
   } else {
 - p-gc_mode = select_gc_type(gc_type);
 + p-gc_mode = select_gc_type(sbi-gc_thread, gc_type);
   p-dirty_segmap = dirty_i-dirty_segmap[DIRTY];
   p-ofs_unit = sbi-segs_per_sec;
   }
 diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
 index f4bf44c..b2faae5 100644
 --- a/fs/f2fs/gc.h
 +++ b/fs/f2fs/gc.h
 @@ -30,6 +30,10 @@ struct f2fs_gc_kthread {
   unsigned int min_sleep_time;
   unsigned int max_sleep_time;
   unsigned int no_gc_sleep_time;
 +
 + /* for changing gc

Re: [PATCH] f2fs: fix handling orphan inodes

2013-08-01 Thread Gu Zheng

On 08/01/2013 03:58 PM, Jaegeuk Kim wrote:

 This patch fixes mishandling of the sbi-n_orphans variable.
 
 If users request lots of f2fs_unlink(), check_orphan_space() could be 
 contended.
 In such the case, sbi-n_orphans can be read incorrectly so that f2fs_unlink()
 would fall into the wrong state which results in the failure of
 add_orphan_inode().
 
 So, let's increment sbi-n_orphans virtually prior to the actual orphan inode
 stuffs. After that, let's release sbi-n_orphans by calling 
 release_orphan_inode
 or remove_orphan_inode.

Hi Kim,
The key point is that we did not reduce sbi-n_orphans when we release/remove 
orphan inode,
so just adding the reduction step can fix this issue.
But why moving the increment of sbi-n_orphans before we add orphan inode? It 
seems that we
can not get benefit from it, and it makes the procedure a bit complex, because 
we should
reduce the sbi-n_orphans in some fail pathes before we really add orphan inode.

Thanks,
Gu

 
 Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com
 ---
  fs/f2fs/checkpoint.c | 13 ++---
  fs/f2fs/dir.c|  2 ++
  fs/f2fs/f2fs.h   |  3 ++-
  fs/f2fs/namei.c  | 19 ++-
  4 files changed, 28 insertions(+), 9 deletions(-)
 
 diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
 index fe91773..c5a5c39 100644
 --- a/fs/f2fs/checkpoint.c
 +++ b/fs/f2fs/checkpoint.c
 @@ -182,7 +182,7 @@ const struct address_space_operations f2fs_meta_aops = {
   .set_page_dirty = f2fs_set_meta_page_dirty,
  };
  
 -int check_orphan_space(struct f2fs_sb_info *sbi)
 +int acquire_orphan_inode(struct f2fs_sb_info *sbi)
  {
   unsigned int max_orphans;
   int err = 0;
 @@ -197,10 +197,19 @@ int check_orphan_space(struct f2fs_sb_info *sbi)
   mutex_lock(sbi-orphan_inode_mutex);
   if (sbi-n_orphans = max_orphans)
   err = -ENOSPC;
 + else
 + sbi-n_orphans++;
   mutex_unlock(sbi-orphan_inode_mutex);
   return err;
  }
  
 +void release_orphan_inode(struct f2fs_sb_info *sbi)
 +{
 + mutex_lock(sbi-orphan_inode_mutex);
 + sbi-n_orphans--;
 + mutex_unlock(sbi-orphan_inode_mutex);
 +}
 +
  void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
  {
   struct list_head *head, *this;
 @@ -229,8 +238,6 @@ retry:
   list_add(new-list, this-prev);
   else
   list_add_tail(new-list, head);
 -
 - sbi-n_orphans++;
  out:
   mutex_unlock(sbi-orphan_inode_mutex);
  }
 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index d1bb260..384c6da 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -572,6 +572,8 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
 struct page *page,
  
   if (inode-i_nlink == 0)
   add_orphan_inode(sbi, inode-i_ino);
 + else
 + release_orphan_inode(sbi);
   }
  
   if (bit_pos == NR_DENTRY_IN_BLOCK) {
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index a6858c7..78777cd 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -1044,7 +1044,8 @@ void destroy_segment_manager(struct f2fs_sb_info *);
  struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
  struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
  long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
 -int check_orphan_space(struct f2fs_sb_info *);
 +int acquire_orphan_inode(struct f2fs_sb_info *);
 +void release_orphan_inode(struct f2fs_sb_info *);
  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
  int recover_orphan_inodes(struct f2fs_sb_info *);
 diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
 index 3297278..4e47518 100644
 --- a/fs/f2fs/namei.c
 +++ b/fs/f2fs/namei.c
 @@ -239,7 +239,7 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
 *dentry)
   if (!de)
   goto fail;
  
 - err = check_orphan_space(sbi);
 + err = acquire_orphan_inode(sbi);
   if (err) {
   kunmap(page);
   f2fs_put_page(page, 0);
 @@ -393,7 +393,7 @@ static int f2fs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
   struct inode *old_inode = old_dentry-d_inode;
   struct inode *new_inode = new_dentry-d_inode;
   struct page *old_dir_page;
 - struct page *old_page;
 + struct page *old_page, *new_page;
   struct f2fs_dir_entry *old_dir_entry = NULL;
   struct f2fs_dir_entry *old_entry;
   struct f2fs_dir_entry *new_entry;
 @@ -415,7 +415,6 @@ static int f2fs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
   ilock = mutex_lock_op(sbi);
  
   if (new_inode) {
 - struct page *new_page;
  
   err = -ENOTEMPTY;
   if (old_dir_entry  !f2fs_empty_dir(new_inode))
 @@ -427,9 +426,13 @@ static int f2fs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
   if (!new_entry)
   goto out_dir;
  
 + err = acquire_orphan_inode(sbi);

Re: [PATCH 1/2] f2fs: add sysfs support for controlling the gc_thread

2013-08-01 Thread Gu Zheng

On 08/02/2013 09:19 AM, Namjae Jeon wrote:


 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 78777cd..63813be 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -430,6 +430,10 @@ struct f2fs_sb_info {
  #endif
 unsigned int last_victim[2];/* last victim segment # */
 spinlock_t stat_lock;   /* lock for stat operations */
 +
 +   /* For sysfs suppport */
 +   struct kobject s_kobj;
 +   struct completion s_kobj_unregister;

 Hi. Gu.
 What is this completion used for? Or it's an ahead design? I do not find
 synchronization
 routines use it. Am I missing something?
 You're right. it is my mistake. I will update it on next version patch.
 


  };

  /*
 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 35f9b1a..60d4f67 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -29,10 +29,11 @@ static struct kmem_cache *winode_slab;
  static int gc_thread_func(void *data)
  {
 struct f2fs_sb_info *sbi = data;
 +   struct f2fs_gc_kthread *gc_th = sbi-gc_thread;
 wait_queue_head_t *wq = sbi-gc_thread-gc_wait_queue_head;
 long wait_ms;

 -   wait_ms = GC_THREAD_MIN_SLEEP_TIME;
 +   wait_ms = gc_th-min_sleep_time;

 do {
 if (try_to_freeze())
 @@ -45,7 +46,7 @@ static int gc_thread_func(void *data)
 break;

 if (sbi-sb-s_writers.frozen = SB_FREEZE_WRITE) {
 -   wait_ms = GC_THREAD_MAX_SLEEP_TIME;
 +   wait_ms = increase_sleep_time(gc_th, wait_ms);
 continue;
 }

 @@ -66,15 +67,15 @@ static int gc_thread_func(void *data)
 continue;

 if (!is_idle(sbi)) {
 -   wait_ms = increase_sleep_time(wait_ms);
 +   wait_ms = increase_sleep_time(gc_th, wait_ms);
 mutex_unlock(sbi-gc_mutex);
 continue;
 }

 if (has_enough_invalid_blocks(sbi))
 -   wait_ms = decrease_sleep_time(wait_ms);
 +   wait_ms = decrease_sleep_time(gc_th, wait_ms);
 else
 -   wait_ms = increase_sleep_time(wait_ms);
 +   wait_ms = increase_sleep_time(gc_th, wait_ms);

  #ifdef CONFIG_F2FS_STAT_FS
 sbi-bg_gc++;
 @@ -82,7 +83,7 @@ static int gc_thread_func(void *data)

 /* if return value is not zero, no victim was selected */
 if (f2fs_gc(sbi))
 -   wait_ms = GC_THREAD_NOGC_SLEEP_TIME;
 +   wait_ms = gc_th-no_gc_sleep_time;
 } while (!kthread_should_stop());
 return 0;
  }
 @@ -101,6 +102,10 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
 goto out;
 }

 +   gc_th-min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
 +   gc_th-max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
 +   gc_th-no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
 +
 sbi-gc_thread = gc_th;
 init_waitqueue_head(sbi-gc_thread-gc_wait_queue_head);
 sbi-gc_thread-f2fs_gc_task = kthread_run(gc_thread_func, sbi,
 diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
 index 2c6a6bd..f4bf44c 100644
 --- a/fs/f2fs/gc.h
 +++ b/fs/f2fs/gc.h
 @@ -13,9 +13,9 @@
  * whether IO subsystem is idle
  * or not
  */
 -#define GC_THREAD_MIN_SLEEP_TIME   3   /* milliseconds */
 -#define GC_THREAD_MAX_SLEEP_TIME   6
 -#define GC_THREAD_NOGC_SLEEP_TIME  30  /* wait 5 min */
 +#define DEF_GC_THREAD_MIN_SLEEP_TIME   3   /* milliseconds */
 +#define DEF_GC_THREAD_MAX_SLEEP_TIME   6
 +#define DEF_GC_THREAD_NOGC_SLEEP_TIME  30  /* wait 5 min */
  #define LIMIT_INVALID_BLOCK40 /* percentage over total user space 
 */
  #define LIMIT_FREE_BLOCK   40 /* percentage over invalid + free space */

 @@ -25,6 +25,11 @@
  struct f2fs_gc_kthread {
 struct task_struct *f2fs_gc_task;
 wait_queue_head_t gc_wait_queue_head;
 +
 +   /* for gc sleep time */
 +   unsigned int min_sleep_time;
 +   unsigned int max_sleep_time;
 +   unsigned int no_gc_sleep_time;

 Though these attributes are used for gc thread, and in current design
 gc_thread is always
 singleton per f2fs_sb, but thare're in fact f2fs sb infos. So I think it's
 to attach
 these to f2fs_sb_info. What's your opinion?
 It does not matter wherever it is. but I think that these gc time are
 for gc thread.
 So I put gc time to gc thread.

Yeah, in fact it's also OK. :)

Regards,
Gu

 
 Thanks for review :)


 Thanks,
 Gu

  };
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[RESEND PATCH 2/2] staging/olpc_docn: reorder the lock sequence to avoid potential dead lock

2013-11-05 Thread Gu Zheng

The lock sequence of dcon_blank_fb(fb_info-lock --- console_lock) is against
with the one of console_callback(console_lock --- fb_info-lock), it'll
lead to a potential dead lock, so reorder the lock sequence of dcon_blank_fb
to avoid the potential dead lock.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/staging/olpc_dcon/olpc_dcon.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/olpc_dcon/olpc_dcon.c 
b/drivers/staging/olpc_dcon/olpc_dcon.c
index 198595e..9db88d9 100644
--- a/drivers/staging/olpc_dcon/olpc_dcon.c
+++ b/drivers/staging/olpc_dcon/olpc_dcon.c
@@ -255,17 +255,19 @@ static bool dcon_blank_fb(struct dcon_priv *dcon, bool 
blank)
 {
int err;
 
+   console_lock();
if (!lock_fb_info(dcon-fbinfo)) {
+   console_unlock();
dev_err(dcon-client-dev, unable to lock framebuffer\n);
return false;
}
-   console_lock();
+
dcon-ignore_fb_events = true;
err = fb_blank(dcon-fbinfo,
blank ? FB_BLANK_POWERDOWN : FB_BLANK_UNBLANK);
dcon-ignore_fb_events = false;
-   console_unlock();
unlock_fb_info(dcon-fbinfo);
+   console_unlock();
 
if (err) {
dev_err(dcon-client-dev, couldn't %sblank framebuffer\n,
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RESEND PATCH 1/2] fb: reorder the lock sequence to fix potential dead lock

2013-11-05 Thread Gu Zheng

Following commits:
50e244cc79 fb: rework locking to fix lock ordering on takeover
e93a9a8687 fb: Yet another band-aid for fixing lockdep mess
054430e773 fbcon: fix locking harder
reworked locking to fix related lock ordering on takeover, and introduced 
console_lock
into fbmem, but it seems that the new lock sequence(fb_info-lock --- 
console_lock)
is against with the one in console_callback(console_lock --- fb_info-lock), 
and leads to
a potential dead lock as following:
[  601.079000] ==
[  601.079000] [ INFO: possible circular locking dependency detected ]
[  601.079000] 3.11.0 #189 Not tainted
[  601.079000] ---
[  601.079000] kworker/0:3/619 is trying to acquire lock:
[  601.079000]  (fb_info-lock){+.+.+.}, at: [81397566] 
lock_fb_info+0x26/0x60
[  601.079000]
but task is already holding lock:
[  601.079000]  (console_lock){+.+.+.}, at: [8141aae3] 
console_callback+0x13/0x160
[  601.079000]
which lock already depends on the new lock.

[  601.079000]
the existing dependency chain (in reverse order) is:
[  601.079000]
- #1 (console_lock){+.+.+.}:
[  601.079000][810dc971] lock_acquire+0xa1/0x140
[  601.079000][810c6267] console_lock+0x77/0x80
[  601.079000][81399448] register_framebuffer+0x1d8/0x320
[  601.079000][81cfb4c8] efifb_probe+0x408/0x48f
[  601.079000][8144a963] platform_drv_probe+0x43/0x80
[  601.079000][8144853b] driver_probe_device+0x8b/0x390
[  601.079000][814488eb] __driver_attach+0xab/0xb0
[  601.079000][814463bd] bus_for_each_dev+0x5d/0xa0
[  601.079000][81447e6e] driver_attach+0x1e/0x20
[  601.079000][81447a07] bus_add_driver+0x117/0x290
[  601.079000][81448fea] driver_register+0x7a/0x170
[  601.079000][8144a10a] __platform_driver_register+0x4a/0x50
[  601.079000][8144a12d] platform_driver_probe+0x1d/0xb0
[  601.079000][81cfb0a1] efifb_init+0x273/0x292
[  601.079000][81002132] do_one_initcall+0x102/0x1c0
[  601.079000][81cb80a6] kernel_init_freeable+0x15d/0x1ef
[  601.079000][8166d2de] kernel_init+0xe/0xf0
[  601.079000][816914ec] ret_from_fork+0x7c/0xb0
[  601.079000]
- #0 (fb_info-lock){+.+.+.}:
[  601.079000][810dc1d8] __lock_acquire+0x1e18/0x1f10
[  601.079000][810dc971] lock_acquire+0xa1/0x140
[  601.079000][816835ca] mutex_lock_nested+0x7a/0x3b0
[  601.079000][81397566] lock_fb_info+0x26/0x60
[  601.079000][813a4aeb] fbcon_blank+0x29b/0x2e0
[  601.079000][81418658] do_blank_screen+0x1d8/0x280
[  601.079000][8141ab34] console_callback+0x64/0x160
[  601.079000][8108d855] process_one_work+0x1f5/0x540
[  601.079000][8108e04c] worker_thread+0x11c/0x370
[  601.079000][81095fbd] kthread+0xed/0x100
[  601.079000][816914ec] ret_from_fork+0x7c/0xb0
[  601.079000]
other info that might help us debug this:

[  601.079000]  Possible unsafe locking scenario:

[  601.079000]CPU0CPU1
[  601.079000]
[  601.079000]   lock(console_lock);
[  601.079000]lock(fb_info-lock);
[  601.079000]lock(console_lock);
[  601.079000]   lock(fb_info-lock);
[  601.079000]
 *** DEADLOCK ***

so we reorder the lock sequence the same as it in console_callback() to
avoid this issue. And following Tomi's suggestion, fix these similar
issues all in fb subsystem.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/video/fbmem.c|   50 -
 drivers/video/fbsysfs.c  |   19 ++
 drivers/video/sh_mobile_lcdcfb.c |   10 ---
 3 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index dacaf74..010d191 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -1108,14 +1108,16 @@ static long do_fb_ioctl(struct fb_info *info, unsigned 
int cmd,
case FBIOPUT_VSCREENINFO:
if (copy_from_user(var, argp, sizeof(var)))
return -EFAULT;
-   if (!lock_fb_info(info))
-   return -ENODEV;
console_lock();
+   if (!lock_fb_info(info)) {
+   console_unlock();
+   return -ENODEV;
+   }
info-flags |= FBINFO_MISC_USEREVENT;
ret = fb_set_var(info, var);
info-flags = ~FBINFO_MISC_USEREVENT;
-   console_unlock();
unlock_fb_info(info);
+   console_unlock

[RESEND PATCH] fs/buffer.c: exit if already confirmed page has dirty and writeback buffers

2013-11-05 Thread Gu Zheng

Stop the loop of iterating bh if we have confirmed page
has dirty and writeback buffers.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/buffer.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 6024877..519cc5c 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -112,7 +112,7 @@ void buffer_check_dirty_writeback(struct page *page,
*dirty = true;
 
bh = bh-b_this_page;
-   } while (bh != head);
+   } while ((bh != head)  !(*writeback  *dirty));
 }
 EXPORT_SYMBOL(buffer_check_dirty_writeback);
 
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH 2/2] staging/olpc_docn: reorder the lock sequence to avoid potential dead lock

2013-11-05 Thread Gu Zheng

Hi Dan,
On 11/05/2013 07:02 PM, Dan Carpenter wrote:

 On Tue, Nov 05, 2013 at 06:01:00PM +0800, Gu Zheng wrote:
 The lock sequence of dcon_blank_fb(fb_info-lock --- console_lock) is 
 against
 with the one of console_callback(console_lock --- fb_info-lock), it'll
 lead to a potential dead lock, so reorder the lock sequence of dcon_blank_fb
 to avoid the potential dead lock.

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 
 Relax, Greg isn't taking new patches for another three weeks because the
 merge window is open.

Got it, I just want to gain some comments about this patch.

 
 Also what happened to [PATCH 1/2]?

It fixes the similar issue of fb subsystem.
https://patchwork.kernel.org/patch/3140121/

Regards,
Gu

 
 regards,
 dan carpenter
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: avoid to use a NULL point in destroy_segment_manager

2013-11-05 Thread Gu Zheng

On 11/06/2013 09:12 AM, Chao Yu wrote:

 A NULL point should avoid to be used in destroy_segment_manager after 
 allocating memory fail for f2fs_sm_info.

Though without this patch it still can work well, because if it failed
to allocate f2fs_sm_info, the sit_info, free_info... all were NULL, and
the destory path(e.g. destroy_dirty_segmap) can deal with them well.
IMO, this patch is still a good catch. 

Regards,
Gu

 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.c |2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 3d4d5fc..ff363e6
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1744,6 +1744,8 @@ static void destroy_sit_info(struct f2fs_sb_info *sbi)
  void destroy_segment_manager(struct f2fs_sb_info *sbi)
  {
   struct f2fs_sm_info *sm_info = SM_I(sbi);
 + if (!sm_info)
 + return;
   destroy_dirty_segmap(sbi);
   destroy_curseg(sbi);
   destroy_free_segmap(sbi);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: avoid to use a NULL point in destroy_segment_manager

2013-11-05 Thread Gu Zheng

On 11/06/2013 01:10 PM, Chao Yu wrote:

 Hi Gu,

 -Original Message-
 From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
 Sent: Wednesday, November 06, 2013 11:41 AM
 To: Chao Yu
 Cc: ???; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 linux-f2fs-de...@lists.sourceforge.net; 谭姝
 Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid to use a NULL point in 
 destroy_segment_manager

 On 11/06/2013 09:12 AM, Chao Yu wrote:

 A NULL point should avoid to be used in destroy_segment_manager after 
 allocating memory fail for f2fs_sm_info.

 Though without this patch it still can work well, because if it failed
 to allocate f2fs_sm_info, the sit_info, free_info... all were NULL, and
 the destory path(e.g. destroy_dirty_segmap) can deal with them well.

 I think it could not work well. Without this patch we may got a segment 
 fault in DIRTY_I(sbi) at the following code if it failed to allocate 
 f2fs_sm_info memory(sbi-sm_info). Right?

Yes, you're right. SIT_I generates sit_info from f2fs_sm_info.
Sorry for my mistake.:(

Regards,
Gu

 static void destroy_dirty_segmap(struct f2fs_sb_info *sbi)
 {
   struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);

 IMO, this patch is still a good catch.

 Regards,
 Gu

 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.c |2 ++
  1 file changed, 2 insertions(+)

 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 3d4d5fc..ff363e6
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1744,6 +1744,8 @@ static void destroy_sit_info(struct f2fs_sb_info *sbi)
  void destroy_segment_manager(struct f2fs_sb_info *sbi)
  {
 struct f2fs_sm_info *sm_info = SM_I(sbi);
 +   if (!sm_info)
 +   return;
 destroy_dirty_segmap(sbi);
 destroy_curseg(sbi);
 destroy_free_segmap(sbi);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH] fs/buffer.c: exit if already confirmed page has dirty and writeback buffers

2013-11-08 Thread Gu Zheng

Hi Jan,

On 11/07/2013 07:44 PM, Jan Kara wrote:

 On Tue 05-11-13 18:02:03, Gu Zheng wrote:
 Stop the loop of iterating bh if we have confirmed page
 has dirty and writeback buffers.
   Thanks for the patch. What I'm somewhat missing here is a motivation of
 the patch. For the common case where blocksize == pagesize this is a noop
 (only adds some code). 

Yes, you're right.

 For the case where blocksize  pagesize we can
 possibly save checking some buffers but how common is that going be?

It's really hard to say.:( But many file systems support small blocksize.

 Does that minimal speed up outweight the cost of additional check / code
 complication?

In fact, without complete test. But I think the speed up can outweigh the cost
if blocksize small enough. For example, blocksize: 1k, pagesize: 4k, we can
reduce 6 bh check(3 dirty, 3 writeback) in the best case.

Best regards,
Gu

 
   Honza
 

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/buffer.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/fs/buffer.c b/fs/buffer.c
 index 6024877..519cc5c 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -112,7 +112,7 @@ void buffer_check_dirty_writeback(struct page *page,
  *dirty = true;
  
  bh = bh-b_this_page;
 -} while (bh != head);
 +} while ((bh != head)  !(*writeback  *dirty));
  }
  EXPORT_SYMBOL(buffer_check_dirty_writeback);
  
 -- 
 1.7.7

 --
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in bio_alloc

2013-09-15 Thread Gu Zheng

Hi Chao,

On 09/13/2013 09:27 PM, Chao Yu wrote:

 This patch add macro MAX_BIO_BLOCKS to limit value of npages in
 f2fs_bio_alloc,
 it can avoid allocating failure in bio_alloc caused by npages is larger than
 UIO_MAXIOV.

As I know bio_alloc is based of *fs_bio_set* pool, without the limitation of 
UIO_MAXIOV,
am I missing something?

Thanks,
Gu

 
 Signed-off-by: Yu Chao chao2...@samsung.com
  ---
  fs/f2fs/segment.c |4 +++-
  fs/f2fs/segment.h |3 +++
  2 files changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 09af9c7..bd79bbe 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi,
 struct page *page,
 block_t blk_addr, enum page_type type)
  {
 struct block_device *bdev = sbi-sb-s_bdev;
 +   int bio_blocks;
  
 verify_block_addr(sbi, blk_addr);
  
 @@ -676,7 +677,8 @@ retry:
 goto retry;
 }
  
 -   sbi-bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
 +   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
 +   sbi-bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
 sbi-bio[type]-bi_sector = SECTOR_FROM_BLOCK(sbi,
 blk_addr);
 sbi-bio[type]-bi_private = priv;
 /*
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
 index bdd10ea..6352af1 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -9,6 +9,7 @@
   * published by the Free Software Foundation.
   */
  #include linux/blkdev.h
 +#include linux/uio.h
  
  /* constant macro */
  #define NULL_SEGNO ((unsigned int)(~0))
 @@ -90,6 +91,8 @@
 (blk_addr  ((sbi)-log_blocksize - F2FS_LOG_SECTOR_SIZE))
  #define SECTOR_TO_BLOCK(sbi, sectors)  \
 (sectors  ((sbi)-log_blocksize - F2FS_LOG_SECTOR_SIZE))
 +#define MAX_BIO_BLOCKS(max_hw_blocks)  \
 +   (min((int)max_hw_blocks, UIO_MAXIOV))
  
  /* during checkpoint, bio_private is used to synchronize the last bio */
  struct bio_private {
 ---
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in bio_alloc

2013-09-15 Thread Gu Zheng

Hi Chao,

On 09/16/2013 11:26 AM, Chao Yu wrote:

 Hi Gu
 
 -Original Message-
 From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
 Sent: Monday, September 16, 2013 10:09 AM
 To: Chao Yu
 Cc: Kim Jaegeuk; linux-f2fs-de...@lists.sourceforge.net;
 linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 谭姝
 Subject: Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in
 bio_alloc

 Hi Chao,

 On 09/13/2013 09:27 PM, Chao Yu wrote:

 This patch add macro MAX_BIO_BLOCKS to limit value of npages in
 f2fs_bio_alloc, it can avoid allocating failure in bio_alloc caused by
 npages is larger than UIO_MAXIOV.

 As I know bio_alloc is based of *fs_bio_set* pool, without the limitation
 of
 UIO_MAXIOV, am I missing something?
 
 Here is the code in bio.c, fs_bio_set is as the actual parameter pass to bs
 without being inited.

fs_bio_set was initiated early in the bio subsystem init.

 So it may have opportunity to return NULL in this function.

It may be, but may not be the thread you mentioned below.

 ---
 Bio.c 
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set
 *bs)
 {
 ..
   if (!bs) {
   if (nr_iovecs  UIO_MAXIOV)
   return NULL;
 ---
 I did the abnormal test: modify the max_sectors_kb in /sys/block/sdx/queue
 to 32767 for a disk with f2fs format,
 and I got a segfualt in f2fs_bio_alloc after the img mounted.
 Is there anyting I missed?

Hmm, this change will also trigger bvec_alloc failed, did you add some traces
to debug this?

Regards,
Gu

 

 Thanks,
 Gu


 Signed-off-by: Yu Chao chao2...@samsung.com
  ---
  fs/f2fs/segment.c |4 +++-
  fs/f2fs/segment.h |3 +++
  2 files changed, 6 insertions(+), 1 deletion(-)

 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
 09af9c7..bd79bbe 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info
 *sbi, struct page *page,
 block_t blk_addr, enum page_type
 type)
 {
 struct block_device *bdev = sbi-sb-s_bdev;
 +   int bio_blocks;

 verify_block_addr(sbi, blk_addr);

 @@ -676,7 +677,8 @@ retry:
 goto retry;
 }

 -   sbi-bio[type] = f2fs_bio_alloc(bdev,
 max_hw_blocks(sbi));
 +   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
 +   sbi-bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
 sbi-bio[type]-bi_sector = SECTOR_FROM_BLOCK(sbi,
 blk_addr);
 sbi-bio[type]-bi_private = priv;
 /*
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
 bdd10ea..6352af1 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -9,6 +9,7 @@
   * published by the Free Software Foundation.
   */
  #include linux/blkdev.h
 +#include linux/uio.h

  /* constant macro */
  #define NULL_SEGNO ((unsigned int)(~0))
 @@ -90,6 +91,8 @@
 (blk_addr  ((sbi)-log_blocksize - F2FS_LOG_SECTOR_SIZE))
  #define SECTOR_TO_BLOCK(sbi, sectors)
 \
 (sectors  ((sbi)-log_blocksize - F2FS_LOG_SECTOR_SIZE))
 +#define MAX_BIO_BLOCKS(max_hw_blocks)
 \
 +   (min((int)max_hw_blocks, UIO_MAXIOV))

  /* during checkpoint, bio_private is used to synchronize the last bio
 */  struct bio_private {
 ---

 --
 To unsubscribe from this list: send the line unsubscribe
 linux-kernel in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] fb: reorder the lock sequence to fix a potential lockdep

2013-10-20 Thread Gu Zheng

Following commits:
50e244cc79 fb: rework locking to fix lock ordering on takeover
e93a9a8687 fb: Yet another band-aid for fixing lockdep mess
054430e773 fbcon: fix locking harder
reworked locking to fix related lock ordering on takeover, and introduced 
console_lock
into fbmem, but it seems that the new lock sequence(fb_info-lock --- 
console_lock)
is against with the one in console_callback(console_lock --- fb_info-lock), 
and leads to
a potential deadlock as following:
[  601.079000] ==
[  601.079000] [ INFO: possible circular locking dependency detected ]
[  601.079000] 3.11.0 #189 Not tainted
[  601.079000] ---
[  601.079000] kworker/0:3/619 is trying to acquire lock:
[  601.079000]  (fb_info-lock){+.+.+.}, at: [81397566] 
lock_fb_info+0x26/0x60
[  601.079000]
but task is already holding lock:
[  601.079000]  (console_lock){+.+.+.}, at: [8141aae3] 
console_callback+0x13/0x160
[  601.079000]
which lock already depends on the new lock.

[  601.079000]
the existing dependency chain (in reverse order) is:
[  601.079000]
- #1 (console_lock){+.+.+.}:
[  601.079000][810dc971] lock_acquire+0xa1/0x140
[  601.079000][810c6267] console_lock+0x77/0x80
[  601.079000][81399448] register_framebuffer+0x1d8/0x320
[  601.079000][81cfb4c8] efifb_probe+0x408/0x48f
[  601.079000][8144a963] platform_drv_probe+0x43/0x80
[  601.079000][8144853b] driver_probe_device+0x8b/0x390
[  601.079000][814488eb] __driver_attach+0xab/0xb0
[  601.079000][814463bd] bus_for_each_dev+0x5d/0xa0
[  601.079000][81447e6e] driver_attach+0x1e/0x20
[  601.079000][81447a07] bus_add_driver+0x117/0x290
[  601.079000][81448fea] driver_register+0x7a/0x170
[  601.079000][8144a10a] __platform_driver_register+0x4a/0x50
[  601.079000][8144a12d] platform_driver_probe+0x1d/0xb0
[  601.079000][81cfb0a1] efifb_init+0x273/0x292
[  601.079000][81002132] do_one_initcall+0x102/0x1c0
[  601.079000][81cb80a6] kernel_init_freeable+0x15d/0x1ef
[  601.079000][8166d2de] kernel_init+0xe/0xf0
[  601.079000][816914ec] ret_from_fork+0x7c/0xb0
[  601.079000]
- #0 (fb_info-lock){+.+.+.}:
[  601.079000][810dc1d8] __lock_acquire+0x1e18/0x1f10
[  601.079000][810dc971] lock_acquire+0xa1/0x140
[  601.079000][816835ca] mutex_lock_nested+0x7a/0x3b0
[  601.079000][81397566] lock_fb_info+0x26/0x60
[  601.079000][813a4aeb] fbcon_blank+0x29b/0x2e0
[  601.079000][81418658] do_blank_screen+0x1d8/0x280
[  601.079000][8141ab34] console_callback+0x64/0x160
[  601.079000][8108d855] process_one_work+0x1f5/0x540
[  601.079000][8108e04c] worker_thread+0x11c/0x370
[  601.079000][81095fbd] kthread+0xed/0x100
[  601.079000][816914ec] ret_from_fork+0x7c/0xb0
[  601.079000]
other info that might help us debug this:

[  601.079000]  Possible unsafe locking scenario:

[  601.079000]CPU0CPU1
[  601.079000]
[  601.079000]   lock(console_lock);
[  601.079000]lock(fb_info-lock);
[  601.079000]lock(console_lock);
[  601.079000]   lock(fb_info-lock);
[  601.079000]
 *** DEADLOCK ***

so we reorder the lock sequence the same as it in console_callback() to
avoid this issue.
Not very sure this change is suitable, any comments is welcome.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 drivers/video/fbmem.c |   50 +++-
 1 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index dacaf74..010d191 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -1108,14 +1108,16 @@ static long do_fb_ioctl(struct fb_info *info, unsigned 
int cmd,
case FBIOPUT_VSCREENINFO:
if (copy_from_user(var, argp, sizeof(var)))
return -EFAULT;
-   if (!lock_fb_info(info))
-   return -ENODEV;
console_lock();
+   if (!lock_fb_info(info)) {
+   console_unlock();
+   return -ENODEV;
+   }
info-flags |= FBINFO_MISC_USEREVENT;
ret = fb_set_var(info, var);
info-flags = ~FBINFO_MISC_USEREVENT;
-   console_unlock();
unlock_fb_info(info);
+   console_unlock();
if (!ret  copy_to_user(argp, var, sizeof(var)))
ret = -EFAULT;
break

[PATCH] f2fs: introduce f2fs_kmem_cache_alloc to hide the unfailed kmem cache allocation

2013-10-21 Thread Gu Zheng

Introduce the unfailed version of kmem_cache_alloc named f2fs_kmem_cache_alloc
to hide the retry routine and make the code a bit cleaner.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |   26 +++---
 fs/f2fs/f2fs.h   |   13 +
 fs/f2fs/gc.c |8 ++--
 fs/f2fs/node.c   |6 +-
 4 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 8d16071..6fb484c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -226,12 +226,8 @@ void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
break;
orphan = NULL;
}
-retry:
-   new = kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
-   if (!new) {
-   cond_resched();
-   goto retry;
-   }
+
+   new = f2fs_kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
new-ino = ino;
 
/* add new_oentry into list which is sorted by inode number */
@@ -484,12 +480,8 @@ void set_dirty_dir_page(struct inode *inode, struct page 
*page)
 
if (!S_ISDIR(inode-i_mode))
return;
-retry:
-   new = kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
-   if (!new) {
-   cond_resched();
-   goto retry;
-   }
+
+   new = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
new-inode = inode;
INIT_LIST_HEAD(new-list);
 
@@ -506,13 +498,9 @@ retry:
 void add_dirty_dir_inode(struct inode *inode)
 {
struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb);
-   struct dir_inode_entry *new;
-retry:
-   new = kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
-   if (!new) {
-   cond_resched();
-   goto retry;
-   }
+   struct dir_inode_entry *new =
+   f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
+
new-inode = inode;
INIT_LIST_HEAD(new-list);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 171c52f..fa9ad03 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -787,6 +787,19 @@ static inline struct kmem_cache 
*f2fs_kmem_cache_create(const char *name,
return kmem_cache_create(name, size, 0, SLAB_RECLAIM_ACCOUNT, ctor);
 }
 
+static inline void *f2fs_kmem_cache_alloc(struct kmem_cache *cachep,
+   gfp_t flags)
+{
+   void *entry = kmem_cache_alloc(cachep, flags);
+retry:
+   if (!entry) {
+   cond_resched();
+   goto retry;
+   }
+
+   return entry;
+}
+
 #define RAW_IS_INODE(p)((p)-footer.nid == (p)-footer.ino)
 
 static inline bool IS_INODE(struct page *page)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index fbad968..7914b92 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -361,12 +361,8 @@ static void add_gc_inode(struct inode *inode, struct 
list_head *ilist)
iput(inode);
return;
}
-repeat:
-   new_ie = kmem_cache_alloc(winode_slab, GFP_NOFS);
-   if (!new_ie) {
-   cond_resched();
-   goto repeat;
-   }
+
+   new_ie = f2fs_kmem_cache_alloc(winode_slab, GFP_NOFS);
new_ie-inode = inode;
list_add_tail(new_ie-list, ilist);
 }
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index ef80f79..fe3cf8e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1308,11 +1308,7 @@ static int add_free_nid(struct f2fs_nm_info *nm_i, nid_t 
nid, bool build)
if (allocated)
return 0;
 retry:
-   i = kmem_cache_alloc(free_nid_slab, GFP_NOFS);
-   if (!i) {
-   cond_resched();
-   goto retry;
-   }
+   i = f2fs_kmem_cache_alloc(free_nid_slab, GFP_NOFS);
i-nid = nid;
i-state = NID_NEW;
 
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs: delete and free dirty dir freeing inode entry when sync dirty dir inodes

2013-10-21 Thread Gu Zheng

In sync_dirty_dir_inodes(), remove_dirty_dir_inode() will be called
in the callback of filemap_flush to delete and free dirty dir inode entry.
But for the freeing inode entry, missed this step after sbumit data bio,
and this may lead to a dead loop if these is freeing inode entry in
dir_inode_list. So add the delete and free step to fix it.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 8d16071..f61838f 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -600,7 +600,16 @@ retry:
 * wribacking dentry pages in the freeing inode.
 */
f2fs_submit_bio(sbi, DATA, true);
+
+   spin_lock(sbi-dir_inode_lock);
+   list_del(entry-list);
+#ifdef CONFIG_F2FS_STAT_FS
+   sbi-n_dirty_dirs--;
+#endif
+   spin_unlock(sbi-dir_inode_lock);
+   kmem_cache_free(inode_entry_slab, entry);
}
+
goto retry;
 }
 
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH] f2fs: fix a potential out of range issue

2013-11-26 Thread Gu Zheng

Fix a potential out of range issue introduced by commit:
22fb72225a
f2fs: simplify write_orphan_inodes for better readable



Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7fe69ff..3e62987 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -323,9 +323,9 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
memset(orphan_blk, 0, sizeof(*orphan_blk));
}
 
-   orphan_blk-ino[nentries] = cpu_to_le32(orphan-ino);
+   orphan_blk-ino[nentries++] = cpu_to_le32(orphan-ino);
 
-   if (nentries++ == F2FS_ORPHANS_PER_BLOCK) {
+   if (nentries == F2FS_ORPHANS_PER_BLOCK) {
/*
 * an orphan block is full of 1020 entries,
 * then we need to flush current orphan blocks
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] f2fs: remove the own bi_private allocation

2013-12-01 Thread Gu Zheng

On 11/30/2013 09:48 AM, Jaegeuk Kim wrote:

 Previously f2fs allocates its own bi_private data structure all the time even
 though we don't use it. But, can we remove this bi_private allocation?
 
 This patch removes such the additional bi_private allocation.
 
 1. Retrieve f2fs_sb_info from its page-mapping-host-i_sb.
  - This removes the usecases of bi_private in end_io.
 
 2. Use bi_private only when we really need it.
  - The bi_private is used only when the checkpoint procedure is conducted.
  - When conducting the checkpoint, f2fs submits a META_FLUSH bio to wait its 
 bio
 completion.
  - Since we have no dependancies to remove bi_private now, let's just use
  bi_private pointer as the completion pointer.

Cool, looks good to me.:)

 
 Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com

 Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com

 ---
  fs/f2fs/segment.c | 43 ---
  fs/f2fs/segment.h |  7 ---
  2 files changed, 16 insertions(+), 34 deletions(-)
 
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 0387863..0db4027 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -791,7 +791,7 @@ static void f2fs_end_io_write(struct bio *bio, int err)
  {
   const int uptodate = test_bit(BIO_UPTODATE, bio-bi_flags);
   struct bio_vec *bvec = bio-bi_io_vec + bio-bi_vcnt - 1;
 - struct bio_private *p = bio-bi_private;
 + struct f2fs_sb_info *sbi = F2FS_SB(bvec-bv_page-mapping-host-i_sb);
  
   do {
   struct page *page = bvec-bv_page;
 @@ -802,21 +802,21 @@ static void f2fs_end_io_write(struct bio *bio, int err)
   SetPageError(page);
   if (page-mapping)
   set_bit(AS_EIO, page-mapping-flags);
 - set_ckpt_flags(p-sbi-ckpt, CP_ERROR_FLAG);
 - p-sbi-sb-s_flags |= MS_RDONLY;
 +
 + set_ckpt_flags(sbi-ckpt, CP_ERROR_FLAG);
 + sbi-sb-s_flags |= MS_RDONLY;
   }
   end_page_writeback(page);
 - dec_page_count(p-sbi, F2FS_WRITEBACK);
 + dec_page_count(sbi, F2FS_WRITEBACK);
   } while (bvec = bio-bi_io_vec);
  
 - if (p-is_sync)
 - complete(p-wait);
 + if (bio-bi_private)
 + complete(bio-bi_private);
  
 - if (!get_pages(p-sbi, F2FS_WRITEBACK) 
 - !list_empty(p-sbi-cp_wait.task_list))
 - wake_up(p-sbi-cp_wait);
 + if (!get_pages(sbi, F2FS_WRITEBACK) 
 + !list_empty(sbi-cp_wait.task_list))
 + wake_up(sbi-cp_wait);
  
 - kfree(p);
   bio_put(bio);
  }
  
 @@ -838,7 +838,6 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
   int rw = sync ? WRITE_SYNC : WRITE;
   enum page_type btype = PAGE_TYPE_OF_BIO(type);
   struct f2fs_bio_info *io = sbi-write_io[btype];
 - struct bio_private *p;
  
   if (!io-bio)
   return;
 @@ -851,18 +850,16 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
  
   trace_f2fs_submit_write_bio(sbi-sb, rw, btype, io-bio);
  
 - p = io-bio-bi_private;
 - p-sbi = sbi;
 - io-bio-bi_end_io = f2fs_end_io_write;
 -
 + /*
 +  * META_FLUSH is only from the checkpoint procedure, and we should wait
 +  * this metadata bio for FS consistency.
 +  */
   if (type == META_FLUSH) {
   DECLARE_COMPLETION_ONSTACK(wait);
 - p-is_sync = true;
 - p-wait = wait;
 + io-bio-bi_private = wait;
   submit_bio(rw, io-bio);
   wait_for_completion(wait);
   } else {
 - p-is_sync = false;
   submit_bio(rw, io-bio);
   }
   io-bio = NULL;
 @@ -897,18 +894,10 @@ static void submit_write_page(struct f2fs_sb_info *sbi, 
 struct page *page,
   do_submit_bio(sbi, type, false);
  alloc_new:
   if (io-bio == NULL) {
 - struct bio_private *priv;
 -retry:
 - priv = kmalloc(sizeof(struct bio_private), GFP_NOFS);
 - if (!priv) {
 - cond_resched();
 - goto retry;
 - }
 -
   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
   io-bio = f2fs_bio_alloc(bdev, bio_blocks);
   io-bio-bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
 - io-bio-bi_private = priv;
 + io-bio-bi_end_io = f2fs_end_io_write;
   /*
* The end_io will be assigned at the sumbission phase.
* Until then, let bio_add_page() merge consecutive IOs as much
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
 index 7fea2ee..26812fc 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -92,13 +92,6 @@
  #define MAX_BIO_BLOCKS(max_hw_blocks)
 \
   (min((int)max_hw_blocks, BIO_MAX_PAGES))
  
 -/* during checkpoint, bio_private is used to synchronize the last

Re: [PATCH] f2fs: refactor bio-related operations

2013-12-01 Thread Gu Zheng

On 11/30/2013 02:25 PM, Jaegeuk Kim wrote:

 This patch integrates redundant bio operations on read and write IOs.
 
 1. Move bio-related codes to the top of data.c.
 2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
bios additionally.
 3. Introduce __submit_merged_bio to submit the merged bio.
 4. Change f2fs_readpage to f2fs_submit_page_bio.
 5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
submit_write_page.
 
 Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com

 Reviewed-by: Gu Zheng guz.f...@cn.fujitsu.com

 ---
  fs/f2fs/checkpoint.c|  14 +-
  fs/f2fs/data.c  | 317 
 +---
  fs/f2fs/f2fs.h  |  13 +-
  fs/f2fs/gc.c|   2 +-
  fs/f2fs/node.c  |  14 +-
  fs/f2fs/recovery.c  |   4 +-
  fs/f2fs/segment.c   | 164 +++
  include/trace/events/f2fs.h |  30 ++---
  8 files changed, 259 insertions(+), 299 deletions(-)
 
 diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
 index 40eea42..38f4a224 100644
 --- a/fs/f2fs/checkpoint.c
 +++ b/fs/f2fs/checkpoint.c
 @@ -61,7 +61,8 @@ repeat:
   if (PageUptodate(page))
   goto out;
  
 - if (f2fs_readpage(sbi, page, index, READ_SYNC | REQ_META | REQ_PRIO))
 + if (f2fs_submit_page_bio(sbi, page, index,
 + READ_SYNC | REQ_META | REQ_PRIO))
   goto repeat;
  
   lock_page(page);
 @@ -157,7 +158,8 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum 
 page_type type,
   }
  
   if (nwritten)
 - f2fs_submit_bio(sbi, type, nr_to_write == LONG_MAX);
 + f2fs_submit_merged_bio(sbi, type, nr_to_write == LONG_MAX,
 + WRITE);
  
   return nwritten;
  }
 @@ -590,7 +592,7 @@ retry:
* We should submit bio, since it exists several
* wribacking dentry pages in the freeing inode.
*/
 - f2fs_submit_bio(sbi, DATA, true);
 + f2fs_submit_merged_bio(sbi, DATA, true, WRITE);
   }
   goto retry;
  }
 @@ -796,9 +798,9 @@ void write_checkpoint(struct f2fs_sb_info *sbi, bool 
 is_umount)
  
   trace_f2fs_write_checkpoint(sbi-sb, is_umount, finish block_ops);
  
 - f2fs_submit_bio(sbi, DATA, true);
 - f2fs_submit_bio(sbi, NODE, true);
 - f2fs_submit_bio(sbi, META, true);
 + f2fs_submit_merged_bio(sbi, DATA, true, WRITE);
 + f2fs_submit_merged_bio(sbi, NODE, true, WRITE);
 + f2fs_submit_merged_bio(sbi, META, true, WRITE);
  
   /*
* update checkpoint pack index
 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index c9a76f8..53e3bbb 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -25,6 +25,205 @@
  #include trace/events/f2fs.h
  
  /*
 + * Low-level block read/write IO operations.
 + */
 +static struct bio *__bio_alloc(struct block_device *bdev, int npages)
 +{
 + struct bio *bio;
 +
 + /* No failure on bio allocation */
 + bio = bio_alloc(GFP_NOIO, npages);
 + bio-bi_bdev = bdev;
 + bio-bi_private = NULL;
 + return bio;
 +}
 +
 +static void f2fs_read_end_io(struct bio *bio, int err)
 +{
 + const int uptodate = test_bit(BIO_UPTODATE, bio-bi_flags);
 + struct bio_vec *bvec = bio-bi_io_vec + bio-bi_vcnt - 1;
 +
 + do {
 + struct page *page = bvec-bv_page;
 +
 + if (--bvec = bio-bi_io_vec)
 + prefetchw(bvec-bv_page-flags);
 +
 + if (uptodate) {
 + SetPageUptodate(page);
 + } else {
 + ClearPageUptodate(page);
 + SetPageError(page);
 + }
 + unlock_page(page);
 + } while (bvec = bio-bi_io_vec);
 +
 + bio_put(bio);
 +}
 +
 +static void f2fs_write_end_io(struct bio *bio, int err)
 +{
 + const int uptodate = test_bit(BIO_UPTODATE, bio-bi_flags);
 + struct bio_vec *bvec = bio-bi_io_vec + bio-bi_vcnt - 1;
 + struct f2fs_sb_info *sbi = F2FS_SB(bvec-bv_page-mapping-host-i_sb);
 +
 + do {
 + struct page *page = bvec-bv_page;
 +
 + if (--bvec = bio-bi_io_vec)
 + prefetchw(bvec-bv_page-flags);
 +
 + if (!uptodate) {
 + SetPageError(page);
 + set_bit(AS_EIO, page-mapping-flags);
 + set_ckpt_flags(sbi-ckpt, CP_ERROR_FLAG);
 + sbi-sb-s_flags |= MS_RDONLY;
 + }
 + end_page_writeback(page);
 + dec_page_count(sbi, F2FS_WRITEBACK);
 + } while (bvec = bio-bi_io_vec);
 +
 + if (bio-bi_private)
 + complete(bio-bi_private);
 +
 + if (!get_pages(sbi, F2FS_WRITEBACK) 
 + !list_empty(sbi-cp_wait.task_list))
 + wake_up(sbi-cp_wait);
 +
 + bio_put(bio);
 +}
 +
 +static void __submit_merged_bio(struct

Re: GPF in aio_migratepage

2013-12-02 Thread Gu Zheng

Hi Kristian, Dave,

Could you please help to check whether the following patch can fix this issue?


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/aio.c |   28 ++--
 1 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 08159ed..fc1fd0a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -223,33 +223,25 @@ static int __init aio_setup(void)
 }
 __initcall(aio_setup);
 
-static void put_aio_ring_file(struct kioctx *ctx)
-{
-   struct file *aio_ring_file = ctx-aio_ring_file;
-   if (aio_ring_file) {
-   truncate_setsize(aio_ring_file-f_inode, 0);
-
-   /* Prevent further access to the kioctx from migratepages */
-   spin_lock(aio_ring_file-f_inode-i_mapping-private_lock);
-   aio_ring_file-f_inode-i_mapping-private_data = NULL;
-   ctx-aio_ring_file = NULL;
-   spin_unlock(aio_ring_file-f_inode-i_mapping-private_lock);
-
-   fput(aio_ring_file);
-   }
-}
-
 static void aio_free_ring(struct kioctx *ctx)
 {
+   struct file *aio_ring_file = ctx-aio_ring_file;
int i;
 
+   BUG_ON(!aio_ring_file);
+
+   spin_lock(aio_ring_file-f_inode-i_mapping-private_lock);
for (i = 0; i  ctx-nr_pages; i++) {
pr_debug(pid(%d) [%d] page-count=%d\n, current-pid, i,
page_count(ctx-ring_pages[i]));
put_page(ctx-ring_pages[i]);
}
-
-   put_aio_ring_file(ctx);
+   truncate_setsize(aio_ring_file-f_inode, 0);
+   /* Prevent further access to the kioctx from migratepages */
+   aio_ring_file-f_inode-i_mapping-private_data = NULL;
+   ctx-aio_ring_file = NULL;
+   spin_unlock(aio_ring_file-f_inode-i_mapping-private_lock);
+   fput(aio_ring_file);
 
if (ctx-ring_pages  ctx-ring_pages != ctx-internal_pages) {
kfree(ctx-ring_pages);
-- 
1.7.7



On 11/30/2013 11:28 PM, Kristian Nielsen wrote:

 Benjamin LaHaise b...@kvack.org writes:
 
 For Dave: what line is this bug on?  Is it the dereference of ctx when 
 doing spin_lock_irqsave(ctx-completion_lock, flags); or is the 
 ctx-ring_pages[idx] = new; ?  From the 64 bit splat, I'm thinking the 
 former, which is quite strange given that the clearing of 
 mapping-private_data is protected by mapping-private_lock.  If it's 
 the latter, we might well need to check if ctx-ring_pages is NULL during 
 setup. 
 
 I think I got the same BUG (at least it looks very similar, full details
 below).
 
 The bug is on this line:
 
 ctx-ring_pages[idx] = new;
 
 Disassembly:
 
 af7:   48 89 2c d1mov%rbp,(%rcx,%rdx,8)
 
 ctx-ring_pages is 0x (this is x86_64). idx is 13.
 
   RCX:   RDX: 000d
   BUG: unable to handle kernel NULL pointer dereference at 0067
 
 So we are de-referencing a pointer that is (page **)-1, causing the crash.
 
 If you look closer at the 32-bit dump that Dave gave, you can see that it is
 similar:
 
  7a2:   89 34 82mov%esi,(%edx,%eax,4)
 
   RAX: 6b6b6b6b6b6b6b6b  RDX: 
 
 Though in this case ctx-ring_pages seems to be NULL and idx=old-index seems
 to be 6b6b6b6b6b6b6b6b, so not completely the same (or maybe I read his dump
 incorrectly).
 
 This is 3.13-rc1. Unfortunately, I do not have a way to reproduce (so far I
 only saw it this once). But I can see if it turns up again, or should I
 install -rc2 and see if it goes away?
 
 I was not doing anything special at the time, normal desktop load (I was using
 the evince pdf viewer).
 
 Let me know if there is anything else I can do to help track this down?
 
  - Kristian.
 
 Full details:
 
 I put my .config here:
 
 http://knielsen-hq.org/config-3.13-rc1-gpf-in-aio-migratepage.txt
 
 BUG output:
 
 BUG: unable to handle kernel NULL pointer dereference at 0067
 IP: [8113d73f] aio_migratepage+0xb3/0xe4
 PGD 0 
 Oops: 0002 [#1] SMP 
 Modules linked in: tun parport_pc ppdev lp parport bnep rfcomm bluetooth 
 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative 
 binfmt_misc uinput fuse nfsd auth_rpcgss oid_registry nfs_acl nfs lockd 
 fscache sunrpc ext3 jbd loop snd_hda_codec_hdmi hid_generic usbhid hid joydev 
 ums_realtek usb_storage snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support 
 arc4 brcmsmac cordic brcmutil b43 mac80211 cfg80211 ssb mmc_core rfkill 
 rng_core pcmcia pcmcia_core nouveau mxm_wmi wmi x86_pkg_temp_thermal coretemp 
 snd_hda_intel kvm_intel snd_hda_codec snd_hwdep snd_pcm_oss kvm snd_mixer_oss 
 snd_seq_midi snd_seq_midi_event snd_pcm crc32c_intel snd_rawmidi 
 snd_page_alloc snd_seq ghash_clmulni_intel snd_timer snd_seq_device lpc_ich 
 aesni_intel mfd_core ttm battery aes_x86_64 ablk_helper drm_kms_helper cryptd 
 lrw gf128mul drm glue_helper psmouse snd pcspkr serio_raw i2c_i801 evdev 
 ehci_pci soundcore ehci_hcd bcma ac acpi_cpufreq video button processor

[PATCH] f2fs: avoid wait if IO end up when do_checkpoint for better performance

2013-10-14 Thread Gu Zheng

Previously, do_checkpoint() will call congestion_wait() for waiting the pages
(previous submitted node/meta/data pages) to be written back.
Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for 
waiting, and
no additional wake up mechanism was introduced if IO ends up before regular 
period costed.
Yuan Zhong found there is a situation that after the pages have been written 
back, 
but the checkpoint thread still wait for congestion_wait to exit.

So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait 
for IO completes
if there's IO going on, and in the end IO path, wake up checkpoint task when IO 
ends up.

Thanks to Yuan Zhong's pre work about this problem.


Reported-by: Yuan Zhong yuan.mark.zh...@samsung.com
Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |   11 +--
 fs/f2fs/f2fs.h   |1 +
 fs/f2fs/segment.c|4 
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d808827..2a5999d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
f2fs_put_page(cp_page, 1);
 
/* wait for previous submitted node/meta pages writeback */
-   while (get_pages(sbi, F2FS_WRITEBACK))
-   congestion_wait(BLK_RW_ASYNC, HZ / 50);
+   sbi-cp_task = current;
+   while (get_pages(sbi, F2FS_WRITEBACK)) {
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   if (!get_pages(sbi, F2FS_WRITEBACK))
+   break;
+   io_schedule();
+   }
+   __set_current_state(TASK_RUNNING);
+   sbi-cp_task = NULL;
 
filemap_fdatawait_range(sbi-node_inode-i_mapping, 0, LONG_MAX);
filemap_fdatawait_range(sbi-meta_inode-i_mapping, 0, LONG_MAX);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 308967b..171c52f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -362,6 +362,7 @@ struct f2fs_sb_info {
struct mutex writepages;/* mutex for writepages() */
int por_doing;  /* recovery is doing or not */
int on_build_free_nids; /* build_free_nids is doing */
+   struct task_struct *cp_task;/* checkpoint task */
 
/* for orphan inode management */
struct list_head orphan_inode_list; /* orphan inode list */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index bd79bbe..3b20359 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 
if (p-is_sync)
complete(p-wait);
+
+   if (!get_pages(p-sbi, F2FS_WRITEBACK)  p-sbi-cp_task)
+   wake_up_process(p-sbi-cp_task);
+
kfree(p);
bio_put(bio);
 }
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] f2fs: introduce function read_raw_super_block()

2013-10-14 Thread Gu Zheng

Introduce function read_raw_super_block() to hide reading raw super block and
the retry routine if the first sb is invalid.

Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/super.c |   54 +-
 1 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 3b786c8..5e913de 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -746,30 +746,46 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
atomic_set(sbi-nr_pages[i], 0);
 }
 
-static int validate_superblock(struct super_block *sb,
-   struct f2fs_super_block **raw_super,
-   struct buffer_head **raw_super_buf, sector_t block)
+/* Read f2fs raw super block.
+ * Because we have two copies of super block, so read the first one at first,
+ * if the first one is invalid, move to read the second one.
+ */
+static int read_raw_super_block(struct super_block *sb,
+   struct f2fs_super_block **raw_super,
+   struct buffer_head **raw_super_buf)
 {
-   const char *super = (block == 0 ? first : second);
+   int block = 0;
 
-   /* read f2fs raw super block */
+retry:
*raw_super_buf = sb_bread(sb, block);
if (!*raw_super_buf) {
-   f2fs_msg(sb, KERN_ERR, unable to read %s superblock,
-   super);
-   return -EIO;
+   f2fs_msg(sb, KERN_ERR, Unable to read %dth superblock,
+   block + 1);
+   if (block == 0) {
+   block++;
+   goto retry;
+   } else {
+   return -EIO;
+   }
}
 
*raw_super = (struct f2fs_super_block *)
((char *)(*raw_super_buf)-b_data + F2FS_SUPER_OFFSET);
 
/* sanity checking of raw super */
-   if (!sanity_check_raw_super(sb, *raw_super))
-   return 0;
+   if (sanity_check_raw_super(sb, *raw_super)) {
+   brelse(*raw_super_buf);
+   f2fs_msg(sb, KERN_ERR, Can't find a valid F2FS filesystem 
+   in %dth superblock, block + 1);
+   if(block == 0) {
+   block++;
+   goto retry;
+   } else {
+   return -EINVAL;
+   }
+   }
 
-   f2fs_msg(sb, KERN_ERR, Can't find a valid F2FS filesystem 
-   in %s superblock, super);
-   return -EINVAL;
+   return 0;
 }
 
 static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
@@ -791,14 +807,10 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
goto free_sbi;
}
 
-   err = validate_superblock(sb, raw_super, raw_super_buf, 0);
-   if (err) {
-   brelse(raw_super_buf);
-   /* check secondary superblock when primary failed */
-   err = validate_superblock(sb, raw_super, raw_super_buf, 1);
-   if (err)
-   goto free_sb_buf;
-   }
+   err = read_raw_super_block(sb, raw_super, raw_super_buf);
+   if (err)
+   goto free_sbi;
+
sb-s_fs_info = sbi;
/* init some FS parameters */
sbi-active_logs = NR_CURSEG_TYPE;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH 1/2] fb: reorder the lock sequence to fix potential dead lock

2013-11-11 Thread Gu Zheng

Hi Tomi,
On 11/11/2013 09:59 PM, Tomi Valkeinen wrote:

 On 2013-11-05 12:00, Gu Zheng wrote:
 Following commits:
 50e244cc79 fb: rework locking to fix lock ordering on takeover
 e93a9a8687 fb: Yet another band-aid for fixing lockdep mess
 054430e773 fbcon: fix locking harder
 reworked locking to fix related lock ordering on takeover, and introduced 
 console_lock
 into fbmem, but it seems that the new lock sequence(fb_info-lock --- 
 console_lock)
 is against with the one in console_callback(console_lock --- 
 fb_info-lock), and leads to
 a potential dead lock as following:
 
 snip
 
 so we reorder the lock sequence the same as it in console_callback() to
 avoid this issue. And following Tomi's suggestion, fix these similar
 issues all in fb subsystem.

 Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  drivers/video/fbmem.c|   50 
 -
  drivers/video/fbsysfs.c  |   19 ++
  drivers/video/sh_mobile_lcdcfb.c |   10 ---
  3 files changed, 51 insertions(+), 28 deletions(-)
 
 I'll apply this for 3.13. It's a bit difficult to verify if the locking
 is now correct, but looks fine to me. And we can revert this easily if
 things break badly.

Thanks very munch.:)

Regards,
Gu

 
  Tomi
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 2/2] f2fs: read contiguous sit entry pages by merging for mount performance

2013-11-12 Thread Gu Zheng

Hi Yu,
On 11/12/2013 01:18 PM, Chao Yu wrote:

 Previously we read sit entries page one by one, this method lost the chance 
 of reading contiguous page together.
 So we read pages as contiguous as possible for better mount performance.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/f2fs.h|2 ++
  fs/f2fs/segment.c |   65 
 ++---
  fs/f2fs/segment.h |2 ++
  3 files changed, 66 insertions(+), 3 deletions(-)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 0afdcec..bfe9d87 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -1113,6 +1113,8 @@ struct page *find_data_page(struct inode *, pgoff_t, 
 bool);
  struct page *get_lock_data_page(struct inode *, pgoff_t);
  struct page *get_new_data_page(struct inode *, struct page *, pgoff_t, bool);
  int f2fs_readpage(struct f2fs_sb_info *, struct page *, block_t, int);
 +void f2fs_submit_read_bio(struct f2fs_sb_info *, int);
 +void submit_read_page(struct f2fs_sb_info *, struct page *, block_t, int);

Better to move these declarations into PATCH 1/2.

  int do_write_data_page(struct page *);
  
  /*
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 86dc289..414c351 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1474,19 +1474,72 @@ static int build_curseg(struct f2fs_sb_info *sbi)
   return restore_curseg_summaries(sbi);
  }
  
 +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start,
 + int nrpages, bool *is_order)
 +{
 + struct address_space *mapping = sbi-meta_inode-i_mapping;
 + struct sit_info *sit_i = SIT_I(sbi);
 + struct page *page;
 + block_t blk_addr;
 + int blkno, readcnt = 0;
 + int sit_blk_cnt = SIT_BLK_CNT(sbi);
 +
 + for (blkno = start; blkno  start + nrpages; blkno++) {
 +
 + if (blkno = sit_blk_cnt)

Merge these two judgements:
for (blkno = start; blkno  start + nrpages  blkno  sit_blk_cnt; blkno++)

 + goto out;

 + if ((!f2fs_test_bit(blkno, sit_i-sit_bitmap) ^ !*is_order)) {
 + *is_order = !*is_order;
 + goto out;

'Break' seems more suitable.

 + }
 +
 + blk_addr = sit_i-sit_base_addr + blkno;
 + if (*is_order)
 + blk_addr += sit_i-sit_blocks;
 +repeat:
 + page = grab_cache_page(mapping, blk_addr);
 + if (!page) {
 + cond_resched();
 + goto repeat;
 + }
 + if (PageUptodate(page)) {
 + f2fs_put_page(page, 1);
 + readcnt++;
 + goto out;

Here may be 'Continue'.

 + }
 +
 + submit_read_page(sbi, page, blk_addr, READ_SYNC);
 +
 + page_cache_release(page);

Put page here seems not a good idea, otherwise all your work may be in vain.

 + readcnt++;
 + }
 +out:
 + f2fs_submit_read_bio(sbi, READ_SYNC);
 + return readcnt;
 +}
 +
  static void build_sit_entries(struct f2fs_sb_info *sbi)
  {
   struct sit_info *sit_i = SIT_I(sbi);
   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
   struct f2fs_summary_block *sum = curseg-sum_blk;
 - unsigned int start;
 + bool is_order = f2fs_test_bit(0, sit_i-sit_bitmap) ? true : false;
 + int sit_blk_cnt = SIT_BLK_CNT(sbi);
 + int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
 + unsigned int i, start, end;
 + unsigned int readed, start_blk = 0;
  
 - for (start = 0; start  TOTAL_SEGS(sbi); start++) {
 +next:
 + readed = ra_sit_pages(sbi, start_blk, bio_blocks, is_order);

In fact, you know how many blocks that you want to read(SIT_BLK_CNT(sbi)),
so here sit_blk_cnt is more suitable than a MAX one, and it also can make
the logic of ra_sit_pages more simple.

 +
 + start = start_blk * sit_i-sents_per_block;
 + end = (start_blk + readed) * sit_i-sents_per_block;
 +
 + for (; start  end  start  TOTAL_SEGS(sbi); start++) {
   struct seg_entry *se = sit_i-sentries[start];
   struct f2fs_sit_block *sit_blk;
   struct f2fs_sit_entry sit;
   struct page *page;
 - int i;
  
   mutex_lock(curseg-curseg_mutex);
   for (i = 0; i  sits_in_cursum(sum); i++) {
 @@ -1497,6 +1550,7 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
   }
   }
   mutex_unlock(curseg-curseg_mutex);
 +
   page = get_current_sit_page(sbi, start);
   sit_blk = (struct f2fs_sit_block *)page_address(page);
   sit = sit_blk-entries[SIT_ENTRY_OFFSET(sit_i, start)];
 @@ -1509,6 +1563,11 @@ got_it:
   e-valid_blocks += se-valid_blocks;
   }
   }
 +
 + start_blk += readed;
 + if (start_blk = sit_blk_cnt)
 + return;
 + goto next;

Using do {...}

Re: [f2fs-dev] [PATCH 1/2] f2fs: add a new function to support for merging contiguous read

2013-11-12 Thread Gu Zheng

On 11/12/2013 01:15 PM, Chao Yu wrote:

 For better read performance, we add a new function to support for merging 
 contiguous read as the one for write.

Nice shot!

 
 Signed-off-by: Chao Yu chao2...@samsung.com

Acked-by: Gu Zheng guz.f...@cn.fujitsu.com

 ---
  fs/f2fs/data.c |   45 +
  fs/f2fs/f2fs.h |2 ++
  2 files changed, 47 insertions(+)
 
 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index aa3438c..f30060b 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -404,6 +404,51 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
 *page,
   return 0;
  }
  
 +void f2fs_submit_read_bio(struct f2fs_sb_info *sbi, int rw)
 +{
 + down_read(sbi-bio_sem);
 + if (sbi-read_bio) {
 + submit_bio(rw, sbi-read_bio);
 + sbi-read_bio = NULL;
 + }
 + up_read(sbi-bio_sem);
 +}
 +
 +void submit_read_page(struct f2fs_sb_info *sbi, struct page *page,
 + block_t blk_addr, int rw)
 +{
 + struct block_device *bdev = sbi-sb-s_bdev;
 + int bio_blocks;
 +
 + verify_block_addr(sbi, blk_addr);
 +
 + down_read(sbi-bio_sem);
 +
 + if (sbi-read_bio  sbi-last_read_block != blk_addr - 1) {
 + submit_bio(rw, sbi-read_bio);
 + sbi-read_bio = NULL;
 + }
 +
 +alloc_new:
 + if (sbi-read_bio == NULL) {
 + bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
 + sbi-read_bio = f2fs_bio_alloc(bdev, bio_blocks);
 + sbi-read_bio-bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
 + sbi-read_bio-bi_end_io = read_end_io;
 + }
 +
 + if (bio_add_page(sbi-read_bio, page, PAGE_CACHE_SIZE, 0) 
 + PAGE_CACHE_SIZE) {
 + submit_bio(rw, sbi-read_bio);
 + sbi-read_bio = NULL;
 + goto alloc_new;
 + }
 +
 + sbi-last_read_block = blk_addr;
 +
 + up_read(sbi-bio_sem);
 +}
 +
  /*
   * This function should be used by the data read flow only where it
   * does not check the create flag that indicates block allocation.
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 89dc750..0afdcec 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -359,6 +359,8 @@ struct f2fs_sb_info {
  
   /* for segment-related operations */
   struct f2fs_sm_info *sm_info;   /* segment manager */
 + struct bio *read_bio;   /* read bios to merge */
 + sector_t last_read_block;   /* last read block number */
   struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
   sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
   struct rw_semaphore bio_sem;/* IO semaphore */


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH 2/2] f2fs: read contiguous sit entry pages by merging for mount performance

2013-11-14 Thread Gu Zheng

Hi Yu,
On 11/13/2013 04:10 PM, Chao Yu wrote:

 Hi Gu,
 
 -Original Message-
 From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
 Sent: Wednesday, November 13, 2013 11:39 AM
 To: Chao Yu
 Cc: ???; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 linux-f2fs-de...@lists.sourceforge.net; 谭姝
 Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: read contiguous sit entry pages by 
 merging for mount performance

 Hi Yu,
 On 11/12/2013 01:18 PM, Chao Yu wrote:

 Previously we read sit entries page one by one, this method lost the chance 
 of reading contiguous page together.
 So we read pages as contiguous as possible for better mount performance.

 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/f2fs.h|2 ++
  fs/f2fs/segment.c |   65 
 ++---
  fs/f2fs/segment.h |2 ++
  3 files changed, 66 insertions(+), 3 deletions(-)

 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 0afdcec..bfe9d87 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -1113,6 +1113,8 @@ struct page *find_data_page(struct inode *, pgoff_t, 
 bool);
  struct page *get_lock_data_page(struct inode *, pgoff_t);
  struct page *get_new_data_page(struct inode *, struct page *, pgoff_t, 
 bool);
  int f2fs_readpage(struct f2fs_sb_info *, struct page *, block_t, int);
 +void f2fs_submit_read_bio(struct f2fs_sb_info *, int);
 +void submit_read_page(struct f2fs_sb_info *, struct page *, block_t, int);

 Better to move these declarations into PATCH 1/2.
 
 Okay, I will move it to the right place.
 

  int do_write_data_page(struct page *);

  /*
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index 86dc289..414c351 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -1474,19 +1474,72 @@ static int build_curseg(struct f2fs_sb_info *sbi)
 return restore_curseg_summaries(sbi);
  }

 +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start,
 +   int nrpages, bool *is_order)
 +{
 +   struct address_space *mapping = sbi-meta_inode-i_mapping;
 +   struct sit_info *sit_i = SIT_I(sbi);
 +   struct page *page;
 +   block_t blk_addr;
 +   int blkno, readcnt = 0;
 +   int sit_blk_cnt = SIT_BLK_CNT(sbi);
 +
 +   for (blkno = start; blkno  start + nrpages; blkno++) {
 +
 +   if (blkno = sit_blk_cnt)

 Merge these two judgements:
 for (blkno = start; blkno  start + nrpages  blkno  sit_blk_cnt; blkno++)
 
 Right, but the line may over 80 characters, if we split this line, it seems 
 not suitable.
 So how about this?
   int blkno = start, readcnt = 0;
   int sit_blk_cnt = SIT_BLK_CNT(sbi);
 
   for (; blkno  start + nrpages  blkno  sit_blk_cnt; blkno++) {

More neat！

 

 +   goto out;

 +   if ((!f2fs_test_bit(blkno, sit_i-sit_bitmap) ^ !*is_order)) {
 +   *is_order = !*is_order;
 +   goto out;

 'Break' seems more suitable.
 
 Yes, you are right.
 

 +   }
 +
 +   blk_addr = sit_i-sit_base_addr + blkno;
 +   if (*is_order)
 +   blk_addr += sit_i-sit_blocks;
 +repeat:
 +   page = grab_cache_page(mapping, blk_addr);
 +   if (!page) {
 +   cond_resched();
 +   goto repeat;
 +   }
 +   if (PageUptodate(page)) {
 +   f2fs_put_page(page, 1);
 +   readcnt++;
 +   goto out;

 Here may be 'Continue'.
 
 'Out' label could be removed after this modification.
 It seems more neat.

Right.

 

 +   }
 +
 +   submit_read_page(sbi, page, blk_addr, READ_SYNC);
 +
 +   page_cache_release(page);

 Put page here seems not a good idea, otherwise all your work may be in vain.
 
 You mean that pages could be reclaimed by VM when out of memory?
 IMO, it is designed more like VM read ahead because we should concern 
 memory state of system, and still we have second chance to read these pages.
 

Yes, but we can avoid to read the same page secondly, that's a serious waste, 
if we still
reread the page in get_current_sit_page(), all the improvement will disappear.

 Could we use mark_page_accessed () to delay VM reclaimed them?

IMO, this is the right way.

 

 +   readcnt++;
 +   }
 +out:
 +   f2fs_submit_read_bio(sbi, READ_SYNC);
 +   return readcnt;
 +}
 +
  static void build_sit_entries(struct f2fs_sb_info *sbi)
  {
 struct sit_info *sit_i = SIT_I(sbi);
 struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
 struct f2fs_summary_block *sum = curseg-sum_blk;
 -   unsigned int start;
 +   bool is_order = f2fs_test_bit(0, sit_i-sit_bitmap) ? true : false;
 +   int sit_blk_cnt = SIT_BLK_CNT(sbi);
 +   int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
 +   unsigned int i, start, end;
 +   unsigned int readed, start_blk = 0;

 -   for (start = 0; start  TOTAL_SEGS(sbi); start++) {
 +next:
 +   readed = ra_sit_pages(sbi, start_blk, bio_blocks, is_order);

 In fact, you know how many

[PATCH] f2fs: use mutex rather than the rw_sem

2013-11-18 Thread Gu Zheng

Use mutex rather than the rw_sem to protect bio related fields,
because it's needless to take the read_sem in the read path.


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/data.c|4 
 fs/f2fs/f2fs.h|2 +-
 fs/f2fs/segment.c |8 
 fs/f2fs/super.c   |2 +-
 4 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index aa3438c..b4e4c7e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -383,8 +383,6 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
*page,
 
trace_f2fs_readpage(page, blk_addr, type);
 
-   down_read(sbi-bio_sem);
-
/* Allocate a new bio */
bio = f2fs_bio_alloc(bdev, 1);
 
@@ -394,13 +392,11 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
*page,
 
if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0)  PAGE_CACHE_SIZE) {
bio_put(bio);
-   up_read(sbi-bio_sem);
f2fs_put_page(page, 1);
return -EFAULT;
}
 
submit_bio(type, bio);
-   up_read(sbi-bio_sem);
return 0;
 }
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 89dc750..78a0054 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -361,7 +361,7 @@ struct f2fs_sb_info {
struct f2fs_sm_info *sm_info;   /* segment manager */
struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
-   struct rw_semaphore bio_sem;/* IO semaphore */
+   struct mutex bio_mutex; /* IO write mutex */
 
/* for checkpoint */
struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index fa284d3..e91f65c 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -653,9 +653,9 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
 
 void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum page_type type, bool sync)
 {
-   down_write(sbi-bio_sem);
+   mutex_lock(sbi-bio_mutex);
do_submit_bio(sbi, type, sync);
-   up_write(sbi-bio_sem);
+   mutex_unlock(sbi-bio_mutex);
 }
 
 static void submit_write_page(struct f2fs_sb_info *sbi, struct page *page,
@@ -666,7 +666,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi, 
struct page *page,
 
verify_block_addr(sbi, blk_addr);
 
-   down_write(sbi-bio_sem);
+   mutex_lock(sbi-bio_mutex);
 
inc_page_count(sbi, F2FS_WRITEBACK);
 
@@ -701,7 +701,7 @@ retry:
 
sbi-last_block_in_bio[type] = blk_addr;
 
-   up_write(sbi-bio_sem);
+   mutex_unlock(sbi-bio_mutex);
trace_f2fs_submit_write_page(page, blk_addr, type);
 }
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index bafff72..fab3550 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -874,7 +874,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
mutex_init(sbi-node_write);
sbi-por_doing = false;
spin_lock_init(sbi-stat_lock);
-   init_rwsem(sbi-bio_sem);
+   mutex_init(sbi-bio_mutex);
init_rwsem(sbi-cp_rwsem);
init_waitqueue_head(sbi-cp_wait);
init_sb_info(sbi);
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] f2fs: use sbi-wr_mutex for write bios

2013-11-18 Thread Gu Zheng

Hi Kim,
On 11/18/2013 05:12 PM, Jaegeuk Kim wrote:

 This patch removes an unnecessary semaphore (i.e., sbi-bio_sem).
 There is no reason to use the semaphore when f2fs submits read and write IOs.
 Instead, let's use a write mutex and cover the sbi-bio[] by the lock.

My god, I just sent out an almost the same patch, do we have a telepathy?:)

Regard,
Gu 

 
 Signed-off-by: Jaegeuk Kim jaegeuk@samsung.com
 ---
  fs/f2fs/data.c|  4 
  fs/f2fs/f2fs.h|  2 +-
  fs/f2fs/segment.c | 13 +
  fs/f2fs/super.c   |  2 +-
  4 files changed, 11 insertions(+), 10 deletions(-)
 
 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index 84867dc..7550026 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -390,8 +390,6 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
 *page,
  
   trace_f2fs_readpage(page, blk_addr, type);
  
 - down_read(sbi-bio_sem);
 -
   /* Allocate a new bio */
   bio = f2fs_bio_alloc(bdev, 1);
  
 @@ -401,13 +399,11 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
 *page,
  
   if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0)  PAGE_CACHE_SIZE) {
   bio_put(bio);
 - up_read(sbi-bio_sem);
   f2fs_put_page(page, 1);
   return -EFAULT;
   }
  
   submit_bio(type, bio);
 - up_read(sbi-bio_sem);
   return 0;
  }
  
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 1c783fd..76f5586 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -375,7 +375,7 @@ struct f2fs_sb_info {
   struct f2fs_sm_info *sm_info;   /* segment manager */
   struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
   sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
 - struct rw_semaphore bio_sem;/* IO semaphore */
 + struct mutex write_mutex;   /* mutex for writing IOs */
  
   /* for checkpoint */
   struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index dad5f1a..893d489 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -871,9 +871,14 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
  
  void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum page_type type, bool 
 sync)
  {
 - down_write(sbi-bio_sem);
 + enum page_type btype = PAGE_TYPE_OF_BIO(type);
 +
 + if (!sbi-bio[btype])
 + return;
 +
 + mutex_lock(sbi-write_mutex);
   do_submit_bio(sbi, type, sync);
 - up_write(sbi-bio_sem);
 + mutex_unlock(sbi-write_mutex);
  }
  
  static void submit_write_page(struct f2fs_sb_info *sbi, struct page *page,
 @@ -884,7 +889,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi, 
 struct page *page,
  
   verify_block_addr(sbi, blk_addr);
  
 - down_write(sbi-bio_sem);
 + mutex_lock(sbi-write_mutex);
  
   inc_page_count(sbi, F2FS_WRITEBACK);
  
 @@ -919,7 +924,7 @@ retry:
  
   sbi-last_block_in_bio[type] = blk_addr;
  
 - up_write(sbi-bio_sem);
 + mutex_unlock(sbi-write_mutex);
   trace_f2fs_submit_write_page(page, blk_addr, type);
  }
  
 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
 index 2c52527..c7b6300 100644
 --- a/fs/f2fs/super.c
 +++ b/fs/f2fs/super.c
 @@ -882,7 +882,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
 *data, int silent)
   mutex_init(sbi-node_write);
   sbi-por_doing = false;
   spin_lock_init(sbi-stat_lock);
 - init_rwsem(sbi-bio_sem);
 + mutex_init(sbi-write_mutex);
   init_rwsem(sbi-cp_rwsem);
   init_waitqueue_head(sbi-cp_wait);
   init_sb_info(sbi);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH V2 1/2] f2fs: add a new function to support for merging contiguous read

2013-11-18 Thread Gu Zheng

On 11/18/2013 05:11 PM, Jaegeuk Kim wrote:

 Hi,
 
 2013-11-18 (월), 09:37 +0800, Chao Yu:
 Hi Kim,

 -Original Message-
 From: Jaegeuk Kim [mailto:jaegeuk@samsung.com]
 Sent: Monday, November 18, 2013 8:29 AM
 To: Chao Yu
 Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 linux-f2fs-de...@lists.sourceforge.net; 谭姝
 Subject: Re: [f2fs-dev] [PATCH V2 1/2] f2fs: add a new function to support 
 for merging contiguous read

 Hi Chao,

 2013-11-16 (토), 14:14 +0800, Chao Yu:
 For better read performance, we add a new function to support for merging 
 contiguous read as the one for write.

 Please consider 80 columns for the description.
 I cannot fix this at every time though. :(

 Got it, sorry about my carelessness in previous patch.



 v1--v2:
  o add declarations here as Gu Zheng suggested.

 Signed-off-by: Chao Yu chao2...@samsung.com
 Acked-by: Gu Zheng guz.f...@cn.fujitsu.com
 ---
  fs/f2fs/data.c |   45 +
  fs/f2fs/f2fs.h |4 
  2 files changed, 49 insertions(+)

 diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
 index aa3438c..18107cb 100644
 --- a/fs/f2fs/data.c
 +++ b/fs/f2fs/data.c
 @@ -404,6 +404,51 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct 
 page *page,
return 0;
  }

 +void f2fs_submit_read_bio(struct f2fs_sb_info *sbi, int rw)
 +{
 +  down_read(sbi-bio_sem);

 Is there any reason to use down_read()?

 Isn't that we use bio_sem to let w/r or w/w submitting be mutex?
 
 As I examined the bio_sem, I think we don't need to use a semaphore for
 read and write IOs.
 Just it is enough to use a mutex for writes only.

Agree. Mutex is more suitable here, we just want to protect the write bio
related fields in the write patch, no relations to read.

 

 It seems that we need to declare sbi-bio_read and sbi-bio_write
 instead of sbi-bio_sem.
 In addition to that, we need to use down_write(sbi-bio_read) here.

 If so, it looks similar between (struct rw_semaphore) sbi-bio_read 
 and (struct bio *) sbi-read_bio.
 How about using read_bio_sem/rbio_sem to differentiate 
 from sbi-read_bio?
 
 I think sbi-write_mutex and sbi-read_mutex are much better.

It's more reasonable and readable.

Thanks,
Gu

 
 Could you refer the following patches?
 Thanks,
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH V2 2/2] f2fs: read contiguous sit entry pages by merging for mount performance

2013-11-18 Thread Gu Zheng

Hi Yu,
One more comment, please refer to inline.
On 11/16/2013 02:15 PM, Chao Yu wrote:

 Previously we read sit entries page one by one, this method lost the chance 
 of reading contiguous page together.
 So we read pages as contiguous as possible for better mount performance.
 
 v1--v2:
  o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng 
 suggested.
  o add mark_page_accessed () before release page to delay VM reclaiming them.
 
 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.c |  108 
 -
  fs/f2fs/segment.h |2 +
  2 files changed, 84 insertions(+), 26 deletions(-)
 
 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index fa284d3..656fe40 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -14,6 +14,7 @@
  #include linux/blkdev.h
  #include linux/prefetch.h
  #include linux/vmalloc.h
 +#include linux/swap.h
  
  #include f2fs.h
  #include segment.h
 @@ -1480,41 +1481,96 @@ static int build_curseg(struct f2fs_sb_info *sbi)
   return restore_curseg_summaries(sbi);
  }
  
 +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start,
 + int nrpages, bool *is_order)
 +{
 + struct address_space *mapping = sbi-meta_inode-i_mapping;
 + struct sit_info *sit_i = SIT_I(sbi);
 + struct page *page;
 + block_t blk_addr;
 + int blkno = start, readcnt = 0;
 + int sit_blk_cnt = SIT_BLK_CNT(sbi);
 +
 + for (; blkno  start + nrpages  blkno  sit_blk_cnt; blkno++) {
 +
 + if ((!f2fs_test_bit(blkno, sit_i-sit_bitmap) ^ !*is_order)) {
 + *is_order = !*is_order;
 + break;
 + }
 +
 + blk_addr = sit_i-sit_base_addr + blkno;
 + if (*is_order)
 + blk_addr += sit_i-sit_blocks;
 +repeat:
 + page = grab_cache_page(mapping, blk_addr);
 + if (!page) {
 + cond_resched();
 + goto repeat;
 + }
 + if (PageUptodate(page)) {
 + mark_page_accessed(page);
 + f2fs_put_page(page, 1);
 + readcnt++;
 + continue;
 + }
 +
 + submit_read_page(sbi, page, blk_addr, READ_SYNC);
 +
 + mark_page_accessed(page);
 + f2fs_put_page(page, 0);
 + readcnt++;
 + }
 +
 + f2fs_submit_read_bio(sbi, READ_SYNC);
 + return readcnt;
 +}
 +
  static void build_sit_entries(struct f2fs_sb_info *sbi)
  {
   struct sit_info *sit_i = SIT_I(sbi);
   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
   struct f2fs_summary_block *sum = curseg-sum_blk;
 - unsigned int start;
 -
 - for (start = 0; start  TOTAL_SEGS(sbi); start++) {
 - struct seg_entry *se = sit_i-sentries[start];
 - struct f2fs_sit_block *sit_blk;
 - struct f2fs_sit_entry sit;
 - struct page *page;
 - int i;
 + bool is_order = f2fs_test_bit(0, sit_i-sit_bitmap) ? true : false;
 + int sit_blk_cnt = SIT_BLK_CNT(sbi);
 + unsigned int i, start, end;
 + unsigned int readed, start_blk = 0;
  
 - mutex_lock(curseg-curseg_mutex);
 - for (i = 0; i  sits_in_cursum(sum); i++) {
 - if (le32_to_cpu(segno_in_journal(sum, i)) == start) {
 - sit = sit_in_journal(sum, i);
 - mutex_unlock(curseg-curseg_mutex);
 - goto got_it;
 + do {

How about using find_next_bit to get the suitable start_blk if the next blk
is not ordered here? And it also can simplify the logic of ra_sit_pages().

Thanks,
Gu

 + readed = ra_sit_pages(sbi, start_blk, sit_blk_cnt, is_order);
 +
 + start = start_blk * sit_i-sents_per_block;
 + end = (start_blk + readed) * sit_i-sents_per_block;
 +
 + for (; start  end  start  TOTAL_SEGS(sbi); start++) {
 + struct seg_entry *se = sit_i-sentries[start];
 + struct f2fs_sit_block *sit_blk;
 + struct f2fs_sit_entry sit;
 + struct page *page;
 +
 + mutex_lock(curseg-curseg_mutex);
 + for (i = 0; i  sits_in_cursum(sum); i++) {
 + if (le32_to_cpu(segno_in_journal(sum, i)) == 
 start) {
 + sit = sit_in_journal(sum, i);
 + mutex_unlock(curseg-curseg_mutex);
 + goto got_it;
 + }
   }
 - }
 - mutex_unlock(curseg-curseg_mutex);
 - page = get_current_sit_page(sbi, start);
 - sit_blk = (struct f2fs_sit_block *)page_address(page);
 - sit = sit_blk-entries

[PATCH 0/5] f2fs: some minor cleanups and logic fixes

2013-11-19 Thread Gu Zheng


Gu Zheng (5):
  f2fs: convert remove_inode_page to void
  f2fs: convert dev_valid_block_count to void
  f2fs: convert inc/dec_valid_node_count to inc/dec one count
  f2fs: simplify write_orphan_inodes for better readable
  f2fs: move the list_head initialization into the lock protection
region

 fs/f2fs/checkpoint.c |   53 ++---
 fs/f2fs/f2fs.h   |   37 --
 fs/f2fs/node.c   |   18 ++--
 3 files changed, 52 insertions(+), 56 deletions(-)

-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] f2fs: simplify write_orphan_inodes for better readable

2013-11-19 Thread Gu Zheng

Simplify write_orphan_inodes for better readable. Because we hold the
orphan_inode_mutex, so it's safe to use list_for_each_entry instead of
list_for_each_safe.


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |   38 ++
 1 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 5716e5e..f884589 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -300,12 +300,13 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
 
 static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk)
 {
-   struct list_head *head, *this, *next;
+   struct list_head *head;
struct f2fs_orphan_block *orphan_blk = NULL;
struct page *page = NULL;
unsigned int nentries = 0;
unsigned short index = 1;
unsigned short orphan_blocks;
+   struct orphan_inode_entry *orphan = NULL;
 
orphan_blocks = (unsigned short)((sbi-n_orphans +
(F2FS_ORPHANS_PER_BLOCK - 1)) / F2FS_ORPHANS_PER_BLOCK);
@@ -314,12 +315,17 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
head = sbi-orphan_inode_list;
 
/* loop for each orphan inode entry and write them in Jornal block */
-   list_for_each_safe(this, next, head) {
-   struct orphan_inode_entry *orphan;
+   list_for_each_entry(orphan, head, list) {
+   if (!page) {
+   page = grab_meta_page(sbi, start_blk);
+   orphan_blk =
+   (struct f2fs_orphan_block *)page_address(page);
+   memset(orphan_blk, 0, sizeof(*orphan_blk));
+   }
 
-   orphan = list_entry(this, struct orphan_inode_entry, list);
+   orphan_blk-ino[nentries] = cpu_to_le32(orphan-ino);
 
-   if (nentries == F2FS_ORPHANS_PER_BLOCK) {
+   if (nentries++ == F2FS_ORPHANS_PER_BLOCK) {
/*
 * an orphan block is full of 1020 entries,
 * then we need to flush current orphan blocks
@@ -335,24 +341,16 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
nentries = 0;
page = NULL;
}
-   if (page)
-   goto page_exist;
+   }
 
-   page = grab_meta_page(sbi, start_blk);
-   orphan_blk = (struct f2fs_orphan_block *)page_address(page);
-   memset(orphan_blk, 0, sizeof(*orphan_blk));
-page_exist:
-   orphan_blk-ino[nentries++] = cpu_to_le32(orphan-ino);
+   if (page) {
+   orphan_blk-blk_addr = cpu_to_le16(index);
+   orphan_blk-blk_count = cpu_to_le16(orphan_blocks);
+   orphan_blk-entry_count = cpu_to_le32(nentries);
+   set_page_dirty(page);
+   f2fs_put_page(page, 1);
}
-   if (!page)
-   goto end;
 
-   orphan_blk-blk_addr = cpu_to_le16(index);
-   orphan_blk-blk_count = cpu_to_le16(orphan_blocks);
-   orphan_blk-entry_count = cpu_to_le32(nentries);
-   set_page_dirty(page);
-   f2fs_put_page(page, 1);
-end:
mutex_unlock(sbi-orphan_inode_mutex);
 }
 
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] f2fs: convert dev_valid_block_count to void

2013-11-19 Thread Gu Zheng


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/f2fs.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 94fbec3..d0c6738 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -585,7 +585,7 @@ static inline bool inc_valid_block_count(struct 
f2fs_sb_info *sbi,
return true;
 }
 
-static inline int dec_valid_block_count(struct f2fs_sb_info *sbi,
+static inline void dec_valid_block_count(struct f2fs_sb_info *sbi,
struct inode *inode,
blkcnt_t count)
 {
@@ -595,7 +595,6 @@ static inline int dec_valid_block_count(struct f2fs_sb_info 
*sbi,
inode-i_blocks -= count;
sbi-total_valid_block_count -= (block_t)count;
spin_unlock(sbi-stat_lock);
-   return 0;
 }
 
 static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] f2fs: move the list_head initialization into the lock protection region

2013-11-19 Thread Gu Zheng


Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com
---
 fs/f2fs/checkpoint.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index f884589..1de70cc 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -511,8 +511,8 @@ void add_dirty_dir_inode(struct inode *inode)
 void remove_dirty_dir_inode(struct inode *inode)
 {
struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb);
-   struct list_head *head = sbi-dir_inode_list;
-   struct list_head *this;
+
+   struct list_head *this, *head;
 
if (!S_ISDIR(inode-i_mode))
return;
@@ -523,6 +523,7 @@ void remove_dirty_dir_inode(struct inode *inode)
return;
}
 
+   head = sbi-dir_inode_list;
list_for_each(this, head) {
struct dir_inode_entry *entry;
entry = list_entry(this, struct dir_inode_entry, list);
@@ -544,11 +545,13 @@ void remove_dirty_dir_inode(struct inode *inode)
 
 struct inode *check_dirty_dir_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
-   struct list_head *head = sbi-dir_inode_list;
-   struct list_head *this;
+
+   struct list_head *this, *head;
struct inode *inode = NULL;
 
spin_lock(sbi-dir_inode_lock);
+
+   head = sbi-dir_inode_list;
list_for_each(this, head) {
struct dir_inode_entry *entry;
entry = list_entry(this, struct dir_inode_entry, list);
@@ -563,11 +566,13 @@ struct inode *check_dirty_dir_inode(struct f2fs_sb_info 
*sbi, nid_t ino)
 
 void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
 {
-   struct list_head *head = sbi-dir_inode_list;
+   struct list_head *head;
struct dir_inode_entry *entry;
struct inode *inode;
 retry:
spin_lock(sbi-dir_inode_lock);
+
+   head = sbi-dir_inode_list;
if (list_empty(head)) {
spin_unlock(sbi-dir_inode_lock);
return;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH V2 2/2] f2fs: read contiguous sit entry pages by merging for mount performance

2013-11-20 Thread Gu Zheng

Hi Yu,
On 11/20/2013 01:37 PM, Chao Yu wrote:

 Hi Gu,
 
 -Original Message-
 From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
 Sent: Monday, November 18, 2013 7:16 PM
 To: Chao Yu
 Cc: '???'; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 linux-f2fs-de...@lists.sourceforge.net; 谭姝
 Subject: Re: [f2fs-dev] [PATCH V2 2/2] f2fs: read contiguous sit entry pages 
 by merging for mount performance

 Hi Yu,
 One more comment, please refer to inline.
 On 11/16/2013 02:15 PM, Chao Yu wrote:

 Previously we read sit entries page one by one, this method lost the chance 
 of reading contiguous page together.
 So we read pages as contiguous as possible for better mount performance.

 v1--v2:
  o merge judgements/use 'Continue' or 'Break' instead of 'Goto' as Gu Zheng 
 suggested.
  o add mark_page_accessed () before release page to delay VM reclaiming 
 them.

 Signed-off-by: Chao Yu chao2...@samsung.com
 ---
  fs/f2fs/segment.c |  108 
 -
  fs/f2fs/segment.h |2 +
  2 files changed, 84 insertions(+), 26 deletions(-)

 diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
 index fa284d3..656fe40 100644
 --- a/fs/f2fs/segment.c
 +++ b/fs/f2fs/segment.c
 @@ -14,6 +14,7 @@
  #include linux/blkdev.h
  #include linux/prefetch.h
  #include linux/vmalloc.h
 +#include linux/swap.h

  #include f2fs.h
  #include segment.h
 @@ -1480,41 +1481,96 @@ static int build_curseg(struct f2fs_sb_info *sbi)
 return restore_curseg_summaries(sbi);
  }

 +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start,
 +   int nrpages, bool *is_order)
 +{
 +   struct address_space *mapping = sbi-meta_inode-i_mapping;
 +   struct sit_info *sit_i = SIT_I(sbi);
 +   struct page *page;
 +   block_t blk_addr;
 +   int blkno = start, readcnt = 0;
 +   int sit_blk_cnt = SIT_BLK_CNT(sbi);
 +
 +   for (; blkno  start + nrpages  blkno  sit_blk_cnt; blkno++) {
 +
 +   if ((!f2fs_test_bit(blkno, sit_i-sit_bitmap) ^ !*is_order)) {
 +   *is_order = !*is_order;
 +   break;
 +   }
 +
 +   blk_addr = sit_i-sit_base_addr + blkno;
 +   if (*is_order)
 +   blk_addr += sit_i-sit_blocks;
 +repeat:
 +   page = grab_cache_page(mapping, blk_addr);
 +   if (!page) {
 +   cond_resched();
 +   goto repeat;
 +   }
 +   if (PageUptodate(page)) {
 +   mark_page_accessed(page);
 +   f2fs_put_page(page, 1);
 +   readcnt++;
 +   continue;
 +   }
 +
 +   submit_read_page(sbi, page, blk_addr, READ_SYNC);
 +
 +   mark_page_accessed(page);
 +   f2fs_put_page(page, 0);
 +   readcnt++;
 +   }
 +
 +   f2fs_submit_read_bio(sbi, READ_SYNC);
 +   return readcnt;
 +}
 +
  static void build_sit_entries(struct f2fs_sb_info *sbi)
  {
 struct sit_info *sit_i = SIT_I(sbi);
 struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
 struct f2fs_summary_block *sum = curseg-sum_blk;
 -   unsigned int start;
 -
 -   for (start = 0; start  TOTAL_SEGS(sbi); start++) {
 -   struct seg_entry *se = sit_i-sentries[start];
 -   struct f2fs_sit_block *sit_blk;
 -   struct f2fs_sit_entry sit;
 -   struct page *page;
 -   int i;
 +   bool is_order = f2fs_test_bit(0, sit_i-sit_bitmap) ? true : false;
 +   int sit_blk_cnt = SIT_BLK_CNT(sbi);
 +   unsigned int i, start, end;
 +   unsigned int readed, start_blk = 0;

 -   mutex_lock(curseg-curseg_mutex);
 -   for (i = 0; i  sits_in_cursum(sum); i++) {
 -   if (le32_to_cpu(segno_in_journal(sum, i)) == start) {
 -   sit = sit_in_journal(sum, i);
 -   mutex_unlock(curseg-curseg_mutex);
 -   goto got_it;
 +   do {

 How about using find_next_bit to get the suitable start_blk if the next blk
 is not ordered here? And it also can simplify the logic of ra_sit_pages().
 
 That's a good idea.
 But I thought there maybe endianness problem between test_bit and 
 f2fs_test_bit, so find_next_bit may get wrong result. Am I right?

IMO, find_next_bit can do well with endianness issue internally, if
it's not so, that may be a weakness.
On the other side, why not introduce a 'f2fs_find_next_bit' if it's
seriously needed?:)

Regards,
Gu

 
 Thanks,
 Yu

 Thanks,
 Gu

 +   readed = ra_sit_pages(sbi, start_blk, sit_blk_cnt, is_order);
 +
 +   start = start_blk * sit_i-sents_per_block;
 +   end = (start_blk + readed) * sit_i-sents_per_block;
 +
 +   for (; start  end  start  TOTAL_SEGS(sbi); start++) {
 +   struct seg_entry *se = sit_i-sentries[start];
 +   struct f2fs_sit_block *sit_blk;
 +   struct f2fs_sit_entry sit;
 +   struct page *page;
 +
 +   mutex_lock

1 2 3 4 5 6 7 8 9 >

1 - 100 of 835 matches

Mail list logo