Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-02-08 Thread Guoqing Jiang

Hi Donald,

On 2/8/21 19:41, Donald Buczek wrote:

Dear Guoqing,

On 08.02.21 15:53, Guoqing Jiang wrote:



On 2/8/21 12:38, Donald Buczek wrote:
5. maybe don't hold reconfig_mutex when try to unregister 
sync_thread, like this.


 /* resync has finished, collect result */
 mddev_unlock(mddev);
 md_unregister_thread(>sync_thread);
 mddev_lock(mddev);


As above: While we wait for the sync thread to terminate, wouldn't it 
be a problem, if another user space operation takes the mutex?


I don't think other places can be blocked while hold mutex, otherwise 
these places can cause potential deadlock. Please try above two lines 
change. And perhaps others have better idea.


Yes, this works. No deadlock after >11000 seconds,

(Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265, 
3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )


Great. I will send a formal patch with your reported-by and tested-by.

Thanks,
Guoqing


Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-02-08 Thread Guoqing Jiang




On 2/8/21 12:38, Donald Buczek wrote:
5. maybe don't hold reconfig_mutex when try to unregister sync_thread, 
like this.


 /* resync has finished, collect result */
 mddev_unlock(mddev);
 md_unregister_thread(>sync_thread);
 mddev_lock(mddev);


As above: While we wait for the sync thread to terminate, wouldn't it be 
a problem, if another user space operation takes the mutex?


I don't think other places can be blocked while hold mutex, otherwise 
these places can cause potential deadlock. Please try above two lines 
change. And perhaps others have better idea.


Thanks,
Guoqing


Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-02-02 Thread Guoqing Jiang

Hi Donald,

On 1/26/21 17:05, Donald Buczek wrote:

Dear Guoqing,

On 26.01.21 15:06, Guoqing Jiang wrote:



On 1/26/21 13:58, Donald Buczek wrote:




Hmm, how about wake the waiter up in the while loop of raid5d?

@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)
 md_check_recovery(mddev);
 spin_lock_irq(>device_lock);
 }
+
+   if ((atomic_read(>active_stripes)
+    < (conf->max_nr_stripes * 3 / 4) ||
+    (test_bit(MD_RECOVERY_INTR, >recovery
+   wake_up(>wait_for_stripe);
 }
 pr_debug("%d stripes handled\n", handled);


Hmm... With this patch on top of your other one, we still have the 
basic symptoms (md3_raid6 busy looping), but the sync thread is now 
hanging at


 root@sloth:~# cat /proc/$(pgrep md3_resync)/stack
 [<0>] md_do_sync.cold+0x8ec/0x97c
 [<0>] md_thread+0xab/0x160
 [<0>] kthread+0x11b/0x140
 [<0>] ret_from_fork+0x22/0x30

instead, which is 
https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963


Not sure why recovery_active is not zero, because it is set 0 before 
blk_start_plug, and raid5_sync_request returns 0 and skipped is also 
set to 1. Perhaps handle_stripe calls md_done_sync.


Could you double check the value of recovery_active? Or just don't 
wait if resync thread is interrupted.


wait_event(mddev->recovery_wait,
    test_bit(MD_RECOVERY_INTR,>recovery) ||
    !atomic_read(>recovery_active));


With that added, md3_resync goes into a loop, too. Not 100% busy, though.

root@sloth:~# cat /proc/$(pgrep md3_resync)/stack

[<0>] raid5_get_active_stripe+0x1e7/0x6b0  # 
https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L735
[<0>] raid5_sync_request+0x2a7/0x3d0   # 
https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L6274
[<0>] md_do_sync.cold+0x3ee/0x97c  # 
https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/md.c#L8883

[<0>] md_thread+0xab/0x160
[<0>] kthread+0x11b/0x140
[<0>] ret_from_fork+0x22/0x30

Sometimes top of stack is raid5_get_active_stripe+0x1ef/0x6b0 instead of 
raid5_get_active_stripe+0x1e7/0x6b0, so I guess it sleeps, its woken, 
but the conditions don't match so its sleeps again.


I don't know why the condition was not true after the change since the 
RECOVERY_INTR is set and the caller is raid5_sync_request.


wait_event_lock_irq(conf->wait_for_stripe,
(test_bit(MD_RECOVERY_INTR, >recovery) && sync_req) ||
 /* the previous condition */,
*(conf->hash_locks + hash));


BTW, I think there some some possible ways:

1. let "echo idle" give up the reconfig_mutex if there are limited 
number of active stripe, but I feel it is ugly to check sh number from 
action_store (kind of layer violation).


2. make raid5_sync_request -> raid5_get_active_stripe can quit from the 
current situation (this was we tried, though it doesn't work so far).


3. set MD_ALLOW_SB_UPDATE as you said though I am not sure the safety 
(but maybe I am wrong).


4. given the write IO keeps coming from upper layer which decrease the 
available stripes. Maybe we need to call grow_one_stripe at the 
beginning of raid5_make_request for this case, then call drop_one_stripe

at the end of make_request.

5. maybe don't hold reconfig_mutex when try to unregister sync_thread, 
like this.


/* resync has finished, collect result */
mddev_unlock(mddev);
md_unregister_thread(>sync_thread);
mddev_lock(mddev);



My suggestion would be try 2 + 4 together since the reproducer triggers 
both sync io and write io. Or try 5. Perhaps there is better way which I 
just can't find.


Thanks,
Guoqing


Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-01-26 Thread Guoqing Jiang




On 1/26/21 13:58, Donald Buczek wrote:




Hmm, how about wake the waiter up in the while loop of raid5d?

@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)
 md_check_recovery(mddev);
 spin_lock_irq(>device_lock);
 }
+
+   if ((atomic_read(>active_stripes)
+    < (conf->max_nr_stripes * 3 / 4) ||
+    (test_bit(MD_RECOVERY_INTR, >recovery
+   wake_up(>wait_for_stripe);
 }
 pr_debug("%d stripes handled\n", handled);


Hmm... With this patch on top of your other one, we still have the basic 
symptoms (md3_raid6 busy looping), but the sync thread is now hanging at


     root@sloth:~# cat /proc/$(pgrep md3_resync)/stack
     [<0>] md_do_sync.cold+0x8ec/0x97c
     [<0>] md_thread+0xab/0x160
     [<0>] kthread+0x11b/0x140
     [<0>] ret_from_fork+0x22/0x30

instead, which is 
https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963


Not sure why recovery_active is not zero, because it is set 0 before 
blk_start_plug, and raid5_sync_request returns 0 and skipped is also set 
to 1. Perhaps handle_stripe calls md_done_sync.


Could you double check the value of recovery_active? Or just don't wait 
if resync thread is interrupted.


wait_event(mddev->recovery_wait,
   test_bit(MD_RECOVERY_INTR,>recovery) ||
   !atomic_read(>recovery_active));

And, unlike before, "md: md3: data-check interrupted." from the pr_info 
two lines above appears in dmesg.


Yes, that is intentional since MD_RECOVERY_INTR is set by write idle.

Anyway, will try the script and investigate more about the issue.

Thanks,
Guoqing


Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-01-26 Thread Guoqing Jiang

Hi Donald,

On 1/26/21 10:50, Donald Buczek wrote:
[...]



diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2d21c298ffa7..f40429843906 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4687,11 +4687,13 @@ action_store(struct mddev *mddev, const char 
*page, size_t len)

  clear_bit(MD_RECOVERY_FROZEN, >recovery);
  if (test_bit(MD_RECOVERY_RUNNING, >recovery) &&
  mddev_lock(mddev) == 0) {
+    set_bit(MD_ALLOW_SB_UPDATE, >flags);
  flush_workqueue(md_misc_wq);
  if (mddev->sync_thread) {
  set_bit(MD_RECOVERY_INTR, >recovery);
  md_reap_sync_thread(mddev);
  }
+    clear_bit(MD_ALLOW_SB_UPDATE, >flags);
  mddev_unlock(mddev);
  }
  } else if (test_bit(MD_RECOVERY_RUNNING, >recovery))


Yes, it could break the deadlock issue, but I am not sure if it is the 
right way given we only set ALLOW_SB_UPDATE in suspend which makes 
sense since the io will be quiesced, but write idle action can't 
guarantee the  similar thing.


Thinking (and documenting) MD_ALLOW_SB_UPDATE as "the holder of 
reconfig_mutex promises not to make any changes which would exclude 
superblocks from being written" might make it easier to accept the usage.


I am not sure it is safe to set the flag here since write idle can't 
prevent IO from fs while mddev_suspend can guarantee that.





I prefer to make resync thread not wait forever here.



[...]



-    sh = raid5_get_active_stripe(conf, new_sector, previous,
+    sh = raid5_get_active_stripe(conf, new_sector, previous, 0,



Mistake here (fourth argument added instead of third)


Thanks for checking.

[...]


Unfortunately, this patch did not fix the issue.

     root@sloth:~/linux# cat /proc/$(pgrep md3_resync)/stack
     [<0>] raid5_get_active_stripe+0x1e7/0x6b0
     [<0>] raid5_sync_request+0x2a7/0x3d0
     [<0>] md_do_sync.cold+0x3ee/0x97c
     [<0>] md_thread+0xab/0x160
     [<0>] kthread+0x11b/0x140
     [<0>] ret_from_fork+0x22/0x30

which is ( according to objdump -d -Sl drivers/md/raid5.o ) at 
https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L735


Isn't it still the case that the superblocks are not written, therefore 
stripes are not processed, therefore number of active stripes are not 
decreasing? Who is expected to wake up conf->wait_for_stripe waiters?


Hmm, how about wake the waiter up in the while loop of raid5d?

@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)
md_check_recovery(mddev);
spin_lock_irq(>device_lock);
}
+
+   if ((atomic_read(>active_stripes)
+< (conf->max_nr_stripes * 3 / 4) ||
+(test_bit(MD_RECOVERY_INTR, >recovery
+   wake_up(>wait_for_stripe);
}
pr_debug("%d stripes handled\n", handled);

If the issue still appears then I will change the waiter to break just 
if MD_RECOVERY_INTR is set, something like.


wait_event_lock_irq(conf->wait_for_stripe,
(test_bit(MD_RECOVERY_INTR, >recovery) && sync_req) ||
 /* the previous condition */,
*(conf->hash_locks + hash));

Thanks,
Guoqing


Re: [PATCH v2 08/10] md: Implement ->corrupted_range()

2021-01-26 Thread Guoqing Jiang




On 1/25/21 23:55, Shiyang Ruan wrote:

With the support of ->rmap(), it is possible to obtain the superblock on
a mapped device.

If a pmem device is used as one target of mapped device, we cannot
obtain its superblock directly.  With the help of SYSFS, the mapped
device can be found on the target devices.  So, we iterate the
bdev->bd_holder_disks to obtain its mapped device.

Signed-off-by: Shiyang Ruan 
---
  drivers/md/dm.c   | 61 +++
  drivers/nvdimm/pmem.c | 11 +++-
  fs/block_dev.c| 42 -
  include/linux/genhd.h |  2 ++
  4 files changed, 107 insertions(+), 9 deletions(-)


I can't see md raid is involved here, perhaps dm-devel need to be cced 
instead of raid list. And the subject need to be changed as well.


Thanks,
Guoqing


Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-01-25 Thread Guoqing Jiang

Hi Donald,

On 1/25/21 22:32, Donald Buczek wrote:



On 25.01.21 09:54, Donald Buczek wrote:

Dear Guoqing,

a colleague of mine was able to produce the issue inside a vm and were 
able to find a procedure to run the vm into the issue within minutes 
(not unreliably after hours on a physical system as before). This of 
course helped to pinpoint the problem.


My current theory of what is happening is:

- MD_SB_CHANGE_CLEAN + MD_SB_CHANGE_PENDING are set by 
md_write_start() when file-system I/O wants to do a write and the 
array transitions from "clean" to "active". 
(https://elixir.bootlin.com/linux/v5.4.57/source/drivers/md/md.c#L8308)


- Before raid5d gets to write the superblock (its busy processing 
active stripes because of the sync activity) , userspace wants to 
pause the check by `echo idle > /sys/block/mdX/md/sync_action`


- action_store() takes the reconfig_mutex before trying to stop the 
sync thread. 
(https://elixir.bootlin.com/linux/v5.4.57/source/drivers/md/md.c#L4689) Dump 
of struct mddev of email 1/19/21 confirms reconf_mutex non-zero.


- raid5d is running in its main loop. 
raid5d()->handle_active_stripes() returns a positive batch size ( 
https://elixir.bootlin.com/linux/v5.4.57/source/drivers/md/raid5.c#L6329 
) although raid5d()->handle_active_stripes()->handle_stripe() doesn't 
process any stripe because of MD_SB_CHANGE_PENDING. 
(https://elixir.bootlin.com/linux/v5.4.57/source/drivers/md/raid5.c#L4729 
). This is the reason, raid5d is busy looping.


- raid5d()->md_check_recovery() is called by the raid5d main loop. One 
of its duties is to write the superblock, if a change is pending. 
However to do so, it needs either MD_ALLOW_SB_UPDATE or must be able 
to take the reconfig_mutex. 
(https://elixir.bootlin.com/linux/v5.4.57/source/drivers/md/md.c#L8967 
, 
https://elixir.bootlin.com/linux/v5.4.57/source/drivers/md/md.c#L9006) 
Both is not true, so the superblock is not written and 
MD_SB_CHANGE_PENDING is not cleared.


- (as discussed previously) the sync thread is waiting for the number 
of active stripes to go down and doesn't terminate. The userspace 
thread is waiting for the sync thread to terminate.


Does this make sense?


Yes, exactly! That was my thought too, the scenario is twisted.

Then resync thread is blocked due to there are too many active stripes, 
because raid5d is in busy loop since SB_CHANGE_PENDING is set which 
means tactive stripes can't be decreased, and echo idle cmd can't make 
progress given resync thread is blocked while the cmd still hold 
reconfig_mutex which make raid5d in busy loop and can't clear 
SB_CHANGE_PENDING flag.


And raid5 could suffer from the same issue I think.



Just for reference, I add the procedure which triggers the issue on 
the vm (with /dev/md3 mounted on /mnt/raid_md3) and some debug output:


```
#! /bin/bash

(
 while true; do
 echo "start check"
 echo check > /sys/block/md3/md/sync_action
 sleep 10
 echo "stop check"
 echo idle > /sys/block/md3/md/sync_action
 sleep 2
 done
) &

(
 while true; do
 dd bs=1k count=$((5*1024*1024)) if=/dev/zero 
of=/mnt/raid_md3/bigfile status=none

 sync /mnt/raid_md3/bigfile
 rm /mnt/raid_md3/bigfile
 sleep .1
 done
) &

start="$(date +%s)"
cd /sys/block/md3/md
wp_count=0
while true; do
 array_state=$(cat array_state)
 if [ "$array_state" = write-pending ]; then
 wp_count=$(($wp_count+1))
 else
 wp_count=0
 fi
 echo $(($(date +%s)-$start)) $(cat sync_action) $(cat 
sync_completed) $array_state $(cat stripe_cache_active)

 if [ $wp_count -ge 3 ]; then
 kill -- -$$
 exit
 fi
 sleep 1
done
```

The time, this needs to trigger the issue, varies from under a minute 
to one hour with 5 minute being typical. The output ends like this:


 309 check 6283872 / 8378368 active-idle 4144
 310 check 6283872 / 8378368 active 1702
 311 check 6807528 / 8378368 active 4152
 312 check 7331184 / 8378368 clean 3021
 stop check
 313 check 7331184 / 8378368 write-pending 3905
 314 check 7331184 / 8378368 write-pending 3905
 315 check 7331184 / 8378368 write-pending 3905
 Terminated

If I add

 pr_debug("XXX batch_size %d release %d mdddev->sb_flags %lx\n", 
batch_size, released, mddev->sb_flags);


in raid5d after the call to handle_active_stripes and enable the debug 
location after the deadlock occurred, I get


 [ 3123.939143] [1223] raid5d:6332: XXX batch_size 8 release 0 
mdddev->sb_flags 6
 [ 3123.939156] [1223] raid5d:6332: XXX batch_size 8 release 0 
mdddev->sb_flags 6
 [ 3123.939170] [1223] raid5d:6332: XXX batch_size 8 release 0 
mdddev->sb_flags 6
 [ 3123.939184] [1223] raid5d:6332: XXX batch_size 8 release 0 

Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2021-01-20 Thread Guoqing Jiang

Hi Donald,

On 1/19/21 12:30, Donald Buczek wrote:

Dear md-raid people,

I've reported a problem in this thread in December:

"We are using raid6 on several servers. Occasionally we had failures, 
where a mdX_raid6 process seems to go into a busy loop and all I/O to 
the md device blocks. We've seen this on various kernel versions." It 
was clear, that this is related to "mdcheck" running, because we found, 
that the command, which should stop the scrubbing in the morning (`echo 
idle > /sys/devices/virtual/block/md1/md/sync_action`) is also blocked.


On 12/21/20, I've reported, that the problem might be caused by a 
failure of the underlying block device driver, because I've found 
"inflight" counters of the block devices not being zero. However, this 
is not the case. We were able to run into the mdX_raid6 looping 
condition a few times again, but the non-zero inflight counters have not 
been observed again.


I was able to collect a lot of additional information from a blocked 
system.


- The `cat idle > /sys/devices/virtual/block/md1/md/sync_action` command 
is waiting at kthread_stop to stop the sync thread. [ 
https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L7989 ]


- The sync thread ("md1_resync") does not finish, because its blocked at

[<0>] raid5_get_active_stripe+0x4c4/0x660 # [1]
[<0>] raid5_sync_request+0x364/0x390
[<0>] md_do_sync+0xb41/0x1030
[<0>] md_thread+0x122/0x160
[<0>] kthread+0x118/0x130
[<0>] ret_from_fork+0x22/0x30

[1] https://elixir.bootlin.com/linux/latest/source/drivers/md/raid5.c#L735

- yes, gdb confirms that `conf->cache_state` is 0x03 ( 
R5_INACTIVE_BLOCKED + R5_ALLOC_MORE )


The resync thread is blocked since it can't get sh from inactive list, 
so R5_ALLOC_MORE and R5_INACTIVE_BLOCKED are set, that is why `echo idle 
> /sys/devices/virtual/block/md1/md/sync_action` can't stop sync thread.




- We have lots of active stripes:

root@deadbird:~/linux_problems/mdX_raid6_looping# cat 
/sys/block/md1/md/stripe_cache_active

27534


There are too many active stripes, so this is false:

atomic_read(>active_stripes) < (conf->max_nr_stripes * 3 / 4)

so raid5_get_active_stripe has to wait till it becomes true, either 
increase max_nr_stripes or decrease active_stripes.


1. Increase max_nr_stripes
since "mdX_raid6 process seems to go into a busy loop" and R5_ALLOC_MORE 
is set, if there is enough memory, I suppose mdX_raid6 (raid5d) could 
alloc new stripe in grow_one_stripe and increase max_nr_stripes. So 
please check the memory usage of your system.


Another thing is you can try to increase the number of sh manually by 
write new number to stripe_cache_size if there is enough memory.


2. Or decrease active_stripes
This is suppose to be done by do_release_stripe, but STRIPE_HANDLE is set


- handle_stripe() doesn't make progress:

echo "func handle_stripe =pflt" > /sys/kernel/debug/dynamic_debug/control

In dmesg, we see the debug output from 
https://elixir.bootlin.com/linux/latest/source/drivers/md/raid5.c#L4925 
but never from 
https://elixir.bootlin.com/linux/latest/source/drivers/md/raid5.c#L4958:


[171908.896651] [1388] handle_stripe:4929: handling stripe 4947089856, 
state=0x2029 cnt=1, pd_idx=9, qd_idx=10

     , check:4, reconstruct:0
[171908.896657] [1388] handle_stripe:4929: handling stripe 4947089872, 
state=0x2029 cnt=1, pd_idx=9, qd_idx=10

     , check:4, reconstruct:0
[171908.896669] [1388] handle_stripe:4929: handling stripe 4947089912, 
state=0x2029 cnt=1, pd_idx=9, qd_idx=10

     , check:4, reconstruct:0

The sector numbers repeat after some time. We have only the following 
variants of stripe state and "check":


state=0x2031 cnt=1, check:0 # ACTIVE    +INSYNC+REPLACED+IO_STARTED, 
check_state_idle
state=0x2029 cnt=1, check:4 # ACTIVE+SYNCING   +REPLACED+IO_STARTED, 
check_state_check_result
state=0x2009 cnt=1, check:0 # ACTIVE+SYNCING    +IO_STARTED, 
check_state_idle


- We have MD_SB_CHANGE_PENDING set:


because MD_SB_CHANGE_PENDING flag. So do_release_stripe can't call 
atomic_dec(>active_stripes).


Normally, SB_CHANGE_PENDING is cleared from md_update_sb which could be 
called by md_reap_sync_thread. But md_reap_sync_thread stuck with 
unregister sync_thread (it was blocked in raid5_get_active_stripe).



Still I don't understand well why mdX_raid6 in a busy loop, maybe raid5d 
can't break from the while(1) loop because "if (!batch_size && 
!released)" is false. I would assume released is '0' since

>  released_stripes = {
>  first = 0x0
>  }
And __get_priority_stripe fetches sh from conf->handle_list and delete
it from handle_list, handle_stripe marks the state of the sh with 
STRIPE_HANDLE, then do_release_stripe put the sh back to handle_list.

So batch_size can't be '0', and handle_active_stripes in the loop
repeats the process in the busy loop. This is my best guess to explain 
the busy loop situation.





Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2020-12-02 Thread Guoqing Jiang

Hi Donald,

On 12/2/20 18:28, Donald Buczek wrote:

Dear Guoqing,

unfortunately the patch didn't fix the problem (unless I messed it up 
with my logging). This is what I used:


     --- a/drivers/md/md.c
     +++ b/drivers/md/md.c
     @@ -9305,6 +9305,14 @@ void md_check_recovery(struct mddev *mddev)
     clear_bit(MD_RECOVERY_NEEDED, 
>recovery);

     goto unlock;
     }


I think you can add the check of RECOVERY_CHECK in above part instead of 
add a new part.



     +   if (test_bit(MD_RECOVERY_RUNNING, >recovery) &&
     +   (!test_bit(MD_RECOVERY_DONE, >recovery) ||
     +    test_bit(MD_RECOVERY_CHECK, >recovery))) {
     +   /* resync/recovery still happening */
     +   pr_info("md: XXX BUGFIX applied\n");
     +   clear_bit(MD_RECOVERY_NEEDED, 
>recovery);

     +   goto unlock;
     +   }
     if (mddev->sync_thread) {
     md_reap_sync_thread(mddev);
     goto unlock;





With pausing and continuing the check four times an hour, I could 
trigger the problem after about 48 hours. This time, the other device 
(md0) has locked up on `echo idle > 
/sys/devices/virtual/block/md0/md/sync_action` , while the check of md1 
is still ongoing:


Without the patch, md0 was good while md1 was locked. So the patch 
switches the status of the two arrays, a little weird ...


What is the stack of the process? I guess it is same as the stack of 
2 in your previous mail, but just to confirm.




     Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
[multipath]
     md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] 
sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
   109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
2 [16/16] []
   [=>...]  check =  8.5% (666852112/7813894144) 
finish=1271.2min speed=93701K/sec

   bitmap: 0/59 pages [0KB], 65536KB chunk
     md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] 
sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] 
sdu[2] sdt[1]
   109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
2 [16/16] []
   [>]  check =  0.2% (19510348/7813894144) 
finish=253779.6min speed=511K/sec

   bitmap: 0/59 pages [0KB], 65536KB chunk

after 1 minute:

     Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
[multipath]
     md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] 
sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
   109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
2 [16/16] []
   [=>...]  check =  8.6% (674914560/7813894144) 
finish=941.1min speed=126418K/sec

   bitmap: 0/59 pages [0KB], 65536KB chunk
     md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] 
sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] 
sdu[2] sdt[1]
   109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
2 [16/16] []
   [>]  check =  0.2% (19510348/7813894144) 
finish=256805.0min speed=505K/sec

   bitmap: 0/59 pages [0KB], 65536KB chunk

A data point, I didn't mention in my previous mail, is that the 
mdX_resync thread is not gone when the problem occurs:


     buczek@done:/scratch/local/linux (v5.10-rc6-mpi)$ ps -Af|fgrep [md
     root   134 2  0 Nov30 ?    00:00:00 [md]
     root  1321 2 27 Nov30 ?    12:57:14 [md0_raid6]
     root  1454 2 26 Nov30 ?    12:37:23 [md1_raid6]
     root  5845 2  0 16:20 ?    00:00:30 [md0_resync]
     root  5855 2 13 16:20 ?    00:14:11 [md1_resync]
     buczek    9880  9072  0 18:05 pts/0    00:00:00 grep -F [md
     buczek@done:/scratch/local/linux (v5.10-rc6-mpi)$ sudo cat 
/proc/5845/stack

     [<0>] md_bitmap_cond_end_sync+0x12d/0x170
     [<0>] raid5_sync_request+0x24b/0x390
     [<0>] md_do_sync+0xb41/0x1030
     [<0>] md_thread+0x122/0x160
     [<0>] kthread+0x118/0x130
     [<0>] ret_from_fork+0x1f/0x30

I guess, md_bitmap_cond_end_sync+0x12d is the 
`wait_event(bitmap->mddev->recovery_wait,atomic_read(>mddev->recovery_active) 
== 0);` in md-bitmap.c.




Could be, if so, then I think md_done_sync was not triggered by the path 
md0_raid6 -> ... -> handle_stripe.


I'd suggest to compare the stacks between md0 and md1 to find the 
difference.


Thanks,
Guoqing


Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

2020-11-29 Thread Guoqing Jiang




On 11/28/20 13:25, Donald Buczek wrote:

Dear Linux mdraid people,

we are using raid6 on several servers. Occasionally we had failures, 
where a mdX_raid6 process seems to go into a busy loop and all I/O to 
the md device blocks. We've seen this on various kernel versions.


The last time this happened (in this case with Linux 5.10.0-rc4), I took 
some data.


The triggering event seems to be the md_check cron job trying to pause 
the ongoing check operation in the morning with


     echo idle > /sys/devices/virtual/block/md1/md/sync_action

This doesn't complete. Here's /proc/stack of this process:

     root@done:~/linux_problems/mdX_raid6_looping/2020-11-27# ps -fp 2
     UID    PID  PPID  C STIME TTY  TIME CMD
     root 2 23331  0 02:00 ?    00:00:00 /bin/bash 
/usr/bin/mdcheck --continue --duration 06:00
     root@done:~/linux_problems/mdX_raid6_looping/2020-11-27# cat 
/proc/2/stack

     [<0>] kthread_stop+0x6e/0x150
     [<0>] md_unregister_thread+0x3e/0x70
     [<0>] md_reap_sync_thread+0x1f/0x1e0
     [<0>] action_store+0x141/0x2b0
     [<0>] md_attr_store+0x71/0xb0
     [<0>] kernfs_fop_write+0x113/0x1a0
     [<0>] vfs_write+0xbc/0x250
     [<0>] ksys_write+0xa1/0xe0
     [<0>] do_syscall_64+0x33/0x40
     [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Note, that md0 has been paused successfully just before.


What is the personality of md0? Is it also raid6?



     2020-11-27T02:00:01+01:00 done CROND[2]: (root) CMD 
(/usr/bin/mdcheck --continue --duration "06:00")
     2020-11-27T02:00:01+01:00 done root: mdcheck continue checking 
/dev/md0 from 10623180920
     2020-11-27T02:00:01.382994+01:00 done kernel: [378596.606381] md: 
data-check of RAID array md0
     2020-11-27T02:00:01+01:00 done root: mdcheck continue checking 
/dev/md1 from 11582849320
     2020-11-27T02:00:01.437999+01:00 done kernel: [378596.661559] md: 
data-check of RAID array md1
     2020-11-27T06:00:01.842003+01:00 done kernel: [392996.625147] md: 
md0: data-check interrupted.
     2020-11-27T06:00:02+01:00 done root: pause checking /dev/md0 at 
13351127680
     2020-11-27T06:00:02.338989+01:00 done kernel: [392997.122520] md: 
md1: data-check interrupted.

     [ nothing related following ]

After that, we see md1_raid6 in a busy loop:

     PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ 
COMMAND
     2376 root 20   0   0  0  0 R 100.0  0.0   1387:38 
md1_raid6


Seems the reap sync thread was trying to stop md1_raid6 while md1_raid6 
was triggered again and again.




Also, all processes doing I/O do the md device block.

This is /proc/mdstat:

     Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
[multipath]
     md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] 
sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
   109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
2 [16/16] []
   [==>..]  check = 94.0% 
(7350290348/7813894144) finish=57189.3min speed=135K/sec

   bitmap: 0/59 pages [0KB], 65536KB chunk
     md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] 
sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] 
sdu[2] sdt[1]
   109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
2 [16/16] []

   bitmap: 0/59 pages [0KB], 65536KB chunk



So the RECOVERY_CHECK flag should be set, not sure if the simple changes
helps, but you may give it a try.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 98bac4f..e2697d0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9300,7 +9300,8 @@ void md_check_recovery(struct mddev *mddev)
md_update_sb(mddev, 0);

if (test_bit(MD_RECOVERY_RUNNING, >recovery) &&
-   !test_bit(MD_RECOVERY_DONE, >recovery)) {
+   (!test_bit(MD_RECOVERY_DONE, >recovery) ||
+test_bit(MD_RECOVERY_CHECK, >recovery))) {
/* resync/recovery still happening */
clear_bit(MD_RECOVERY_NEEDED, >recovery);
goto unlock;

Thanks,
Guoqing


Re: decruft the early init / initrd / initramfs code v2

2020-07-16 Thread Guoqing Jiang

On 7/15/20 8:51 AM, Christoph Hellwig wrote:

On Tue, Jul 14, 2020 at 12:34:45PM -0700, Linus Torvalds wrote:

On Tue, Jul 14, 2020 at 12:06 PM Christoph Hellwig  wrote:

this series starts to move the early init code away from requiring
KERNEL_DS to be implicitly set during early startup.  It does so by
first removing legacy unused cruft, and the switches away the code
from struct file based APIs to our more usual in-kernel APIs.

Looks good to me, with the added note on the utimes cruft too as a
further cleanup (separate patch).

So you can add my acked-by.

I _would_ like the md parts to get a few more acks. I see the one from
Song Liu, anybody else in md land willing to go through those patches?
They were the bulk of it, and the least obvious to me because I don't
know that code at all?

Song is the maintainer.   Neil is the only person I could think of
that also knows the old md code pretty well.  Guoqing has contributed
a lot lately, but the code touched here is rather historic (and not
used very much at all these days as people use modular md and initramfѕ
based detection).


Hi Christoph,

I just cloned the tree, seems there is compile issue that you need to 
resolve.


hch-misc$ make -j8
  DESCEND  objtool
  CALL    scripts/atomic/check-atomics.sh
  CALL    scripts/checksyscalls.sh
  CHK include/generated/compile.h
  CC  drivers/md/md.o
  CC  drivers/md/md-bitmap.o
  CC  drivers/md/md-autodetect.o
  AR  drivers/perf/built-in.a
  CC  drivers/md/dm.o
  AR  drivers/hwtracing/intel_th/built-in.a
  CC  drivers/nvmem/core.o
drivers/md/md.c:7809:45: error: static declaration of ‘md_fops’ follows 
non-static declaration

 static const struct block_device_operations md_fops =
 ^~~
drivers/md/md.c:329:38: note: previous declaration of ‘md_fops’ was here
 const struct block_device_operations md_fops;
  ^~~
scripts/Makefile.build:280: recipe for target 'drivers/md/md.o' failed
make[2]: *** [drivers/md/md.o] Error 1
make[2]: *** Waiting for unfinished jobs

And for the changes of md, feel free to add my Acked-by if it could help.

Thanks,
Guoqing


Re: [PATCH 1/6] md: switch to ->check_events for media change notifications

2020-07-08 Thread Guoqing Jiang

On 7/8/20 3:23 PM, Matthew Wilcox wrote:

On Wed, Jul 08, 2020 at 03:17:31PM +0200, Guoqing Jiang wrote:

On 7/8/20 2:25 PM, Christoph Hellwig wrote:

-static int md_media_changed(struct gendisk *disk)
-{
-   struct mddev *mddev = disk->private_data;
-
-   return mddev->changed;
-}

Maybe we can remove "changed" from struct mddev since no one reads it
after the change.

You missed this hunk:

+static unsigned int md_check_events(struct gendisk *disk, unsigned int 
clearing)
  {
 struct mddev *mddev = disk->private_data;
+   unsigned int ret = 0;

+   if (mddev->changed)
+   ret = DISK_EVENT_MEDIA_CHANGE;
 mddev->changed = 0;
-   return 0;
+   return ret;
  }



Yes, sorry for the noise.

Thanks,
Guoqing



Re: [PATCH 1/6] md: switch to ->check_events for media change notifications

2020-07-08 Thread Guoqing Jiang

On 7/8/20 2:25 PM, Christoph Hellwig wrote:

-static int md_media_changed(struct gendisk *disk)
-{
-   struct mddev *mddev = disk->private_data;
-
-   return mddev->changed;
-}


Maybe we can remove "changed" from struct mddev since no one reads it
after the change.

Thanks,
Guoqing


Re: [PATCH 08/10] orangefs: use attach/detach_page_private

2020-05-28 Thread Guoqing Jiang

On 5/26/20 11:54 PM, Mike Marshall wrote:

I apologize for not mentioning that I ran this patch set
through orangefs xfstests at 5.7 rc5 with no problems
or regressions.


Glad to hear that, thanks for your effort.

Thanks,
Guoqing


[PATCH] mm_types.h: change set_page_private to inline function

2020-05-25 Thread Guoqing Jiang
Change it to inline function to make callers use the proper argument.
And no need for it to be macro per Andrew's comment [1].

[1] 
https://lore.kernel.org/lkml/20200518221235.1fa32c38e5766113f78e3...@linux-foundation.org/

Signed-off-by: Guoqing Jiang 
---
 include/linux/mm_types.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e88dc5c65c01..64ede5f150dc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -240,7 +240,11 @@ static inline atomic_t *compound_pincount_ptr(struct page 
*page)
 #define PAGE_FRAG_CACHE_MAX_ORDER  get_order(PAGE_FRAG_CACHE_MAX_SIZE)
 
 #define page_private(page) ((page)->private)
-#define set_page_private(page, v)  ((page)->private = (v))
+
+static inline void set_page_private(struct page *page, unsigned long private)
+{
+   page->private = private;
+}
 
 struct page_frag_cache {
void * va;
-- 
2.17.1



Re: [PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-24 Thread Guoqing Jiang

On 5/23/20 1:53 AM, Andrew Morton wrote:

On Fri, 22 May 2020 09:18:25 +0200 Guoqing Jiang 
 wrote:


-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));

attach_page_private(newpage, detach_page_private(page));

Mattew had suggested it as follows, but not sure if we can reorder of
the setup of
the bh and setting PagePrivate, so I didn't want to break the original
syntax.

@@ -797,11 +797,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
  if (rc != MIGRATEPAGE_SUCCESS)
  goto unlock_buffers;
   
-   ClearPagePrivate(page);

-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   get_page(newpage);
+   attach_page_private(newpage, detach_page_private(page));
   
  bh = head;

  do {
@@ -810,8 +806,6 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
   
  } while (bh != head);
   
-   SetPagePrivate(newpage);

-
  if (mode != MIGRATE_SYNC_NO_COPY)

This is OK - coherency between PG_private and the page's buffer
ring is maintained by holding lock_page().


Appreciate for your explanation.

Thanks,
Guoqing


I have (effectively) applied the above change.


Re: [PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-22 Thread Guoqing Jiang

Hi Dave,

On 5/22/20 12:52 AM, Dave Chinner wrote:

On Sun, May 17, 2020 at 11:47:18PM +0200, Guoqing Jiang wrote:

We can cleanup code a little by call detach_page_private here.

Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

  mm/migrate.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 5fed0305d2ec..f99502bc113c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -804,10 +804,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
  
-	ClearPagePrivate(page);

-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));

attach_page_private(newpage, detach_page_private(page));


Mattew had suggested it as follows, but not sure if we can reorder of 
the setup of
the bh and setting PagePrivate, so I didn't want to break the original 
syntax.


@@ -797,11 +797,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
 
-   ClearPagePrivate(page);

-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   get_page(newpage);
+   attach_page_private(newpage, detach_page_private(page));
 
bh = head;

do {
@@ -810,8 +806,6 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
 
} while (bh != head);
 
-   SetPagePrivate(newpage);

-
if (mode != MIGRATE_SYNC_NO_COPY)



[1]. 
https://lore.kernel.org/lkml/20200502004158.gd29...@bombadil.infradead.org/
[2]. 
https://lore.kernel.org/lkml/e4d5ddc0-877f-6499-f697-2b7c0ddbf...@cloud.ionos.com/


Thanks,
Guoqing



[UPDATE PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-19 Thread Guoqing Jiang
We can cleanup code a little by call detach_page_private here.

Signed-off-by: Guoqing Jiang 
---
Add the cast to fix type mismatch warning, sorry for the mistake.

 mm/migrate.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 7160c1556f79..44546d407e40 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -797,10 +797,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
 
-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, (unsigned long)detach_page_private(page));
get_page(newpage);
 
bh = head;
-- 
2.17.1



Re: [PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-19 Thread Guoqing Jiang

Hi Andrew,

On 5/19/20 9:35 AM, Guoqing Jiang wrote:

On 5/19/20 7:12 AM, Andrew Morton wrote:
On Sun, 17 May 2020 23:47:18 +0200 Guoqing Jiang 
 wrote:



We can cleanup code a little by call detach_page_private here.

...

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -804,10 +804,7 @@ static int __buffer_migrate_page(struct 
address_space *mapping,

  if (rc != MIGRATEPAGE_SUCCESS)
  goto unlock_buffers;
  -    ClearPagePrivate(page);
-    set_page_private(newpage, page_private(page));
-    set_page_private(page, 0);
-    put_page(page);
+    set_page_private(newpage, detach_page_private(page));
  get_page(newpage);
    bh = head;

mm/migrate.c: In function '__buffer_migrate_page':
./include/linux/mm_types.h:243:52: warning: assignment makes integer 
from pointer without a cast [-Wint-conversion]

  #define set_page_private(page, v) ((page)->private = (v))
 ^
mm/migrate.c:800:2: note: in expansion of macro 'set_page_private'
   set_page_private(newpage, detach_page_private(page));
   ^~~~

The fact that set_page_private(detach_page_private()) generates a type
mismatch warning seems deeply wrong, surely.

Please let's get the types sorted out - either unsigned long or void *,
not half-one and half-the other.  Whatever needs the least typecasting
at callsites, I suggest.


Sorry about that, I should notice the warning before. I will double 
check if other

places need the typecast or not, then send a new version.


Only this patch missed the typecast. I guess I just need to send an 
updated patch
to replace this one (I am fine to send a new patch set if you want), 
sorry again for

the trouble.




And can we please implement set_page_private() and page_private() with
inlined C code?  There is no need for these to be macros.


Just did a quick change.

-#define page_private(page) ((page)->private)
-#define set_page_private(page, v)  ((page)->private = (v))
+static inline unsigned long page_private(struct page *page)
+{
+   return page->private;
+}
+
+static inline void set_page_private(struct page *page, unsigned long 
priv_data)

+{
+   page->private = priv_data;
+}

Then I get error like.

fs/erofs/zdata.h: In function ‘z_erofs_onlinepage_index’:
fs/erofs/zdata.h:126:8: error: lvalue required as unary ‘&’ operand
  u.v = _private(page);
    ^

I guess it is better to keep page_private as macro, please correct me 
in case I

missed something.


Lost of problems need to be fixed if change page_private to inline 
function, so I

think it is better to keep it and only convert set_page_private.

mm/compaction.c: In function ‘isolate_migratepages_block’:
./include/linux/compiler.h:287:20: error: lvalue required as unary ‘&’ 
operand

   __read_once_size(&(x), __u.__c, sizeof(x));  \
    ^
./include/linux/compiler.h:293:22: note: in expansion of macro ‘__READ_ONCE’
 #define READ_ONCE(x) __READ_ONCE(x, 1)
  ^~~
mm/internal.h:293:34: note: in expansion of macro ‘READ_ONCE’
 #define page_order_unsafe(page)  READ_ONCE(page_private(page))


Thanks,
Guoqing


Re: [PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-19 Thread Guoqing Jiang

On 5/19/20 12:06 PM, Gao Xiang wrote:

On Tue, May 19, 2020 at 09:35:59AM +0200, Guoqing Jiang wrote:

On 5/19/20 7:12 AM, Andrew Morton wrote:

On Sun, 17 May 2020 23:47:18 +0200 Guoqing Jiang 
 wrote:


We can cleanup code a little by call detach_page_private here.

...

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -804,10 +804,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));
get_page(newpage);
bh = head;

mm/migrate.c: In function '__buffer_migrate_page':
./include/linux/mm_types.h:243:52: warning: assignment makes integer from 
pointer without a cast [-Wint-conversion]
   #define set_page_private(page, v) ((page)->private = (v))
  ^
mm/migrate.c:800:2: note: in expansion of macro 'set_page_private'
set_page_private(newpage, detach_page_private(page));
^~~~

The fact that set_page_private(detach_page_private()) generates a type
mismatch warning seems deeply wrong, surely.

Please let's get the types sorted out - either unsigned long or void *,
not half-one and half-the other.  Whatever needs the least typecasting
at callsites, I suggest.

Sorry about that, I should notice the warning before. I will double check if
other
places need the typecast or not, then send a new version.


And can we please implement set_page_private() and page_private() with
inlined C code?  There is no need for these to be macros.

Just did a quick change.

-#define page_private(page)Â Â Â Â Â Â Â Â Â Â Â Â  ((page)->private)
-#define set_page_private(page, v)Â Â Â Â Â  ((page)->private = (v))
+static inline unsigned long page_private(struct page *page)
+{
+Â Â Â Â Â Â  return page->private;
+}
+
+static inline void set_page_private(struct page *page, unsigned long
priv_data)
+{
+Â Â Â Â Â Â  page->private = priv_data;
+}

Then I get error like.

fs/erofs/zdata.h: In function ‘z_erofs_onlinepage_index’:
fs/erofs/zdata.h:126:8: error: lvalue required as unary ‘&’ operand
  u.v = _private(page);
        ^

I guess it is better to keep page_private as macro, please correct me in
case I
missed something.

I guess that you could Cc me in the reply.


Sorry for not do that ...


In that case, EROFS uses page->private as an atomic integer to
trace 2 partial subpages in one page.

I think that you could also use >private instead directly to
replace _private(page) here since I didn't find some hint to
pick _private(page) or >private.


Thanks for the input, I just did a quick test, so need to investigate more.
And I think it is better to have another thread to change those macros to
inline function, then fix related issues due to the change.


In addition, I found some limitation of new {attach,detach}_page_private
helper (that is why I was interested in this series at that time [1] [2],
but I gave up finally) since many patterns (not all) in EROFS are

io_submit (origin, page locked):
attach_page_private(page);
...
put_page(page);

end_io (page locked):
SetPageUptodate(page);
unlock_page(page);

since the page is always locked, so io_submit could be simplified as
set_page_private(page, ...);
SetPagePrivate(page);
, which can save both one temporary get_page(page) and one
put_page(page) since it could be regarded as safe with page locked.


The SetPageUptodate is not called inside {attach,detach}_page_private,
I could probably misunderstand your point, maybe you want the new pairs
can handle the locked page, care to elaborate more?


btw, I noticed the patchset versions are PATCH [3], RFC PATCH [4],
RFC PATCH v2 [5], RFC PATCH v3 [6], PATCH [7]. Although I also
noticed the patchset title was once changed, but it could be some
harder to trace the whole history discussion.

[1] 
https://lore.kernel.org/linux-fsdevel/20200419051404.GA30986@hsiangkao-HP-ZHAN-66-Pro-G1/
[2] 
https://lore.kernel.org/linux-fsdevel/20200427025752.GA3979@hsiangkao-HP-ZHAN-66-Pro-G1/
[3] 
https://lore.kernel.org/linux-fsdevel/20200418225123.31850-1-guoqing.ji...@cloud.ionos.com/
[4] 
https://lore.kernel.org/linux-fsdevel/20200426214925.10970-1-guoqing.ji...@cloud.ionos.com/
[5] 
https://lore.kernel.org/linux-fsdevel/20200430214450.10662-1-guoqing.ji...@cloud.ionos.com/
[6] 
https://lore.kernel.org/linux-fsdevel/20200507214400.15785-1-guoqing.ji...@cloud.ionos.com/
[7] 
https://lore.kernel.org/linux-fsdevel/20200517214718.468-1-guoqing.ji...@cloud.ionos.com/


All the cover letter of those series are here.

RFC 
V3:https://lore.kernel.org/lkml/20200507214400.15785-1-guoqing.ji...@cloud.ionos.com/
RFC 
V2:https://lore.kernel.org/lkml/20200430214450.10662-1-guoqing.ji...@cloud.ionos.com/
RFC:https://lore.kernel.org/lkml/20200

Re: [PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-19 Thread Guoqing Jiang

On 5/19/20 7:12 AM, Andrew Morton wrote:

On Sun, 17 May 2020 23:47:18 +0200 Guoqing Jiang 
 wrote:


We can cleanup code a little by call detach_page_private here.

...

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -804,10 +804,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
  
-	ClearPagePrivate(page);

-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));
get_page(newpage);
  
  	bh = head;

mm/migrate.c: In function '__buffer_migrate_page':
./include/linux/mm_types.h:243:52: warning: assignment makes integer from 
pointer without a cast [-Wint-conversion]
  #define set_page_private(page, v) ((page)->private = (v))
 ^
mm/migrate.c:800:2: note: in expansion of macro 'set_page_private'
   set_page_private(newpage, detach_page_private(page));
   ^~~~

The fact that set_page_private(detach_page_private()) generates a type
mismatch warning seems deeply wrong, surely.

Please let's get the types sorted out - either unsigned long or void *,
not half-one and half-the other.  Whatever needs the least typecasting
at callsites, I suggest.


Sorry about that, I should notice the warning before. I will double 
check if other

places need the typecast or not, then send a new version.


And can we please implement set_page_private() and page_private() with
inlined C code?  There is no need for these to be macros.


Just did a quick change.

-#define page_private(page) ((page)->private)
-#define set_page_private(page, v)  ((page)->private = (v))
+static inline unsigned long page_private(struct page *page)
+{
+   return page->private;
+}
+
+static inline void set_page_private(struct page *page, unsigned long 
priv_data)

+{
+   page->private = priv_data;
+}

Then I get error like.

fs/erofs/zdata.h: In function ‘z_erofs_onlinepage_index’:
fs/erofs/zdata.h:126:8: error: lvalue required as unary ‘&’ operand
  u.v = _private(page);
    ^

I guess it is better to keep page_private as macro, please correct me in 
case I

missed something.

Thanks,
Guoqing



[PATCH 03/10] btrfs: use attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in btrfs.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. call attach_page_private(newpage, clear_page_private(page)) to
   cleanup code further as suggested by Dave Chinner.

 fs/btrfs/disk-io.c   |  4 +---
 fs/btrfs/extent_io.c | 21 ++---
 fs/btrfs/inode.c | 23 +--
 3 files changed, 12 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 714b57553ed6..44745d5e05cf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -976,9 +976,7 @@ static void btree_invalidatepage(struct page *page, 
unsigned int offset,
btrfs_warn(BTRFS_I(page->mapping->host)->root->fs_info,
   "page private not zero on page %llu",
   (unsigned long long)page_offset(page));
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
+   detach_page_private(page);
}
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 451d17bfafd8..704239546093 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3099,22 +3099,16 @@ static int submit_extent_page(unsigned int opf,
 static void attach_extent_buffer_page(struct extent_buffer *eb,
  struct page *page)
 {
-   if (!PagePrivate(page)) {
-   SetPagePrivate(page);
-   get_page(page);
-   set_page_private(page, (unsigned long)eb);
-   } else {
+   if (!PagePrivate(page))
+   attach_page_private(page, eb);
+   else
WARN_ON(page->private != (unsigned long)eb);
-   }
 }
 
 void set_page_extent_mapped(struct page *page)
 {
-   if (!PagePrivate(page)) {
-   SetPagePrivate(page);
-   get_page(page);
-   set_page_private(page, EXTENT_PAGE_PRIVATE);
-   }
+   if (!PagePrivate(page))
+   attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
 }
 
 static struct extent_map *
@@ -4935,10 +4929,7 @@ static void btrfs_release_extent_buffer_pages(struct 
extent_buffer *eb)
 * We need to make sure we haven't be attached
 * to a new eb.
 */
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   /* One for the page private */
-   put_page(page);
+   detach_page_private(page);
}
 
if (mapped)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 037efc25d993..6c2d6ecedd6a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7927,11 +7927,8 @@ static void btrfs_readahead(struct readahead_control 
*rac)
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
int ret = try_release_extent_mapping(page, gfp_flags);
-   if (ret == 1) {
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-   }
+   if (ret == 1)
+   detach_page_private(page);
return ret;
 }
 
@@ -7953,14 +7950,8 @@ static int btrfs_migratepage(struct address_space 
*mapping,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page)) {
-   ClearPagePrivate(page);
-   get_page(newpage);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   SetPagePrivate(newpage);
-   }
+   if (page_has_private(page))
+   attach_page_private(newpage, detach_page_private(page));
 
if (PagePrivate2(page)) {
ClearPagePrivate2(page);
@@ -8082,11 +8073,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
}
 
ClearPageChecked(page);
-   if (PagePrivate(page)) {
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-   }
+   detach_page_private(page);
 }
 
 /*
-- 
2.17.1



[PATCH 09/10] buffer_head.h: remove attach_page_buffers

2020-05-17 Thread Guoqing Jiang
All the callers have replaced attach_page_buffers with the new function
attach_page_private, so remove it.

Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
No change since RFC.

 include/linux/buffer_head.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 15b765a181b8..22fb11e2d2e0 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -272,14 +272,6 @@ void buffer_init(void);
  * inline definitions
  */
 
-static inline void attach_page_buffers(struct page *page,
-   struct buffer_head *head)
-{
-   get_page(page);
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)head);
-}
-
 static inline void get_bh(struct buffer_head *bh)
 {
 atomic_inc(>b_count);
-- 
2.17.1



[PATCH 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-17 Thread Guoqing Jiang
We can cleanup code a little by call detach_page_private here.

Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

 mm/migrate.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 5fed0305d2ec..f99502bc113c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -804,10 +804,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
 
-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));
get_page(newpage);
 
bh = head;
-- 
2.17.1



[PATCH 08/10] orangefs: use attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in orangefs.

Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. avoid potential use-after-free as suggested by Dave Chinner.

 fs/orangefs/inode.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 12ae630fbed7..48f0547d4850 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -62,12 +62,7 @@ static int orangefs_writepage_locked(struct page *page,
} else {
ret = 0;
}
-   if (wr) {
-   kfree(wr);
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
-   }
+   kfree(detach_page_private(page));
return ret;
 }
 
@@ -409,9 +404,7 @@ static int orangefs_write_begin(struct file *file,
wr->len = len;
wr->uid = current_fsuid();
wr->gid = current_fsgid();
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)wr);
-   get_page(page);
+   attach_page_private(page, wr);
 okay:
return 0;
 }
@@ -459,18 +452,12 @@ static void orangefs_invalidatepage(struct page *page,
wr = (struct orangefs_write_range *)page_private(page);
 
if (offset == 0 && length == PAGE_SIZE) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
+   kfree(detach_page_private(page));
return;
/* write range entirely within invalidate range (or equal) */
} else if (page_offset(page) + offset <= wr->pos &&
wr->pos + wr->len <= page_offset(page) + offset + length) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
+   kfree(detach_page_private(page));
/* XXX is this right? only caller in fs */
cancel_dirty_page(page);
return;
@@ -535,12 +522,7 @@ static int orangefs_releasepage(struct page *page, gfp_t 
foo)
 
 static void orangefs_freepage(struct page *page)
 {
-   if (PagePrivate(page)) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
-   }
+   kfree(detach_page_private(page));
 }
 
 static int orangefs_launder_page(struct page *page)
@@ -740,9 +722,7 @@ vm_fault_t orangefs_page_mkwrite(struct vm_fault *vmf)
wr->len = PAGE_SIZE;
wr->uid = current_fsuid();
wr->gid = current_fsgid();
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)wr);
-   get_page(page);
+   attach_page_private(page, wr);
 okay:
 
file_update_time(vmf->vma->vm_file);
-- 
2.17.1



[PATCH 02/10] md: remove __clear_page_buffers and use attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
After introduce attach/detach_page_private in pagemap.h, we can remove
the duplicat code and call the new functions.

Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Acked-by: Song Liu 
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 drivers/md/md-bitmap.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b952bd45bd6a..95a5f3757fa3 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -324,14 +324,6 @@ static void end_bitmap_write(struct buffer_head *bh, int 
uptodate)
wake_up(>write_wait);
 }
 
-/* copied from buffer.c */
-static void
-__clear_page_buffers(struct page *page)
-{
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-}
 static void free_buffers(struct page *page)
 {
struct buffer_head *bh;
@@ -345,7 +337,7 @@ static void free_buffers(struct page *page)
free_buffer_head(bh);
bh = next;
}
-   __clear_page_buffers(page);
+   detach_page_private(page);
put_page(page);
 }
 
@@ -374,7 +366,7 @@ static int read_page(struct file *file, unsigned long index,
ret = -ENOMEM;
goto out;
}
-   attach_page_buffers(page, bh);
+   attach_page_private(page, bh);
blk_cur = index << (PAGE_SHIFT - inode->i_blkbits);
while (bh) {
block = blk_cur;
-- 
2.17.1



[PATCH 05/10] f2fs: use attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in f2fs.h.

Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: linux-f2fs-de...@lists.sourceforge.net
Acked-by: Chao Yu 
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 fs/f2fs/f2fs.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 3574629b75ba..a4d4a947f603 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3128,19 +3128,12 @@ static inline void f2fs_set_page_private(struct page 
*page,
if (PagePrivate(page))
return;
 
-   get_page(page);
-   SetPagePrivate(page);
-   set_page_private(page, data);
+   attach_page_private(page, (void *)data);
 }
 
 static inline void f2fs_clear_page_private(struct page *page)
 {
-   if (!PagePrivate(page))
-   return;
-
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   f2fs_put_page(page, 0);
+   detach_page_private(page);
 }
 
 /*
-- 
2.17.1



[PATCH 07/10] ntfs: replace attach_page_buffers with attach_page_private

2020-05-17 Thread Guoqing Jiang
Call the new function since attach_page_buffers will be removed.

Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Signed-off-by: Guoqing Jiang 
---
No change since RFC V2.

RFC -> RFC V2
1. change the name of new function to attach_page_private.

 fs/ntfs/aops.c | 2 +-
 fs/ntfs/mft.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 554b744f41bf..bb0a43860ad2 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1732,7 +1732,7 @@ void mark_ntfs_record_dirty(struct page *page, const 
unsigned int ofs) {
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
} else
buffers_to_free = bh;
}
diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 3aac5c917afe..fbb9f1bc623d 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -504,7 +504,7 @@ int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned 
long mft_no,
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
}
bh = head = page_buffers(page);
BUG_ON(!bh);
-- 
2.17.1



[PATCH 06/10] iomap: use attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in iomap.

Cc: Christoph Hellwig 
Cc: "Darrick J. Wong" 
Cc: linux-...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. call attach_page_private(newpage, clear_page_private(page)) to
   cleanup code further as suggested by Matthew Wilcox.
3. don't return attach_page_private in iomap_page_create per the
   comment from Christoph Hellwig.

 fs/iomap/buffered-io.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 890c8fcda4f3..a1ed7620fbac 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -59,24 +59,19 @@ iomap_page_create(struct inode *inode, struct page *page)
 * migrate_page_move_mapping() assumes that pages with private data have
 * their count elevated by 1.
 */
-   get_page(page);
-   set_page_private(page, (unsigned long)iop);
-   SetPagePrivate(page);
+   attach_page_private(page, iop);
return iop;
 }
 
 static void
 iomap_page_release(struct page *page)
 {
-   struct iomap_page *iop = to_iomap_page(page);
+   struct iomap_page *iop = detach_page_private(page);
 
if (!iop)
return;
WARN_ON_ONCE(atomic_read(>read_count));
WARN_ON_ONCE(atomic_read(>write_count));
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
kfree(iop);
 }
 
@@ -526,14 +521,8 @@ iomap_migrate_page(struct address_space *mapping, struct 
page *newpage,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page)) {
-   ClearPagePrivate(page);
-   get_page(newpage);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   SetPagePrivate(newpage);
-   }
+   if (page_has_private(page))
+   attach_page_private(newpage, detach_page_private(page));
 
if (mode != MIGRATE_SYNC_NO_COPY)
migrate_page_copy(newpage, page);
-- 
2.17.1



[PATCH 00/10] Introduce attach/detach_page_private to cleanup code

2020-05-17 Thread Guoqing Jiang
Hell Andrew and Al,

Since no more feedback from RFC V3, I suppose the series could be ready
for merge. And Song has already acked the md patch, then all the rest
patches are for fs and mm. 

So, if no further comments for the patchset, could you consider to queue
them from either of your tree to avoid potential build issue? Thank you!

RFC V3: 
https://lore.kernel.org/lkml/20200507214400.15785-1-guoqing.ji...@cloud.ionos.com/
RFC V3: 
https://lore.kernel.org/lkml/20200430214450.10662-1-guoqing.ji...@cloud.ionos.com/
RFC: 
https://lore.kernel.org/lkml/20200426214925.10970-1-guoqing.ji...@cloud.ionos.com/

Thanks,
Guoqing

No change since RFC V3.

RFC V2 -> RFC V3:
1. rename clear_page_private to detach_page_private.
2. Update the comments for attach/detach_page_private from Mattew.
3. add one patch to call new function in mm/migrate.c as suggested by Mattew, 
but
   use the conservative way to keep the orginal semantics [2].


RFC -> RFC V2:
1. rename the new functions and add comments for them.
2. change the return type of attach_page_private.
3. call attach_page_private(newpage, clear_page_private(page)) to cleanup code 
further.
4. avoid potential use-after-free in orangefs.

[1]. 
https://lore.kernel.org/linux-fsdevel/20200420221424.gh5...@bombadil.infradead.org/
[2]. 
https://lore.kernel.org/lkml/e4d5ddc0-877f-6499-f697-2b7c0ddbf...@cloud.ionos.com/

Guoqing Jiang (10):
  include/linux/pagemap.h: introduce attach/detach_page_private
  md: remove __clear_page_buffers and use attach/detach_page_private
  btrfs: use attach/detach_page_private
  fs/buffer.c: use attach/detach_page_private
  f2fs: use attach/detach_page_private
  iomap: use attach/detach_page_private
  ntfs: replace attach_page_buffers with attach_page_private
  orangefs: use attach/detach_page_private
  buffer_head.h: remove attach_page_buffers
  mm/migrate.c: call detach_page_private to cleanup code

 drivers/md/md-bitmap.c  | 12 ++--
 fs/btrfs/disk-io.c  |  4 +---
 fs/btrfs/extent_io.c| 21 ++---
 fs/btrfs/inode.c| 23 +--
 fs/buffer.c | 16 
 fs/f2fs/f2fs.h  | 11 ++-
 fs/iomap/buffered-io.c  | 19 ---
 fs/ntfs/aops.c  |  2 +-
 fs/ntfs/mft.c   |  2 +-
 fs/orangefs/inode.c | 32 ++--
 include/linux/buffer_head.h |  8 
 include/linux/pagemap.h | 37 +
 mm/migrate.c|  5 +
 13 files changed, 70 insertions(+), 122 deletions(-)

-- 
2.17.1



[PATCH 01/10] include/linux/pagemap.h: introduce attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
The logic in attach_page_buffers and  __clear_page_buffers are quite
paired, but

1. they are located in different files.

2. attach_page_buffers is implemented in buffer_head.h, so it could be
   used by other files. But __clear_page_buffers is static function in
   buffer.c and other potential users can't call the function, md-bitmap
   even copied the function.

So, introduce the new attach/detach_page_private to replace them. With
the new pair of function, we will remove the usage of attach_page_buffers
and  __clear_page_buffers in next patches. Thanks for suggestions about
the function name from Alexander Viro, Andreas Grünbacher, Christoph
Hellwig and Matthew Wilcox.

Suggested-by: Matthew Wilcox 
Cc: Andrew Morton 
Cc: "Darrick J. Wong" 
Cc: William Kucharski 
Cc: "Kirill A. Shutemov" 
Cc: Andreas Gruenbacher 
Cc: Yang Shi 
Cc: Yafang Shao 
Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Alexander Viro 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: Christoph Hellwig 
Cc: linux-...@vger.kernel.org
Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3:
1. rename clear_page_private to detach_page_private.
2. updated the comments for the two functions.

RFC -> RFC V2:  Address the comments from Christoph Hellwig
1. change function names to attach/clear_page_private and add comments.
2. change the return type of attach_page_private.

 include/linux/pagemap.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c6348c50136f..8e085713150c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -208,6 +208,43 @@ static inline int page_cache_add_speculative(struct page 
*page, int count)
return __page_cache_add_speculative(page, count);
 }
 
+/**
+ * attach_page_private - Attach private data to a page.
+ * @page: Page to attach data to.
+ * @data: Data to attach to page.
+ *
+ * Attaching private data to a page increments the page's reference count.
+ * The data must be detached before the page will be freed.
+ */
+static inline void attach_page_private(struct page *page, void *data)
+{
+   get_page(page);
+   set_page_private(page, (unsigned long)data);
+   SetPagePrivate(page);
+}
+
+/**
+ * detach_page_private - Detach private data from a page.
+ * @page: Page to detach data from.
+ *
+ * Removes the data that was previously attached to the page and decrements
+ * the refcount on the page.
+ *
+ * Return: Data that was attached to the page.
+ */
+static inline void *detach_page_private(struct page *page)
+{
+   void *data = (void *)page_private(page);
+
+   if (!PagePrivate(page))
+   return NULL;
+   ClearPagePrivate(page);
+   set_page_private(page, 0);
+   put_page(page);
+
+   return data;
+}
+
 #ifdef CONFIG_NUMA
 extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
-- 
2.17.1



[PATCH 04/10] fs/buffer.c: use attach/detach_page_private

2020-05-17 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in buffer.c.

Cc: Alexander Viro 
Signed-off-by: Guoqing Jiang 
---
No change since RFC V3.

RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 fs/buffer.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 85b4be1939ce..fc8831c392d7 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -123,14 +123,6 @@ void __wait_on_buffer(struct buffer_head * bh)
 }
 EXPORT_SYMBOL(__wait_on_buffer);
 
-static void
-__clear_page_buffers(struct page *page)
-{
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-}
-
 static void buffer_io_error(struct buffer_head *bh, char *msg)
 {
if (!test_bit(BH_Quiet, >b_state))
@@ -906,7 +898,7 @@ link_dev_buffers(struct page *page, struct buffer_head 
*head)
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
 }
 
 static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
@@ -1624,7 +1616,7 @@ void create_empty_buffers(struct page *page,
bh = bh->b_this_page;
} while (bh != head);
}
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
spin_unlock(>mapping->private_lock);
 }
 EXPORT_SYMBOL(create_empty_buffers);
@@ -2611,7 +2603,7 @@ static void attach_nobh_buffers(struct page *page, struct 
buffer_head *head)
bh->b_this_page = head;
bh = bh->b_this_page;
} while (bh != head);
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
spin_unlock(>mapping->private_lock);
 }
 
@@ -3276,7 +3268,7 @@ drop_buffers(struct page *page, struct buffer_head 
**buffers_to_free)
bh = next;
} while (bh != head);
*buffers_to_free = head;
-   __clear_page_buffers(page);
+   detach_page_private(page);
return 1;
 failed:
return 0;
-- 
2.17.1



[RFC PATCH V3 07/10] ntfs: replace attach_page_buffers with attach_page_private

2020-05-07 Thread Guoqing Jiang
Call the new function since attach_page_buffers will be removed.

Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3: no change

RFC -> RFC V2
1. change the name of new function to attach_page_private.

 fs/ntfs/aops.c | 2 +-
 fs/ntfs/mft.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 554b744f41bf..bb0a43860ad2 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1732,7 +1732,7 @@ void mark_ntfs_record_dirty(struct page *page, const 
unsigned int ofs) {
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
} else
buffers_to_free = bh;
}
diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 3aac5c917afe..fbb9f1bc623d 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -504,7 +504,7 @@ int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned 
long mft_no,
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
}
bh = head = page_buffers(page);
BUG_ON(!bh);
-- 
2.17.1



[RFC PATCH V3 02/10] md: remove __clear_page_buffers and use attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
After introduce attach/detach_page_private in pagemap.h, we can remove
the duplicat code and call the new functions.

Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 drivers/md/md-bitmap.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b952bd45bd6a..95a5f3757fa3 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -324,14 +324,6 @@ static void end_bitmap_write(struct buffer_head *bh, int 
uptodate)
wake_up(>write_wait);
 }
 
-/* copied from buffer.c */
-static void
-__clear_page_buffers(struct page *page)
-{
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-}
 static void free_buffers(struct page *page)
 {
struct buffer_head *bh;
@@ -345,7 +337,7 @@ static void free_buffers(struct page *page)
free_buffer_head(bh);
bh = next;
}
-   __clear_page_buffers(page);
+   detach_page_private(page);
put_page(page);
 }
 
@@ -374,7 +366,7 @@ static int read_page(struct file *file, unsigned long index,
ret = -ENOMEM;
goto out;
}
-   attach_page_buffers(page, bh);
+   attach_page_private(page, bh);
blk_cur = index << (PAGE_SHIFT - inode->i_blkbits);
while (bh) {
block = blk_cur;
-- 
2.17.1



[RFC PATCH V3 03/10] btrfs: use attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in btrfs.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. call attach_page_private(newpage, clear_page_private(page)) to
   cleanup code further as suggested by Dave Chinner.

 fs/btrfs/disk-io.c   |  4 +---
 fs/btrfs/extent_io.c | 21 ++---
 fs/btrfs/inode.c | 23 +--
 3 files changed, 12 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d10c7be10f3b..7278789ff8a7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -980,9 +980,7 @@ static void btree_invalidatepage(struct page *page, 
unsigned int offset,
btrfs_warn(BTRFS_I(page->mapping->host)->root->fs_info,
   "page private not zero on page %llu",
   (unsigned long long)page_offset(page));
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
+   detach_page_private(page);
}
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 39e45b8a5031..bf816387715b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3076,22 +3076,16 @@ static int submit_extent_page(unsigned int opf,
 static void attach_extent_buffer_page(struct extent_buffer *eb,
  struct page *page)
 {
-   if (!PagePrivate(page)) {
-   SetPagePrivate(page);
-   get_page(page);
-   set_page_private(page, (unsigned long)eb);
-   } else {
+   if (!PagePrivate(page))
+   attach_page_private(page, eb);
+   else
WARN_ON(page->private != (unsigned long)eb);
-   }
 }
 
 void set_page_extent_mapped(struct page *page)
 {
-   if (!PagePrivate(page)) {
-   SetPagePrivate(page);
-   get_page(page);
-   set_page_private(page, EXTENT_PAGE_PRIVATE);
-   }
+   if (!PagePrivate(page))
+   attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
 }
 
 static struct extent_map *
@@ -4929,10 +4923,7 @@ static void btrfs_release_extent_buffer_pages(struct 
extent_buffer *eb)
 * We need to make sure we haven't be attached
 * to a new eb.
 */
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   /* One for the page private */
-   put_page(page);
+   detach_page_private(page);
}
 
if (mapped)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 320d1062068d..a7f7ff0a8b7c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8303,11 +8303,8 @@ btrfs_readpages(struct file *file, struct address_space 
*mapping,
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
int ret = try_release_extent_mapping(page, gfp_flags);
-   if (ret == 1) {
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-   }
+   if (ret == 1)
+   detach_page_private(page);
return ret;
 }
 
@@ -8329,14 +8326,8 @@ static int btrfs_migratepage(struct address_space 
*mapping,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page)) {
-   ClearPagePrivate(page);
-   get_page(newpage);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   SetPagePrivate(newpage);
-   }
+   if (page_has_private(page))
+   attach_page_private(newpage, detach_page_private(page));
 
if (PagePrivate2(page)) {
ClearPagePrivate2(page);
@@ -8458,11 +8449,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
}
 
ClearPageChecked(page);
-   if (PagePrivate(page)) {
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-   }
+   detach_page_private(page);
 }
 
 /*
-- 
2.17.1



[RFC PATCH V3 05/10] f2fs: use attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in f2fs.h.

Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: linux-f2fs-de...@lists.sourceforge.net
Acked-by: Chao Yu 
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 fs/f2fs/f2fs.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ba470d5687fe..6920d1a88289 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3051,19 +3051,12 @@ static inline void f2fs_set_page_private(struct page 
*page,
if (PagePrivate(page))
return;
 
-   get_page(page);
-   SetPagePrivate(page);
-   set_page_private(page, data);
+   attach_page_private(page, (void *)data);
 }
 
 static inline void f2fs_clear_page_private(struct page *page)
 {
-   if (!PagePrivate(page))
-   return;
-
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   f2fs_put_page(page, 0);
+   detach_page_private(page);
 }
 
 /*
-- 
2.17.1



[RFC PATCH V3 08/10] orangefs: use attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in orangefs.

Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. avoid potential use-after-free as suggested by Dave Chinner.

 fs/orangefs/inode.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 12ae630fbed7..48f0547d4850 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -62,12 +62,7 @@ static int orangefs_writepage_locked(struct page *page,
} else {
ret = 0;
}
-   if (wr) {
-   kfree(wr);
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
-   }
+   kfree(detach_page_private(page));
return ret;
 }
 
@@ -409,9 +404,7 @@ static int orangefs_write_begin(struct file *file,
wr->len = len;
wr->uid = current_fsuid();
wr->gid = current_fsgid();
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)wr);
-   get_page(page);
+   attach_page_private(page, wr);
 okay:
return 0;
 }
@@ -459,18 +452,12 @@ static void orangefs_invalidatepage(struct page *page,
wr = (struct orangefs_write_range *)page_private(page);
 
if (offset == 0 && length == PAGE_SIZE) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
+   kfree(detach_page_private(page));
return;
/* write range entirely within invalidate range (or equal) */
} else if (page_offset(page) + offset <= wr->pos &&
wr->pos + wr->len <= page_offset(page) + offset + length) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
+   kfree(detach_page_private(page));
/* XXX is this right? only caller in fs */
cancel_dirty_page(page);
return;
@@ -535,12 +522,7 @@ static int orangefs_releasepage(struct page *page, gfp_t 
foo)
 
 static void orangefs_freepage(struct page *page)
 {
-   if (PagePrivate(page)) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
-   }
+   kfree(detach_page_private(page));
 }
 
 static int orangefs_launder_page(struct page *page)
@@ -740,9 +722,7 @@ vm_fault_t orangefs_page_mkwrite(struct vm_fault *vmf)
wr->len = PAGE_SIZE;
wr->uid = current_fsuid();
wr->gid = current_fsgid();
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)wr);
-   get_page(page);
+   attach_page_private(page, wr);
 okay:
 
file_update_time(vmf->vma->vm_file);
-- 
2.17.1



[RFC PATCH V3 01/10] include/linux/pagemap.h: introduce attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
The logic in attach_page_buffers and  __clear_page_buffers are quite
paired, but

1. they are located in different files.

2. attach_page_buffers is implemented in buffer_head.h, so it could be
   used by other files. But __clear_page_buffers is static function in
   buffer.c and other potential users can't call the function, md-bitmap
   even copied the function.

So, introduce the new attach/detach_page_private to replace them. With
the new pair of function, we will remove the usage of attach_page_buffers
and  __clear_page_buffers in next patches. Thanks for suggestions about
the function name from Alexander Viro, Andreas Grünbacher, Christoph
Hellwig and Matthew Wilcox.

Suggested-by: Matthew Wilcox 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: "Darrick J. Wong" 
Cc: William Kucharski 
Cc: "Kirill A. Shutemov" 
Cc: Andreas Gruenbacher 
Cc: Yang Shi 
Cc: Yafang Shao 
Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Alexander Viro 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: Christoph Hellwig 
Cc: linux-...@vger.kernel.org
Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3:
1. rename clear_page_private to detach_page_private.
2. updated the comments for the two functions.

RFC -> RFC V2:  Address the comments from Christoph Hellwig
1. change function names to attach/clear_page_private and add comments.
2. change the return type of attach_page_private.

 include/linux/pagemap.h | 37 +
 1 file changed, 37 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a8f7bd8ea1c6..99dd93188a5e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -205,6 +205,43 @@ static inline int page_cache_add_speculative(struct page 
*page, int count)
return __page_cache_add_speculative(page, count);
 }
 
+/**
+ * attach_page_private - Attach private data to a page.
+ * @page: Page to attach data to.
+ * @data: Data to attach to page.
+ *
+ * Attaching private data to a page increments the page's reference count.
+ * The data must be detached before the page will be freed.
+ */
+static inline void attach_page_private(struct page *page, void *data)
+{
+   get_page(page);
+   set_page_private(page, (unsigned long)data);
+   SetPagePrivate(page);
+}
+
+/**
+ * detach_page_private - Detach private data from a page.
+ * @page: Page to detach data from.
+ *
+ * Removes the data that was previously attached to the page and decrements
+ * the refcount on the page.
+ *
+ * Return: Data that was attached to the page.
+ */
+static inline void *detach_page_private(struct page *page)
+{
+   void *data = (void *)page_private(page);
+
+   if (!PagePrivate(page))
+   return NULL;
+   ClearPagePrivate(page);
+   set_page_private(page, 0);
+   put_page(page);
+
+   return data;
+}
+
 #ifdef CONFIG_NUMA
 extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
-- 
2.17.1



[RFC PATCH V3 04/10] fs/buffer.c: use attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in buffer.c.

Cc: Alexander Viro 
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 fs/buffer.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index a60f60396cfa..059404658d5d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -123,14 +123,6 @@ void __wait_on_buffer(struct buffer_head * bh)
 }
 EXPORT_SYMBOL(__wait_on_buffer);
 
-static void
-__clear_page_buffers(struct page *page)
-{
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-}
-
 static void buffer_io_error(struct buffer_head *bh, char *msg)
 {
if (!test_bit(BH_Quiet, >b_state))
@@ -906,7 +898,7 @@ link_dev_buffers(struct page *page, struct buffer_head 
*head)
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
 }
 
 static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
@@ -1580,7 +1572,7 @@ void create_empty_buffers(struct page *page,
bh = bh->b_this_page;
} while (bh != head);
}
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
spin_unlock(>mapping->private_lock);
 }
 EXPORT_SYMBOL(create_empty_buffers);
@@ -2567,7 +2559,7 @@ static void attach_nobh_buffers(struct page *page, struct 
buffer_head *head)
bh->b_this_page = head;
bh = bh->b_this_page;
} while (bh != head);
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
spin_unlock(>mapping->private_lock);
 }
 
@@ -3227,7 +3219,7 @@ drop_buffers(struct page *page, struct buffer_head 
**buffers_to_free)
bh = next;
} while (bh != head);
*buffers_to_free = head;
-   __clear_page_buffers(page);
+   detach_page_private(page);
return 1;
 failed:
return 0;
-- 
2.17.1



[RFC PATCH V3 10/10] mm/migrate.c: call detach_page_private to cleanup code

2020-05-07 Thread Guoqing Jiang
We can cleanup code a little by call detach_page_private here.

Cc: Andrew Morton 
Cc: linux...@kvack.org
Signed-off-by: Guoqing Jiang 
---
Added since RFC V3.

 mm/migrate.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 7160c1556f79..f214adfb3fa4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -797,10 +797,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
 
-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));
get_page(newpage);
 
bh = head;
-- 
2.17.1



[RFC PATCH V3 06/10] iomap: use attach/detach_page_private

2020-05-07 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in iomap.

Cc: Christoph Hellwig 
Cc: "Darrick J. Wong" 
Cc: linux-...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
RFC V2 -> RFC V3
1. rename clear_page_private to detach_page_private.

RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. call attach_page_private(newpage, clear_page_private(page)) to
   cleanup code further as suggested by Matthew Wilcox.
3. don't return attach_page_private in iomap_page_create per the
   comment from Christoph Hellwig.

 fs/iomap/buffered-io.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 89e21961d1ad..e3031007b4ae 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -59,24 +59,19 @@ iomap_page_create(struct inode *inode, struct page *page)
 * migrate_page_move_mapping() assumes that pages with private data have
 * their count elevated by 1.
 */
-   get_page(page);
-   set_page_private(page, (unsigned long)iop);
-   SetPagePrivate(page);
+   attach_page_private(page, iop);
return iop;
 }
 
 static void
 iomap_page_release(struct page *page)
 {
-   struct iomap_page *iop = to_iomap_page(page);
+   struct iomap_page *iop = detach_page_private(page);
 
if (!iop)
return;
WARN_ON_ONCE(atomic_read(>read_count));
WARN_ON_ONCE(atomic_read(>write_count));
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
kfree(iop);
 }
 
@@ -554,14 +549,8 @@ iomap_migrate_page(struct address_space *mapping, struct 
page *newpage,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page)) {
-   ClearPagePrivate(page);
-   get_page(newpage);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   SetPagePrivate(newpage);
-   }
+   if (page_has_private(page))
+   attach_page_private(newpage, detach_page_private(page));
 
if (mode != MIGRATE_SYNC_NO_COPY)
migrate_page_copy(newpage, page);
-- 
2.17.1



[RFC PATCH V3 09/10] buffer_head.h: remove attach_page_buffers

2020-05-07 Thread Guoqing Jiang
All the callers have replaced attach_page_buffers with the new function
attach_page_private, so remove it.

Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
No change since RFC.

 include/linux/buffer_head.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 15b765a181b8..22fb11e2d2e0 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -272,14 +272,6 @@ void buffer_init(void);
  * inline definitions
  */
 
-static inline void attach_page_buffers(struct page *page,
-   struct buffer_head *head)
-{
-   get_page(page);
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)head);
-}
-
 static inline void get_bh(struct buffer_head *bh)
 {
 atomic_inc(>b_count);
-- 
2.17.1



[RFC PATCH V3 0/9] Introduce attach/detach_page_private to cleanup code

2020-05-07 Thread Guoqing Jiang
Hi,

Based on the previous thread [1], this patchset introduces attach_page_private
and detach_page_private to replace attach_page_buffers and __clear_page_buffers.
Thanks a lot for the constructive suggestions and comments from Christoph,
Matthew and Dave.

And sorry for cross post to different lists since it modifies different 
subsystems.

RFC V2 -> RFC V3:
1. rename clear_page_private to detach_page_private.
2. Update the comments for attach/detach_page_private from Mattew.
3. add one patch to call new function in mm/migrate.c as suggested by Mattew, 
but
   use the conservative way to keep the orginal semantics [2].


RFC -> RFC V2:
1. rename the new functions and add comments for them.
2. change the return type of attach_page_private.
3. call attach_page_private(newpage, clear_page_private(page)) to cleanup code 
further.
4. avoid potential use-after-free in orangefs.

[1]. 
https://lore.kernel.org/linux-fsdevel/20200420221424.gh5...@bombadil.infradead.org/
[2]. 
https://lore.kernel.org/lkml/e4d5ddc0-877f-6499-f697-2b7c0ddbf...@cloud.ionos.com/

Thanks,
Guoqing

Guoqing Jiang (10):
  include/linux/pagemap.h: introduce attach/detach_page_private
  md: remove __clear_page_buffers and use attach/detach_page_private
  btrfs: use attach/detach_page_private
  fs/buffer.c: use attach/detach_page_private
  f2fs: use attach/detach_page_private
  iomap: use attach/detach_page_private
  ntfs: replace attach_page_buffers with attach_page_private
  orangefs: use attach/detach_page_private
  buffer_head.h: remove attach_page_buffers
  mm/migrate.c: call detach_page_private to cleanup code

 drivers/md/md-bitmap.c  | 12 ++--
 fs/btrfs/disk-io.c  |  4 +---
 fs/btrfs/extent_io.c| 21 ++---
 fs/btrfs/inode.c| 23 +--
 fs/buffer.c | 16 
 fs/f2fs/f2fs.h  | 11 ++-
 fs/iomap/buffered-io.c  | 19 ---
 fs/ntfs/aops.c  |  2 +-
 fs/ntfs/mft.c   |  2 +-
 fs/orangefs/inode.c | 32 ++--
 include/linux/buffer_head.h |  8 
 include/linux/pagemap.h | 37 +
 mm/migrate.c|  5 +
 13 files changed, 70 insertions(+), 122 deletions(-)

-- 
2.17.1



Re: [RFC PATCH V2 0/9] Introduce attach/clear_page_private to cleanup code

2020-05-02 Thread Guoqing Jiang

On 5/2/20 2:41 AM, Matthew Wilcox wrote:

On Sat, May 02, 2020 at 12:42:15AM +0200, Guoqing Jiang wrote:

On 5/2/20 12:16 AM, Matthew Wilcox wrote:

On Thu, Apr 30, 2020 at 11:44:41PM +0200, Guoqing Jiang wrote:

include/linux/pagemap.h: introduce attach/clear_page_private
md: remove __clear_page_buffers and use attach/clear_page_private
btrfs: use attach/clear_page_private
fs/buffer.c: use attach/clear_page_private
f2fs: use attach/clear_page_private
iomap: use attach/clear_page_private
ntfs: replace attach_page_buffers with attach_page_private
orangefs: use attach/clear_page_private
buffer_head.h: remove attach_page_buffers

I think mm/migrate.c could also use this:

  ClearPagePrivate(page);
  set_page_private(newpage, page_private(page));
  set_page_private(page, 0);
  put_page(page);
  get_page(newpage);


Thanks for checking!  Assume the below change is appropriate.

diff --git a/mm/migrate.c b/mm/migrate.c
index 7160c1556f79..f214adfb3fa4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -797,10 +797,7 @@ static int __buffer_migrate_page(struct address_space
*mapping,
     if (rc != MIGRATEPAGE_SUCCESS)
     goto unlock_buffers;

-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));
     get_page(newpage);

I think you can do:

@@ -797,11 +797,7 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
 if (rc != MIGRATEPAGE_SUCCESS)
 goto unlock_buffers;
  
-   ClearPagePrivate(page);

-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   get_page(newpage);
+   attach_page_private(newpage, detach_page_private(page));
  
 bh = head;

 do {
@@ -810,8 +806,6 @@ static int __buffer_migrate_page(struct address_space 
*mapping,
  
 } while (bh != head);
  
-   SetPagePrivate(newpage);

-
 if (mode != MIGRATE_SYNC_NO_COPY)

... but maybe there's a subtlety to the ordering of the setup of the bh
and setting PagePrivate that means what you have there is a better patch.


Yes, it is better but not sure if the order can be changed here. And seems
the original commit is this one.

commit e965f9630c651fa4249039fd4b80c9392d07a856
Author: Christoph Lameter 
Date:   Wed Feb 1 03:05:41 2006 -0800

    [PATCH] Direct Migration V9: Avoid writeback / page_migrate() method

    Migrate a page with buffers without requiring writeback

    This introduces a new address space operation migratepage() that 
may be used

    by a filesystem to implement its own version of page migration.

    A version is provided that migrates buffers attached to pages. Some
    filesystems (ext2, ext3, xfs) are modified to utilize this feature.

    The swapper address space operation are modified so that a regular
    migrate_page() will occur for anonymous pages without writeback 
(migrate_pages

    forces every anonymous page to have a swap entry).

Hope mm experts could take a look, so CC more people and mm list. And
the question is that if we can setting PagePrivate before setup bh in the
__buffer_migrate_page, thanks for your any further input.

Thanks,
Guoqing


Re: [RFC PATCH V2 0/9] Introduce attach/clear_page_private to cleanup code

2020-05-01 Thread Guoqing Jiang

On 5/2/20 12:16 AM, Matthew Wilcox wrote:

On Thu, Apr 30, 2020 at 11:44:41PM +0200, Guoqing Jiang wrote:

   include/linux/pagemap.h: introduce attach/clear_page_private
   md: remove __clear_page_buffers and use attach/clear_page_private
   btrfs: use attach/clear_page_private
   fs/buffer.c: use attach/clear_page_private
   f2fs: use attach/clear_page_private
   iomap: use attach/clear_page_private
   ntfs: replace attach_page_buffers with attach_page_private
   orangefs: use attach/clear_page_private
   buffer_head.h: remove attach_page_buffers

I think mm/migrate.c could also use this:

 ClearPagePrivate(page);
 set_page_private(newpage, page_private(page));
 set_page_private(page, 0);
 put_page(page);
 get_page(newpage);



Thanks for checking!  Assume the below change is appropriate.

diff --git a/mm/migrate.c b/mm/migrate.c
index 7160c1556f79..f214adfb3fa4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -797,10 +797,7 @@ static int __buffer_migrate_page(struct 
address_space *mapping,

    if (rc != MIGRATEPAGE_SUCCESS)
    goto unlock_buffers;

-   ClearPagePrivate(page);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
+   set_page_private(newpage, detach_page_private(page));
    get_page(newpage);

    bh = head;


Cheers,
Guoqing



Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-05-01 Thread Guoqing Jiang

On 5/1/20 3:49 AM, Al Viro wrote:

On Fri, May 01, 2020 at 02:42:29AM +0100, Al Viro wrote:

On Thu, Apr 30, 2020 at 03:13:38PM -0700, Matthew Wilcox wrote:


+/**
+ * clear_page_private - clear page's private field and PG_private.
+ * @page: page to be cleared.
+ *
+ * The counterpart function of attach_page_private.
+ * Return: private data of page or NULL if page doesn't have private data.
+ */

Seems to me that the opposite of "attach" is "detach", not "clear".

Or "remove", perhaps...

Actually, "detach" is better - neither "clear" nor "remove" imply "... and give
me what used to be attached there", as this thing is doing.


Ok, seems we have reached the agreement about the new name ;-), will 
follow the instruction.


Thanks & Regards,
Guoqing


Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-05-01 Thread Guoqing Jiang

On 5/1/20 12:13 AM, Matthew Wilcox wrote:

On Thu, Apr 30, 2020 at 11:44:42PM +0200, Guoqing Jiang wrote:

+/**
+ * attach_page_private - attach data to page's private field and set 
PG_private.
+ * @page: page to be attached and set flag.
+ * @data: data to attach to page's private field.
+ *
+ * Need to take reference as mm.h said "Setting PG_private should also 
increment
+ * the refcount".
+ */

I don't think this will read well when added to the API documentation.
Try this:

/**
  * attach_page_private - Attach private data to a page.
  * @page: Page to attach data to.
  * @data: Data to attach to page.
  *
  * Attaching private data to a page increments the page's reference count.
  * The data must be detached before the page will be freed.
  */


+/**
+ * clear_page_private - clear page's private field and PG_private.
+ * @page: page to be cleared.
+ *
+ * The counterpart function of attach_page_private.
+ * Return: private data of page or NULL if page doesn't have private data.
+ */

Seems to me that the opposite of "attach" is "detach", not "clear".

/**
  * detach_page_private - Detach private data from a page.
  * @page: Page to detach data from.
  *
  * Removes the data that was previously attached to the page and decrements
  * the refcount on the page.
  *
  * Return: Data that was attached to the page.
  */


Thanks you very much, Mattew! Will change them in next version.

Best Regards,
Guoqing


Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-05-01 Thread Guoqing Jiang

On 5/1/20 12:10 AM, Andreas Grünbacher wrote:

Hi,

Am Do., 30. Apr. 2020 um 23:56 Uhr schrieb Guoqing Jiang
:

The logic in attach_page_buffers and  __clear_page_buffers are quite
paired, but

1. they are located in different files.

2. attach_page_buffers is implemented in buffer_head.h, so it could be
used by other files. But __clear_page_buffers is static function in
buffer.c and other potential users can't call the function, md-bitmap
even copied the function.

So, introduce the new attach/clear_page_private to replace them. With
the new pair of function, we will remove the usage of attach_page_buffers
and  __clear_page_buffers in next patches. Thanks for the new names from
Christoph Hellwig.

Suggested-by: Matthew Wilcox 
Cc: Andrew Morton 
Cc: "Darrick J. Wong" 
Cc: William Kucharski 
Cc: "Kirill A. Shutemov" 
Cc: Andreas Gruenbacher 
Cc: Yang Shi 
Cc: Yafang Shao 
Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Alexander Viro 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: Christoph Hellwig 
Cc: linux-...@vger.kernel.org
Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2:  Address the comments from Christoph Hellwig
1. change function names to attach/clear_page_private and add comments.
2. change the return type of attach_page_private.

  include/linux/pagemap.h | 35 +++
  1 file changed, 35 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a8f7bd8ea1c6..2e515f210b18 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -205,6 +205,41 @@ static inline int page_cache_add_speculative(struct page 
*page, int count)
 return __page_cache_add_speculative(page, count);
  }

+/**
+ * attach_page_private - attach data to page's private field and set 
PG_private.
+ * @page: page to be attached and set flag.
+ * @data: data to attach to page's private field.
+ *
+ * Need to take reference as mm.h said "Setting PG_private should also 
increment
+ * the refcount".
+ */
+static inline void attach_page_private(struct page *page, void *data)
+{
+   get_page(page);
+   set_page_private(page, (unsigned long)data);
+   SetPagePrivate(page);
+}
+
+/**
+ * clear_page_private - clear page's private field and PG_private.
+ * @page: page to be cleared.
+ *
+ * The counterpart function of attach_page_private.
+ * Return: private data of page or NULL if page doesn't have private data.
+ */
+static inline void *clear_page_private(struct page *page)
+{
+   void *data = (void *)page_private(page);
+
+   if (!PagePrivate(page))
+   return NULL;
+   ClearPagePrivate(page);
+   set_page_private(page, 0);
+   put_page(page);
+
+   return data;
+}
+

I like this in general, but the name clear_page_private suggests that
this might be the inverse operation of set_page_private, which it is
not. So maybe this can be renamed to detach_page_private to more
clearly indicate that it pairs with attach_page_private?


Yes, the new name is better, thank you!

Cheers,
Guoqing


[RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
The logic in attach_page_buffers and  __clear_page_buffers are quite
paired, but

1. they are located in different files.

2. attach_page_buffers is implemented in buffer_head.h, so it could be
   used by other files. But __clear_page_buffers is static function in
   buffer.c and other potential users can't call the function, md-bitmap
   even copied the function.

So, introduce the new attach/clear_page_private to replace them. With
the new pair of function, we will remove the usage of attach_page_buffers
and  __clear_page_buffers in next patches. Thanks for the new names from
Christoph Hellwig.

Suggested-by: Matthew Wilcox 
Cc: Andrew Morton 
Cc: "Darrick J. Wong" 
Cc: William Kucharski 
Cc: "Kirill A. Shutemov" 
Cc: Andreas Gruenbacher 
Cc: Yang Shi 
Cc: Yafang Shao 
Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Alexander Viro 
Cc: Jaegeuk Kim 
Cc: Chao Yu 
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: Christoph Hellwig 
Cc: linux-...@vger.kernel.org
Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2:  Address the comments from Christoph Hellwig
1. change function names to attach/clear_page_private and add comments.
2. change the return type of attach_page_private.

 include/linux/pagemap.h | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a8f7bd8ea1c6..2e515f210b18 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -205,6 +205,41 @@ static inline int page_cache_add_speculative(struct page 
*page, int count)
return __page_cache_add_speculative(page, count);
 }
 
+/**
+ * attach_page_private - attach data to page's private field and set 
PG_private.
+ * @page: page to be attached and set flag.
+ * @data: data to attach to page's private field.
+ *
+ * Need to take reference as mm.h said "Setting PG_private should also 
increment
+ * the refcount".
+ */
+static inline void attach_page_private(struct page *page, void *data)
+{
+   get_page(page);
+   set_page_private(page, (unsigned long)data);
+   SetPagePrivate(page);
+}
+
+/**
+ * clear_page_private - clear page's private field and PG_private.
+ * @page: page to be cleared.
+ *
+ * The counterpart function of attach_page_private.
+ * Return: private data of page or NULL if page doesn't have private data.
+ */
+static inline void *clear_page_private(struct page *page)
+{
+   void *data = (void *)page_private(page);
+
+   if (!PagePrivate(page))
+   return NULL;
+   ClearPagePrivate(page);
+   set_page_private(page, 0);
+   put_page(page);
+
+   return data;
+}
+
 #ifdef CONFIG_NUMA
 extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
-- 
2.17.1



[RFC PATCH V2 3/9] btrfs: use attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in btrfs.

Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. call attach_page_private(newpage, clear_page_private(page)) to
   cleanup code further as suggested by Dave Chinner.

 fs/btrfs/disk-io.c   |  4 +---
 fs/btrfs/extent_io.c | 21 ++---
 fs/btrfs/inode.c | 23 +--
 3 files changed, 12 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a6cb5cbbdb9f..fe4acf821110 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -980,9 +980,7 @@ static void btree_invalidatepage(struct page *page, 
unsigned int offset,
btrfs_warn(BTRFS_I(page->mapping->host)->root->fs_info,
   "page private not zero on page %llu",
   (unsigned long long)page_offset(page));
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
+   clear_page_private(page);
}
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 39e45b8a5031..095a5e83e660 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3076,22 +3076,16 @@ static int submit_extent_page(unsigned int opf,
 static void attach_extent_buffer_page(struct extent_buffer *eb,
  struct page *page)
 {
-   if (!PagePrivate(page)) {
-   SetPagePrivate(page);
-   get_page(page);
-   set_page_private(page, (unsigned long)eb);
-   } else {
+   if (!PagePrivate(page))
+   attach_page_private(page, eb);
+   else
WARN_ON(page->private != (unsigned long)eb);
-   }
 }
 
 void set_page_extent_mapped(struct page *page)
 {
-   if (!PagePrivate(page)) {
-   SetPagePrivate(page);
-   get_page(page);
-   set_page_private(page, EXTENT_PAGE_PRIVATE);
-   }
+   if (!PagePrivate(page))
+   attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
 }
 
 static struct extent_map *
@@ -4929,10 +4923,7 @@ static void btrfs_release_extent_buffer_pages(struct 
extent_buffer *eb)
 * We need to make sure we haven't be attached
 * to a new eb.
 */
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   /* One for the page private */
-   put_page(page);
+   clear_page_private(page);
}
 
if (mapped)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 320d1062068d..34b09ab2c32a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8303,11 +8303,8 @@ btrfs_readpages(struct file *file, struct address_space 
*mapping,
 static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
int ret = try_release_extent_mapping(page, gfp_flags);
-   if (ret == 1) {
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-   }
+   if (ret == 1)
+   clear_page_private(page);
return ret;
 }
 
@@ -8329,14 +8326,8 @@ static int btrfs_migratepage(struct address_space 
*mapping,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page)) {
-   ClearPagePrivate(page);
-   get_page(newpage);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   SetPagePrivate(newpage);
-   }
+   if (page_has_private(page))
+   attach_page_private(newpage, clear_page_private(page));
 
if (PagePrivate2(page)) {
ClearPagePrivate2(page);
@@ -8458,11 +8449,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
}
 
ClearPageChecked(page);
-   if (PagePrivate(page)) {
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-   }
+   clear_page_private(page);
 }
 
 /*
-- 
2.17.1



[RFC PATCH V2 8/9] orangefs: use attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in orangefs.

Cc: Mike Marshall 
Cc: Martin Brandenburg 
Cc: de...@lists.orangefs.org
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. avoid potential use-after-free as suggested by Dave Chinner.

 fs/orangefs/inode.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 12ae630fbed7..139c450aca68 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -62,12 +62,7 @@ static int orangefs_writepage_locked(struct page *page,
} else {
ret = 0;
}
-   if (wr) {
-   kfree(wr);
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
-   }
+   kfree(clear_page_private(page));
return ret;
 }
 
@@ -409,9 +404,7 @@ static int orangefs_write_begin(struct file *file,
wr->len = len;
wr->uid = current_fsuid();
wr->gid = current_fsgid();
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)wr);
-   get_page(page);
+   attach_page_private(page, wr);
 okay:
return 0;
 }
@@ -459,18 +452,12 @@ static void orangefs_invalidatepage(struct page *page,
wr = (struct orangefs_write_range *)page_private(page);
 
if (offset == 0 && length == PAGE_SIZE) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
+   kfree(clear_page_private(page));
return;
/* write range entirely within invalidate range (or equal) */
} else if (page_offset(page) + offset <= wr->pos &&
wr->pos + wr->len <= page_offset(page) + offset + length) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
+   kfree(clear_page_private(page));
/* XXX is this right? only caller in fs */
cancel_dirty_page(page);
return;
@@ -535,12 +522,7 @@ static int orangefs_releasepage(struct page *page, gfp_t 
foo)
 
 static void orangefs_freepage(struct page *page)
 {
-   if (PagePrivate(page)) {
-   kfree((struct orangefs_write_range *)page_private(page));
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   put_page(page);
-   }
+   kfree(clear_page_private(page));
 }
 
 static int orangefs_launder_page(struct page *page)
@@ -740,9 +722,7 @@ vm_fault_t orangefs_page_mkwrite(struct vm_fault *vmf)
wr->len = PAGE_SIZE;
wr->uid = current_fsuid();
wr->gid = current_fsgid();
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)wr);
-   get_page(page);
+   attach_page_private(page, wr);
 okay:
 
file_update_time(vmf->vma->vm_file);
-- 
2.17.1



[RFC PATCH V2 0/9] Introduce attach/clear_page_private to cleanup code

2020-04-30 Thread Guoqing Jiang
Hi,

Based on the previous thread [1], this patchset introduces attach_page_private
and clear_page_private to replace attach_page_buffers and __clear_page_buffers.
Thanks a lot for the constructive suggestions and comments from Christoph,
Matthew and Dave.

And sorry for cross post to different lists since it modifies different 
subsystems.

RFC -> RFC V2:

1. rename the new functions and add comments for them.
2. change the return type of attach_page_private.
3. call attach_page_private(newpage, clear_page_private(page)) to cleanup code 
further.
4. avoid potential use-after-free in orangefs.

[1]. 
https://lore.kernel.org/linux-fsdevel/20200420221424.gh5...@bombadil.infradead.org/

Thanks,
Guoqing

Guoqing Jiang (9):
  include/linux/pagemap.h: introduce attach/clear_page_private
  md: remove __clear_page_buffers and use attach/clear_page_private
  btrfs: use attach/clear_page_private
  fs/buffer.c: use attach/clear_page_private
  f2fs: use attach/clear_page_private
  iomap: use attach/clear_page_private
  ntfs: replace attach_page_buffers with attach_page_private
  orangefs: use attach/clear_page_private
  buffer_head.h: remove attach_page_buffers

 drivers/md/md-bitmap.c  | 12 ++--
 fs/btrfs/disk-io.c  |  4 +---
 fs/btrfs/extent_io.c| 21 ++---
 fs/btrfs/inode.c| 23 +--
 fs/buffer.c | 16 
 fs/f2fs/f2fs.h  | 11 ++-
 fs/iomap/buffered-io.c  | 19 ---
 fs/ntfs/aops.c  |  2 +-
 fs/ntfs/mft.c   |  2 +-
 fs/orangefs/inode.c | 32 ++--
 include/linux/buffer_head.h |  8 
 include/linux/pagemap.h | 35 +++
 12 files changed, 67 insertions(+), 118 deletions(-)

-- 
2.17.1



[RFC PATCH V2 2/9] md: remove __clear_page_buffers and use attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
After introduce attach/clear_page_private in pagemap.h, we can remove
the duplicat code and call the new functions.

Cc: Song Liu 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 drivers/md/md-bitmap.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b952bd45bd6a..033d12063600 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -324,14 +324,6 @@ static void end_bitmap_write(struct buffer_head *bh, int 
uptodate)
wake_up(>write_wait);
 }
 
-/* copied from buffer.c */
-static void
-__clear_page_buffers(struct page *page)
-{
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-}
 static void free_buffers(struct page *page)
 {
struct buffer_head *bh;
@@ -345,7 +337,7 @@ static void free_buffers(struct page *page)
free_buffer_head(bh);
bh = next;
}
-   __clear_page_buffers(page);
+   clear_page_private(page);
put_page(page);
 }
 
@@ -374,7 +366,7 @@ static int read_page(struct file *file, unsigned long index,
ret = -ENOMEM;
goto out;
}
-   attach_page_buffers(page, bh);
+   attach_page_private(page, bh);
blk_cur = index << (PAGE_SHIFT - inode->i_blkbits);
while (bh) {
block = blk_cur;
-- 
2.17.1



[RFC PATCH V2 4/9] fs/buffer.c: use attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in buffer.c.

Cc: Alexander Viro 
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 fs/buffer.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index a60f60396cfa..60dd61384b13 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -123,14 +123,6 @@ void __wait_on_buffer(struct buffer_head * bh)
 }
 EXPORT_SYMBOL(__wait_on_buffer);
 
-static void
-__clear_page_buffers(struct page *page)
-{
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
-}
-
 static void buffer_io_error(struct buffer_head *bh, char *msg)
 {
if (!test_bit(BH_Quiet, >b_state))
@@ -906,7 +898,7 @@ link_dev_buffers(struct page *page, struct buffer_head 
*head)
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
 }
 
 static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
@@ -1580,7 +1572,7 @@ void create_empty_buffers(struct page *page,
bh = bh->b_this_page;
} while (bh != head);
}
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
spin_unlock(>mapping->private_lock);
 }
 EXPORT_SYMBOL(create_empty_buffers);
@@ -2567,7 +2559,7 @@ static void attach_nobh_buffers(struct page *page, struct 
buffer_head *head)
bh->b_this_page = head;
bh = bh->b_this_page;
} while (bh != head);
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
spin_unlock(>mapping->private_lock);
 }
 
@@ -3227,7 +3219,7 @@ drop_buffers(struct page *page, struct buffer_head 
**buffers_to_free)
bh = next;
} while (bh != head);
*buffers_to_free = head;
-   __clear_page_buffers(page);
+   clear_page_private(page);
return 1;
 failed:
return 0;
-- 
2.17.1



[RFC PATCH V2 6/9] iomap: use attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in iomap.

Cc: Christoph Hellwig 
Cc: "Darrick J. Wong" 
Cc: linux-...@vger.kernel.org
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.
2. call attach_page_private(newpage, clear_page_private(page)) to
   cleanup code further as suggested by Matthew Wilcox.
3. don't return attach_page_private in iomap_page_create per the
   comment from Christoph Hellwig.

 fs/iomap/buffered-io.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 89e21961d1ad..cf4c1b02a9d8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -59,24 +59,19 @@ iomap_page_create(struct inode *inode, struct page *page)
 * migrate_page_move_mapping() assumes that pages with private data have
 * their count elevated by 1.
 */
-   get_page(page);
-   set_page_private(page, (unsigned long)iop);
-   SetPagePrivate(page);
+   attach_page_private(page, iop);
return iop;
 }
 
 static void
 iomap_page_release(struct page *page)
 {
-   struct iomap_page *iop = to_iomap_page(page);
+   struct iomap_page *iop = clear_page_private(page);
 
if (!iop)
return;
WARN_ON_ONCE(atomic_read(>read_count));
WARN_ON_ONCE(atomic_read(>write_count));
-   ClearPagePrivate(page);
-   set_page_private(page, 0);
-   put_page(page);
kfree(iop);
 }
 
@@ -554,14 +549,8 @@ iomap_migrate_page(struct address_space *mapping, struct 
page *newpage,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page)) {
-   ClearPagePrivate(page);
-   get_page(newpage);
-   set_page_private(newpage, page_private(page));
-   set_page_private(page, 0);
-   put_page(page);
-   SetPagePrivate(newpage);
-   }
+   if (page_has_private(page))
+   attach_page_private(newpage, clear_page_private(page));
 
if (mode != MIGRATE_SYNC_NO_COPY)
migrate_page_copy(newpage, page);
-- 
2.17.1



[RFC PATCH V2 7/9] ntfs: replace attach_page_buffers with attach_page_private

2020-04-30 Thread Guoqing Jiang
Call the new function since attach_page_buffers will be removed.

Cc: Anton Altaparmakov 
Cc: linux-ntfs-...@lists.sourceforge.net
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new function to attach_page_private.

 fs/ntfs/aops.c | 2 +-
 fs/ntfs/mft.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
index 554b744f41bf..bb0a43860ad2 100644
--- a/fs/ntfs/aops.c
+++ b/fs/ntfs/aops.c
@@ -1732,7 +1732,7 @@ void mark_ntfs_record_dirty(struct page *page, const 
unsigned int ofs) {
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
} else
buffers_to_free = bh;
}
diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
index 3aac5c917afe..fbb9f1bc623d 100644
--- a/fs/ntfs/mft.c
+++ b/fs/ntfs/mft.c
@@ -504,7 +504,7 @@ int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned 
long mft_no,
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
-   attach_page_buffers(page, head);
+   attach_page_private(page, head);
}
bh = head = page_buffers(page);
BUG_ON(!bh);
-- 
2.17.1



[RFC PATCH V2 5/9] f2fs: use attach/clear_page_private

2020-04-30 Thread Guoqing Jiang
Since the new pair function is introduced, we can call them to clean the
code in f2fs.h.

Cc: Jaegeuk Kim 
Cc: linux-f2fs-de...@lists.sourceforge.net
Acked-by: Chao Yu 
Signed-off-by: Guoqing Jiang 
---
RFC -> RFC V2
1. change the name of new functions to attach/clear_page_private.

 fs/f2fs/f2fs.h | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ba470d5687fe..24d22bd7352d 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3051,19 +3051,12 @@ static inline void f2fs_set_page_private(struct page 
*page,
if (PagePrivate(page))
return;
 
-   get_page(page);
-   SetPagePrivate(page);
-   set_page_private(page, data);
+   attach_page_private(page, (void *)data);
 }
 
 static inline void f2fs_clear_page_private(struct page *page)
 {
-   if (!PagePrivate(page))
-   return;
-
-   set_page_private(page, 0);
-   ClearPagePrivate(page);
-   f2fs_put_page(page, 0);
+   clear_page_private(page);
 }
 
 /*
-- 
2.17.1



[RFC PATCH V2 9/9] buffer_head.h: remove attach_page_buffers

2020-04-30 Thread Guoqing Jiang
All the callers have replaced attach_page_buffers with the new function
attach_page_private, so remove it.

Cc: Thomas Gleixner 
Cc: Sebastian Andrzej Siewior 
Cc: Roman Gushchin 
Cc: Andreas Dilger 
Signed-off-by: Guoqing Jiang 
---
 include/linux/buffer_head.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 15b765a181b8..22fb11e2d2e0 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -272,14 +272,6 @@ void buffer_init(void);
  * inline definitions
  */
 
-static inline void attach_page_buffers(struct page *page,
-   struct buffer_head *head)
-{
-   get_page(page);
-   SetPagePrivate(page);
-   set_page_private(page, (unsigned long)head);
-}
-
 static inline void get_bh(struct buffer_head *bh)
 {
 atomic_inc(>b_count);
-- 
2.17.1



Re: [PATCH] md: Fix failed allocation of md_register_thread

2019-03-07 Thread Guoqing Jiang




On 3/5/19 6:48 AM, Aditya Pakki wrote:

mddev->sync_thread can be set to NULL on kzalloc failure downstream.
The patch checks for such a scenario and frees allocated resources.

Signed-off-by: Aditya Pakki 
---
  drivers/md/raid10.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index abb5d382f64d..f52b4d9bcd24 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3939,6 +3939,8 @@ static int raid10_run(struct mddev *mddev)
set_bit(MD_RECOVERY_RUNNING, >recovery);
mddev->sync_thread = md_register_thread(md_do_sync, mddev,
"reshape");
+   if (!mddev->sync_thread)
+   goto out_free_conf;
}
  
  	return 0;


Could you make the change for raid5 as well? It also doesn't check the 
failure

in raid5_run.

Thanks,
Guoqing


Re: linux 4.19.19: md0_raid:1317 blocked for more than 120 seconds.

2019-02-14 Thread Guoqing Jiang




On 2/14/19 11:27 PM, Wolfgang Walter wrote:

Am Donnerstag, 14. Februar 2019, 10:09:56 schrieb Guoqing Jiang:

On 2/12/19 7:20 PM, Wolfgang Walter wrote:

Am Dienstag, 12. Februar 2019, 16:20:11 schrieb Guoqing Jiang:

On 2/11/19 11:12 PM, Wolfgang Walter wrote:

With 4.19.19 we see sometimes the following issue (practically only with
blk_mq, though):

Feb  4 20:04:46 tettnang kernel: [252300.060165] INFO: task
md0_raid1:317
blocked for more than 120 seconds. Feb  4 20:04:46 tettnang kernel:
[252300.060188]   Not tainted 4.19.19-debian64.all+1.1 #1 Feb  4
20:04:46 tettnang kernel: [252300.060197] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb  4
20:04:46 tettnang kernel: [252300.060207] md0_raid1   D0   317

2 0x8000 Feb  4 20:04:46 tettnang kernel: [252300.060211] Call

Trace:
Feb  4 20:04:46 tettnang kernel: [252300.060222]  ?
__schedule+0x2a2/0x8c0
Feb  4 20:04:46 tettnang kernel: [252300.060226]  ?
_raw_spin_unlock_irqrestore+0x20/0x40 Feb  4 20:04:46 tettnang kernel:
[252300.060229]  schedule+0x32/0x90 Feb  4 20:04:46 tettnang kernel:
[252300.060241]  md_super_wait+0x69/0xa0 [md_mod] Feb  4 20:04:46
tettnang kernel: [252300.060247]  ? finish_wait+0x80/0x80 Feb  4
20:04:46
tettnang kernel: [252300.060255]  md_bitmap_wait_writes+0x8e/0xa0
[md_mod] Feb  4 20:04:46 tettnang kernel: [252300.060263]  ?
md_bitmap_get_counter+0x42/0xd0 [md_mod] Feb  4 20:04:46 tettnang
kernel:
[252300.060271]  md_bitmap_daemon_work+0x1e8/0x380 [md_mod] Feb  4
20:04:46 tettnang kernel: [252300.060278]  ? md_rdev_init+0xb0/0xb0
[md_mod] Feb  4 20:04:46 tettnang kernel: [252300.060285]
md_check_recovery+0x26/0x540 [md_mod] Feb  4 20:04:46 tettnang kernel:
[252300.060290]  raid1d+0x5c/0xf00 [raid1] Feb  4 20:04:46 tettnang
kernel: [252300.060294]  ? preempt_count_add+0x79/0xb0 Feb  4 20:04:46
tettnang kernel: [252300.060298]  ? lock_timer_base+0x67/0x80 Feb  4
20:04:46 tettnang kernel: [252300.060302]  ?
_raw_spin_unlock_irqrestore+0x20/0x40 Feb  4 20:04:46 tettnang kernel:
[252300.060304]  ? try_to_del_timer_sync+0x4d/0x80 Feb  4 20:04:46
tettnang kernel: [252300.060306]  ? del_timer_sync+0x35/0x40 Feb  4
20:04:46 tettnang kernel: [252300.060309]  ?
schedule_timeout+0x17a/0x3b0
Feb  4 20:04:46 tettnang kernel: [252300.060312]  ?
preempt_count_add+0x79/0xb0 Feb  4 20:04:46 tettnang kernel:
[252300.060315]  ? _raw_spin_lock_irqsave+0x25/0x50 Feb  4 20:04:46
tettnang kernel: [252300.060321]  ? md_rdev_init+0xb0/0xb0 [md_mod] Feb
4 20:04:46 tettnang kernel: [252300.060327]  ? md_thread+0xf9/0x160
[md_mod] Feb  4 20:04:46 tettnang kernel: [252300.060330]  ?
r1bio_pool_alloc+0x20/0x20 [raid1] Feb  4 20:04:46 tettnang kernel:
[252300.060336]  md_thread+0xf9/0x160 [md_mod] Feb  4 20:04:46 tettnang
kernel: [252300.060340]  ? finish_wait+0x80/0x80 Feb  4 20:04:46
tettnang
kernel: [252300.060344]  kthread+0x112/0x130 Feb  4 20:04:46 tettnang
kernel: [252300.060346]  ? kthread_create_worker_on_cpu+0x70/0x70 Feb  4
20:04:46 tettnang kernel: [252300.060350]  ret_from_fork+0x35/0x40

I saw that there was a similar problem with raid10 and an upstream patch

e820d55cb99dd93ac2dc949cf486bb187e5cd70d
md: fix raid10 hang issue caused by barrier
by  Guoqing Jiang

I wonder if there is a similar fix needed for raid1?

Seems not, the calltrace tells the previous write superblock IO was not
finish as expected,
there is a report for raid5 which has similar problem with md_super_wait
in the link [1]. Maybe
you can disable blk-mq to narrow down the issue as well.

I already did for 4 weeks. I didn't saw this with blk-mq disabled (for
scsi
and md), though this may be by luck.

Then I guess it maybe related to blk-mq, which scheduler are you used
with blk-mq?
And maybe you can switch it to see if it is caused by specified
scheduler or not.

mq-deadline for SCSI and none for md and dm.


Can you try with the patch [1]? In case the block was caused by flush.

[1]: https://patchwork.kernel.org/patch/10787903/


Thanks,
Guoqing


Re: linux 4.19.19: md0_raid:1317 blocked for more than 120 seconds.

2019-02-13 Thread Guoqing Jiang




On 2/12/19 7:20 PM, Wolfgang Walter wrote:

Am Dienstag, 12. Februar 2019, 16:20:11 schrieb Guoqing Jiang:

On 2/11/19 11:12 PM, Wolfgang Walter wrote:

With 4.19.19 we see sometimes the following issue (practically only with
blk_mq, though):

Feb  4 20:04:46 tettnang kernel: [252300.060165] INFO: task md0_raid1:317
blocked for more than 120 seconds. Feb  4 20:04:46 tettnang kernel:
[252300.060188]   Not tainted 4.19.19-debian64.all+1.1 #1 Feb  4
20:04:46 tettnang kernel: [252300.060197] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb  4
20:04:46 tettnang kernel: [252300.060207] md0_raid1   D0   317
   2 0x8000 Feb  4 20:04:46 tettnang kernel: [252300.060211] Call
Trace:
Feb  4 20:04:46 tettnang kernel: [252300.060222]  ? __schedule+0x2a2/0x8c0
Feb  4 20:04:46 tettnang kernel: [252300.060226]  ?
_raw_spin_unlock_irqrestore+0x20/0x40 Feb  4 20:04:46 tettnang kernel:
[252300.060229]  schedule+0x32/0x90 Feb  4 20:04:46 tettnang kernel:
[252300.060241]  md_super_wait+0x69/0xa0 [md_mod] Feb  4 20:04:46
tettnang kernel: [252300.060247]  ? finish_wait+0x80/0x80 Feb  4 20:04:46
tettnang kernel: [252300.060255]  md_bitmap_wait_writes+0x8e/0xa0
[md_mod] Feb  4 20:04:46 tettnang kernel: [252300.060263]  ?
md_bitmap_get_counter+0x42/0xd0 [md_mod] Feb  4 20:04:46 tettnang kernel:
[252300.060271]  md_bitmap_daemon_work+0x1e8/0x380 [md_mod] Feb  4
20:04:46 tettnang kernel: [252300.060278]  ? md_rdev_init+0xb0/0xb0
[md_mod] Feb  4 20:04:46 tettnang kernel: [252300.060285]
md_check_recovery+0x26/0x540 [md_mod] Feb  4 20:04:46 tettnang kernel:
[252300.060290]  raid1d+0x5c/0xf00 [raid1] Feb  4 20:04:46 tettnang
kernel: [252300.060294]  ? preempt_count_add+0x79/0xb0 Feb  4 20:04:46
tettnang kernel: [252300.060298]  ? lock_timer_base+0x67/0x80 Feb  4
20:04:46 tettnang kernel: [252300.060302]  ?
_raw_spin_unlock_irqrestore+0x20/0x40 Feb  4 20:04:46 tettnang kernel:
[252300.060304]  ? try_to_del_timer_sync+0x4d/0x80 Feb  4 20:04:46
tettnang kernel: [252300.060306]  ? del_timer_sync+0x35/0x40 Feb  4
20:04:46 tettnang kernel: [252300.060309]  ? schedule_timeout+0x17a/0x3b0
Feb  4 20:04:46 tettnang kernel: [252300.060312]  ?
preempt_count_add+0x79/0xb0 Feb  4 20:04:46 tettnang kernel:
[252300.060315]  ? _raw_spin_lock_irqsave+0x25/0x50 Feb  4 20:04:46
tettnang kernel: [252300.060321]  ? md_rdev_init+0xb0/0xb0 [md_mod] Feb
4 20:04:46 tettnang kernel: [252300.060327]  ? md_thread+0xf9/0x160
[md_mod] Feb  4 20:04:46 tettnang kernel: [252300.060330]  ?
r1bio_pool_alloc+0x20/0x20 [raid1] Feb  4 20:04:46 tettnang kernel:
[252300.060336]  md_thread+0xf9/0x160 [md_mod] Feb  4 20:04:46 tettnang
kernel: [252300.060340]  ? finish_wait+0x80/0x80 Feb  4 20:04:46 tettnang
kernel: [252300.060344]  kthread+0x112/0x130 Feb  4 20:04:46 tettnang
kernel: [252300.060346]  ? kthread_create_worker_on_cpu+0x70/0x70 Feb  4
20:04:46 tettnang kernel: [252300.060350]  ret_from_fork+0x35/0x40

I saw that there was a similar problem with raid10 and an upstream patch

e820d55cb99dd93ac2dc949cf486bb187e5cd70d
md: fix raid10 hang issue caused by barrier
by  Guoqing Jiang

I wonder if there is a similar fix needed for raid1?

Seems not, the calltrace tells the previous write superblock IO was not
finish as expected,
there is a report for raid5 which has similar problem with md_super_wait
in the link [1]. Maybe
you can disable blk-mq to narrow down the issue as well.

I already did for 4 weeks. I didn't saw this with blk-mq disabled (for scsi
and md), though this may be by luck.


Then I guess it maybe related to blk-mq, which scheduler are you used 
with blk-mq?
And maybe you can switch it to see if it is caused by specified 
scheduler or not.



[1] |https://bbs.archlinux.org/viewtopic.php?id=243520

I found this bug report in debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904822


Thanks, the bug report also said it didn't happen after disable blk-mq.

Regards,
Guoqing


Re: [PATCH] drivers/md.c: Make bio_alloc_mddev return bio_alloc_bioset

2019-01-11 Thread Guoqing Jiang




On 1/11/19 10:17 AM, Marcos Paulo de Souza wrote:

ping?

On Sat, Dec 22, 2018 at 08:08:45AM -0200, Marcos Paulo de Souza wrote:

bio_alloc_bioset return a bio pointer or NULL, so we can avoid storing
the returned data into a new variable.

Signed-off-by: Marcos Paulo de Souza 


Add Jens to cc list, anyway, Acked-by: Guoqing Jiang 

Thanks,
Guoqing


---
  drivers/md/md.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc488cb30a94..42e018f014cb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -207,15 +207,10 @@ static bool create_on_open = true;
  struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
struct mddev *mddev)
  {
-   struct bio *b;
-
if (!mddev || !bioset_initialized(>bio_set))
return bio_alloc(gfp_mask, nr_iovecs);
  
-	b = bio_alloc_bioset(gfp_mask, nr_iovecs, >bio_set);

-   if (!b)
-   return NULL;
-   return b;
+   return bio_alloc_bioset(gfp_mask, nr_iovecs, >bio_set);
  }
  EXPORT_SYMBOL_GPL(bio_alloc_mddev);
  
--

2.16.4





Re: Shaohua Li

2019-01-02 Thread Guoqing Jiang




On 1/3/19 1:13 AM, Jens Axboe wrote:

Hi,

I've got some very sad news to share with you - over Christmas, Shaohua
Li passed away after battling cancer for most of last year.


It is really a sad news and a big lost for the community consider Shaohua's
great contribution!


As you know, Shaohua acted as the maintainer for md. He remained
dedicated to that through the difficult times. With his passing, we
obviously have a void that needs to be filled.


It is better if there is a small team to maintain md in future since nobody
have broad/deep knowledge as Neil or Shaohua, also people in the team
can cover/backup each other, see [1].


For now, I'll act as interim maintainer for md until we can find a good
permanent solution. I'll queue up reviewed patches in a separate branch,
in the block tree, and help review fixes. I'm subscribed to linux-raid,
but please do CC me as well.


Thanks, I think there are some patches in shli/md/for-next which supposed
to be merge to 4.21, could you please help to merge them to your branch?
And I suppose you will send a patch to CREDITS for Shaohua.

[1]. https://lkml.org/lkml/2015/12/21/9

Regards,
Guoqing


Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang



On 01/10/2018 02:13 PM, Paolo Valente wrote:



Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang <gqji...@suse.com> ha 
scritto:



On 01/09/2018 05:27 PM, Paolo Valente wrote:

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it.  On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists.  Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte <hol...@applied-asynchrony.com>
Reported-by: Guoqing Jiang <gqji...@suse.com>
Signed-off-by: Davide Ferrari <davideferra...@gmail.com>
Signed-off-by: Paolo Valente <paolo.vale...@linaro.org>
---
  block/bfq-cgroup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;
  + spin_lock_irqsave(>lock, flags);
+
if (!entity) /* root group */
-   return;
+   goto put_async_queues;
  - spin_lock_irqsave(>lock, flags);
/*
 * Empty all service_trees belonging to this group before
 * deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}
__bfq_deactivate_entity(entity, false);
+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
spin_unlock_irqrestore(>lock, flags);

With this change, bfqg_stats_xfer_dead will be called even entity is not 
existed,

Actually, the entity associated with the bfq group being offlined
exists even if the local variable entity is NULL here.  Simply, that
variable gets NULL if the bfq group is the bfq root group for a
device.


is it necessary?

No, I opted for this solution to not shake the code too much, and
considering that
- bfqg_stats_xfer_dead simply does nothing if applied
to a root group
- who knows, in the future that function may need
do be invoked for a root group too.

Yet, if you guys think that it would be cleaner to not invoke
bfqg_stats_xfer_dead at all for a root group, I'll change the code
accordingly (this would introduce a little asymmetry with
cfq_pd_offline, which invokes cfqg_stats_xfer_dead unconditionally).


Thanks a lot for the explanation, I am fine with it.

Acked-by: Guoqing Jiang <gqji...@suse.com>

Regards,
Guoqing


Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang



On 01/10/2018 02:13 PM, Paolo Valente wrote:



Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang  ha 
scritto:



On 01/09/2018 05:27 PM, Paolo Valente wrote:

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it.  On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists.  Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte 
Reported-by: Guoqing Jiang 
Signed-off-by: Davide Ferrari 
Signed-off-by: Paolo Valente 
---
  block/bfq-cgroup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;
  + spin_lock_irqsave(>lock, flags);
+
if (!entity) /* root group */
-   return;
+   goto put_async_queues;
  - spin_lock_irqsave(>lock, flags);
/*
 * Empty all service_trees belonging to this group before
 * deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}
__bfq_deactivate_entity(entity, false);
+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
spin_unlock_irqrestore(>lock, flags);

With this change, bfqg_stats_xfer_dead will be called even entity is not 
existed,

Actually, the entity associated with the bfq group being offlined
exists even if the local variable entity is NULL here.  Simply, that
variable gets NULL if the bfq group is the bfq root group for a
device.


is it necessary?

No, I opted for this solution to not shake the code too much, and
considering that
- bfqg_stats_xfer_dead simply does nothing if applied
to a root group
- who knows, in the future that function may need
do be invoked for a root group too.

Yet, if you guys think that it would be cleaner to not invoke
bfqg_stats_xfer_dead at all for a root group, I'll change the code
accordingly (this would introduce a little asymmetry with
cfq_pd_offline, which invokes cfqg_stats_xfer_dead unconditionally).


Thanks a lot for the explanation, I am fine with it.

Acked-by: Guoqing Jiang 

Regards,
Guoqing


Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang



On 01/09/2018 05:27 PM, Paolo Valente wrote:

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it.  On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists.  Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte <hol...@applied-asynchrony.com>
Reported-by: Guoqing Jiang <gqji...@suse.com>
Signed-off-by: Davide Ferrari <davideferra...@gmail.com>
Signed-off-by: Paolo Valente <paolo.vale...@linaro.org>
---
  block/bfq-cgroup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;
  
+	spin_lock_irqsave(>lock, flags);

+
if (!entity) /* root group */
-   return;
+   goto put_async_queues;
  
-	spin_lock_irqsave(>lock, flags);

/*
 * Empty all service_trees belonging to this group before
 * deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}
  
  	__bfq_deactivate_entity(entity, false);

+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
  
  	spin_unlock_irqrestore(>lock, flags);


With this change, bfqg_stats_xfer_dead will be called even entity is not 
existed,

is it necessary? Thanks.

Regards,
Guoqing


Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang



On 01/09/2018 05:27 PM, Paolo Valente wrote:

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it.  On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists.  Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte 
Reported-by: Guoqing Jiang 
Signed-off-by: Davide Ferrari 
Signed-off-by: Paolo Valente 
---
  block/bfq-cgroup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;
  
+	spin_lock_irqsave(>lock, flags);

+
if (!entity) /* root group */
-   return;
+   goto put_async_queues;
  
-	spin_lock_irqsave(>lock, flags);

/*
 * Empty all service_trees belonging to this group before
 * deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}
  
  	__bfq_deactivate_entity(entity, false);

+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
  
  	spin_unlock_irqrestore(>lock, flags);


With this change, bfqg_stats_xfer_dead will be called even entity is not 
existed,

is it necessary? Thanks.

Regards,
Guoqing


[PATCH] ppdev: remove unused ROUND_UP macro

2017-07-14 Thread Guoqing Jiang
This macro is not used after commit 3b9ab374a1e6
("ppdev: convert to y2038 safe"), so let's remove
it.

Signed-off-by: Guoqing Jiang <gqji...@suse.com>
---
 drivers/char/ppdev.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index 3e73bcd..d256110 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -101,9 +101,6 @@ static DEFINE_IDA(ida_index);
 #define PP_BUFFER_SIZE 1024
 #define PARDEVICE_MAX 8
 
-/* ROUND_UP macro from fs/select.c */
-#define ROUND_UP(x,y) (((x)+(y)-1)/(y))
-
 static DEFINE_MUTEX(pp_do_mutex);
 
 /* define fixed sized ioctl cmd for y2038 migration */
-- 
2.6.6



[PATCH] ppdev: remove unused ROUND_UP macro

2017-07-14 Thread Guoqing Jiang
This macro is not used after commit 3b9ab374a1e6
("ppdev: convert to y2038 safe"), so let's remove
it.

Signed-off-by: Guoqing Jiang 
---
 drivers/char/ppdev.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index 3e73bcd..d256110 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -101,9 +101,6 @@ static DEFINE_IDA(ida_index);
 #define PP_BUFFER_SIZE 1024
 #define PARDEVICE_MAX 8
 
-/* ROUND_UP macro from fs/select.c */
-#define ROUND_UP(x,y) (((x)+(y)-1)/(y))
-
 static DEFINE_MUTEX(pp_do_mutex);
 
 /* define fixed sized ioctl cmd for y2038 migration */
-- 
2.6.6



Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-28 Thread Guoqing Jiang



On 06/28/2017 12:22 AM, Ming Lei wrote:



Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Hi Guoqing,

I think it is a good idea, and even the two patches can be put into
one, so how about the following patch?


Looks good.

Acked-by: Guoqing Jiang <gqji...@suse.com>

Thanks,
Guoqing



Shaohua, what do you think of this one?

---
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3d957ac1e109..7ffc622dd7fa 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9126,6 +9126,24 @@ void md_reload_sb(struct mddev *mddev, int nr)
  }
  EXPORT_SYMBOL(md_reload_sb);
  
+/* generally called after bio_reset() for reseting bvec */

+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
+{
+   /* initialize bvec table again */
+   rp->idx = 0;
+   do {
+   struct page *page = resync_fetch_page(rp, rp->idx++);
+   int len = min_t(int, size, PAGE_SIZE);
+
+   /*
+* won't fail because the vec table is big
+* enough to hold all these pages
+*/
+   bio_add_page(bio, page, len, 0);
+   size -= len;
+   } while (rp->idx < RESYNC_PAGES && size > 0);
+}
+
  #ifndef MODULE
  
  /*

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 991f0fe2dcc6..f569831b1de9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -783,4 +783,6 @@ static inline struct page *resync_fetch_page(struct 
resync_pages *rp,
return NULL;
return rp->pages[idx];
  }
+
+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size);
  #endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..6ae2665ecd58 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,7 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
-   int j;
-   int size;
blk_status_t status;
-   struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2095,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
-   b->bi_vcnt = vcnt;
-   b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2102,8 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;
  
-		size = b->bi_iter.bi_size;

-   bio_for_each_segment_all(bi, b, j) {
-   bi->bv_offset = 0;
-   if (size > PAGE_SIZE)
-   bi->bv_len = PAGE_SIZE;
-   else
-   bi->bv_len = size;
-   size -= PAGE_SIZE;
-   }
+   /* initialize bvec table again */
+   reset_bvec_table(b, rp, r1_bio->sectors << 9);
}
for (primary = 0; primary < conf->raid_disks * 2; primary++)
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5026e7ad51d3..72f4fa07449b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2087,8 +2087,8 @@ static void sync_request_write(struct mddev *mddev, 
struct r10bio *r10_bio)
rp = get_resync_pages(tbio);
bio_reset(tbio);
  
-		tbio->bi_vcnt = vcnt;

-   tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
+   reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
+
rp->raid_bio = r10_bio;
tbio->bi_private = rp;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;

Thanks,
Ming





Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-28 Thread Guoqing Jiang



On 06/28/2017 12:22 AM, Ming Lei wrote:



Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Hi Guoqing,

I think it is a good idea, and even the two patches can be put into
one, so how about the following patch?


Looks good.

Acked-by: Guoqing Jiang 

Thanks,
Guoqing



Shaohua, what do you think of this one?

---
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3d957ac1e109..7ffc622dd7fa 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9126,6 +9126,24 @@ void md_reload_sb(struct mddev *mddev, int nr)
  }
  EXPORT_SYMBOL(md_reload_sb);
  
+/* generally called after bio_reset() for reseting bvec */

+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
+{
+   /* initialize bvec table again */
+   rp->idx = 0;
+   do {
+   struct page *page = resync_fetch_page(rp, rp->idx++);
+   int len = min_t(int, size, PAGE_SIZE);
+
+   /*
+* won't fail because the vec table is big
+* enough to hold all these pages
+*/
+   bio_add_page(bio, page, len, 0);
+   size -= len;
+   } while (rp->idx < RESYNC_PAGES && size > 0);
+}
+
  #ifndef MODULE
  
  /*

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 991f0fe2dcc6..f569831b1de9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -783,4 +783,6 @@ static inline struct page *resync_fetch_page(struct 
resync_pages *rp,
return NULL;
return rp->pages[idx];
  }
+
+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size);
  #endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..6ae2665ecd58 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,7 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
-   int j;
-   int size;
blk_status_t status;
-   struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2095,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
-   b->bi_vcnt = vcnt;
-   b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2102,8 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;
  
-		size = b->bi_iter.bi_size;

-   bio_for_each_segment_all(bi, b, j) {
-   bi->bv_offset = 0;
-   if (size > PAGE_SIZE)
-   bi->bv_len = PAGE_SIZE;
-   else
-   bi->bv_len = size;
-   size -= PAGE_SIZE;
-   }
+   /* initialize bvec table again */
+   reset_bvec_table(b, rp, r1_bio->sectors << 9);
}
for (primary = 0; primary < conf->raid_disks * 2; primary++)
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5026e7ad51d3..72f4fa07449b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2087,8 +2087,8 @@ static void sync_request_write(struct mddev *mddev, 
struct r10bio *r10_bio)
rp = get_resync_pages(tbio);
bio_reset(tbio);
  
-		tbio->bi_vcnt = vcnt;

-   tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
+   reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
+
rp->raid_bio = r10_bio;
tbio->bi_private = rp;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;

Thanks,
Ming





Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-27 Thread Guoqing Jiang



On 06/26/2017 08:09 PM, Ming Lei wrote:

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.

Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Ming Lei 
---
  drivers/md/raid1.c | 27 ++-
  1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..835c42396861 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
-   int j;
int size;
blk_status_t status;
-   struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
-   b->bi_vcnt = vcnt;
-   b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;
  
-		size = b->bi_iter.bi_size;

-   bio_for_each_segment_all(bi, b, j) {
-   bi->bv_offset = 0;
-   if (size > PAGE_SIZE)
-   bi->bv_len = PAGE_SIZE;
-   else
-   bi->bv_len = size;
-   size -= PAGE_SIZE;
-   }
+   /* initialize bvec table again */
+   rp->idx = 0;
+   size = r1_bio->sectors << 9;
+   do {
+   struct page *page = resync_fetch_page(rp, rp->idx++);
+   int len = min_t(int, size, PAGE_SIZE);
+
+   /*
+* won't fail because the vec table is big
+* enough to hold all these pages
+*/
+   bio_add_page(b, page, len, 0);
+   size -= len;
+   } while (rp->idx < RESYNC_PAGES && size > 0);
}


Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Thanks,
Guoqing


Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-27 Thread Guoqing Jiang



On 06/26/2017 08:09 PM, Ming Lei wrote:

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.

Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Ming Lei 
---
  drivers/md/raid1.c | 27 ++-
  1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..835c42396861 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
-   int j;
int size;
blk_status_t status;
-   struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
-   b->bi_vcnt = vcnt;
-   b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;
  
-		size = b->bi_iter.bi_size;

-   bio_for_each_segment_all(bi, b, j) {
-   bi->bv_offset = 0;
-   if (size > PAGE_SIZE)
-   bi->bv_len = PAGE_SIZE;
-   else
-   bi->bv_len = size;
-   size -= PAGE_SIZE;
-   }
+   /* initialize bvec table again */
+   rp->idx = 0;
+   size = r1_bio->sectors << 9;
+   do {
+   struct page *page = resync_fetch_page(rp, rp->idx++);
+   int len = min_t(int, size, PAGE_SIZE);
+
+   /*
+* won't fail because the vec table is big
+* enough to hold all these pages
+*/
+   bio_add_page(b, page, len, 0);
+   size -= len;
+   } while (rp->idx < RESYNC_PAGES && size > 0);
}


Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Thanks,
Guoqing


Re: [PATCH] md-cluster: Fix a memleak in an error handling path

2017-04-14 Thread Guoqing Jiang



On 04/14/2017 02:08 PM, Christophe JAILLET wrote:

We know that 'bm_lockres' is NULL here, so 'lockres_free(bm_lockres)' is a
no-op. According to resource handling in case of error a few lines below,
it is likely that 'bitmap_free(bitmap)' was expected instead.

Fixes: b98938d16a10 ("md-cluster: introduce cluster_check_sync_size")


Reviewed-by: Guoqing Jiang <gqji...@suse.com>

Thanks,
Guoqing



Signed-off-by: Christophe JAILLET <christophe.jail...@wanadoo.fr>
---
  drivers/md/md-cluster.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index b21ef58819f6..7299ce2f08a8 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -1127,7 +1127,7 @@ int cluster_check_sync_size(struct mddev *mddev)
bm_lockres = lockres_init(mddev, str, NULL, 1);
if (!bm_lockres) {
pr_err("md-cluster: Cannot initialize %s\n", str);
-   lockres_free(bm_lockres);
+   bitmap_free(bitmap);
return -1;
}
bm_lockres->flags |= DLM_LKF_NOQUEUE;




Re: [PATCH] md-cluster: Fix a memleak in an error handling path

2017-04-14 Thread Guoqing Jiang



On 04/14/2017 02:08 PM, Christophe JAILLET wrote:

We know that 'bm_lockres' is NULL here, so 'lockres_free(bm_lockres)' is a
no-op. According to resource handling in case of error a few lines below,
it is likely that 'bitmap_free(bitmap)' was expected instead.

Fixes: b98938d16a10 ("md-cluster: introduce cluster_check_sync_size")


Reviewed-by: Guoqing Jiang 

Thanks,
Guoqing



Signed-off-by: Christophe JAILLET 
---
  drivers/md/md-cluster.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index b21ef58819f6..7299ce2f08a8 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -1127,7 +1127,7 @@ int cluster_check_sync_size(struct mddev *mddev)
bm_lockres = lockres_init(mddev, str, NULL, 1);
if (!bm_lockres) {
pr_err("md-cluster: Cannot initialize %s\n", str);
-   lockres_free(bm_lockres);
+   bitmap_free(bitmap);
return -1;
}
bm_lockres->flags |= DLM_LKF_NOQUEUE;




Re: [LSF/MM TOPIC] [LSF/MM ATTEND] md raid general discussion

2017-01-15 Thread Guoqing Jiang



On 01/10/2017 12:38 AM, Coly Li wrote:

Hi Folks,

I'd like to propose a general md raid discussion, it is quite necessary
for most of active md raid developers sit together to discuss current
challenge of Linux software raid and development trends.

In the last years, we have many development activities in md raid, e.g.
raid5 cache, raid1 clustering, partial parity log, fast fail
upstreaming, and some effort for raid1 & raid0 performance improvement.

I see there are some kind of functionality overlap between r5cache
(raid5 cache) and PPL (partial parity log), currently I have no idea
where we will go for these two development activities.
Also I receive reports from users that raid1 performance is desired when
it is built on NVMe SSDs as a cache (maybe bcache or dm-cache). I am
working on some raid1 performance improvement (e.g. new raid1 I/O
barrier and lockless raid1 I/O submit), and have some more ideas to discuss.

Therefore, if md raid developers may have a chance to sit together,
discuss how to efficiently collaborate in next year, it will be much
more productive then communicating on mailing list.


I would like to attend raid discussion, besides above topics I think we
can talk about improve the test suite of mdadm to make it more robust
(I can share related test suite which is used for clustered raid).

And I could share  the status of clustered raid about what we have done
and what we can do in the future. Finally, I'd like to know/discuss about
the roadmap of RAID.

Thanks a lot!
Guoqing


Re: [LSF/MM TOPIC] [LSF/MM ATTEND] md raid general discussion

2017-01-15 Thread Guoqing Jiang



On 01/10/2017 12:38 AM, Coly Li wrote:

Hi Folks,

I'd like to propose a general md raid discussion, it is quite necessary
for most of active md raid developers sit together to discuss current
challenge of Linux software raid and development trends.

In the last years, we have many development activities in md raid, e.g.
raid5 cache, raid1 clustering, partial parity log, fast fail
upstreaming, and some effort for raid1 & raid0 performance improvement.

I see there are some kind of functionality overlap between r5cache
(raid5 cache) and PPL (partial parity log), currently I have no idea
where we will go for these two development activities.
Also I receive reports from users that raid1 performance is desired when
it is built on NVMe SSDs as a cache (maybe bcache or dm-cache). I am
working on some raid1 performance improvement (e.g. new raid1 I/O
barrier and lockless raid1 I/O submit), and have some more ideas to discuss.

Therefore, if md raid developers may have a chance to sit together,
discuss how to efficiently collaborate in next year, it will be much
more productive then communicating on mailing list.


I would like to attend raid discussion, besides above topics I think we
can talk about improve the test suite of mdadm to make it more robust
(I can share related test suite which is used for clustered raid).

And I could share  the status of clustered raid about what we have done
and what we can do in the future. Finally, I'd like to know/discuss about
the roadmap of RAID.

Thanks a lot!
Guoqing


Re: ANNOUNCE: mdadm 4.0 - A tool for managing md Soft RAID under Linux

2017-01-11 Thread Guoqing Jiang



On 01/12/2017 12:59 AM, Jes Sorensen wrote:

On 01/11/17 11:52, Shaohua Li wrote:

On Tue, Jan 10, 2017 at 11:49:04AM -0600, Bruce Dubbs wrote:

Jes Sorensen wrote:

I am pleased to announce the availability of
 mdadm version 4.0

It is available at the usual places:
 http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
 git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
 http://git.kernel.org/cgit/utils/mdadm/

The update in major version number primarily indicates this is a
release by it's new maintainer. In addition it contains a large number
of fixes in particular for IMSM RAID and clustered RAID support.  In
addition this release includes support for IMSM 4k sector drives,
failfast and better documentation for journaled RAID.

Thank you for the new release.  Unfortunately I get 9 failures running the
test suite:

tests/00raid1...  FAILED
tests/07autoassemble...   FAILED
tests/07changelevels...   FAILED
tests/07revert-grow...FAILED
tests/07revert-inplace... FAILED
tests/07testreshape5...   FAILED
tests/10ddf-fail-twice... FAILED
tests/20raid5journal...   FAILED
tests/10ddf-incremental-wrong-order...  FAILED

Yep, several tests usually fail. It appears some checks aren't always good.  At
least the 'check' function for reshape/resync isn't reliable in my test, I saw
07changelevelintr fails frequently.

That is my experience as well - some of them are affected by the kernel
version too. We probably need to look into making them more reliable.


If possible, it could be a potential topic for lsf/mm raid discussion as 
Coly suggested

in previous mail.

Is current test can run the test for different raid level, say, "./test 
--raidtype=raid1" could
execute all the *r1* tests, does it make sense to do it if we don't 
support it now.


Thanks,
Guoqing


Re: ANNOUNCE: mdadm 4.0 - A tool for managing md Soft RAID under Linux

2017-01-11 Thread Guoqing Jiang



On 01/12/2017 12:59 AM, Jes Sorensen wrote:

On 01/11/17 11:52, Shaohua Li wrote:

On Tue, Jan 10, 2017 at 11:49:04AM -0600, Bruce Dubbs wrote:

Jes Sorensen wrote:

I am pleased to announce the availability of
 mdadm version 4.0

It is available at the usual places:
 http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
 git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
 http://git.kernel.org/cgit/utils/mdadm/

The update in major version number primarily indicates this is a
release by it's new maintainer. In addition it contains a large number
of fixes in particular for IMSM RAID and clustered RAID support.  In
addition this release includes support for IMSM 4k sector drives,
failfast and better documentation for journaled RAID.

Thank you for the new release.  Unfortunately I get 9 failures running the
test suite:

tests/00raid1...  FAILED
tests/07autoassemble...   FAILED
tests/07changelevels...   FAILED
tests/07revert-grow...FAILED
tests/07revert-inplace... FAILED
tests/07testreshape5...   FAILED
tests/10ddf-fail-twice... FAILED
tests/20raid5journal...   FAILED
tests/10ddf-incremental-wrong-order...  FAILED

Yep, several tests usually fail. It appears some checks aren't always good.  At
least the 'check' function for reshape/resync isn't reliable in my test, I saw
07changelevelintr fails frequently.

That is my experience as well - some of them are affected by the kernel
version too. We probably need to look into making them more reliable.


If possible, it could be a potential topic for lsf/mm raid discussion as 
Coly suggested

in previous mail.

Is current test can run the test for different raid level, say, "./test 
--raidtype=raid1" could
execute all the *r1* tests, does it make sense to do it if we don't 
support it now.


Thanks,
Guoqing


Re: [PATCH 08/16] md/bitmap: Rename a jump label in location_store()

2016-09-28 Thread Guoqing Jiang



On 09/28/2016 03:55 PM, Jes Sorensen wrote:

SF Markus Elfring  writes:

From: Markus Elfring 
Date: Tue, 27 Sep 2016 15:46:22 +0200

Adjust jump labels according to the current Linux coding style convention.

Signed-off-by: Markus Elfring 
---
  drivers/md/bitmap.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

Sorry but this patch is just plain ridiculous. It does not improve the
code in any shape or form.

'out' as a label is perfectly legitimate and just as good as 'unlock'.


Agree, I also curious which document recorded the coding style convention.

Thanks,
Guoqing



Re: [PATCH 08/16] md/bitmap: Rename a jump label in location_store()

2016-09-28 Thread Guoqing Jiang



On 09/28/2016 03:55 PM, Jes Sorensen wrote:

SF Markus Elfring  writes:

From: Markus Elfring 
Date: Tue, 27 Sep 2016 15:46:22 +0200

Adjust jump labels according to the current Linux coding style convention.

Signed-off-by: Markus Elfring 
---
  drivers/md/bitmap.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

Sorry but this patch is just plain ridiculous. It does not improve the
code in any shape or form.

'out' as a label is perfectly legitimate and just as good as 'unlock'.


Agree, I also curious which document recorded the coding style convention.

Thanks,
Guoqing



Re: Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").

2016-09-18 Thread Guoqing Jiang



On 09/14/2016 04:39 PM, Marion & Christophe JAILLET wrote:


I don't share your feeling.
bitmap_create() can return ERR_PTR(-ENOMEM) or ERR_PTR(-EINVAL).

In such cases 'if (!bitmap)' will not be helpful.

Maybe it should be turned into 'if (IS_ERR_OR_NULL(bitmap))' to handle 
errors returned by bitmap_create.

Maybe just removing the call to 'bitmap_free(bitmap)' is enough.



I agreed we can remove it, if so, seems we are not consistent with the 
previous comment  of bitmap_create.


/*
 * initialize the bitmap structure
 * if this returns an error, bitmap_destroy must be called to do clean up
 */

What about revert it and re-use v1 patch? see 
http://www.spinics.net/lists/raid/msg51819.html.


Thanks,
Guoqing


Re: Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").

2016-09-18 Thread Guoqing Jiang



On 09/14/2016 04:39 PM, Marion & Christophe JAILLET wrote:


I don't share your feeling.
bitmap_create() can return ERR_PTR(-ENOMEM) or ERR_PTR(-EINVAL).

In such cases 'if (!bitmap)' will not be helpful.

Maybe it should be turned into 'if (IS_ERR_OR_NULL(bitmap))' to handle 
errors returned by bitmap_create.

Maybe just removing the call to 'bitmap_free(bitmap)' is enough.



I agreed we can remove it, if so, seems we are not consistent with the 
previous comment  of bitmap_create.


/*
 * initialize the bitmap structure
 * if this returns an error, bitmap_destroy must be called to do clean up
 */

What about revert it and re-use v1 patch? see 
http://www.spinics.net/lists/raid/msg51819.html.


Thanks,
Guoqing


Re: Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").

2016-09-14 Thread Guoqing Jiang



On 09/13/2016 01:24 PM, Shaohua Li wrote:

On Mon, Sep 12, 2016 at 09:09:48PM +0200, Christophe JAILLET wrote:

Hi,

I'm puzzled by commit f9a67b1182e5 ("md/bitmap: clear bitmap if
bitmap_create failed").

Hi Christophe,
Thank you very much to help check this!


Part of the commit is:

@@ -1865,8 +1866,10 @@ int bitmap_copy_from_slot(struct mddev *mddev, int
slot,
  struct bitmap_counts *counts;
  struct bitmap *bitmap = bitmap_create(mddev, slot);

-if (IS_ERR(bitmap))
+if (IS_ERR(bitmap)) {
+bitmap_free(bitmap);
  return PTR_ERR(bitmap);
+}

but if 'bitmap' is an error, I think that bad things will happen in
'bitmap_free()' when, at the beginning of the function, we will execute:

 if (bitmap->sysfs_can_clear) <-
 sysfs_put(bitmap->sysfs_can_clear);


I guess it is safe, since below part is at the beginning of bitmap_free.

if (!bitmap) /* there was no bitmap */
return;


Add Guoqing.

Yeah, you are right, this bitmap_free isn't required. This must be something
slip in in the v2 patch. I'll delete that line.


However, the commit log message is really explicit and adding this call to
'bitmap_free' has really been done one purpose. ("If bitmap_create returns
an error, we need to call either bitmap_destroy or bitmap_free to do clean
up, ...")

this log is a little confusing, I thought it really means the bitmap_free called
in bitmap_create. The V1 patch calls bitmap_destroy in bitmap_create.


I double checked v1 patch, it called bitmap_destroy once bitmap_create 
returned
error inside bitmap_copy_from_slot, also bitmap_destroy is also not 
called in

location_store once failed to create bitmap.

But since bitmap_free within bitmap_create is used to handle related 
failure,

seems we don't need the patch, and maybe we also don't need the second line
of below comments (the patch is motivated by the comment IIRC).

/*
 * initialize the bitmap structure
 * if this returns an error, bitmap_destroy must be called to do clean up
 */

Thanks,
Guoqing


Re: Question about commit f9a67b1182e5 ("md/bitmap: clear bitmap if bitmap_create failed").

2016-09-14 Thread Guoqing Jiang



On 09/13/2016 01:24 PM, Shaohua Li wrote:

On Mon, Sep 12, 2016 at 09:09:48PM +0200, Christophe JAILLET wrote:

Hi,

I'm puzzled by commit f9a67b1182e5 ("md/bitmap: clear bitmap if
bitmap_create failed").

Hi Christophe,
Thank you very much to help check this!


Part of the commit is:

@@ -1865,8 +1866,10 @@ int bitmap_copy_from_slot(struct mddev *mddev, int
slot,
  struct bitmap_counts *counts;
  struct bitmap *bitmap = bitmap_create(mddev, slot);

-if (IS_ERR(bitmap))
+if (IS_ERR(bitmap)) {
+bitmap_free(bitmap);
  return PTR_ERR(bitmap);
+}

but if 'bitmap' is an error, I think that bad things will happen in
'bitmap_free()' when, at the beginning of the function, we will execute:

 if (bitmap->sysfs_can_clear) <-
 sysfs_put(bitmap->sysfs_can_clear);


I guess it is safe, since below part is at the beginning of bitmap_free.

if (!bitmap) /* there was no bitmap */
return;


Add Guoqing.

Yeah, you are right, this bitmap_free isn't required. This must be something
slip in in the v2 patch. I'll delete that line.


However, the commit log message is really explicit and adding this call to
'bitmap_free' has really been done one purpose. ("If bitmap_create returns
an error, we need to call either bitmap_destroy or bitmap_free to do clean
up, ...")

this log is a little confusing, I thought it really means the bitmap_free called
in bitmap_create. The V1 patch calls bitmap_destroy in bitmap_create.


I double checked v1 patch, it called bitmap_destroy once bitmap_create 
returned
error inside bitmap_copy_from_slot, also bitmap_destroy is also not 
called in

location_store once failed to create bitmap.

But since bitmap_free within bitmap_create is used to handle related 
failure,

seems we don't need the patch, and maybe we also don't need the second line
of below comments (the patch is motivated by the comment IIRC).

/*
 * initialize the bitmap structure
 * if this returns an error, bitmap_destroy must be called to do clean up
 */

Thanks,
Guoqing


[PATCH 1/3] md: set MD_CHANGE_PENDING in a atomic region

2016-05-03 Thread Guoqing Jiang
Some code waits for a metadata update by:

1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
2. setting MD_CHANGE_PENDING and waking the management thread
3. waiting for MD_CHANGE_PENDING to be cleared

If the first two are done without locking, the code in md_update_sb()
which checks if it needs to repeat might test if an update is needed
before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
in the wait returning early.

So make sure all places that set MD_CHANGE_PENDING are atomicial, and
bit_clear_unless (suggested by Neil) is introduced for the purpose.

Cc: Martin Kepplinger <mart...@posteo.de>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: Sasha Levin <sasha.le...@oracle.com>
Cc: <linux-kernel@vger.kernel.org>
Reviewed-by: NeilBrown <ne...@suse.com>
Signed-off-by: Guoqing Jiang <gqji...@suse.com>
---
 drivers/md/md.c  | 27 ++-
 drivers/md/raid1.c   |  4 ++--
 drivers/md/raid10.c  |  8 
 drivers/md/raid5-cache.c |  4 ++--
 drivers/md/raid5.c   |  4 ++--
 include/linux/bitops.h   | 16 
 6 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 06f6e81..892d639 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2295,12 +2295,16 @@ repeat:
if (mddev_is_clustered(mddev)) {
if (test_and_clear_bit(MD_CHANGE_DEVS, >flags))
force_change = 1;
+   if (test_and_clear_bit(MD_CHANGE_CLEAN, >flags))
+   nospares = 1;
ret = md_cluster_ops->metadata_update_start(mddev);
/* Has someone else has updated the sb */
if (!does_sb_need_changing(mddev)) {
if (ret == 0)
md_cluster_ops->metadata_update_cancel(mddev);
-   clear_bit(MD_CHANGE_PENDING, >flags);
+   bit_clear_unless(>flags, BIT(MD_CHANGE_PENDING),
+BIT(MD_CHANGE_DEVS) |
+BIT(MD_CHANGE_CLEAN));
return;
}
}
@@ -2434,15 +2438,11 @@ repeat:
if (mddev_is_clustered(mddev) && ret == 0)
md_cluster_ops->metadata_update_finish(mddev);
 
-   spin_lock(>lock);
if (mddev->in_sync != sync_req ||
-   test_bit(MD_CHANGE_DEVS, >flags)) {
+   !bit_clear_unless(>flags, BIT(MD_CHANGE_PENDING),
+  BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_CLEAN)))
/* have to write it out again */
-   spin_unlock(>lock);
goto repeat;
-   }
-   clear_bit(MD_CHANGE_PENDING, >flags);
-   spin_unlock(>lock);
wake_up(>sb_wait);
if (test_bit(MD_RECOVERY_RUNNING, >recovery))
sysfs_notify(>kobj, NULL, "sync_completed");
@@ -8147,18 +8147,18 @@ void md_do_sync(struct md_thread *thread)
}
}
  skip:
-   set_bit(MD_CHANGE_DEVS, >flags);
-
if (mddev_is_clustered(mddev) &&
ret == 0) {
/* set CHANGE_PENDING here since maybe another
 * update is needed, so other nodes are informed */
-   set_bit(MD_CHANGE_PENDING, >flags);
+   set_mask_bits(>flags, 0,
+ BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));
md_wakeup_thread(mddev->thread);
wait_event(mddev->sb_wait,
   !test_bit(MD_CHANGE_PENDING, >flags));
md_cluster_ops->resync_finish(mddev);
-   }
+   } else
+   set_bit(MD_CHANGE_DEVS, >flags);
 
spin_lock(>lock);
if (!test_bit(MD_RECOVERY_INTR, >recovery)) {
@@ -8550,6 +8550,7 @@ EXPORT_SYMBOL(md_finish_reshape);
 int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
   int is_new)
 {
+   struct mddev *mddev = rdev->mddev;
int rv;
if (is_new)
s += rdev->new_data_offset;
@@ -8559,8 +8560,8 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, 
int sectors,
if (rv == 0) {
/* Make sure they get written out promptly */
sysfs_notify_dirent_safe(rdev->sysfs_state);
-   set_bit(MD_CHANGE_CLEAN, >mddev->flags);
-   set_bit(MD_CHANGE_PENDING, >mddev->flags);
+   set_mask_bits(>flags, 0,
+ BIT(MD_CHANGE_CLEAN) | BIT(MD_CHANGE_PENDING));
md_wakeup_thread(rdev->mddev->thread);
return 1;
} else
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a7f2b9c..c7c8cde 100644

[PATCH 1/3] md: set MD_CHANGE_PENDING in a atomic region

2016-05-03 Thread Guoqing Jiang
Some code waits for a metadata update by:

1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
2. setting MD_CHANGE_PENDING and waking the management thread
3. waiting for MD_CHANGE_PENDING to be cleared

If the first two are done without locking, the code in md_update_sb()
which checks if it needs to repeat might test if an update is needed
before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
in the wait returning early.

So make sure all places that set MD_CHANGE_PENDING are atomicial, and
bit_clear_unless (suggested by Neil) is introduced for the purpose.

Cc: Martin Kepplinger 
Cc: Andrew Morton 
Cc: Denys Vlasenko 
Cc: Sasha Levin 
Cc: 
Reviewed-by: NeilBrown 
Signed-off-by: Guoqing Jiang 
---
 drivers/md/md.c  | 27 ++-
 drivers/md/raid1.c   |  4 ++--
 drivers/md/raid10.c  |  8 
 drivers/md/raid5-cache.c |  4 ++--
 drivers/md/raid5.c   |  4 ++--
 include/linux/bitops.h   | 16 
 6 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 06f6e81..892d639 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2295,12 +2295,16 @@ repeat:
if (mddev_is_clustered(mddev)) {
if (test_and_clear_bit(MD_CHANGE_DEVS, >flags))
force_change = 1;
+   if (test_and_clear_bit(MD_CHANGE_CLEAN, >flags))
+   nospares = 1;
ret = md_cluster_ops->metadata_update_start(mddev);
/* Has someone else has updated the sb */
if (!does_sb_need_changing(mddev)) {
if (ret == 0)
md_cluster_ops->metadata_update_cancel(mddev);
-   clear_bit(MD_CHANGE_PENDING, >flags);
+   bit_clear_unless(>flags, BIT(MD_CHANGE_PENDING),
+BIT(MD_CHANGE_DEVS) |
+BIT(MD_CHANGE_CLEAN));
return;
}
}
@@ -2434,15 +2438,11 @@ repeat:
if (mddev_is_clustered(mddev) && ret == 0)
md_cluster_ops->metadata_update_finish(mddev);
 
-   spin_lock(>lock);
if (mddev->in_sync != sync_req ||
-   test_bit(MD_CHANGE_DEVS, >flags)) {
+   !bit_clear_unless(>flags, BIT(MD_CHANGE_PENDING),
+  BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_CLEAN)))
/* have to write it out again */
-   spin_unlock(>lock);
goto repeat;
-   }
-   clear_bit(MD_CHANGE_PENDING, >flags);
-   spin_unlock(>lock);
wake_up(>sb_wait);
if (test_bit(MD_RECOVERY_RUNNING, >recovery))
sysfs_notify(>kobj, NULL, "sync_completed");
@@ -8147,18 +8147,18 @@ void md_do_sync(struct md_thread *thread)
}
}
  skip:
-   set_bit(MD_CHANGE_DEVS, >flags);
-
if (mddev_is_clustered(mddev) &&
ret == 0) {
/* set CHANGE_PENDING here since maybe another
 * update is needed, so other nodes are informed */
-   set_bit(MD_CHANGE_PENDING, >flags);
+   set_mask_bits(>flags, 0,
+ BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));
md_wakeup_thread(mddev->thread);
wait_event(mddev->sb_wait,
   !test_bit(MD_CHANGE_PENDING, >flags));
md_cluster_ops->resync_finish(mddev);
-   }
+   } else
+   set_bit(MD_CHANGE_DEVS, >flags);
 
spin_lock(>lock);
if (!test_bit(MD_RECOVERY_INTR, >recovery)) {
@@ -8550,6 +8550,7 @@ EXPORT_SYMBOL(md_finish_reshape);
 int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
   int is_new)
 {
+   struct mddev *mddev = rdev->mddev;
int rv;
if (is_new)
s += rdev->new_data_offset;
@@ -8559,8 +8560,8 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, 
int sectors,
if (rv == 0) {
/* Make sure they get written out promptly */
sysfs_notify_dirent_safe(rdev->sysfs_state);
-   set_bit(MD_CHANGE_CLEAN, >mddev->flags);
-   set_bit(MD_CHANGE_PENDING, >mddev->flags);
+   set_mask_bits(>flags, 0,
+ BIT(MD_CHANGE_CLEAN) | BIT(MD_CHANGE_PENDING));
md_wakeup_thread(rdev->mddev->thread);
return 1;
} else
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a7f2b9c..c7c8cde 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1474,8 +1474,8 @@ static void raid1_error(struct mddev *mddev, struct 
md_rdev *rdev)
 * if recovery is running, make sure it aborts.
   

Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-16 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
> On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
>   
>> Do you consider take the following clean up? If yes, I will send a
>> formal patch, otherwise pls ignore it.
>> 
>
> On first glance, the old and new code do not appear to do the same thing,
> so let's leave it as it is.
>
>   
>> -   to_nodeid = dlm_dir_nodeid(r);
>> 
Sorry, seems it is the only different thing, if combines previous change
with below modification, then the behavior is same.

@@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mstype)
struct dlm_mhandle *mh;
int to_nodeid, error;
 
-   to_nodeid = r->res_nodeid;
+   if (mstype == DLM_MSG_LOOKUP)
+   to_nodeid = dlm_dir_nodeid(r);
+   else
+   to_nodeid = r->res_nodeid;

And for create_message, the second parameter (lkb) is not effective to
create three type msgs (REQUEST/LOOKUP/REMOVE).

Thanks,
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-16 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
 On Thu, Jun 11, 2015 at 05:47:28PM +0800, Guoqing Jiang wrote:
   
 Do you consider take the following clean up? If yes, I will send a
 formal patch, otherwise pls ignore it.
 

 On first glance, the old and new code do not appear to do the same thing,
 so let's leave it as it is.

   
 -   to_nodeid = dlm_dir_nodeid(r);
 
Sorry, seems it is the only different thing, if combines previous change
with below modification, then the behavior is same.

@@ -3644,7 +3644,10 @@ static int send_common(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mstype)
struct dlm_mhandle *mh;
int to_nodeid, error;
 
-   to_nodeid = r-res_nodeid;
+   if (mstype == DLM_MSG_LOOKUP)
+   to_nodeid = dlm_dir_nodeid(r);
+   else
+   to_nodeid = r-res_nodeid;

And for create_message, the second parameter (lkb) is not effective to
create three type msgs (REQUEST/LOOKUP/REMOVE).

Thanks,
Guoqing
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
> On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote:
>   
>> The remove_from_waiters could  only be invoked after failed to
>> create_message, right?
>> Since send_message always returns 0, this patch doesn't touch anything
>> about the failure
>> path, and it also doesn't change the original semantic.
>> 
>
> I'm not inclined to take any patches unless there's a problem identified.
>
> .  
>   
Do you consider take the following clean up? If yes, I will send a
formal patch,
otherwise pls ignore it.

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..7c822f7 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3747,30 +3747,7 @@ static int send_bast(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mode)
 
 static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb)
 {
-   struct dlm_message *ms;
-   struct dlm_mhandle *mh;
-   int to_nodeid, error;
-
-   to_nodeid = dlm_dir_nodeid(r);
-
-   error = add_to_waiters(lkb, DLM_MSG_LOOKUP, to_nodeid);
-   if (error)
-   return error;
-
-   error = create_message(r, NULL, to_nodeid, DLM_MSG_LOOKUP, ,
);
-   if (error)
-   goto fail;
-
-   send_args(r, lkb, ms);
-
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
-
- fail:
-   remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
-   return error;
+   return send_common(r, lkb, DLM_MSG_LOOKUP);
 }

Thanks,
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-11 Thread Guoqing Jiang
Hi David,

David Teigland wrote:
 On Wed, Jun 10, 2015 at 11:10:44AM +0800, Guoqing Jiang wrote:
   
 The remove_from_waiters could  only be invoked after failed to
 create_message, right?
 Since send_message always returns 0, this patch doesn't touch anything
 about the failure
 path, and it also doesn't change the original semantic.
 

 I'm not inclined to take any patches unless there's a problem identified.

 .  
   
Do you consider take the following clean up? If yes, I will send a
formal patch,
otherwise pls ignore it.

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..7c822f7 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3747,30 +3747,7 @@ static int send_bast(struct dlm_rsb *r, struct
dlm_lkb *lkb, int mode)
 
 static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb)
 {
-   struct dlm_message *ms;
-   struct dlm_mhandle *mh;
-   int to_nodeid, error;
-
-   to_nodeid = dlm_dir_nodeid(r);
-
-   error = add_to_waiters(lkb, DLM_MSG_LOOKUP, to_nodeid);
-   if (error)
-   return error;
-
-   error = create_message(r, NULL, to_nodeid, DLM_MSG_LOOKUP, ms,
mh);
-   if (error)
-   goto fail;
-
-   send_args(r, lkb, ms);
-
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
-
- fail:
-   remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
-   return error;
+   return send_common(r, lkb, DLM_MSG_LOOKUP);
 }

Thanks,
Guoqing
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread Guoqing Jiang
Bob Peterson wrote:
>
>>>> 
>>>> 
>>>>> - Original Message -
>>>>>   
>>>>>   
>>>>>   
>>>>>> We don't need the redundant logic since send_message always returns 0.
>>>>>>
>>>>>> Signed-off-by: Guoqing Jiang 
>>>>>> ---
>>>>>>  fs/dlm/lock.c | 10 ++
>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
>>>>>> index 35502d4..6fc3de9 100644
>>>>>> --- a/fs/dlm/lock.c
>>>>>> +++ b/fs/dlm/lock.c
>>>>>> @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
>>>>>> dlm_lkb *lkb, int mstype)
>>>>>>  
>>>>>>  send_args(r, lkb, ms);
>>>>>>  
>>>>>> -error = send_message(mh, ms);
>>>>>> -if (error)
>>>>>> -goto fail;
>>>>>> -return 0;
>>>>>> +return send_message(mh, ms);
>>>>>> 
>
> Hi Guoqing,
>
> Sorry, I was momentarily confused. I think you misunderstood what I was 
> saying.
> What I meant was: Instead of doing:
>
> + return send_message(mh, ms);
> ...where send_message returns 0, it might be better to have:
>
> static void send_message(struct dlm_mhandle *mh, struct dlm_message *ms)
> {
>   dlm_message_out(ms);
>   dlm_lowcomms_commit_buffer(mh);
> }
>
> ...And in send_common, do (in both places):
> + send_message(mh, ms);
> + return 0;
>
> Since it's so short, it might even be better to code send_message as a macro,
> or at least an "inline" function.
>
>   
Hi Bob,

Got it, thanks. It is a better solution but it is not a bug fix or
similar thing, so maybe just leave it as it is.

Regards,
Guoqing


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-10 Thread Guoqing Jiang
Bob Peterson wrote:

 
 
 - Original Message -
   
   
   
 We don't need the redundant logic since send_message always returns 0.

 Signed-off-by: Guoqing Jiang gqji...@suse.com
 ---
  fs/dlm/lock.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 35502d4..6fc3de9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
 dlm_lkb *lkb, int mstype)
  
  send_args(r, lkb, ms);
  
 -error = send_message(mh, ms);
 -if (error)
 -goto fail;
 -return 0;
 +return send_message(mh, ms);
 

 Hi Guoqing,

 Sorry, I was momentarily confused. I think you misunderstood what I was 
 saying.
 What I meant was: Instead of doing:

 + return send_message(mh, ms);
 ...where send_message returns 0, it might be better to have:

 static void send_message(struct dlm_mhandle *mh, struct dlm_message *ms)
 {
   dlm_message_out(ms);
   dlm_lowcomms_commit_buffer(mh);
 }

 ...And in send_common, do (in both places):
 + send_message(mh, ms);
 + return 0;

 Since it's so short, it might even be better to code send_message as a macro,
 or at least an inline function.

   
Hi Bob,

Got it, thanks. It is a better solution but it is not a bug fix or
similar thing, so maybe just leave it as it is.

Regards,
Guoqing


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
Bob Peterson wrote:
> - Original Message -
>   
>> Hi Bob,
>>
>> Bob Peterson wrote:
>> 
>>> - Original Message -
>>>   
>>>   
>>>> We don't need the redundant logic since send_message always returns 0.
>>>>
>>>> Signed-off-by: Guoqing Jiang 
>>>> ---
>>>>  fs/dlm/lock.c | 10 ++
>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
>>>> index 35502d4..6fc3de9 100644
>>>> --- a/fs/dlm/lock.c
>>>> +++ b/fs/dlm/lock.c
>>>> @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
>>>> dlm_lkb *lkb, int mstype)
>>>>  
>>>>send_args(r, lkb, ms);
>>>>  
>>>> -  error = send_message(mh, ms);
>>>> -  if (error)
>>>> -  goto fail;
>>>> -  return 0;
>>>> +  return send_message(mh, ms);
>>>>  
>>>>   fail:
>>>>remove_from_waiters(lkb, msg_reply_type(mstype));
>>>> @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct
>>>> dlm_lkb *lkb)
>>>>  
>>>>send_args(r, lkb, ms);
>>>>  
>>>> -  error = send_message(mh, ms);
>>>> -  if (error)
>>>> -  goto fail;
>>>> -  return 0;
>>>> +  return send_message(mh, ms);
>>>>  
>>>>   fail:
>>>>remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
>>>> --
>>>> 1.7.12.4
>>>> 
>>>> 
>>> Hi,
>>>
>>> The patch looks okay, but if remove_from_waiters() always returns 0,
>>> wouldn't it be better to change the function from int to void and
>>> return 0 here? The advantage is that code spelunkers wouldn't need
>>> to back-track one more level (not to mention the instruction or two
>>> it might save).
>>>
>>>   
>>>   
>> Seems remove_from_waiters is not always returns 0, the return value
>> could  be -1 or 0 which depends on _remove_from_waiters.
>>
>> BTW, I found that there are no big difference between send_common
>> and send_lookup, since the send_common can also be use to send
>> lookup message, I guess send_lookup can be removed as well.
>>
>> Thanks,
>> Guoqing
>> 
>
> Hi Guoqing,
>
> If remove_from_waiters can return -1, then the patch would prevent the
> code from calling remove_from_waiters. So the patch still doesn't look
> right to me.
>
>   
Hi Bob,

The remove_from_waiters could  only be invoked after failed to
create_message, right?
Since send_message always returns 0, this patch doesn't touch anything
about the failure
path, and it also doesn't change the original semantic.

Thanks,
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
Hi Bob,

Bob Peterson wrote:
> - Original Message -
>   
>> We don't need the redundant logic since send_message always returns 0.
>>
>> Signed-off-by: Guoqing Jiang 
>> ---
>>  fs/dlm/lock.c | 10 ++
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
>> index 35502d4..6fc3de9 100644
>> --- a/fs/dlm/lock.c
>> +++ b/fs/dlm/lock.c
>> @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
>> dlm_lkb *lkb, int mstype)
>>  
>>  send_args(r, lkb, ms);
>>  
>> -error = send_message(mh, ms);
>> -if (error)
>> -goto fail;
>> -return 0;
>> +return send_message(mh, ms);
>>  
>>   fail:
>>  remove_from_waiters(lkb, msg_reply_type(mstype));
>> @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct
>> dlm_lkb *lkb)
>>  
>>  send_args(r, lkb, ms);
>>  
>> -error = send_message(mh, ms);
>> -if (error)
>> -goto fail;
>> -return 0;
>> +return send_message(mh, ms);
>>  
>>   fail:
>>  remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
>> --
>> 1.7.12.4
>> 
>
> Hi,
>
> The patch looks okay, but if remove_from_waiters() always returns 0,
> wouldn't it be better to change the function from int to void and
> return 0 here? The advantage is that code spelunkers wouldn't need
> to back-track one more level (not to mention the instruction or two
> it might save).
>
>   
Seems remove_from_waiters is not always returns 0, the return value
could  be -1 or 0 which depends on _remove_from_waiters.

BTW, I found that there are no big difference between send_common
and send_lookup, since the send_common can also be use to send
lookup message, I guess send_lookup can be removed as well.

Thanks,
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
We don't need the redundant logic since send_message always returns 0.

Signed-off-by: Guoqing Jiang 
---
 fs/dlm/lock.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..6fc3de9 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb 
*lkb, int mstype)
 
send_args(r, lkb, ms);
 
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
+   return send_message(mh, ms);
 
  fail:
remove_from_waiters(lkb, msg_reply_type(mstype));
@@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb 
*lkb)
 
send_args(r, lkb, ms);
 
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
+   return send_message(mh, ms);
 
  fail:
remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
We don't need the redundant logic since send_message always returns 0.

Signed-off-by: Guoqing Jiang gqji...@suse.com
---
 fs/dlm/lock.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 35502d4..6fc3de9 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct dlm_lkb 
*lkb, int mstype)
 
send_args(r, lkb, ms);
 
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
+   return send_message(mh, ms);
 
  fail:
remove_from_waiters(lkb, msg_reply_type(mstype));
@@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct dlm_lkb 
*lkb)
 
send_args(r, lkb, ms);
 
-   error = send_message(mh, ms);
-   if (error)
-   goto fail;
-   return 0;
+   return send_message(mh, ms);
 
  fail:
remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
Hi Bob,

Bob Peterson wrote:
 - Original Message -
   
 We don't need the redundant logic since send_message always returns 0.

 Signed-off-by: Guoqing Jiang gqji...@suse.com
 ---
  fs/dlm/lock.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 35502d4..6fc3de9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
 dlm_lkb *lkb, int mstype)
  
  send_args(r, lkb, ms);
  
 -error = send_message(mh, ms);
 -if (error)
 -goto fail;
 -return 0;
 +return send_message(mh, ms);
  
   fail:
  remove_from_waiters(lkb, msg_reply_type(mstype));
 @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct
 dlm_lkb *lkb)
  
  send_args(r, lkb, ms);
  
 -error = send_message(mh, ms);
 -if (error)
 -goto fail;
 -return 0;
 +return send_message(mh, ms);
  
   fail:
  remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
 --
 1.7.12.4
 

 Hi,

 The patch looks okay, but if remove_from_waiters() always returns 0,
 wouldn't it be better to change the function from int to void and
 return 0 here? The advantage is that code spelunkers wouldn't need
 to back-track one more level (not to mention the instruction or two
 it might save).

   
Seems remove_from_waiters is not always returns 0, the return value
could  be -1 or 0 which depends on _remove_from_waiters.

BTW, I found that there are no big difference between send_common
and send_lookup, since the send_common can also be use to send
lookup message, I guess send_lookup can be removed as well.

Thanks,
Guoqing
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] [PATCH] dlm: remove unnecessary error check

2015-06-09 Thread Guoqing Jiang
Bob Peterson wrote:
 - Original Message -
   
 Hi Bob,

 Bob Peterson wrote:
 
 - Original Message -
   
   
 We don't need the redundant logic since send_message always returns 0.

 Signed-off-by: Guoqing Jiang gqji...@suse.com
 ---
  fs/dlm/lock.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
 index 35502d4..6fc3de9 100644
 --- a/fs/dlm/lock.c
 +++ b/fs/dlm/lock.c
 @@ -3656,10 +3656,7 @@ static int send_common(struct dlm_rsb *r, struct
 dlm_lkb *lkb, int mstype)
  
send_args(r, lkb, ms);
  
 -  error = send_message(mh, ms);
 -  if (error)
 -  goto fail;
 -  return 0;
 +  return send_message(mh, ms);
  
   fail:
remove_from_waiters(lkb, msg_reply_type(mstype));
 @@ -3763,10 +3760,7 @@ static int send_lookup(struct dlm_rsb *r, struct
 dlm_lkb *lkb)
  
send_args(r, lkb, ms);
  
 -  error = send_message(mh, ms);
 -  if (error)
 -  goto fail;
 -  return 0;
 +  return send_message(mh, ms);
  
   fail:
remove_from_waiters(lkb, DLM_MSG_LOOKUP_REPLY);
 --
 1.7.12.4
 
 
 Hi,

 The patch looks okay, but if remove_from_waiters() always returns 0,
 wouldn't it be better to change the function from int to void and
 return 0 here? The advantage is that code spelunkers wouldn't need
 to back-track one more level (not to mention the instruction or two
 it might save).

   
   
 Seems remove_from_waiters is not always returns 0, the return value
 could  be -1 or 0 which depends on _remove_from_waiters.

 BTW, I found that there are no big difference between send_common
 and send_lookup, since the send_common can also be use to send
 lookup message, I guess send_lookup can be removed as well.

 Thanks,
 Guoqing
 

 Hi Guoqing,

 If remove_from_waiters can return -1, then the patch would prevent the
 code from calling remove_from_waiters. So the patch still doesn't look
 right to me.

   
Hi Bob,

The remove_from_waiters could  only be invoked after failed to
create_message, right?
Since send_message always returns 0, this patch doesn't touch anything
about the failure
path, and it also doesn't change the original semantic.

Thanks,
Guoqing
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the md tree

2015-03-13 Thread Guoqing Jiang
Stephen Rothwell wrote:
> Hi Neil,
>
> On Mon, 2 Mar 2015 17:11:49 +1100 Stephen Rothwell  
> wrote:
>   
>> On Mon, 2 Mar 2015 17:03:45 +1100 NeilBrown  wrote:
>> 
>>> I think
>>> +   bm_blocks = DIV_ROUND_UP(bm_blocks, 4096);
>>>
>>> needs DIV_ROUND_UP_SECTOR_T()
>>>   
>> I tried that and it was not sufficient.
>>
>> 
>>> The first patch you identified adds that line.  The second relocates it.
>>>   
>> The second also changes this:
>>
>> bm_blocks = sector_div(bitmap->mddev->resync_max_sectors, (chunksize >> 9));
>>
>> (added by the first) to this:
>>
>> bm_blocks = bitmap->mddev->resync_max_sectors / 
>> (bitmap->mddev->bitmap_info.chunksize >> 9);
>>
>> where bitmap->mddev->resync_max_sectors is a sector_t ...
>> 
>
> So I applied this patch for today:
>
> From: Stephen Rothwell 
> Date: Tue, 3 Mar 2015 13:30:26 +1100
> Subject: [PATCH] md/bitmap: use sector_div for sector_t divisions
>
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/md/bitmap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 23f575f0cd92..d40398404ab6 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -573,7 +573,8 @@ re_read:
>   if (bitmap->cluster_slot >= 0) {
>   long long bm_blocks;
>  
> - bm_blocks = bitmap->mddev->resync_max_sectors / 
> (bitmap->mddev->bitmap_info.chunksize >> 9);
> + bm_blocks = sector_div(bitmap->mddev->resync_max_sectors,
> +bitmap->mddev->bitmap_info.chunksize >> 
> 9);
>   
One simple question, isn't the sector_div used for change '%' operator?
but the modified line is:

bm_blocks = bitmap->mddev->resync_max_sectors */* 
(bitmap->mddev->bitmap_info.chunksize >> 9);

But, the modified is:

bm_blocks = sector_div(bitmap->mddev->resync_max_sectors,
   bitmap->mddev->bitmap_info.chunksize >> 9);

The sector_div returns:

bitmap->mddev->resync_max_sectors % bitmap->mddev->bitmap_info.chunksize >> 9

So it basically means :

bm_blocks = bitmap->mddev->resync_max_sectors % 
bitmap->mddev->bitmap_info.chunksize >> 9


And maybe the current next tree should add the following change to keep
original semantic.

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 501f83f..ea9c685 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -571,11 +571,10 @@ static int bitmap_read_sb(struct bitmap *bitmap)
 re_read:
/* If cluster_slot is set, the cluster is setup */
if (bitmap->cluster_slot >= 0) {
-   sector_t bm_blocks;
-   sector_t resync_sectors = bitmap->mddev->resync_max_sectors;
+   sector_t bm_blocks = bitmap->mddev->resync_max_sectors;
 
-   bm_blocks = sector_div(resync_sectors,
- 
bitmap->mddev->bitmap_info.chunksize >> 9);
+   sector_div(bm_blocks,
+   bitmap->mddev->bitmap_info.chunksize >> 9);
bm_blocks = bm_blocks << 3;
bm_blocks = DIV_ROUND_UP_SECTOR_T(bm_blocks, 4096);
bitmap->mddev->bitmap_info.offset +=
bitmap->cluster_slot * (bm_blocks << 3);


Thanks,
Guoqing
>   bm_blocks = bm_blocks << 3;
>   bm_blocks = DIV_ROUND_UP_SECTOR_T(bm_blocks, 4096);
>   bitmap->mddev->bitmap_info.offset += bitmap->cluster_slot * 
> (bm_blocks << 3);
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >