Re: [PATCH 1/5] sched, rt: move .switched_from out of the scope of CONFIG_SMP

2013-12-28 Thread Zhi Yong Wu
On Sat, Dec 28, 2013 at 5:48 PM, Kirill Tkhai  wrote:
> On Сб, дек 28, 2013 at 05:37:32 +0800, Zhi Yong Wu wrote:
>> On Sat, Dec 28, 2013 at 5:19 PM, Kirill Tkhai  wrote:
>> > On Пт, дек 27, 2013 at 07:41:00 +0800, Zhi Yong Wu wrote:
>> >> From: Zhi Yong Wu 
>> >>
>> >> .switched_from shouldn't be initialized in the scope of CONFIG_SMP,
>> >> so this patch is trying to move it out.
>> >>
>> >> Signed-off-by: Zhi Yong Wu 
>> >> ---
>> >>  kernel/sched/rt.c |2 +-
>> >>  1 files changed, 1 insertions(+), 1 deletions(-)
>> >>
>> >> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> >> index 1c40655..f34d41b 100644
>> >> --- a/kernel/sched/rt.c
>> >> +++ b/kernel/sched/rt.c
>> >> @@ -2002,9 +2002,9 @@ const struct sched_class rt_sched_class = {
>> >>   .pre_schedule   = pre_schedule_rt,
>> >>   .post_schedule  = post_schedule_rt,
>> >>   .task_woken = task_woken_rt,
>> >> - .switched_from  = switched_from_rt,
>> >>  #endif
>> >>
>> >> + .switched_from  = switched_from_rt,
>> >>   .set_curr_task  = set_curr_task_rt,
>> >>   .task_tick  = task_tick_rt,
>> >
>> > This will not be compilable in !SMP mode because the body of 
>> > switched_from_rt()
>> > is still under CONFIG_SMP define.
>> How about also removing its body out?
>
> switched_from_rt() is necessary only in SMP mode, so I think we should
> not change anything connected with it here. It's already initialized
> properly.
pls ignore this patch, thanks.
>
>> >
>> > Kirill
>>
>>
>>
>> --
>> Regards,
>>
>> Zhi Yong Wu



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] sched, rt: move .switched_from out of the scope of CONFIG_SMP

2013-12-28 Thread Zhi Yong Wu
On Sat, Dec 28, 2013 at 5:19 PM, Kirill Tkhai  wrote:
> On Пт, дек 27, 2013 at 07:41:00 +0800, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu 
>>
>> .switched_from shouldn't be initialized in the scope of CONFIG_SMP,
>> so this patch is trying to move it out.
>>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>  kernel/sched/rt.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 1c40655..f34d41b 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -2002,9 +2002,9 @@ const struct sched_class rt_sched_class = {
>>   .pre_schedule   = pre_schedule_rt,
>>   .post_schedule  = post_schedule_rt,
>>   .task_woken = task_woken_rt,
>> - .switched_from  = switched_from_rt,
>>  #endif
>>
>> + .switched_from  = switched_from_rt,
>>   .set_curr_task  = set_curr_task_rt,
>>   .task_tick  = task_tick_rt,
>
> This will not be compilable in !SMP mode because the body of 
> switched_from_rt()
> is still under CONFIG_SMP define.
How about also removing its body out?
>
> Kirill



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] sched, rt: move .switched_from out of the scope of CONFIG_SMP

2013-12-28 Thread Zhi Yong Wu
On Sat, Dec 28, 2013 at 5:19 PM, Kirill Tkhai tk...@yandex.ru wrote:
 On Пт, дек 27, 2013 at 07:41:00 +0800, Zhi Yong Wu wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 .switched_from shouldn't be initialized in the scope of CONFIG_SMP,
 so this patch is trying to move it out.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  kernel/sched/rt.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
 index 1c40655..f34d41b 100644
 --- a/kernel/sched/rt.c
 +++ b/kernel/sched/rt.c
 @@ -2002,9 +2002,9 @@ const struct sched_class rt_sched_class = {
   .pre_schedule   = pre_schedule_rt,
   .post_schedule  = post_schedule_rt,
   .task_woken = task_woken_rt,
 - .switched_from  = switched_from_rt,
  #endif

 + .switched_from  = switched_from_rt,
   .set_curr_task  = set_curr_task_rt,
   .task_tick  = task_tick_rt,

 This will not be compilable in !SMP mode because the body of 
 switched_from_rt()
 is still under CONFIG_SMP define.
How about also removing its body out?

 Kirill



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] sched, rt: move .switched_from out of the scope of CONFIG_SMP

2013-12-28 Thread Zhi Yong Wu
On Sat, Dec 28, 2013 at 5:48 PM, Kirill Tkhai tk...@yandex.ru wrote:
 On Сб, дек 28, 2013 at 05:37:32 +0800, Zhi Yong Wu wrote:
 On Sat, Dec 28, 2013 at 5:19 PM, Kirill Tkhai tk...@yandex.ru wrote:
  On Пт, дек 27, 2013 at 07:41:00 +0800, Zhi Yong Wu wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
  .switched_from shouldn't be initialized in the scope of CONFIG_SMP,
  so this patch is trying to move it out.
 
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   kernel/sched/rt.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)
 
  diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
  index 1c40655..f34d41b 100644
  --- a/kernel/sched/rt.c
  +++ b/kernel/sched/rt.c
  @@ -2002,9 +2002,9 @@ const struct sched_class rt_sched_class = {
.pre_schedule   = pre_schedule_rt,
.post_schedule  = post_schedule_rt,
.task_woken = task_woken_rt,
  - .switched_from  = switched_from_rt,
   #endif
 
  + .switched_from  = switched_from_rt,
.set_curr_task  = set_curr_task_rt,
.task_tick  = task_tick_rt,
 
  This will not be compilable in !SMP mode because the body of 
  switched_from_rt()
  is still under CONFIG_SMP define.
 How about also removing its body out?

 switched_from_rt() is necessary only in SMP mode, so I think we should
 not change anything connected with it here. It's already initialized
 properly.
pls ignore this patch, thanks.

 
  Kirill



 --
 Regards,

 Zhi Yong Wu



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] Sched: Some trivial typo fixes

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

They were found when i review sched related src code.

Zhi Yong Wu (5):
  sched, rt: move .switched_from out of the scope of CONFIG_SMP
  sched, fair: fix the comment of move_tasks()
  sched, fair: fix the typo in select_idle_sibling()
  sched, fair: fix the comment of select_task_rq_fair()
  Documentation, sched-arch.txt: fix the incorrect syntax

 Documentation/scheduler/sched-arch.txt |2 +-
 kernel/sched/fair.c|6 +++---
 kernel/sched/rt.c  |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] sched, fair: fix the typo in select_idle_sibling()

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 kernel/sched/fair.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a82ae0a..db23d71 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4191,7 +4191,7 @@ static int select_idle_sibling(struct task_struct *p, int 
target)
return i;
 
/*
-* Otherwise, iterate the domains and find an elegible idle cpu.
+* Otherwise, iterate the domains and find an eligible idle cpu.
 */
sd = rcu_dereference(per_cpu(sd_llc, target));
for_each_lower_domain(sd) {
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] sched, fair: fix the comment of select_task_rq_fair()

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 kernel/sched/fair.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index db23d71..eaa1e91 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4218,7 +4218,7 @@ done:
 }
 
 /*
- * sched_balance_self: balance the current task (running on cpu) in domains
+ * select_task_rq_fair: balance the current task (running on cpu) in domains
  * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
  * SD_BALANCE_EXEC.
  *
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] sched, rt: move .switched_from out of the scope of CONFIG_SMP

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

.switched_from shouldn't be initialized in the scope of CONFIG_SMP,
so this patch is trying to move it out.

Signed-off-by: Zhi Yong Wu 
---
 kernel/sched/rt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 1c40655..f34d41b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2002,9 +2002,9 @@ const struct sched_class rt_sched_class = {
.pre_schedule   = pre_schedule_rt,
.post_schedule  = post_schedule_rt,
.task_woken = task_woken_rt,
-   .switched_from  = switched_from_rt,
 #endif
 
+   .switched_from  = switched_from_rt,
.set_curr_task  = set_curr_task_rt,
.task_tick  = task_tick_rt,
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] Documentation, sched-arch.txt: fix the incorrect syntax

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 Documentation/scheduler/sched-arch.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/scheduler/sched-arch.txt 
b/Documentation/scheduler/sched-arch.txt
index 9290de7..0a7d252 100644
--- a/Documentation/scheduler/sched-arch.txt
+++ b/Documentation/scheduler/sched-arch.txt
@@ -21,7 +21,7 @@ CPU idle
 
 Your cpu_idle routines need to obey the following rules:
 
-1. Preempt should now disabled over idle routines. Should only
+1. Preempt should be now disabled over idle routines. Should only
be enabled to call schedule() then disabled again.
 
 2. need_resched/TIF_NEED_RESCHED is only ever set, and will never
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] sched, fair: fix the comment of move_tasks()

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 kernel/sched/fair.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..a82ae0a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4982,7 +4982,7 @@ static const unsigned int sched_nr_migrate_break = 32;
 /*
  * move_tasks tries to move up to imbalance weighted load from busiest to
  * this_rq, as part of a balancing operation within domain "sd".
- * Returns 1 if successful and 0 otherwise.
+ * Returns the number of moved tasks if successful and 0 otherwise.
  *
  * Called with both runqueues locked.
  */
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] Sched: Some trivial typo fixes

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu 

*** BLURB HERE ***

Zhi Yong Wu (5):
  sched, rt: move .switched_from out of the scope of CONFIG_SMP
  sched, fair: fix the comment of move_tasks()
  sched, fair: fix the typo in select_idle_sibling()
  sched, fair: fix the comment of select_task_rq_fair()
  Documentation, sched-arch.txt: fix the incorrect syntax

 Documentation/scheduler/sched-arch.txt |2 +-
 kernel/sched/fair.c|6 +++---
 kernel/sched/rt.c  |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] Sched: Some trivial typo fixes

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

*** BLURB HERE ***

Zhi Yong Wu (5):
  sched, rt: move .switched_from out of the scope of CONFIG_SMP
  sched, fair: fix the comment of move_tasks()
  sched, fair: fix the typo in select_idle_sibling()
  sched, fair: fix the comment of select_task_rq_fair()
  Documentation, sched-arch.txt: fix the incorrect syntax

 Documentation/scheduler/sched-arch.txt |2 +-
 kernel/sched/fair.c|6 +++---
 kernel/sched/rt.c  |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] sched, rt: move .switched_from out of the scope of CONFIG_SMP

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

.switched_from shouldn't be initialized in the scope of CONFIG_SMP,
so this patch is trying to move it out.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 kernel/sched/rt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 1c40655..f34d41b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2002,9 +2002,9 @@ const struct sched_class rt_sched_class = {
.pre_schedule   = pre_schedule_rt,
.post_schedule  = post_schedule_rt,
.task_woken = task_woken_rt,
-   .switched_from  = switched_from_rt,
 #endif
 
+   .switched_from  = switched_from_rt,
.set_curr_task  = set_curr_task_rt,
.task_tick  = task_tick_rt,
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] Documentation, sched-arch.txt: fix the incorrect syntax

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 Documentation/scheduler/sched-arch.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/scheduler/sched-arch.txt 
b/Documentation/scheduler/sched-arch.txt
index 9290de7..0a7d252 100644
--- a/Documentation/scheduler/sched-arch.txt
+++ b/Documentation/scheduler/sched-arch.txt
@@ -21,7 +21,7 @@ CPU idle
 
 Your cpu_idle routines need to obey the following rules:
 
-1. Preempt should now disabled over idle routines. Should only
+1. Preempt should be now disabled over idle routines. Should only
be enabled to call schedule() then disabled again.
 
 2. need_resched/TIF_NEED_RESCHED is only ever set, and will never
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] sched, fair: fix the comment of move_tasks()

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 kernel/sched/fair.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..a82ae0a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4982,7 +4982,7 @@ static const unsigned int sched_nr_migrate_break = 32;
 /*
  * move_tasks tries to move up to imbalance weighted load from busiest to
  * this_rq, as part of a balancing operation within domain sd.
- * Returns 1 if successful and 0 otherwise.
+ * Returns the number of moved tasks if successful and 0 otherwise.
  *
  * Called with both runqueues locked.
  */
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] sched, fair: fix the comment of select_task_rq_fair()

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 kernel/sched/fair.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index db23d71..eaa1e91 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4218,7 +4218,7 @@ done:
 }
 
 /*
- * sched_balance_self: balance the current task (running on cpu) in domains
+ * select_task_rq_fair: balance the current task (running on cpu) in domains
  * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
  * SD_BALANCE_EXEC.
  *
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] sched, fair: fix the typo in select_idle_sibling()

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 kernel/sched/fair.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a82ae0a..db23d71 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4191,7 +4191,7 @@ static int select_idle_sibling(struct task_struct *p, int 
target)
return i;
 
/*
-* Otherwise, iterate the domains and find an elegible idle cpu.
+* Otherwise, iterate the domains and find an eligible idle cpu.
 */
sd = rcu_dereference(per_cpu(sd_llc, target));
for_each_lower_domain(sd) {
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] Sched: Some trivial typo fixes

2013-12-27 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

They were found when i review sched related src code.

Zhi Yong Wu (5):
  sched, rt: move .switched_from out of the scope of CONFIG_SMP
  sched, fair: fix the comment of move_tasks()
  sched, fair: fix the typo in select_idle_sibling()
  sched, fair: fix the comment of select_task_rq_fair()
  Documentation, sched-arch.txt: fix the incorrect syntax

 Documentation/scheduler/sched-arch.txt |2 +-
 kernel/sched/fair.c|6 +++---
 kernel/sched/rt.c  |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH stable 2/2] virtio-net: make all RX paths handle erors consistently

2013-12-25 Thread Zhi Yong Wu
typo in the subject

s/erors/errors/

On Wed, Dec 25, 2013 at 10:56 PM, Michael S. Tsirkin  wrote:
> receive mergeable now handles errors internally.
> Do same for big and small packet paths, otherwise
> the logic is too hard to follow.
>
> Cc: Jason Wang 
> Cc: David S. Miller 
> Signed-off-by: Michael S. Tsirkin 
>
> (cherry picked from commit f121159d72091f25afb22007c833e60a6845e912)
> ---
>  drivers/net/virtio_net.c | 56 
> +++-
>  1 file changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 435076f..c0ed6d5 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -297,6 +297,34 @@ static struct sk_buff *page_to_skb(struct receive_queue 
> *rq,
> return skb;
>  }
>
> +static struct sk_buff *receive_small(void *buf, unsigned int len)
> +{
> +   struct sk_buff * skb = buf;
> +
> +   len -= sizeof(struct virtio_net_hdr);
> +   skb_trim(skb, len);
> +
> +   return skb;
> +}
> +
> +static struct sk_buff *receive_big(struct net_device *dev,
> +  struct receive_queue *rq,
> +  void *buf)
> +{
> +   struct page *page = buf;
> +   struct sk_buff *skb = page_to_skb(rq, page, 0);
> +
> +   if (unlikely(!skb))
> +   goto err;
> +
> +   return skb;
> +
> +err:
> +   dev->stats.rx_dropped++;
> +   give_pages(rq, page);
> +   return NULL;
> +}
> +
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>  struct receive_queue *rq,
>  void *buf,
> @@ -360,7 +388,6 @@ static void receive_buf(struct receive_queue *rq, void 
> *buf, unsigned int len)
> struct net_device *dev = vi->dev;
> struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> struct sk_buff *skb;
> -   struct page *page;
> struct skb_vnet_hdr *hdr;
>
> if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
> @@ -372,26 +399,15 @@ static void receive_buf(struct receive_queue *rq, void 
> *buf, unsigned int len)
> dev_kfree_skb(buf);
> return;
> }
> +   if (vi->mergeable_rx_bufs)
> +   skb = receive_mergeable(dev, rq, buf, len);
> +   else if (vi->big_packets)
> +   skb = receive_big(dev, rq, buf);
> +   else
> +   skb = receive_small(buf, len);
>
> -   if (!vi->mergeable_rx_bufs && !vi->big_packets) {
> -   skb = buf;
> -   len -= sizeof(struct virtio_net_hdr);
> -   skb_trim(skb, len);
> -   } else {
> -   page = buf;
> -   if (vi->mergeable_rx_bufs) {
> -   skb = receive_mergeable(dev, rq, page, len);
> -   if (unlikely(!skb))
> -   return;
> -   } else {
> -   skb = page_to_skb(rq, page, len);
> -   if (unlikely(!skb)) {
> -   dev->stats.rx_dropped++;
> -   give_pages(rq, page);
> -   return;
> -   }
> -   }
> -   }
> +   if (unlikely(!skb))
> +   return;
>
> hdr = skb_vnet_hdr(skb);
>
> --
> MST
>
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH stable 2/2] virtio-net: make all RX paths handle erors consistently

2013-12-25 Thread Zhi Yong Wu
typo in the subject

s/erors/errors/

On Wed, Dec 25, 2013 at 10:56 PM, Michael S. Tsirkin m...@redhat.com wrote:
 receive mergeable now handles errors internally.
 Do same for big and small packet paths, otherwise
 the logic is too hard to follow.

 Cc: Jason Wang jasow...@redhat.com
 Cc: David S. Miller da...@davemloft.net
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 (cherry picked from commit f121159d72091f25afb22007c833e60a6845e912)
 ---
  drivers/net/virtio_net.c | 56 
 +++-
  1 file changed, 36 insertions(+), 20 deletions(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 435076f..c0ed6d5 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -297,6 +297,34 @@ static struct sk_buff *page_to_skb(struct receive_queue 
 *rq,
 return skb;
  }

 +static struct sk_buff *receive_small(void *buf, unsigned int len)
 +{
 +   struct sk_buff * skb = buf;
 +
 +   len -= sizeof(struct virtio_net_hdr);
 +   skb_trim(skb, len);
 +
 +   return skb;
 +}
 +
 +static struct sk_buff *receive_big(struct net_device *dev,
 +  struct receive_queue *rq,
 +  void *buf)
 +{
 +   struct page *page = buf;
 +   struct sk_buff *skb = page_to_skb(rq, page, 0);
 +
 +   if (unlikely(!skb))
 +   goto err;
 +
 +   return skb;
 +
 +err:
 +   dev-stats.rx_dropped++;
 +   give_pages(rq, page);
 +   return NULL;
 +}
 +
  static struct sk_buff *receive_mergeable(struct net_device *dev,
  struct receive_queue *rq,
  void *buf,
 @@ -360,7 +388,6 @@ static void receive_buf(struct receive_queue *rq, void 
 *buf, unsigned int len)
 struct net_device *dev = vi-dev;
 struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 struct sk_buff *skb;
 -   struct page *page;
 struct skb_vnet_hdr *hdr;

 if (unlikely(len  sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
 @@ -372,26 +399,15 @@ static void receive_buf(struct receive_queue *rq, void 
 *buf, unsigned int len)
 dev_kfree_skb(buf);
 return;
 }
 +   if (vi-mergeable_rx_bufs)
 +   skb = receive_mergeable(dev, rq, buf, len);
 +   else if (vi-big_packets)
 +   skb = receive_big(dev, rq, buf);
 +   else
 +   skb = receive_small(buf, len);

 -   if (!vi-mergeable_rx_bufs  !vi-big_packets) {
 -   skb = buf;
 -   len -= sizeof(struct virtio_net_hdr);
 -   skb_trim(skb, len);
 -   } else {
 -   page = buf;
 -   if (vi-mergeable_rx_bufs) {
 -   skb = receive_mergeable(dev, rq, page, len);
 -   if (unlikely(!skb))
 -   return;
 -   } else {
 -   skb = page_to_skb(rq, page, len);
 -   if (unlikely(!skb)) {
 -   dev-stats.rx_dropped++;
 -   give_pages(rq, page);
 -   return;
 -   }
 -   }
 -   }
 +   if (unlikely(!skb))
 +   return;

 hdr = skb_vnet_hdr(skb);

 --
 MST

 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Wed, Dec 18, 2013 at 12:58 PM, Tom Herbert  wrote:
>>> Yes , in it's current state it's broken. But maybe we can try to fix
>>> it instead of arbitrarily removing it. Please see my patches on
>>> plumbing RFS into tuntap which may start to make it useful.
>> Do you mean you patch [5/5] tun: Added support for RFS on tun flows?
>> Sorry, can you say with more details?
>
> Correct. It was RFC since I didn't have a good way to test, if you do
> please try it and see if there's any effect. We should also be able to
Interesting, i will try to dig it. Sorry, i don't understand why you
can't test. Does it require some special hardware support? or other
facilities?
> do something similar for KVM guests, either doing the flow lookup on
> each packet from the guest, or use aRFS interface from the guest
> driver for end to end RFS (more exciting prospect). We are finding
which two ends do you mean?
> that guest to driver accelerations like this (and tso, lro) are quite
Sorry, i got a bit confused, the driver here mean "virtio_net" or tuntap driver?
> important in getting virtual networking performance up.
>
>>
>>>
>>> Tom
>>>
>>>> Signed-off-by: Zhi Yong Wu 
>>>> ---
>>>>  drivers/net/tun.c |  208 
>>>> +++--
>>>>  1 files changed, 10 insertions(+), 198 deletions(-)
>>>>
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index 7c8343a..7c27fdc 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -32,12 +32,15 @@
>>>>   *
>>>>   *  Daniel Podlejski 
>>>>   *Modifications for 2.3.99-pre5 kernel.
>>>> + *
>>>> + *  Zhi Yong Wu 
>>>> + *Remove the flow cache.
>>>>   */
>>>>
>>>>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>>>
>>>>  #define DRV_NAME   "tun"
>>>> -#define DRV_VERSION"1.6"
>>>> +#define DRV_VERSION"1.7"
>>>>  #define DRV_DESCRIPTION"Universal TUN/TAP device driver"
>>>>  #define DRV_COPYRIGHT  "(C) 1999-2004 Max Krasnyansky "
>>>>
>>>> @@ -146,18 +149,6 @@ struct tun_file {
>>>> struct tun_struct *detached;
>>>>  };
>>>>
>>>> -struct tun_flow_entry {
>>>> -   struct hlist_node hash_link;
>>>> -   struct rcu_head rcu;
>>>> -   struct tun_struct *tun;
>>>> -
>>>> -   u32 rxhash;
>>>> -   int queue_index;
>>>> -   unsigned long updated;
>>>> -};
>>>> -
>>>> -#define TUN_NUM_FLOW_ENTRIES 1024
>>>> -
>>>>  /* Since the socket were moved to tun_file, to preserve the behavior of 
>>>> persist
>>>>   * device, socket filter, sndbuf and vnet header size were restore when 
>>>> the
>>>>   * file were attached to a persist device.
>>>> @@ -184,163 +175,11 @@ struct tun_struct {
>>>> int debug;
>>>>  #endif
>>>> spinlock_t lock;
>>>> -   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
>>>> -   struct timer_list flow_gc_timer;
>>>> -   unsigned long ageing_time;
>>>> unsigned int numdisabled;
>>>> struct list_head disabled;
>>>> void *security;
>>>> -   u32 flow_count;
>>>>  };
>>>>
>>>> -static inline u32 tun_hashfn(u32 rxhash)
>>>> -{
>>>> -   return rxhash & 0x3ff;
>>>> -}
>>>> -
>>>> -static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
>>>> rxhash)
>>>> -{
>>>> -   struct tun_flow_entry *e;
>>>> -
>>>> -   hlist_for_each_entry_rcu(e, head, hash_link) {
>>>> -   if (e->rxhash == rxhash)
>>>> -   return e;
>>>> -   }
>>>> -   return NULL;
>>>> -}
>>>> -
>>>> -static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
>>>> - struct hlist_head *head,
>>>> - u32 rxhash, u16 queue_index)
>>>> -{
>>>> -   struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
>>>>

Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
HI, Tom,

On Wed, Dec 18, 2013 at 12:06 PM, Tom Herbert  wrote:
> On Mon, Dec 16, 2013 at 11:26 PM, Zhi Yong Wu  wrote:
>> From: Zhi Yong Wu 
>>
>> The flow cache is an extremely broken concept, and it usually brings up
>> growth issues and DoS attacks, so this patch is trying to remove it from
>> the tuntap driver, and insteadly use a simpler way for its flow control.
>>
> Yes , in it's current state it's broken. But maybe we can try to fix
> it instead of arbitrarily removing it. Please see my patches on
> plumbing RFS into tuntap which may start to make it useful.
Do you mean you patch [5/5] tun: Added support for RFS on tun flows?
Sorry, can you say with more details?

>
> Tom
>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>  drivers/net/tun.c |  208 
>> +++--
>>  1 files changed, 10 insertions(+), 198 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 7c8343a..7c27fdc 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -32,12 +32,15 @@
>>   *
>>   *  Daniel Podlejski 
>>   *Modifications for 2.3.99-pre5 kernel.
>> + *
>> + *  Zhi Yong Wu 
>> + *Remove the flow cache.
>>   */
>>
>>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>
>>  #define DRV_NAME   "tun"
>> -#define DRV_VERSION"1.6"
>> +#define DRV_VERSION"1.7"
>>  #define DRV_DESCRIPTION"Universal TUN/TAP device driver"
>>  #define DRV_COPYRIGHT  "(C) 1999-2004 Max Krasnyansky "
>>
>> @@ -146,18 +149,6 @@ struct tun_file {
>> struct tun_struct *detached;
>>  };
>>
>> -struct tun_flow_entry {
>> -   struct hlist_node hash_link;
>> -   struct rcu_head rcu;
>> -   struct tun_struct *tun;
>> -
>> -   u32 rxhash;
>> -   int queue_index;
>> -   unsigned long updated;
>> -};
>> -
>> -#define TUN_NUM_FLOW_ENTRIES 1024
>> -
>>  /* Since the socket were moved to tun_file, to preserve the behavior of 
>> persist
>>   * device, socket filter, sndbuf and vnet header size were restore when the
>>   * file were attached to a persist device.
>> @@ -184,163 +175,11 @@ struct tun_struct {
>> int debug;
>>  #endif
>> spinlock_t lock;
>> -   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
>> -   struct timer_list flow_gc_timer;
>> -   unsigned long ageing_time;
>> unsigned int numdisabled;
>> struct list_head disabled;
>> void *security;
>> -   u32 flow_count;
>>  };
>>
>> -static inline u32 tun_hashfn(u32 rxhash)
>> -{
>> -   return rxhash & 0x3ff;
>> -}
>> -
>> -static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
>> rxhash)
>> -{
>> -   struct tun_flow_entry *e;
>> -
>> -   hlist_for_each_entry_rcu(e, head, hash_link) {
>> -   if (e->rxhash == rxhash)
>> -   return e;
>> -   }
>> -   return NULL;
>> -}
>> -
>> -static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
>> - struct hlist_head *head,
>> - u32 rxhash, u16 queue_index)
>> -{
>> -   struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
>> -
>> -   if (e) {
>> -   tun_debug(KERN_INFO, tun, "create flow: hash %u index %u\n",
>> - rxhash, queue_index);
>> -   e->updated = jiffies;
>> -   e->rxhash = rxhash;
>> -   e->queue_index = queue_index;
>> -   e->tun = tun;
>> -   hlist_add_head_rcu(>hash_link, head);
>> -   ++tun->flow_count;
>> -   }
>> -   return e;
>> -}
>> -
>> -static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry 
>> *e)
>> -{
>> -   tun_debug(KERN_INFO, tun, "delete flow: hash %u index %u\n",
>> - e->rxhash, e->queue_index);
>> -   hlist_del_rcu(>hash_link);
>> -   kfree_rcu(e, rcu);
>> -   --tun->flow_count;
>> -}
>> -
>> -static void tun_flow_flush(struct tun_struct *tun)
>> -{
>> -   int i;
>> -
>> -   spin_lock_bh(>lock);
>> -   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
>>

Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Tue, Dec 17, 2013 at 6:05 PM, Jason Wang  wrote:
> On 12/17/2013 05:13 PM, Zhi Yong Wu wrote:
>> On Tue, Dec 17, 2013 at 4:49 PM, Jason Wang  wrote:
>>> > On 12/17/2013 03:26 PM, Zhi Yong Wu wrote:
>>>> >> From: Zhi Yong Wu 
>>>> >>
>>>> >> The flow cache is an extremely broken concept, and it usually brings up
>>>> >> growth issues and DoS attacks, so this patch is trying to remove it from
>>>> >> the tuntap driver, and insteadly use a simpler way for its flow control.
>>> >
>>> > NACK.
>>> >
>>> > This single function revert does not make sense to me. Since:
>> IIRC, the tuntap flow cache is only used to save the mapping of skb
>> packet <-> queue index. My idea only save the queue index in skb_buff
>> early when skb buffer is filled, not in flow cache as the current
>> code. This method is actually more simpler and completely doesn't need
>> any flow cache.
>
> Nope. Flow caches record the flow to queues mapping like what most
> multiqueue nic does. The only difference is tun record it silently while
> most nic needs driver to tell the mapping.
Just check virtio specs, i seem to miss the fact that flow cache
enable packet steering in mq mode, thanks for your comments. But i
have some concerns about some of your comments.
>
> What your patch does is:
> - set the queue mapping of skb during tun_get_user(). But most drivers
> using XPS or processor id to select the real tx queue. So the real txq
> depends on the cpu that vhost or qemu is running. This setting does not
Doesn't those drivers invoke netdev_pick_tx() or its counterpart to
select real tx queue? e.g. tun_select_queue(). or can you say it with
an example?
Moreover, how do those drivers know which cpu vhost or qemu is running on?
> have any effect in fact.
> - the queue mapping of skb were fetched during tun_select_queue(). This
> value is usually set by a multiqueue nic to record which hardware rxq
> was this packet came.
ah? Can you let me know where a mq nic controller set it?
>
> Can you explain how your patch works exactly?
You have understood it.
>>> >
>>> > - You in fact removes the flow steering function in tun. We definitely
>>> > need something like this to unbreak the TCP performance in a multiqueue
>> I don't think it will downgrade the TCP perf even in mq guest, but my
>> idea maybe has better TCP perf, because it doesn't have any cache
>> table lookup, etc.
>
> Did you test and compare the performance numbers? Did you run profiler
> to see how much does the lookup cost?
No, As i jus said above, i miss that flow cache can enable packet
steering. But Did you do related perf testing? To be honest, i am
wondering how much perf the packet steering can improve. Actually it
also injects a lot of cache lookup cost.
>>> > guest. Please have a look at the virtio-net driver / virtio sepc for
>>> > more information.
>>> > - The total number of flow caches were limited to 4096, so there's no
>>> > DoS or growth issue.
>> Can you check why the ipv4 routing cache is removed? maybe i miss
>> something, if yes, pls correct me. :)
>
> The main differences is that the flow caches were best effort. Tun can
> not store all flows to queue mapping, and even a hardware nic can not do
> this. If a packet misses the flow cache, it's safe to distribute it
> randomly or through another method. So the limitation just work.
Exactly, we can know this from tun_select_queue().
>
> Could you please explain the DoS or growth issue you meet here?
>>> > - The only issue is scalability, but fixing this is not easy. We can
>>> > just use arrays/indirection table like RSS instead of hash buckets, it
>>> > saves some time in linear search but has other issues like collision
>>> > - I've also had a RFC of using aRFS in the past, it also has several
>>> > drawbacks such as busy looping in the networking hotspot.
>>> >
>>> > So in conclusion, we need flow steering in tun, just removing current
>>> > method does not help. The proper way is to expose several different
>>> > methods to user and let user to choose the preferable mechanism like
>>> > packet.
>> By the way, let us look at what other networking guys think of this,
>> such as MST, dave, etc. :)
>>
>
> Of course.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Tue, Dec 17, 2013 at 4:49 PM, Jason Wang  wrote:
> On 12/17/2013 03:26 PM, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu 
>>
>> The flow cache is an extremely broken concept, and it usually brings up
>> growth issues and DoS attacks, so this patch is trying to remove it from
>> the tuntap driver, and insteadly use a simpler way for its flow control.
>
> NACK.
>
> This single function revert does not make sense to me. Since:
IIRC, the tuntap flow cache is only used to save the mapping of skb
packet <-> queue index. My idea only save the queue index in skb_buff
early when skb buffer is filled, not in flow cache as the current
code. This method is actually more simpler and completely doesn't need
any flow cache.

>
> - You in fact removes the flow steering function in tun. We definitely
> need something like this to unbreak the TCP performance in a multiqueue
I don't think it will downgrade the TCP perf even in mq guest, but my
idea maybe has better TCP perf, because it doesn't have any cache
table lookup, etc.
> guest. Please have a look at the virtio-net driver / virtio sepc for
> more information.
> - The total number of flow caches were limited to 4096, so there's no
> DoS or growth issue.
Can you check why the ipv4 routing cache is removed? maybe i miss
something, if yes, pls correct me. :)
> - The only issue is scalability, but fixing this is not easy. We can
> just use arrays/indirection table like RSS instead of hash buckets, it
> saves some time in linear search but has other issues like collision
> - I've also had a RFC of using aRFS in the past, it also has several
> drawbacks such as busy looping in the networking hotspot.
>
> So in conclusion, we need flow steering in tun, just removing current
> method does not help. The proper way is to expose several different
> methods to user and let user to choose the preferable mechanism like
> packet.
By the way, let us look at what other networking guys think of this,
such as MST, dave, etc. :)

>
>>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>  drivers/net/tun.c |  208 
>> +++--
>>  1 files changed, 10 insertions(+), 198 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 7c8343a..7c27fdc 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -32,12 +32,15 @@
>>   *
>>   *  Daniel Podlejski 
>>   *Modifications for 2.3.99-pre5 kernel.
>> + *
>> + *  Zhi Yong Wu 
>> + *Remove the flow cache.
>>   */
>>
>>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>
>>  #define DRV_NAME "tun"
>> -#define DRV_VERSION  "1.6"
>> +#define DRV_VERSION  "1.7"
>>  #define DRV_DESCRIPTION  "Universal TUN/TAP device driver"
>>  #define DRV_COPYRIGHT"(C) 1999-2004 Max Krasnyansky 
>> "
>>
>> @@ -146,18 +149,6 @@ struct tun_file {
>>   struct tun_struct *detached;
>>  };
>>
>> -struct tun_flow_entry {
>> - struct hlist_node hash_link;
>> - struct rcu_head rcu;
>> - struct tun_struct *tun;
>> -
>> - u32 rxhash;
>> - int queue_index;
>> - unsigned long updated;
>> -};
>> -
>> -#define TUN_NUM_FLOW_ENTRIES 1024
>> -
>>  /* Since the socket were moved to tun_file, to preserve the behavior of 
>> persist
>>   * device, socket filter, sndbuf and vnet header size were restore when the
>>   * file were attached to a persist device.
>> @@ -184,163 +175,11 @@ struct tun_struct {
>>   int debug;
>>  #endif
>>   spinlock_t lock;
>> - struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
>> - struct timer_list flow_gc_timer;
>> - unsigned long ageing_time;
>>   unsigned int numdisabled;
>>   struct list_head disabled;
>>   void *security;
>> - u32 flow_count;
>>  };
>>
>> -static inline u32 tun_hashfn(u32 rxhash)
>> -{
>> - return rxhash & 0x3ff;
>> -}
>> -
>> -static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
>> rxhash)
>> -{
>> - struct tun_flow_entry *e;
>> -
>> - hlist_for_each_entry_rcu(e, head, hash_link) {
>> - if (e->rxhash == rxhash)
>> - return e;
>> - }
>> - return NULL;
>> -}
>> -
>> -static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
>> -   struct hlist_head *head,
>> -   u32 rxhash, u16 queue_in

Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Mon, 2013-12-16 at 23:47 -0800, Stephen Hemminger wrote:
> On Tue, 17 Dec 2013 15:26:22 +0800
> Zhi Yong Wu  wrote:
> 
> > From: Zhi Yong Wu 
> > 
> > The flow cache is an extremely broken concept, and it usually brings up
> > growth issues and DoS attacks, so this patch is trying to remove it from
> > the tuntap driver, and insteadly use a simpler way for its flow control.
> > 
> > Signed-off-by: Zhi Yong Wu 
> > ---
> >  drivers/net/tun.c |  208 
> > +++--
> >  1 files changed, 10 insertions(+), 198 deletions(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 7c8343a..7c27fdc 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -32,12 +32,15 @@
> >   *
> >   *  Daniel Podlejski 
> >   *Modifications for 2.3.99-pre5 kernel.
> > + *
> > + *  Zhi Yong Wu 
> > + *Remove the flow cache.
> >   */
> 
> I agree with your patch, but please don't add to the comment changelog.
> These are all historical. The kernel development process has not used
> them for 5+ years.
> 
> Can we get kernel janitors to just remove them, or would that step
> on too many early developers toes by removing credit?
I thought that it is a big code change, and need to add some changelog
for this, but you seem to have a big argue. :) I don't object to
removing my comment in its changelog if other guys also agree with you.


> 

-- 
Regards,

Zhi Yong Wu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Mon, 2013-12-16 at 23:47 -0800, Stephen Hemminger wrote:
 On Tue, 17 Dec 2013 15:26:22 +0800
 Zhi Yong Wu zwu.ker...@gmail.com wrote:
 
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
  
  The flow cache is an extremely broken concept, and it usually brings up
  growth issues and DoS attacks, so this patch is trying to remove it from
  the tuntap driver, and insteadly use a simpler way for its flow control.
  
  Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
  ---
   drivers/net/tun.c |  208 
  +++--
   1 files changed, 10 insertions(+), 198 deletions(-)
  
  diff --git a/drivers/net/tun.c b/drivers/net/tun.c
  index 7c8343a..7c27fdc 100644
  --- a/drivers/net/tun.c
  +++ b/drivers/net/tun.c
  @@ -32,12 +32,15 @@
*
*  Daniel Podlejski under...@underley.eu.org
*Modifications for 2.3.99-pre5 kernel.
  + *
  + *  Zhi Yong Wu wu...@linux.vnet.ibm.com
  + *Remove the flow cache.
*/
 
 I agree with your patch, but please don't add to the comment changelog.
 These are all historical. The kernel development process has not used
 them for 5+ years.
 
 Can we get kernel janitors to just remove them, or would that step
 on too many early developers toes by removing credit?
I thought that it is a big code change, and need to add some changelog
for this, but you seem to have a big argue. :) I don't object to
removing my comment in its changelog if other guys also agree with you.


 

-- 
Regards,

Zhi Yong Wu

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Tue, Dec 17, 2013 at 4:49 PM, Jason Wang jasow...@redhat.com wrote:
 On 12/17/2013 03:26 PM, Zhi Yong Wu wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 The flow cache is an extremely broken concept, and it usually brings up
 growth issues and DoS attacks, so this patch is trying to remove it from
 the tuntap driver, and insteadly use a simpler way for its flow control.

 NACK.

 This single function revert does not make sense to me. Since:
IIRC, the tuntap flow cache is only used to save the mapping of skb
packet - queue index. My idea only save the queue index in skb_buff
early when skb buffer is filled, not in flow cache as the current
code. This method is actually more simpler and completely doesn't need
any flow cache.


 - You in fact removes the flow steering function in tun. We definitely
 need something like this to unbreak the TCP performance in a multiqueue
I don't think it will downgrade the TCP perf even in mq guest, but my
idea maybe has better TCP perf, because it doesn't have any cache
table lookup, etc.
 guest. Please have a look at the virtio-net driver / virtio sepc for
 more information.
 - The total number of flow caches were limited to 4096, so there's no
 DoS or growth issue.
Can you check why the ipv4 routing cache is removed? maybe i miss
something, if yes, pls correct me. :)
 - The only issue is scalability, but fixing this is not easy. We can
 just use arrays/indirection table like RSS instead of hash buckets, it
 saves some time in linear search but has other issues like collision
 - I've also had a RFC of using aRFS in the past, it also has several
 drawbacks such as busy looping in the networking hotspot.

 So in conclusion, we need flow steering in tun, just removing current
 method does not help. The proper way is to expose several different
 methods to user and let user to choose the preferable mechanism like
 packet.
By the way, let us look at what other networking guys think of this,
such as MST, dave, etc. :)



 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  drivers/net/tun.c |  208 
 +++--
  1 files changed, 10 insertions(+), 198 deletions(-)

 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 7c8343a..7c27fdc 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -32,12 +32,15 @@
   *
   *  Daniel Podlejski under...@underley.eu.org
   *Modifications for 2.3.99-pre5 kernel.
 + *
 + *  Zhi Yong Wu wu...@linux.vnet.ibm.com
 + *Remove the flow cache.
   */

  #define pr_fmt(fmt) KBUILD_MODNAME :  fmt

  #define DRV_NAME tun
 -#define DRV_VERSION  1.6
 +#define DRV_VERSION  1.7
  #define DRV_DESCRIPTION  Universal TUN/TAP device driver
  #define DRV_COPYRIGHT(C) 1999-2004 Max Krasnyansky 
 m...@qualcomm.com

 @@ -146,18 +149,6 @@ struct tun_file {
   struct tun_struct *detached;
  };

 -struct tun_flow_entry {
 - struct hlist_node hash_link;
 - struct rcu_head rcu;
 - struct tun_struct *tun;
 -
 - u32 rxhash;
 - int queue_index;
 - unsigned long updated;
 -};
 -
 -#define TUN_NUM_FLOW_ENTRIES 1024
 -
  /* Since the socket were moved to tun_file, to preserve the behavior of 
 persist
   * device, socket filter, sndbuf and vnet header size were restore when the
   * file were attached to a persist device.
 @@ -184,163 +175,11 @@ struct tun_struct {
   int debug;
  #endif
   spinlock_t lock;
 - struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
 - struct timer_list flow_gc_timer;
 - unsigned long ageing_time;
   unsigned int numdisabled;
   struct list_head disabled;
   void *security;
 - u32 flow_count;
  };

 -static inline u32 tun_hashfn(u32 rxhash)
 -{
 - return rxhash  0x3ff;
 -}
 -
 -static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
 rxhash)
 -{
 - struct tun_flow_entry *e;
 -
 - hlist_for_each_entry_rcu(e, head, hash_link) {
 - if (e-rxhash == rxhash)
 - return e;
 - }
 - return NULL;
 -}
 -
 -static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 -   struct hlist_head *head,
 -   u32 rxhash, u16 queue_index)
 -{
 - struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
 -
 - if (e) {
 - tun_debug(KERN_INFO, tun, create flow: hash %u index %u\n,
 -   rxhash, queue_index);
 - e-updated = jiffies;
 - e-rxhash = rxhash;
 - e-queue_index = queue_index;
 - e-tun = tun;
 - hlist_add_head_rcu(e-hash_link, head);
 - ++tun-flow_count;
 - }
 - return e;
 -}
 -
 -static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry 
 *e)
 -{
 - tun_debug(KERN_INFO, tun, delete flow: hash %u index %u\n,
 -   e-rxhash, e-queue_index);
 - hlist_del_rcu(e-hash_link);
 - kfree_rcu(e, rcu);
 - --tun

Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Tue, Dec 17, 2013 at 6:05 PM, Jason Wang jasow...@redhat.com wrote:
 On 12/17/2013 05:13 PM, Zhi Yong Wu wrote:
 On Tue, Dec 17, 2013 at 4:49 PM, Jason Wang jasow...@redhat.com wrote:
  On 12/17/2013 03:26 PM, Zhi Yong Wu wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
  The flow cache is an extremely broken concept, and it usually brings up
  growth issues and DoS attacks, so this patch is trying to remove it from
  the tuntap driver, and insteadly use a simpler way for its flow control.
 
  NACK.
 
  This single function revert does not make sense to me. Since:
 IIRC, the tuntap flow cache is only used to save the mapping of skb
 packet - queue index. My idea only save the queue index in skb_buff
 early when skb buffer is filled, not in flow cache as the current
 code. This method is actually more simpler and completely doesn't need
 any flow cache.

 Nope. Flow caches record the flow to queues mapping like what most
 multiqueue nic does. The only difference is tun record it silently while
 most nic needs driver to tell the mapping.
Just check virtio specs, i seem to miss the fact that flow cache
enable packet steering in mq mode, thanks for your comments. But i
have some concerns about some of your comments.

 What your patch does is:
 - set the queue mapping of skb during tun_get_user(). But most drivers
 using XPS or processor id to select the real tx queue. So the real txq
 depends on the cpu that vhost or qemu is running. This setting does not
Doesn't those drivers invoke netdev_pick_tx() or its counterpart to
select real tx queue? e.g. tun_select_queue(). or can you say it with
an example?
Moreover, how do those drivers know which cpu vhost or qemu is running on?
 have any effect in fact.
 - the queue mapping of skb were fetched during tun_select_queue(). This
 value is usually set by a multiqueue nic to record which hardware rxq
 was this packet came.
ah? Can you let me know where a mq nic controller set it?

 Can you explain how your patch works exactly?
You have understood it.
 
  - You in fact removes the flow steering function in tun. We definitely
  need something like this to unbreak the TCP performance in a multiqueue
 I don't think it will downgrade the TCP perf even in mq guest, but my
 idea maybe has better TCP perf, because it doesn't have any cache
 table lookup, etc.

 Did you test and compare the performance numbers? Did you run profiler
 to see how much does the lookup cost?
No, As i jus said above, i miss that flow cache can enable packet
steering. But Did you do related perf testing? To be honest, i am
wondering how much perf the packet steering can improve. Actually it
also injects a lot of cache lookup cost.
  guest. Please have a look at the virtio-net driver / virtio sepc for
  more information.
  - The total number of flow caches were limited to 4096, so there's no
  DoS or growth issue.
 Can you check why the ipv4 routing cache is removed? maybe i miss
 something, if yes, pls correct me. :)

 The main differences is that the flow caches were best effort. Tun can
 not store all flows to queue mapping, and even a hardware nic can not do
 this. If a packet misses the flow cache, it's safe to distribute it
 randomly or through another method. So the limitation just work.
Exactly, we can know this from tun_select_queue().

 Could you please explain the DoS or growth issue you meet here?
  - The only issue is scalability, but fixing this is not easy. We can
  just use arrays/indirection table like RSS instead of hash buckets, it
  saves some time in linear search but has other issues like collision
  - I've also had a RFC of using aRFS in the past, it also has several
  drawbacks such as busy looping in the networking hotspot.
 
  So in conclusion, we need flow steering in tun, just removing current
  method does not help. The proper way is to expose several different
  methods to user and let user to choose the preferable mechanism like
  packet.
 By the way, let us look at what other networking guys think of this,
 such as MST, dave, etc. :)


 Of course.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
HI, Tom,

On Wed, Dec 18, 2013 at 12:06 PM, Tom Herbert therb...@google.com wrote:
 On Mon, Dec 16, 2013 at 11:26 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 The flow cache is an extremely broken concept, and it usually brings up
 growth issues and DoS attacks, so this patch is trying to remove it from
 the tuntap driver, and insteadly use a simpler way for its flow control.

 Yes , in it's current state it's broken. But maybe we can try to fix
 it instead of arbitrarily removing it. Please see my patches on
 plumbing RFS into tuntap which may start to make it useful.
Do you mean you patch [5/5] tun: Added support for RFS on tun flows?
Sorry, can you say with more details?


 Tom

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  drivers/net/tun.c |  208 
 +++--
  1 files changed, 10 insertions(+), 198 deletions(-)

 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 7c8343a..7c27fdc 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -32,12 +32,15 @@
   *
   *  Daniel Podlejski under...@underley.eu.org
   *Modifications for 2.3.99-pre5 kernel.
 + *
 + *  Zhi Yong Wu wu...@linux.vnet.ibm.com
 + *Remove the flow cache.
   */

  #define pr_fmt(fmt) KBUILD_MODNAME :  fmt

  #define DRV_NAME   tun
 -#define DRV_VERSION1.6
 +#define DRV_VERSION1.7
  #define DRV_DESCRIPTIONUniversal TUN/TAP device driver
  #define DRV_COPYRIGHT  (C) 1999-2004 Max Krasnyansky m...@qualcomm.com

 @@ -146,18 +149,6 @@ struct tun_file {
 struct tun_struct *detached;
  };

 -struct tun_flow_entry {
 -   struct hlist_node hash_link;
 -   struct rcu_head rcu;
 -   struct tun_struct *tun;
 -
 -   u32 rxhash;
 -   int queue_index;
 -   unsigned long updated;
 -};
 -
 -#define TUN_NUM_FLOW_ENTRIES 1024
 -
  /* Since the socket were moved to tun_file, to preserve the behavior of 
 persist
   * device, socket filter, sndbuf and vnet header size were restore when the
   * file were attached to a persist device.
 @@ -184,163 +175,11 @@ struct tun_struct {
 int debug;
  #endif
 spinlock_t lock;
 -   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
 -   struct timer_list flow_gc_timer;
 -   unsigned long ageing_time;
 unsigned int numdisabled;
 struct list_head disabled;
 void *security;
 -   u32 flow_count;
  };

 -static inline u32 tun_hashfn(u32 rxhash)
 -{
 -   return rxhash  0x3ff;
 -}
 -
 -static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
 rxhash)
 -{
 -   struct tun_flow_entry *e;
 -
 -   hlist_for_each_entry_rcu(e, head, hash_link) {
 -   if (e-rxhash == rxhash)
 -   return e;
 -   }
 -   return NULL;
 -}
 -
 -static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 - struct hlist_head *head,
 - u32 rxhash, u16 queue_index)
 -{
 -   struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
 -
 -   if (e) {
 -   tun_debug(KERN_INFO, tun, create flow: hash %u index %u\n,
 - rxhash, queue_index);
 -   e-updated = jiffies;
 -   e-rxhash = rxhash;
 -   e-queue_index = queue_index;
 -   e-tun = tun;
 -   hlist_add_head_rcu(e-hash_link, head);
 -   ++tun-flow_count;
 -   }
 -   return e;
 -}
 -
 -static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry 
 *e)
 -{
 -   tun_debug(KERN_INFO, tun, delete flow: hash %u index %u\n,
 - e-rxhash, e-queue_index);
 -   hlist_del_rcu(e-hash_link);
 -   kfree_rcu(e, rcu);
 -   --tun-flow_count;
 -}
 -
 -static void tun_flow_flush(struct tun_struct *tun)
 -{
 -   int i;
 -
 -   spin_lock_bh(tun-lock);
 -   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
 -   struct tun_flow_entry *e;
 -   struct hlist_node *n;
 -
 -   hlist_for_each_entry_safe(e, n, tun-flows[i], hash_link)
 -   tun_flow_delete(tun, e);
 -   }
 -   spin_unlock_bh(tun-lock);
 -}
 -
 -static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 
 queue_index)
 -{
 -   int i;
 -
 -   spin_lock_bh(tun-lock);
 -   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
 -   struct tun_flow_entry *e;
 -   struct hlist_node *n;
 -
 -   hlist_for_each_entry_safe(e, n, tun-flows[i], hash_link) {
 -   if (e-queue_index == queue_index)
 -   tun_flow_delete(tun, e);
 -   }
 -   }
 -   spin_unlock_bh(tun-lock);
 -}
 -
 -static void tun_flow_cleanup(unsigned long data)
 -{
 -   struct tun_struct *tun = (struct tun_struct *)data;
 -   unsigned long delay = tun-ageing_time;
 -   unsigned

Re: [RFC PATCH] net, tun: remove the flow cache

2013-12-17 Thread Zhi Yong Wu
On Wed, Dec 18, 2013 at 12:58 PM, Tom Herbert therb...@google.com wrote:
 Yes , in it's current state it's broken. But maybe we can try to fix
 it instead of arbitrarily removing it. Please see my patches on
 plumbing RFS into tuntap which may start to make it useful.
 Do you mean you patch [5/5] tun: Added support for RFS on tun flows?
 Sorry, can you say with more details?

 Correct. It was RFC since I didn't have a good way to test, if you do
 please try it and see if there's any effect. We should also be able to
Interesting, i will try to dig it. Sorry, i don't understand why you
can't test. Does it require some special hardware support? or other
facilities?
 do something similar for KVM guests, either doing the flow lookup on
 each packet from the guest, or use aRFS interface from the guest
 driver for end to end RFS (more exciting prospect). We are finding
which two ends do you mean?
 that guest to driver accelerations like this (and tso, lro) are quite
Sorry, i got a bit confused, the driver here mean virtio_net or tuntap driver?
 important in getting virtual networking performance up.



 Tom

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  drivers/net/tun.c |  208 
 +++--
  1 files changed, 10 insertions(+), 198 deletions(-)

 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 7c8343a..7c27fdc 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -32,12 +32,15 @@
   *
   *  Daniel Podlejski under...@underley.eu.org
   *Modifications for 2.3.99-pre5 kernel.
 + *
 + *  Zhi Yong Wu wu...@linux.vnet.ibm.com
 + *Remove the flow cache.
   */

  #define pr_fmt(fmt) KBUILD_MODNAME :  fmt

  #define DRV_NAME   tun
 -#define DRV_VERSION1.6
 +#define DRV_VERSION1.7
  #define DRV_DESCRIPTIONUniversal TUN/TAP device driver
  #define DRV_COPYRIGHT  (C) 1999-2004 Max Krasnyansky m...@qualcomm.com

 @@ -146,18 +149,6 @@ struct tun_file {
 struct tun_struct *detached;
  };

 -struct tun_flow_entry {
 -   struct hlist_node hash_link;
 -   struct rcu_head rcu;
 -   struct tun_struct *tun;
 -
 -   u32 rxhash;
 -   int queue_index;
 -   unsigned long updated;
 -};
 -
 -#define TUN_NUM_FLOW_ENTRIES 1024
 -
  /* Since the socket were moved to tun_file, to preserve the behavior of 
 persist
   * device, socket filter, sndbuf and vnet header size were restore when 
 the
   * file were attached to a persist device.
 @@ -184,163 +175,11 @@ struct tun_struct {
 int debug;
  #endif
 spinlock_t lock;
 -   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
 -   struct timer_list flow_gc_timer;
 -   unsigned long ageing_time;
 unsigned int numdisabled;
 struct list_head disabled;
 void *security;
 -   u32 flow_count;
  };

 -static inline u32 tun_hashfn(u32 rxhash)
 -{
 -   return rxhash  0x3ff;
 -}
 -
 -static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
 rxhash)
 -{
 -   struct tun_flow_entry *e;
 -
 -   hlist_for_each_entry_rcu(e, head, hash_link) {
 -   if (e-rxhash == rxhash)
 -   return e;
 -   }
 -   return NULL;
 -}
 -
 -static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 - struct hlist_head *head,
 - u32 rxhash, u16 queue_index)
 -{
 -   struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
 -
 -   if (e) {
 -   tun_debug(KERN_INFO, tun, create flow: hash %u index 
 %u\n,
 - rxhash, queue_index);
 -   e-updated = jiffies;
 -   e-rxhash = rxhash;
 -   e-queue_index = queue_index;
 -   e-tun = tun;
 -   hlist_add_head_rcu(e-hash_link, head);
 -   ++tun-flow_count;
 -   }
 -   return e;
 -}
 -
 -static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry 
 *e)
 -{
 -   tun_debug(KERN_INFO, tun, delete flow: hash %u index %u\n,
 - e-rxhash, e-queue_index);
 -   hlist_del_rcu(e-hash_link);
 -   kfree_rcu(e, rcu);
 -   --tun-flow_count;
 -}
 -
 -static void tun_flow_flush(struct tun_struct *tun)
 -{
 -   int i;
 -
 -   spin_lock_bh(tun-lock);
 -   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
 -   struct tun_flow_entry *e;
 -   struct hlist_node *n;
 -
 -   hlist_for_each_entry_safe(e, n, tun-flows[i], hash_link)
 -   tun_flow_delete(tun, e);
 -   }
 -   spin_unlock_bh(tun-lock);
 -}
 -
 -static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 
 queue_index)
 -{
 -   int i;
 -
 -   spin_lock_bh(tun-lock);
 -   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
 -   struct tun_flow_entry *e;
 -   struct hlist_node *n;
 -
 -   hlist_for_each_entry_safe(e, n, tun-flows[i

[RFC PATCH] net, tun: remove the flow cache

2013-12-16 Thread Zhi Yong Wu
From: Zhi Yong Wu 

The flow cache is an extremely broken concept, and it usually brings up
growth issues and DoS attacks, so this patch is trying to remove it from
the tuntap driver, and insteadly use a simpler way for its flow control.

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/tun.c |  208 +++--
 1 files changed, 10 insertions(+), 198 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7c8343a..7c27fdc 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -32,12 +32,15 @@
  *
  *  Daniel Podlejski 
  *Modifications for 2.3.99-pre5 kernel.
+ *
+ *  Zhi Yong Wu 
+ *Remove the flow cache.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #define DRV_NAME   "tun"
-#define DRV_VERSION"1.6"
+#define DRV_VERSION"1.7"
 #define DRV_DESCRIPTION"Universal TUN/TAP device driver"
 #define DRV_COPYRIGHT  "(C) 1999-2004 Max Krasnyansky "
 
@@ -146,18 +149,6 @@ struct tun_file {
struct tun_struct *detached;
 };
 
-struct tun_flow_entry {
-   struct hlist_node hash_link;
-   struct rcu_head rcu;
-   struct tun_struct *tun;
-
-   u32 rxhash;
-   int queue_index;
-   unsigned long updated;
-};
-
-#define TUN_NUM_FLOW_ENTRIES 1024
-
 /* Since the socket were moved to tun_file, to preserve the behavior of persist
  * device, socket filter, sndbuf and vnet header size were restore when the
  * file were attached to a persist device.
@@ -184,163 +175,11 @@ struct tun_struct {
int debug;
 #endif
spinlock_t lock;
-   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
-   struct timer_list flow_gc_timer;
-   unsigned long ageing_time;
unsigned int numdisabled;
struct list_head disabled;
void *security;
-   u32 flow_count;
 };
 
-static inline u32 tun_hashfn(u32 rxhash)
-{
-   return rxhash & 0x3ff;
-}
-
-static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
rxhash)
-{
-   struct tun_flow_entry *e;
-
-   hlist_for_each_entry_rcu(e, head, hash_link) {
-   if (e->rxhash == rxhash)
-   return e;
-   }
-   return NULL;
-}
-
-static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
- struct hlist_head *head,
- u32 rxhash, u16 queue_index)
-{
-   struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
-
-   if (e) {
-   tun_debug(KERN_INFO, tun, "create flow: hash %u index %u\n",
- rxhash, queue_index);
-   e->updated = jiffies;
-   e->rxhash = rxhash;
-   e->queue_index = queue_index;
-   e->tun = tun;
-   hlist_add_head_rcu(>hash_link, head);
-   ++tun->flow_count;
-   }
-   return e;
-}
-
-static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
-{
-   tun_debug(KERN_INFO, tun, "delete flow: hash %u index %u\n",
- e->rxhash, e->queue_index);
-   hlist_del_rcu(>hash_link);
-   kfree_rcu(e, rcu);
-   --tun->flow_count;
-}
-
-static void tun_flow_flush(struct tun_struct *tun)
-{
-   int i;
-
-   spin_lock_bh(>lock);
-   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
-   struct tun_flow_entry *e;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_safe(e, n, >flows[i], hash_link)
-   tun_flow_delete(tun, e);
-   }
-   spin_unlock_bh(>lock);
-}
-
-static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 queue_index)
-{
-   int i;
-
-   spin_lock_bh(>lock);
-   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
-   struct tun_flow_entry *e;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_safe(e, n, >flows[i], hash_link) {
-   if (e->queue_index == queue_index)
-   tun_flow_delete(tun, e);
-   }
-   }
-   spin_unlock_bh(>lock);
-}
-
-static void tun_flow_cleanup(unsigned long data)
-{
-   struct tun_struct *tun = (struct tun_struct *)data;
-   unsigned long delay = tun->ageing_time;
-   unsigned long next_timer = jiffies + delay;
-   unsigned long count = 0;
-   int i;
-
-   tun_debug(KERN_INFO, tun, "tun_flow_cleanup\n");
-
-   spin_lock_bh(>lock);
-   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
-   struct tun_flow_entry *e;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_safe(e, n, >flows[i], hash_link) {
-   unsigned long this_timer;
-   count++;
-   this_timer = e->updated + delay;
-   

[RFC PATCH] net, tun: remove the flow cache

2013-12-16 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

The flow cache is an extremely broken concept, and it usually brings up
growth issues and DoS attacks, so this patch is trying to remove it from
the tuntap driver, and insteadly use a simpler way for its flow control.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/tun.c |  208 +++--
 1 files changed, 10 insertions(+), 198 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7c8343a..7c27fdc 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -32,12 +32,15 @@
  *
  *  Daniel Podlejski under...@underley.eu.org
  *Modifications for 2.3.99-pre5 kernel.
+ *
+ *  Zhi Yong Wu wu...@linux.vnet.ibm.com
+ *Remove the flow cache.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 
 #define DRV_NAME   tun
-#define DRV_VERSION1.6
+#define DRV_VERSION1.7
 #define DRV_DESCRIPTIONUniversal TUN/TAP device driver
 #define DRV_COPYRIGHT  (C) 1999-2004 Max Krasnyansky m...@qualcomm.com
 
@@ -146,18 +149,6 @@ struct tun_file {
struct tun_struct *detached;
 };
 
-struct tun_flow_entry {
-   struct hlist_node hash_link;
-   struct rcu_head rcu;
-   struct tun_struct *tun;
-
-   u32 rxhash;
-   int queue_index;
-   unsigned long updated;
-};
-
-#define TUN_NUM_FLOW_ENTRIES 1024
-
 /* Since the socket were moved to tun_file, to preserve the behavior of persist
  * device, socket filter, sndbuf and vnet header size were restore when the
  * file were attached to a persist device.
@@ -184,163 +175,11 @@ struct tun_struct {
int debug;
 #endif
spinlock_t lock;
-   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
-   struct timer_list flow_gc_timer;
-   unsigned long ageing_time;
unsigned int numdisabled;
struct list_head disabled;
void *security;
-   u32 flow_count;
 };
 
-static inline u32 tun_hashfn(u32 rxhash)
-{
-   return rxhash  0x3ff;
-}
-
-static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
rxhash)
-{
-   struct tun_flow_entry *e;
-
-   hlist_for_each_entry_rcu(e, head, hash_link) {
-   if (e-rxhash == rxhash)
-   return e;
-   }
-   return NULL;
-}
-
-static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
- struct hlist_head *head,
- u32 rxhash, u16 queue_index)
-{
-   struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
-
-   if (e) {
-   tun_debug(KERN_INFO, tun, create flow: hash %u index %u\n,
- rxhash, queue_index);
-   e-updated = jiffies;
-   e-rxhash = rxhash;
-   e-queue_index = queue_index;
-   e-tun = tun;
-   hlist_add_head_rcu(e-hash_link, head);
-   ++tun-flow_count;
-   }
-   return e;
-}
-
-static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
-{
-   tun_debug(KERN_INFO, tun, delete flow: hash %u index %u\n,
- e-rxhash, e-queue_index);
-   hlist_del_rcu(e-hash_link);
-   kfree_rcu(e, rcu);
-   --tun-flow_count;
-}
-
-static void tun_flow_flush(struct tun_struct *tun)
-{
-   int i;
-
-   spin_lock_bh(tun-lock);
-   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
-   struct tun_flow_entry *e;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_safe(e, n, tun-flows[i], hash_link)
-   tun_flow_delete(tun, e);
-   }
-   spin_unlock_bh(tun-lock);
-}
-
-static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 queue_index)
-{
-   int i;
-
-   spin_lock_bh(tun-lock);
-   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
-   struct tun_flow_entry *e;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_safe(e, n, tun-flows[i], hash_link) {
-   if (e-queue_index == queue_index)
-   tun_flow_delete(tun, e);
-   }
-   }
-   spin_unlock_bh(tun-lock);
-}
-
-static void tun_flow_cleanup(unsigned long data)
-{
-   struct tun_struct *tun = (struct tun_struct *)data;
-   unsigned long delay = tun-ageing_time;
-   unsigned long next_timer = jiffies + delay;
-   unsigned long count = 0;
-   int i;
-
-   tun_debug(KERN_INFO, tun, tun_flow_cleanup\n);
-
-   spin_lock_bh(tun-lock);
-   for (i = 0; i  TUN_NUM_FLOW_ENTRIES; i++) {
-   struct tun_flow_entry *e;
-   struct hlist_node *n;
-
-   hlist_for_each_entry_safe(e, n, tun-flows[i], hash_link) {
-   unsigned long this_timer;
-   count++;
-   this_timer = e-updated + delay;
-   if (time_before_eq(this_timer, jiffies

Re: [PATCH 1/5] xfs: factor prid related codes into xfs_get_initial_prid()

2013-12-14 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 7:20 PM, Jeff Liu  wrote:
> On 12/14 2013 00:32 AM, Christoph Hellwig wrote:
>>> +static inline prid_t xfs_get_initial_prid(struct xfs_inode *dp)
>>> +{
>>> +if (dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
>>> +return xfs_get_projid(dp);
>>> +else
>>> +return XFS_PROJID_DEFAULT;
>>> +}
>>
>> You could skip the else here.
> Except that, I'd suggest we move this helper to proper header file with
> further refactoring in xfs_symlink(), and it could be a separate patch.
Good point, will apply it, thanks.

>
> Thanks,
> -Jeff



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] xfs: allow linkat() on O_TMPFILE files

2013-12-14 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 4:19 PM, Dave Chinner  wrote:
> On Sat, Dec 14, 2013 at 01:36:47AM +0800, Zhi Yong Wu wrote:
>> On Sat, Dec 14, 2013 at 12:41 AM, Christoph Hellwig  
>> wrote:
>> > On Fri, Dec 13, 2013 at 10:27:53PM +0800, Zhi Yong Wu wrote:
>> >> From: Zhi Yong Wu 
>> >>
>> >> Enable O_TMPFILE support in linkat().
>> >> For more info, please refer to:
>> >>   http://oss.sgi.com/archives/xfs/2013-08/msg00341.html
>> >
>> > Generall you should provide all reasonable information in the changelog
>> > instead of linking to it.
>> will apply this, thanks.
>> >
>> >> + if (sip->i_d.di_nlink == 0)
>> >> + tres = _RES(mp)->tr_link_tmpfile;
>> >> + else
>> >> + tres = _RES(mp)->tr_link;
>> >
>> > As mentioned before I think Dave wanted you to always use the same
>> > reservation, but I'll leave that discussion to him.
>> If as you said, when some tons of regular files are created, it won't
>> waste some disk space? e.g. some files want to reserve some space, but
>> get NOSPACE due to other files reserving additional space?
>
> This is a log space reservation, not a disk space reservation. End
> either way, what is unused by the transaction is returned to the
> free space pool at the end of the transaction. So for simplicity,
> we should just use the one reservation for the link transaction -
> take whichever is larger at calculation time.
Good explaination, thanks Dave and Christoph. By the way, can you help
check if the log reservation for adding/removing one inode to/from
unlinked list is correct? or  will you check after i post next version
out?

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] xfs: allow linkat() on O_TMPFILE files

2013-12-14 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 4:19 PM, Dave Chinner da...@fromorbit.com wrote:
 On Sat, Dec 14, 2013 at 01:36:47AM +0800, Zhi Yong Wu wrote:
 On Sat, Dec 14, 2013 at 12:41 AM, Christoph Hellwig h...@infradead.org 
 wrote:
  On Fri, Dec 13, 2013 at 10:27:53PM +0800, Zhi Yong Wu wrote:
  From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
  Enable O_TMPFILE support in linkat().
  For more info, please refer to:
http://oss.sgi.com/archives/xfs/2013-08/msg00341.html
 
  Generall you should provide all reasonable information in the changelog
  instead of linking to it.
 will apply this, thanks.
 
  + if (sip-i_d.di_nlink == 0)
  + tres = M_RES(mp)-tr_link_tmpfile;
  + else
  + tres = M_RES(mp)-tr_link;
 
  As mentioned before I think Dave wanted you to always use the same
  reservation, but I'll leave that discussion to him.
 If as you said, when some tons of regular files are created, it won't
 waste some disk space? e.g. some files want to reserve some space, but
 get NOSPACE due to other files reserving additional space?

 This is a log space reservation, not a disk space reservation. End
 either way, what is unused by the transaction is returned to the
 free space pool at the end of the transaction. So for simplicity,
 we should just use the one reservation for the link transaction -
 take whichever is larger at calculation time.
Good explaination, thanks Dave and Christoph. By the way, can you help
check if the log reservation for adding/removing one inode to/from
unlinked list is correct? or  will you check after i post next version
out?


 Cheers,

 Dave.
 --
 Dave Chinner
 da...@fromorbit.com



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] xfs: factor prid related codes into xfs_get_initial_prid()

2013-12-14 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 7:20 PM, Jeff Liu jeff@oracle.com wrote:
 On 12/14 2013 00:32 AM, Christoph Hellwig wrote:
 +static inline prid_t xfs_get_initial_prid(struct xfs_inode *dp)
 +{
 +if (dp-i_d.di_flags  XFS_DIFLAG_PROJINHERIT)
 +return xfs_get_projid(dp);
 +else
 +return XFS_PROJID_DEFAULT;
 +}

 You could skip the else here.
 Except that, I'd suggest we move this helper to proper header file with
 further refactoring in xfs_symlink(), and it could be a separate patch.
Good point, will apply it, thanks.


 Thanks,
 -Jeff



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] xfs: allow linkat() on O_TMPFILE files

2013-12-13 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 12:41 AM, Christoph Hellwig  wrote:
> On Fri, Dec 13, 2013 at 10:27:53PM +0800, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu 
>>
>> Enable O_TMPFILE support in linkat().
>> For more info, please refer to:
>>   http://oss.sgi.com/archives/xfs/2013-08/msg00341.html
>
> Generall you should provide all reasonable information in the changelog
> instead of linking to it.
will apply this, thanks.
>
>> + if (sip->i_d.di_nlink == 0)
>> + tres = _RES(mp)->tr_link_tmpfile;
>> + else
>> + tres = _RES(mp)->tr_link;
>
> As mentioned before I think Dave wanted you to always use the same
> reservation, but I'll leave that discussion to him.
If as you said, when some tons of regular files are created, it won't
waste some disk space? e.g. some files want to reserve some space, but
get NOSPACE due to other files reserving additional space?

>
>> +/* For creating a link to an O_TMPFILE inode, except modifying
>> + * those metadata for regular inode, we still need to remove an inode
>> + * from unlinked list at first. That is,  we can modify:
>> + *the agi hash list and counters: sector size
>> + *the on disk inode before ours in the agi hash list: inode cluster size
>> + */
>
> We always have an emptry content
Done, thanks.
>
> /*
>
> line at the beginning of comments in XFS and the Linux kernel in
> general.
>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] xfs: add a new method xfs_vn_tmpfile()

2013-12-13 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 12:39 AM, Christoph Hellwig  wrote:
> On Fri, Dec 13, 2013 at 10:27:52PM +0800, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu 
>>
>> Add a new O_TMPFILE method to VFS inteface.
>> For more info, please refer to:
>>   http://oss.sgi.com/archives/xfs/2013-08/msg00336.html
>>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>  fs/xfs/xfs_iops.c |   22 ++
>>  1 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index eb55be5..b57cd89 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -39,6 +39,7 @@
>>  #include "xfs_da_btree.h"
>>  #include "xfs_dir2_priv.h"
>>  #include "xfs_dinode.h"
>> +#include "xfs_trans_space.h"
>>
>>  #include 
>>  #include 
>> @@ -1051,6 +1052,25 @@ xfs_vn_fiemap(
>>   return 0;
>>  }
>>
>> +STATIC int
>> +xfs_vn_tmpfile(
>> + struct inode*dir,
>> + struct dentry   *dentry,
>> + umode_t mode)
>> +{
>> + struct xfs_inode *ip = NULL;
>> + int error;
>> +
>> + error = xfs_create_tmpfile(XFS_I(dir), XFS_I(dir)->i_mount,
>
> No need to pass in the mount point here, the client can get it easily.
>
>> + mode, 0, );
>
> Also no need for an always-zero argument.
Fixed, thanks.
>
>> + if (error)
>> + return -error;
>> +
>> + d_instantiate(dentry, VFS_I(ip));
>
> Shouldn't this be a call to d_tmpfile() instead?
Yes, then it need to be called in xfs_create_tmpfile() just before
xfs_iunlink() is called.
>
> Also I'd suggest mergin this into the previous patch, so that we have
> one that actually adds O_TMPFILE support, and once place to write a nice
Merged them, thanks.
> good changelog.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] xfs: add xfs_create_tmpfile() for O_TMPFILE support

2013-12-13 Thread Zhi Yong Wu
Fixed them, thanks.

On Sat, Dec 14, 2013 at 12:37 AM, Christoph Hellwig  wrote:
>> + error = xfs_dir_ialloc(, NULL, mode, 0, rdev,
>
> please pass the parent inode pointer here.
>
>> + XFS_PROJID_DEFAULT, resblks > 0,
>
> and pass the project id that you inherited from the parent here.
>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] xfs: adjust the interface of xfs_qm_vop_dqalloc()

2013-12-13 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 12:32 AM, Christoph Hellwig  wrote:
> On Fri, Dec 13, 2013 at 10:27:50PM +0800, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu 
>>
>> There may be not a parent inode or a name for O_TMPFILE support, but will 
>> pass
>> a struct xfs_mount to xfs_qm_vop_dqalloc(). So its interface need to be
>> adjusted in order that O_TMPFILE creation function can also use it.
>>
>> Signed-off-by: Zhi Yong Wu 
>
> This patch is not actually needed, as we do get passed a parent.
Discarded, thanks.

>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] xfs: adjust the interface of xfs_qm_vop_dqalloc()

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu 

There may be not a parent inode or a name for O_TMPFILE support, but will pass
a struct xfs_mount to xfs_qm_vop_dqalloc(). So its interface need to be
adjusted in order that O_TMPFILE creation function can also use it.

Signed-off-by: Zhi Yong Wu 
---
 fs/xfs/xfs_inode.c   |2 +-
 fs/xfs/xfs_ioctl.c   |2 +-
 fs/xfs/xfs_iops.c|3 ++-
 fs/xfs/xfs_qm.c  |   50 +++---
 fs/xfs/xfs_quota.h   |6 --
 fs/xfs/xfs_symlink.c |2 +-
 6 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e8b9a68..71a8186 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1182,7 +1182,7 @@ xfs_create(
/*
 * Make sure that we have allocated dquot(s) on disk.
 */
-   error = xfs_qm_vop_dqalloc(dp, xfs_kuid_to_uid(current_fsuid()),
+   error = xfs_qm_vop_dqalloc(dp, mp, xfs_kuid_to_uid(current_fsuid()),
xfs_kgid_to_gid(current_fsgid()), prid,
XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
, , );
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 33ad9a7..eac84bd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1090,7 +1090,7 @@ xfs_ioctl_setattr(
 * because the i_*dquot fields will get updated anyway.
 */
if (XFS_IS_QUOTA_ON(mp) && (mask & FSX_PROJID)) {
-   code = xfs_qm_vop_dqalloc(ip, ip->i_d.di_uid,
+   code = xfs_qm_vop_dqalloc(ip, ip->i_mount, ip->i_d.di_uid,
 ip->i_d.di_gid, fa->fsx_projid,
 XFS_QMOPT_PQUOTA, , NULL, );
if (code)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 27e0e54..eb55be5 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -540,7 +540,8 @@ xfs_setattr_nonsize(
 */
ASSERT(udqp == NULL);
ASSERT(gdqp == NULL);
-   error = xfs_qm_vop_dqalloc(ip, xfs_kuid_to_uid(uid),
+   error = xfs_qm_vop_dqalloc(ip, ip->i_mount,
+  xfs_kuid_to_uid(uid),
   xfs_kgid_to_gid(gid),
   xfs_get_projid(ip),
   qflags, , , NULL);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 14a4996..1f13e82 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1765,6 +1765,7 @@ xfs_qm_write_sb_changes(
 int
 xfs_qm_vop_dqalloc(
struct xfs_inode*ip,
+   struct xfs_mount*mp,
xfs_dqid_t  uid,
xfs_dqid_t  gid,
prid_t  prid,
@@ -1773,7 +1774,6 @@ xfs_qm_vop_dqalloc(
struct xfs_dquot**O_gdqpp,
struct xfs_dquot**O_pdqpp)
 {
-   struct xfs_mount*mp = ip->i_mount;
struct xfs_dquot*uq = NULL;
struct xfs_dquot*gq = NULL;
struct xfs_dquot*pq = NULL;
@@ -1783,17 +1783,19 @@ xfs_qm_vop_dqalloc(
if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
return 0;
 
-   lockflags = XFS_ILOCK_EXCL;
-   xfs_ilock(ip, lockflags);
+   if (ip) {
+   lockflags = XFS_ILOCK_EXCL;
+   xfs_ilock(ip, lockflags);
 
-   if ((flags & XFS_QMOPT_INHERIT) && XFS_INHERIT_GID(ip))
-   gid = ip->i_d.di_gid;
+   if ((flags & XFS_QMOPT_INHERIT) && XFS_INHERIT_GID(ip))
+   gid = ip->i_d.di_gid;
+   }
 
/*
 * Attach the dquot(s) to this inode, doing a dquot allocation
 * if necessary. The dquot(s) will not be locked.
 */
-   if (XFS_NOT_DQATTACHED(mp, ip)) {
+   if (ip && XFS_NOT_DQATTACHED(mp, ip)) {
error = xfs_qm_dqattach_locked(ip, XFS_QMOPT_DQALLOC);
if (error) {
xfs_iunlock(ip, lockflags);
@@ -1802,7 +1804,7 @@ xfs_qm_vop_dqalloc(
}
 
if ((flags & XFS_QMOPT_UQUOTA) && XFS_IS_UQUOTA_ON(mp)) {
-   if (ip->i_d.di_uid != uid) {
+   if (ip || (ip->i_d.di_uid != uid)) {
/*
 * What we need is the dquot that has this uid, and
 * if we send the inode to dqget, the uid of the inode
@@ -1812,7 +1814,8 @@ xfs_qm_vop_dqalloc(
 * we'll deadlock by doing trans_reserve while
 * holding ilock.
 */
-   xfs_iunlock(ip, lockflags);
+   if (ip)
+   xfs_iunlock(ip, lockflags);
error = xfs_qm_dqget(mp, NULL, uid,
   

[PATCH 0/5] xfs: add O_TMPFILE support

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu 

HI, folks

  It's time to post out the first formal version, welcome to any constructive 
comment, thanks.

  If anyone is interested in playing with it, you can get this patchset from my 
dev git on github:
  git://github.com/wuzhy/kernel.git xfs_tmpfile

  The patchset was tests agaist the code snippet from Andy Lutomirski and other 
test cases:
  http://lwn.net/Articles/562296/
  If you have any other better test cases, please let me know, thanks.

#include 
#include 
#include 
#include 
#include 

#define __O_TMPFILE 02000
#define O_DIRECTORY 020
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define AT_EMPTY_PATH 0x1000

int main(int argc, char **argv)
{
   char buf[128];

   if (argc != 3)
 errx(1, "Usage: flinktest PATH linkat|proc");

   int fd = open(".", O_TMPFILE | O_RDWR, 0600);
   if (fd == -1)
 err(1, "O_TMPFILE");
   else
 printf("fd #: %d\n", fd);

   write(fd, "test", 4);

   if (!strcmp(argv[2], "linkat")) {
 if (linkat(fd, "", AT_FDCWD, argv[1], AT_EMPTY_PATH) != 0)
   err(1, "linkat");
   } else if (!strcmp(argv[2], "proc")) {
 sprintf(buf, "/proc/self/fd/%d", fd);
 if (linkat(AT_FDCWD, buf, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0)
   err(1, "linkat");
   } else {
 errx(1, "invalid mode");
   }

   return 0;
}


Changelog from rfc:
 - Addressed the comments from Dave Chinner and Christoph Hellwig.

Zhi Yong Wu (5):
  xfs: factor prid related codes into xfs_get_initial_prid()
  xfs: adjust the interface of xfs_qm_vop_dqalloc()
  xfs: add xfs_create_tmpfile() for O_TMPFILE support
  xfs: add a new method xfs_vn_tmpfile()
  xfs: allow linkat() on O_TMPFILE files

 fs/xfs/xfs_inode.c  |  142 ---
 fs/xfs/xfs_inode.h  |2 +
 fs/xfs/xfs_ioctl.c  |2 +-
 fs/xfs/xfs_iops.c   |   25 -
 fs/xfs/xfs_qm.c |   50 ++--
 fs/xfs/xfs_quota.h  |6 +-
 fs/xfs/xfs_shared.h |4 +-
 fs/xfs/xfs_symlink.c|2 +-
 fs/xfs/xfs_trans_resv.c |   51 +
 fs/xfs/xfs_trans_resv.h |4 +
 10 files changed, 255 insertions(+), 33 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] xfs: factor prid related codes into xfs_get_initial_prid()

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu 

It will be reused by the O_TMPFILE creation function.

Signed-off-by: Zhi Yong Wu 
---
 fs/xfs/xfs_inode.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 001aa89..e8b9a68 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1139,6 +1139,14 @@ xfs_bumplink(
return 0;
 }
 
+static inline prid_t xfs_get_initial_prid(struct xfs_inode *dp)
+{
+   if (dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
+   return xfs_get_projid(dp);
+   else
+   return XFS_PROJID_DEFAULT;
+}
+
 int
 xfs_create(
xfs_inode_t *dp,
@@ -1169,10 +1177,7 @@ xfs_create(
if (XFS_FORCED_SHUTDOWN(mp))
return XFS_ERROR(EIO);
 
-   if (dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)
-   prid = xfs_get_projid(dp);
-   else
-   prid = XFS_PROJID_DEFAULT;
+   prid = xfs_get_initial_prid(dp);
 
/*
 * Make sure that we have allocated dquot(s) on disk.
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] xfs: add a new method xfs_vn_tmpfile()

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Add a new O_TMPFILE method to VFS inteface.
For more info, please refer to:
  http://oss.sgi.com/archives/xfs/2013-08/msg00336.html

Signed-off-by: Zhi Yong Wu 
---
 fs/xfs/xfs_iops.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index eb55be5..b57cd89 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -39,6 +39,7 @@
 #include "xfs_da_btree.h"
 #include "xfs_dir2_priv.h"
 #include "xfs_dinode.h"
+#include "xfs_trans_space.h"
 
 #include 
 #include 
@@ -1051,6 +1052,25 @@ xfs_vn_fiemap(
return 0;
 }
 
+STATIC int
+xfs_vn_tmpfile(
+   struct inode*dir,
+   struct dentry   *dentry,
+   umode_t mode)
+{
+   struct xfs_inode *ip = NULL;
+   int error;
+
+   error = xfs_create_tmpfile(XFS_I(dir), XFS_I(dir)->i_mount,
+   mode, 0, );
+   if (error)
+   return -error;
+
+   d_instantiate(dentry, VFS_I(ip));
+
+   return -error;
+}
+
 static const struct inode_operations xfs_inode_operations = {
.get_acl= xfs_get_acl,
.getattr= xfs_vn_getattr,
@@ -1087,6 +1107,7 @@ static const struct inode_operations 
xfs_dir_inode_operations = {
.removexattr= generic_removexattr,
.listxattr  = xfs_vn_listxattr,
.update_time= xfs_vn_update_time,
+   .tmpfile= xfs_vn_tmpfile,
 };
 
 static const struct inode_operations xfs_dir_ci_inode_operations = {
@@ -1113,6 +1134,7 @@ static const struct inode_operations 
xfs_dir_ci_inode_operations = {
.removexattr= generic_removexattr,
.listxattr  = xfs_vn_listxattr,
.update_time= xfs_vn_update_time,
+   .tmpfile= xfs_vn_tmpfile,
 };
 
 static const struct inode_operations xfs_symlink_inode_operations = {
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] xfs: allow linkat() on O_TMPFILE files

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Enable O_TMPFILE support in linkat().
For more info, please refer to:
  http://oss.sgi.com/archives/xfs/2013-08/msg00341.html

Signed-off-by: Zhi Yong Wu 
---
 fs/xfs/xfs_inode.c  |   21 ++---
 fs/xfs/xfs_trans_resv.c |   20 
 fs/xfs/xfs_trans_resv.h |2 ++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 48e09c5..2e1fd96 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -62,6 +62,8 @@ kmem_zone_t *xfs_inode_zone;
 
 STATIC int xfs_iflush_int(xfs_inode_t *, xfs_buf_t *);
 
+STATIC int xfs_iunlink_remove(xfs_trans_t *, xfs_inode_t *);
+
 /*
  * helper function to extract extent size hint from inode
  */
@@ -1119,7 +1121,7 @@ xfs_bumplink(
 {
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 
-   ASSERT(ip->i_d.di_nlink > 0);
+   ASSERT(ip->i_d.di_nlink > 0 || (VFS_I(ip)->i_state & I_LINKABLE));
ip->i_d.di_nlink++;
inc_nlink(VFS_I(ip));
if ((ip->i_d.di_version == 1) &&
@@ -1455,6 +1457,7 @@ xfs_link(
 {
xfs_mount_t *mp = tdp->i_mount;
xfs_trans_t *tp;
+   struct xfs_trans_res*tres;
int error;
xfs_bmap_free_t free_list;
xfs_fsblock_t   first_block;
@@ -1480,10 +1483,16 @@ xfs_link(
tp = xfs_trans_alloc(mp, XFS_TRANS_LINK);
cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
-   error = xfs_trans_reserve(tp, _RES(mp)->tr_link, resblks, 0);
+
+   if (sip->i_d.di_nlink == 0)
+   tres = _RES(mp)->tr_link_tmpfile;
+   else
+   tres = _RES(mp)->tr_link;
+
+   error = xfs_trans_reserve(tp, tres, resblks, 0);
if (error == ENOSPC) {
resblks = 0;
-   error = xfs_trans_reserve(tp, _RES(mp)->tr_link, 0, 0);
+   error = xfs_trans_reserve(tp, tres, 0, 0);
}
if (error) {
cancel_flags = 0;
@@ -1512,6 +1521,12 @@ xfs_link(
 
xfs_bmap_init(_list, _block);
 
+   if (sip->i_d.di_nlink == 0) {
+   error = xfs_iunlink_remove(tp, sip);
+   if (error)
+   goto abort_return;
+   }
+
error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
_block, _list, resblks);
if (error)
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index 04519a9..f2da7f4 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -228,6 +228,22 @@ xfs_calc_link_reservation(
  XFS_FSB_TO_B(mp, 1;
 }
 
+/* For creating a link to an O_TMPFILE inode, except modifying
+ * those metadata for regular inode, we still need to remove an inode
+ * from unlinked list at first. That is,  we can modify:
+ *the agi hash list and counters: sector size
+ *the on disk inode before ours in the agi hash list: inode cluster size
+ */
+STATIC uint
+xfs_calc_link_tmpfile_reservation(
+   struct xfs_mount*mp)
+{
+   return xfs_calc_link_reservation(mp) +
+   xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+   MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
+   (__uint16_t)XFS_INODE_CLUSTER_SIZE(mp));
+}
+
 /*
  * For removing a directory entry we can modify:
  *the parent directory inode: inode size
@@ -743,6 +759,10 @@ xfs_trans_resv_calc(
resp->tr_link.tr_logcount = XFS_LINK_LOG_COUNT;
resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+   resp->tr_link_tmpfile.tr_logres = xfs_calc_link_tmpfile_reservation(mp);
+   resp->tr_link_tmpfile.tr_logcount = XFS_LINK_TMPFILE_LOG_COUNT;
+   resp->tr_link_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
resp->tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT;
resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index 285621d..86a0daf 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -35,6 +35,7 @@ struct xfs_trans_resv {
struct xfs_trans_restr_itruncate;   /* truncate trans */
struct xfs_trans_restr_rename;  /* rename trans */
struct xfs_trans_restr_link;/* link trans */
+   struct xfs_trans_restr_link_tmpfile; /* link O_TMPFILE trans */
struct xfs_trans_restr_remove;  /* unlink trans */
struct xfs_trans_restr_symlink; /* symlink trans */
struct xfs_trans_restr_create;  /* create trans */
@@ -106,6 +107,7 @@ struct xfs_trans_resv {
 #defineXFS_SYMLINK_LOG_COUNT   3
 #defineXFS_REMOVE_LOG_COUNT2
 #defineXFS_L

[PATCH 3/5] xfs: add xfs_create_tmpfile() for O_TMPFILE support

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu 

The function is used to create one O_TMPFILE file.
For more info, please refer to:
  http://oss.sgi.com/archives/xfs/2013-08/msg00339.html

Signed-off-by: Zhi Yong Wu 
---
 fs/xfs/xfs_inode.c  |  106 +++
 fs/xfs/xfs_inode.h  |2 +
 fs/xfs/xfs_shared.h |4 +-
 fs/xfs/xfs_trans_resv.c |   31 ++
 fs/xfs/xfs_trans_resv.h |2 +
 5 files changed, 144 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 71a8186..48e09c5 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1342,6 +1342,112 @@ xfs_create(
 }
 
 int
+xfs_create_tmpfile(
+   struct xfs_inode*dp,
+   struct xfs_mount*mp,
+   umode_t mode,
+   dev_t   rdev,
+   struct xfs_inode**ipp)
+{
+   struct xfs_inode*ip = NULL;
+   struct xfs_trans*tp = NULL;
+   int error;
+   uintcancel_flags = XFS_TRANS_RELEASE_LOG_RES;
+   struct xfs_dquot*udqp = NULL;
+   struct xfs_dquot*gdqp = NULL;
+   struct xfs_dquot*pdqp = NULL;
+   struct xfs_trans_res*tres;
+   uintresblks;
+
+   if (XFS_FORCED_SHUTDOWN(mp))
+   return XFS_ERROR(EIO);
+
+   /*
+* Make sure that we have allocated dquot(s) on disk.
+*/
+   error = xfs_qm_vop_dqalloc(dp, mp, xfs_kuid_to_uid(current_fsuid()),
+   xfs_kgid_to_gid(current_fsgid()),
+   xfs_get_initial_prid(dp),
+   XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
+   , , );
+   if (error)
+   return error;
+
+   resblks = XFS_IALLOC_SPACE_RES(mp);
+   tp = xfs_trans_alloc(mp, XFS_TRANS_CREATE_TMPFILE);
+
+   tres = _RES(mp)->tr_create_tmpfile;
+   error = xfs_trans_reserve(tp, tres, resblks, 0);
+   if (error == ENOSPC) {
+   /* No space at all so try a "no-allocation" reservation */
+   resblks = 0;
+   error = xfs_trans_reserve(tp, tres, 0, 0);
+   }
+   if (error) {
+   cancel_flags = 0;
+   goto out_trans_cancel;
+   }
+
+   error = xfs_trans_reserve_quota(tp, mp, udqp, gdqp,
+   pdqp, resblks, 1, 0);
+   if (error)
+   goto out_trans_cancel;
+
+   error = xfs_dir_ialloc(, NULL, mode, 0, rdev,
+   XFS_PROJID_DEFAULT, resblks > 0,
+   , NULL);
+   if (error) {
+   if (error == ENOSPC)
+   goto out_trans_cancel;
+   goto out_trans_abort;
+   }
+
+   if (mp->m_flags & XFS_MOUNT_WSYNC)
+   xfs_trans_set_sync(tp);
+
+   /*
+* Attach the dquot(s) to the inodes and modify them incore.
+* These ids of the inode couldn't have changed since the new
+* inode has been locked ever since it was created.
+*/
+   xfs_qm_vop_create_dqattach(tp, ip, udqp, gdqp, pdqp);
+
+   error = xfs_iunlink(tp, ip);
+   if (error)
+   goto out_trans_abort;
+
+   error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+   if (error)
+   goto out_release_inode;
+
+   xfs_qm_dqrele(udqp);
+   xfs_qm_dqrele(gdqp);
+   xfs_qm_dqrele(pdqp);
+
+   *ipp = ip;
+   return 0;
+
+ out_trans_abort:
+   cancel_flags |= XFS_TRANS_ABORT;
+ out_trans_cancel:
+   xfs_trans_cancel(tp, cancel_flags);
+ out_release_inode:
+   /*
+* Wait until after the current transaction is aborted to
+* release the inode.  This prevents recursive transactions
+* and deadlocks from xfs_inactive.
+*/
+   if (ip)
+   IRELE(ip);
+
+   xfs_qm_dqrele(udqp);
+   xfs_qm_dqrele(gdqp);
+   xfs_qm_dqrele(pdqp);
+
+   return error;
+}
+
+int
 xfs_link(
xfs_inode_t *tdp,
xfs_inode_t *sip,
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 9e6efccb..5699cc6 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -323,6 +323,8 @@ int xfs_lookup(struct xfs_inode *dp, struct 
xfs_name *name,
   struct xfs_inode **ipp, struct xfs_name *ci_name);
 intxfs_create(struct xfs_inode *dp, struct xfs_name *name,
   umode_t mode, xfs_dev_t rdev, struct xfs_inode 
**ipp);
+intxfs_create_tmpfile(struct xfs_inode *dp, struct xfs_mount *mp,
+  umode_t mode, xfs_dev_t rdev, struct xfs_inode 
**ipp);
 intxfs_remove(struct xfs_inode *dp, struct xfs_name *name,
   struct xfs_inode *ip);
 intxfs_link(struct xfs_inode *tdp

[PATCH 3/5] xfs: add xfs_create_tmpfile() for O_TMPFILE support

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

The function is used to create one O_TMPFILE file.
For more info, please refer to:
  http://oss.sgi.com/archives/xfs/2013-08/msg00339.html

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/xfs/xfs_inode.c  |  106 +++
 fs/xfs/xfs_inode.h  |2 +
 fs/xfs/xfs_shared.h |4 +-
 fs/xfs/xfs_trans_resv.c |   31 ++
 fs/xfs/xfs_trans_resv.h |2 +
 5 files changed, 144 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 71a8186..48e09c5 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1342,6 +1342,112 @@ xfs_create(
 }
 
 int
+xfs_create_tmpfile(
+   struct xfs_inode*dp,
+   struct xfs_mount*mp,
+   umode_t mode,
+   dev_t   rdev,
+   struct xfs_inode**ipp)
+{
+   struct xfs_inode*ip = NULL;
+   struct xfs_trans*tp = NULL;
+   int error;
+   uintcancel_flags = XFS_TRANS_RELEASE_LOG_RES;
+   struct xfs_dquot*udqp = NULL;
+   struct xfs_dquot*gdqp = NULL;
+   struct xfs_dquot*pdqp = NULL;
+   struct xfs_trans_res*tres;
+   uintresblks;
+
+   if (XFS_FORCED_SHUTDOWN(mp))
+   return XFS_ERROR(EIO);
+
+   /*
+* Make sure that we have allocated dquot(s) on disk.
+*/
+   error = xfs_qm_vop_dqalloc(dp, mp, xfs_kuid_to_uid(current_fsuid()),
+   xfs_kgid_to_gid(current_fsgid()),
+   xfs_get_initial_prid(dp),
+   XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
+   udqp, gdqp, pdqp);
+   if (error)
+   return error;
+
+   resblks = XFS_IALLOC_SPACE_RES(mp);
+   tp = xfs_trans_alloc(mp, XFS_TRANS_CREATE_TMPFILE);
+
+   tres = M_RES(mp)-tr_create_tmpfile;
+   error = xfs_trans_reserve(tp, tres, resblks, 0);
+   if (error == ENOSPC) {
+   /* No space at all so try a no-allocation reservation */
+   resblks = 0;
+   error = xfs_trans_reserve(tp, tres, 0, 0);
+   }
+   if (error) {
+   cancel_flags = 0;
+   goto out_trans_cancel;
+   }
+
+   error = xfs_trans_reserve_quota(tp, mp, udqp, gdqp,
+   pdqp, resblks, 1, 0);
+   if (error)
+   goto out_trans_cancel;
+
+   error = xfs_dir_ialloc(tp, NULL, mode, 0, rdev,
+   XFS_PROJID_DEFAULT, resblks  0,
+   ip, NULL);
+   if (error) {
+   if (error == ENOSPC)
+   goto out_trans_cancel;
+   goto out_trans_abort;
+   }
+
+   if (mp-m_flags  XFS_MOUNT_WSYNC)
+   xfs_trans_set_sync(tp);
+
+   /*
+* Attach the dquot(s) to the inodes and modify them incore.
+* These ids of the inode couldn't have changed since the new
+* inode has been locked ever since it was created.
+*/
+   xfs_qm_vop_create_dqattach(tp, ip, udqp, gdqp, pdqp);
+
+   error = xfs_iunlink(tp, ip);
+   if (error)
+   goto out_trans_abort;
+
+   error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+   if (error)
+   goto out_release_inode;
+
+   xfs_qm_dqrele(udqp);
+   xfs_qm_dqrele(gdqp);
+   xfs_qm_dqrele(pdqp);
+
+   *ipp = ip;
+   return 0;
+
+ out_trans_abort:
+   cancel_flags |= XFS_TRANS_ABORT;
+ out_trans_cancel:
+   xfs_trans_cancel(tp, cancel_flags);
+ out_release_inode:
+   /*
+* Wait until after the current transaction is aborted to
+* release the inode.  This prevents recursive transactions
+* and deadlocks from xfs_inactive.
+*/
+   if (ip)
+   IRELE(ip);
+
+   xfs_qm_dqrele(udqp);
+   xfs_qm_dqrele(gdqp);
+   xfs_qm_dqrele(pdqp);
+
+   return error;
+}
+
+int
 xfs_link(
xfs_inode_t *tdp,
xfs_inode_t *sip,
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 9e6efccb..5699cc6 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -323,6 +323,8 @@ int xfs_lookup(struct xfs_inode *dp, struct 
xfs_name *name,
   struct xfs_inode **ipp, struct xfs_name *ci_name);
 intxfs_create(struct xfs_inode *dp, struct xfs_name *name,
   umode_t mode, xfs_dev_t rdev, struct xfs_inode 
**ipp);
+intxfs_create_tmpfile(struct xfs_inode *dp, struct xfs_mount *mp,
+  umode_t mode, xfs_dev_t rdev, struct xfs_inode 
**ipp);
 intxfs_remove(struct xfs_inode *dp, struct xfs_name *name,
   struct xfs_inode *ip);
 int

[PATCH 5/5] xfs: allow linkat() on O_TMPFILE files

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Enable O_TMPFILE support in linkat().
For more info, please refer to:
  http://oss.sgi.com/archives/xfs/2013-08/msg00341.html

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/xfs/xfs_inode.c  |   21 ++---
 fs/xfs/xfs_trans_resv.c |   20 
 fs/xfs/xfs_trans_resv.h |2 ++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 48e09c5..2e1fd96 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -62,6 +62,8 @@ kmem_zone_t *xfs_inode_zone;
 
 STATIC int xfs_iflush_int(xfs_inode_t *, xfs_buf_t *);
 
+STATIC int xfs_iunlink_remove(xfs_trans_t *, xfs_inode_t *);
+
 /*
  * helper function to extract extent size hint from inode
  */
@@ -1119,7 +1121,7 @@ xfs_bumplink(
 {
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 
-   ASSERT(ip-i_d.di_nlink  0);
+   ASSERT(ip-i_d.di_nlink  0 || (VFS_I(ip)-i_state  I_LINKABLE));
ip-i_d.di_nlink++;
inc_nlink(VFS_I(ip));
if ((ip-i_d.di_version == 1) 
@@ -1455,6 +1457,7 @@ xfs_link(
 {
xfs_mount_t *mp = tdp-i_mount;
xfs_trans_t *tp;
+   struct xfs_trans_res*tres;
int error;
xfs_bmap_free_t free_list;
xfs_fsblock_t   first_block;
@@ -1480,10 +1483,16 @@ xfs_link(
tp = xfs_trans_alloc(mp, XFS_TRANS_LINK);
cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
resblks = XFS_LINK_SPACE_RES(mp, target_name-len);
-   error = xfs_trans_reserve(tp, M_RES(mp)-tr_link, resblks, 0);
+
+   if (sip-i_d.di_nlink == 0)
+   tres = M_RES(mp)-tr_link_tmpfile;
+   else
+   tres = M_RES(mp)-tr_link;
+
+   error = xfs_trans_reserve(tp, tres, resblks, 0);
if (error == ENOSPC) {
resblks = 0;
-   error = xfs_trans_reserve(tp, M_RES(mp)-tr_link, 0, 0);
+   error = xfs_trans_reserve(tp, tres, 0, 0);
}
if (error) {
cancel_flags = 0;
@@ -1512,6 +1521,12 @@ xfs_link(
 
xfs_bmap_init(free_list, first_block);
 
+   if (sip-i_d.di_nlink == 0) {
+   error = xfs_iunlink_remove(tp, sip);
+   if (error)
+   goto abort_return;
+   }
+
error = xfs_dir_createname(tp, tdp, target_name, sip-i_ino,
first_block, free_list, resblks);
if (error)
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index 04519a9..f2da7f4 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -228,6 +228,22 @@ xfs_calc_link_reservation(
  XFS_FSB_TO_B(mp, 1;
 }
 
+/* For creating a link to an O_TMPFILE inode, except modifying
+ * those metadata for regular inode, we still need to remove an inode
+ * from unlinked list at first. That is,  we can modify:
+ *the agi hash list and counters: sector size
+ *the on disk inode before ours in the agi hash list: inode cluster size
+ */
+STATIC uint
+xfs_calc_link_tmpfile_reservation(
+   struct xfs_mount*mp)
+{
+   return xfs_calc_link_reservation(mp) +
+   xfs_calc_buf_res(1, mp-m_sb.sb_sectsize) +
+   MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
+   (__uint16_t)XFS_INODE_CLUSTER_SIZE(mp));
+}
+
 /*
  * For removing a directory entry we can modify:
  *the parent directory inode: inode size
@@ -743,6 +759,10 @@ xfs_trans_resv_calc(
resp-tr_link.tr_logcount = XFS_LINK_LOG_COUNT;
resp-tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+   resp-tr_link_tmpfile.tr_logres = xfs_calc_link_tmpfile_reservation(mp);
+   resp-tr_link_tmpfile.tr_logcount = XFS_LINK_TMPFILE_LOG_COUNT;
+   resp-tr_link_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
resp-tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
resp-tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT;
resp-tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
diff --git a/fs/xfs/xfs_trans_resv.h b/fs/xfs/xfs_trans_resv.h
index 285621d..86a0daf 100644
--- a/fs/xfs/xfs_trans_resv.h
+++ b/fs/xfs/xfs_trans_resv.h
@@ -35,6 +35,7 @@ struct xfs_trans_resv {
struct xfs_trans_restr_itruncate;   /* truncate trans */
struct xfs_trans_restr_rename;  /* rename trans */
struct xfs_trans_restr_link;/* link trans */
+   struct xfs_trans_restr_link_tmpfile; /* link O_TMPFILE trans */
struct xfs_trans_restr_remove;  /* unlink trans */
struct xfs_trans_restr_symlink; /* symlink trans */
struct xfs_trans_restr_create;  /* create trans */
@@ -106,6 +107,7 @@ struct xfs_trans_resv {
 #defineXFS_SYMLINK_LOG_COUNT   3
 #defineXFS_REMOVE_LOG_COUNT2
 #defineXFS_LINK_LOG_COUNT  2
+#define

[PATCH 4/5] xfs: add a new method xfs_vn_tmpfile()

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Add a new O_TMPFILE method to VFS inteface.
For more info, please refer to:
  http://oss.sgi.com/archives/xfs/2013-08/msg00336.html

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/xfs/xfs_iops.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index eb55be5..b57cd89 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -39,6 +39,7 @@
 #include xfs_da_btree.h
 #include xfs_dir2_priv.h
 #include xfs_dinode.h
+#include xfs_trans_space.h
 
 #include linux/capability.h
 #include linux/xattr.h
@@ -1051,6 +1052,25 @@ xfs_vn_fiemap(
return 0;
 }
 
+STATIC int
+xfs_vn_tmpfile(
+   struct inode*dir,
+   struct dentry   *dentry,
+   umode_t mode)
+{
+   struct xfs_inode *ip = NULL;
+   int error;
+
+   error = xfs_create_tmpfile(XFS_I(dir), XFS_I(dir)-i_mount,
+   mode, 0, ip);
+   if (error)
+   return -error;
+
+   d_instantiate(dentry, VFS_I(ip));
+
+   return -error;
+}
+
 static const struct inode_operations xfs_inode_operations = {
.get_acl= xfs_get_acl,
.getattr= xfs_vn_getattr,
@@ -1087,6 +1107,7 @@ static const struct inode_operations 
xfs_dir_inode_operations = {
.removexattr= generic_removexattr,
.listxattr  = xfs_vn_listxattr,
.update_time= xfs_vn_update_time,
+   .tmpfile= xfs_vn_tmpfile,
 };
 
 static const struct inode_operations xfs_dir_ci_inode_operations = {
@@ -1113,6 +1134,7 @@ static const struct inode_operations 
xfs_dir_ci_inode_operations = {
.removexattr= generic_removexattr,
.listxattr  = xfs_vn_listxattr,
.update_time= xfs_vn_update_time,
+   .tmpfile= xfs_vn_tmpfile,
 };
 
 static const struct inode_operations xfs_symlink_inode_operations = {
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] xfs: add O_TMPFILE support

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

HI, folks

  It's time to post out the first formal version, welcome to any constructive 
comment, thanks.

  If anyone is interested in playing with it, you can get this patchset from my 
dev git on github:
  git://github.com/wuzhy/kernel.git xfs_tmpfile

  The patchset was tests agaist the code snippet from Andy Lutomirski and other 
test cases:
  http://lwn.net/Articles/562296/
  If you have any other better test cases, please let me know, thanks.

#include stdio.h
#include err.h
#include fcntl.h
#include unistd.h
#include string.h

#define __O_TMPFILE 02000
#define O_DIRECTORY 020
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define AT_EMPTY_PATH 0x1000

int main(int argc, char **argv)
{
   char buf[128];

   if (argc != 3)
 errx(1, Usage: flinktest PATH linkat|proc);

   int fd = open(., O_TMPFILE | O_RDWR, 0600);
   if (fd == -1)
 err(1, O_TMPFILE);
   else
 printf(fd #: %d\n, fd);

   write(fd, test, 4);

   if (!strcmp(argv[2], linkat)) {
 if (linkat(fd, , AT_FDCWD, argv[1], AT_EMPTY_PATH) != 0)
   err(1, linkat);
   } else if (!strcmp(argv[2], proc)) {
 sprintf(buf, /proc/self/fd/%d, fd);
 if (linkat(AT_FDCWD, buf, AT_FDCWD, argv[1], AT_SYMLINK_FOLLOW) != 0)
   err(1, linkat);
   } else {
 errx(1, invalid mode);
   }

   return 0;
}


Changelog from rfc:
 - Addressed the comments from Dave Chinner and Christoph Hellwig.

Zhi Yong Wu (5):
  xfs: factor prid related codes into xfs_get_initial_prid()
  xfs: adjust the interface of xfs_qm_vop_dqalloc()
  xfs: add xfs_create_tmpfile() for O_TMPFILE support
  xfs: add a new method xfs_vn_tmpfile()
  xfs: allow linkat() on O_TMPFILE files

 fs/xfs/xfs_inode.c  |  142 ---
 fs/xfs/xfs_inode.h  |2 +
 fs/xfs/xfs_ioctl.c  |2 +-
 fs/xfs/xfs_iops.c   |   25 -
 fs/xfs/xfs_qm.c |   50 ++--
 fs/xfs/xfs_quota.h  |6 +-
 fs/xfs/xfs_shared.h |4 +-
 fs/xfs/xfs_symlink.c|2 +-
 fs/xfs/xfs_trans_resv.c |   51 +
 fs/xfs/xfs_trans_resv.h |4 +
 10 files changed, 255 insertions(+), 33 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] xfs: factor prid related codes into xfs_get_initial_prid()

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

It will be reused by the O_TMPFILE creation function.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/xfs/xfs_inode.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 001aa89..e8b9a68 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1139,6 +1139,14 @@ xfs_bumplink(
return 0;
 }
 
+static inline prid_t xfs_get_initial_prid(struct xfs_inode *dp)
+{
+   if (dp-i_d.di_flags  XFS_DIFLAG_PROJINHERIT)
+   return xfs_get_projid(dp);
+   else
+   return XFS_PROJID_DEFAULT;
+}
+
 int
 xfs_create(
xfs_inode_t *dp,
@@ -1169,10 +1177,7 @@ xfs_create(
if (XFS_FORCED_SHUTDOWN(mp))
return XFS_ERROR(EIO);
 
-   if (dp-i_d.di_flags  XFS_DIFLAG_PROJINHERIT)
-   prid = xfs_get_projid(dp);
-   else
-   prid = XFS_PROJID_DEFAULT;
+   prid = xfs_get_initial_prid(dp);
 
/*
 * Make sure that we have allocated dquot(s) on disk.
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] xfs: adjust the interface of xfs_qm_vop_dqalloc()

2013-12-13 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

There may be not a parent inode or a name for O_TMPFILE support, but will pass
a struct xfs_mount to xfs_qm_vop_dqalloc(). So its interface need to be
adjusted in order that O_TMPFILE creation function can also use it.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/xfs/xfs_inode.c   |2 +-
 fs/xfs/xfs_ioctl.c   |2 +-
 fs/xfs/xfs_iops.c|3 ++-
 fs/xfs/xfs_qm.c  |   50 +++---
 fs/xfs/xfs_quota.h   |6 --
 fs/xfs/xfs_symlink.c |2 +-
 6 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e8b9a68..71a8186 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1182,7 +1182,7 @@ xfs_create(
/*
 * Make sure that we have allocated dquot(s) on disk.
 */
-   error = xfs_qm_vop_dqalloc(dp, xfs_kuid_to_uid(current_fsuid()),
+   error = xfs_qm_vop_dqalloc(dp, mp, xfs_kuid_to_uid(current_fsuid()),
xfs_kgid_to_gid(current_fsgid()), prid,
XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT,
udqp, gdqp, pdqp);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 33ad9a7..eac84bd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1090,7 +1090,7 @@ xfs_ioctl_setattr(
 * because the i_*dquot fields will get updated anyway.
 */
if (XFS_IS_QUOTA_ON(mp)  (mask  FSX_PROJID)) {
-   code = xfs_qm_vop_dqalloc(ip, ip-i_d.di_uid,
+   code = xfs_qm_vop_dqalloc(ip, ip-i_mount, ip-i_d.di_uid,
 ip-i_d.di_gid, fa-fsx_projid,
 XFS_QMOPT_PQUOTA, udqp, NULL, pdqp);
if (code)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 27e0e54..eb55be5 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -540,7 +540,8 @@ xfs_setattr_nonsize(
 */
ASSERT(udqp == NULL);
ASSERT(gdqp == NULL);
-   error = xfs_qm_vop_dqalloc(ip, xfs_kuid_to_uid(uid),
+   error = xfs_qm_vop_dqalloc(ip, ip-i_mount,
+  xfs_kuid_to_uid(uid),
   xfs_kgid_to_gid(gid),
   xfs_get_projid(ip),
   qflags, udqp, gdqp, NULL);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 14a4996..1f13e82 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1765,6 +1765,7 @@ xfs_qm_write_sb_changes(
 int
 xfs_qm_vop_dqalloc(
struct xfs_inode*ip,
+   struct xfs_mount*mp,
xfs_dqid_t  uid,
xfs_dqid_t  gid,
prid_t  prid,
@@ -1773,7 +1774,6 @@ xfs_qm_vop_dqalloc(
struct xfs_dquot**O_gdqpp,
struct xfs_dquot**O_pdqpp)
 {
-   struct xfs_mount*mp = ip-i_mount;
struct xfs_dquot*uq = NULL;
struct xfs_dquot*gq = NULL;
struct xfs_dquot*pq = NULL;
@@ -1783,17 +1783,19 @@ xfs_qm_vop_dqalloc(
if (!XFS_IS_QUOTA_RUNNING(mp) || !XFS_IS_QUOTA_ON(mp))
return 0;
 
-   lockflags = XFS_ILOCK_EXCL;
-   xfs_ilock(ip, lockflags);
+   if (ip) {
+   lockflags = XFS_ILOCK_EXCL;
+   xfs_ilock(ip, lockflags);
 
-   if ((flags  XFS_QMOPT_INHERIT)  XFS_INHERIT_GID(ip))
-   gid = ip-i_d.di_gid;
+   if ((flags  XFS_QMOPT_INHERIT)  XFS_INHERIT_GID(ip))
+   gid = ip-i_d.di_gid;
+   }
 
/*
 * Attach the dquot(s) to this inode, doing a dquot allocation
 * if necessary. The dquot(s) will not be locked.
 */
-   if (XFS_NOT_DQATTACHED(mp, ip)) {
+   if (ip  XFS_NOT_DQATTACHED(mp, ip)) {
error = xfs_qm_dqattach_locked(ip, XFS_QMOPT_DQALLOC);
if (error) {
xfs_iunlock(ip, lockflags);
@@ -1802,7 +1804,7 @@ xfs_qm_vop_dqalloc(
}
 
if ((flags  XFS_QMOPT_UQUOTA)  XFS_IS_UQUOTA_ON(mp)) {
-   if (ip-i_d.di_uid != uid) {
+   if (ip || (ip-i_d.di_uid != uid)) {
/*
 * What we need is the dquot that has this uid, and
 * if we send the inode to dqget, the uid of the inode
@@ -1812,7 +1814,8 @@ xfs_qm_vop_dqalloc(
 * we'll deadlock by doing trans_reserve while
 * holding ilock.
 */
-   xfs_iunlock(ip, lockflags);
+   if (ip)
+   xfs_iunlock(ip, lockflags);
error = xfs_qm_dqget(mp, NULL, uid,
 XFS_DQ_USER

Re: [PATCH 2/5] xfs: adjust the interface of xfs_qm_vop_dqalloc()

2013-12-13 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 12:32 AM, Christoph Hellwig h...@infradead.org wrote:
 On Fri, Dec 13, 2013 at 10:27:50PM +0800, Zhi Yong Wu wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 There may be not a parent inode or a name for O_TMPFILE support, but will 
 pass
 a struct xfs_mount to xfs_qm_vop_dqalloc(). So its interface need to be
 adjusted in order that O_TMPFILE creation function can also use it.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com

 This patch is not actually needed, as we do get passed a parent.
Discarded, thanks.





-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] xfs: add xfs_create_tmpfile() for O_TMPFILE support

2013-12-13 Thread Zhi Yong Wu
Fixed them, thanks.

On Sat, Dec 14, 2013 at 12:37 AM, Christoph Hellwig h...@infradead.org wrote:
 + error = xfs_dir_ialloc(tp, NULL, mode, 0, rdev,

 please pass the parent inode pointer here.

 + XFS_PROJID_DEFAULT, resblks  0,

 and pass the project id that you inherited from the parent here.




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] xfs: add a new method xfs_vn_tmpfile()

2013-12-13 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 12:39 AM, Christoph Hellwig h...@infradead.org wrote:
 On Fri, Dec 13, 2013 at 10:27:52PM +0800, Zhi Yong Wu wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Add a new O_TMPFILE method to VFS inteface.
 For more info, please refer to:
   http://oss.sgi.com/archives/xfs/2013-08/msg00336.html

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/xfs/xfs_iops.c |   22 ++
  1 files changed, 22 insertions(+), 0 deletions(-)

 diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
 index eb55be5..b57cd89 100644
 --- a/fs/xfs/xfs_iops.c
 +++ b/fs/xfs/xfs_iops.c
 @@ -39,6 +39,7 @@
  #include xfs_da_btree.h
  #include xfs_dir2_priv.h
  #include xfs_dinode.h
 +#include xfs_trans_space.h

  #include linux/capability.h
  #include linux/xattr.h
 @@ -1051,6 +1052,25 @@ xfs_vn_fiemap(
   return 0;
  }

 +STATIC int
 +xfs_vn_tmpfile(
 + struct inode*dir,
 + struct dentry   *dentry,
 + umode_t mode)
 +{
 + struct xfs_inode *ip = NULL;
 + int error;
 +
 + error = xfs_create_tmpfile(XFS_I(dir), XFS_I(dir)-i_mount,

 No need to pass in the mount point here, the client can get it easily.

 + mode, 0, ip);

 Also no need for an always-zero argument.
Fixed, thanks.

 + if (error)
 + return -error;
 +
 + d_instantiate(dentry, VFS_I(ip));

 Shouldn't this be a call to d_tmpfile() instead?
Yes, then it need to be called in xfs_create_tmpfile() just before
xfs_iunlink() is called.

 Also I'd suggest mergin this into the previous patch, so that we have
 one that actually adds O_TMPFILE support, and once place to write a nice
Merged them, thanks.
 good changelog.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] xfs: allow linkat() on O_TMPFILE files

2013-12-13 Thread Zhi Yong Wu
On Sat, Dec 14, 2013 at 12:41 AM, Christoph Hellwig h...@infradead.org wrote:
 On Fri, Dec 13, 2013 at 10:27:53PM +0800, Zhi Yong Wu wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Enable O_TMPFILE support in linkat().
 For more info, please refer to:
   http://oss.sgi.com/archives/xfs/2013-08/msg00341.html

 Generall you should provide all reasonable information in the changelog
 instead of linking to it.
will apply this, thanks.

 + if (sip-i_d.di_nlink == 0)
 + tres = M_RES(mp)-tr_link_tmpfile;
 + else
 + tres = M_RES(mp)-tr_link;

 As mentioned before I think Dave wanted you to always use the same
 reservation, but I'll leave that discussion to him.
If as you said, when some tons of regular files are created, it won't
waste some disk space? e.g. some files want to reserve some space, but
get NOSPACE due to other files reserving additional space?


 +/* For creating a link to an O_TMPFILE inode, except modifying
 + * those metadata for regular inode, we still need to remove an inode
 + * from unlinked list at first. That is,  we can modify:
 + *the agi hash list and counters: sector size
 + *the on disk inode before ours in the agi hash list: inode cluster size
 + */

 We always have an emptry content
Done, thanks.

 /*

 line at the beginning of comments in XFS and the Linux kernel in
 general.




-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 00/11] VFS hot tracking

2013-12-11 Thread Zhi Yong Wu
Ping ^ 7

On Wed, Nov 6, 2013 at 9:45 PM, Zhi Yong Wu  wrote:
> From: Zhi Yong Wu 
>
>   The patchset is trying to introduce hot tracking function in
> VFS layer, which will keep track of real disk I/O in memory.
> By it, you will easily know more details about disk I/O, and
> then detect where disk I/O hot spots are. Also, specific FS
> can take use of it to do accurate defragment, and hot relocation
> support, etc.
>
>   Now it's time to send out its V6 for external review, and
> any comments or ideas are appreciated, thanks.
>
> NOTE:
>
>   The patchset can be obtained via my kernel dev git on github:
> git://github.com/wuzhy/kernel.git hot_tracking
>   If you're interested, you can also review them via
> https://github.com/wuzhy/kernel/commits/hot_tracking
>
>   For how to use and more other info and performance report,
> please check hot_tracking.txt in Documentation and following
> links:
>   1.) http://lwn.net/Articles/525651/
>   2.) https://lkml.org/lkml/2012/12/20/199
>
>   This patchset has been done scalability or performance tests
> by fs_mark, ffsb and compilebench.
>
>   The perf testings were done on Linux 3.12.0-rc7 with Model IBM,8231-E2C
> Big Endian PPC64 with 64 CPUs and 2 NUMA nodes, 250G RAM and 1.50 TiB
> test hard disk where each test file size is 20G or 100G.
> Architecture:  ppc64
> Byte Order:Big Endian
> CPU(s):64
> On-line CPU(s) list:   0-63
> Thread(s) per core:4
> Core(s) per socket:1
> Socket(s): 16
> NUMA node(s):  2
> Model: IBM,8231-E2C
> Hypervisor vendor: pHyp
> Virtualization type:   full
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  4096K
> NUMA node0 CPU(s): 0-31
> NUMA node1 CPU(s): 32-63
>
>   Below is the perf testing report:
>
>   Please focus on the two key points:
>   - The overall overhead which is injected by the patchset
>   - The stability of the perf results
>
> 1. fio tests
>
> w/o hot tracking   w/ 
> hot tracking
>
> RAM size32G  32G 16G   8G 
>   4G   2G  250G
>
> sequential-8k-1jobs-read 61260KB/s60918KB/s60901KB/s
> 62610KB/s60992KB/s60213KB/s60948KB/s
>
> sequential-8k-1jobs-write 1329KB/s 1329KB/s 1328KB/s 
> 1329KB/s 1328KB/s 1329KB/s 1329KB/s
>
> sequential-8k-8jobs-read 91139KB/s92614KB/s90907KB/s
> 89895KB/s92022KB/s90851KB/s91877KB/s
>
> sequential-8k-8jobs-write 2523KB/s 2522KB/s 2516KB/s 
> 2521KB/s 2516KB/s 2518KB/s 2521KB/s
>
> sequential-256k-1jobs-read  151432KB/s   151403KB/s   151406KB/s   
> 151422KB/s   151344KB/s   151446KB/s   151372KB/s
>
> sequential-256k-1jobs-write  33451KB/s33470KB/s33481KB/s
> 33470KB/s33459KB/s33472KB/s33477KB/s
>
> sequential-256k-8jobs-read  235291KB/s   234555KB/s   234251KB/s   
> 233656KB/s   234927KB/s   236380KB/s   235535KB/s
>
> sequential-256k-8jobs-write  62419KB/s62402KB/s62191KB/s
> 62859KB/s62629KB/s62720KB/s62523KB/s
>
> random-io-mix-8k-1jobs  [READ]2929KB/s 2942KB/s 2946KB/s 
> 2929KB/s 2934KB/s 2947KB/s 2946KB/s
> [WRITE]   1262KB/s 1266KB/s 1257KB/s 
> 1262KB/s 1257KB/s 1257KB/s 1265KB/s
>
> random-io-mix-8k-8jobs  [READ]2444KB/s 2442KB/s 2436KB/s 
> 2416KB/s 2353KB/s 2441KB/s 2442KB/s
> [WRITE]   1047KB/s 1044KB/s 1047KB/s 
> 1028KB/s 1017KB/s 1034KB/s 1049KB/s
>
> random-io-mix-8k-16jobs [READ]2182KB/s 2184KB/s 2169KB/s 
> 2178KB/s 2190KB/s 2184KB/s 2180KB/s
> [WRITE]932KB/s  930KB/s  943KB/s  
> 936KB/s  937KB/s  929KB/s  931KB/s
>
> The above perf parameter is the aggregate bandwidth of threads in the group;
> If you hope to know how about other perf parameters, or fio raw results, 
> please let me know, thanks.
>
> 2. Locking stat - Contention & Cacheline Bouncing
>
> RAM size class name con-bounces  contentions  acq-bounces   
> acquisitions   cacheline bouncing  locking contention
>   
>ratio  ratio
>
>   &(>t_lock)->rlock:  15081592 157834  
> 374639292

Re: [PATCH v6 07/11] VFS hot tracking: Add a /proc interface to control memory usage

2013-12-11 Thread Zhi Yong Wu
Ping ^ 7

On Wed, Nov 6, 2013 at 9:45 PM, Zhi Yong Wu  wrote:
> From: Zhi Yong Wu 
>
> Introduce a /proc interface hot-mem-high-thresh and
> to cap the memory which is consumed by hot_inode_item
> and hot_range_item, and they will be in the unit of
> 1M bytes.
>
> Signed-off-by: Chandra Seetharaman 
> Signed-off-by: Zhi Yong Wu 
> ---
>  fs/hot_tracking.c| 29 +
>  fs/hot_tracking.h| 23 +++
>  include/linux/hot_tracking.h |  3 +++
>  kernel/sysctl.c  |  7 +++
>  4 files changed, 62 insertions(+)
>
> diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
> index 7a9bd4f..2c5a7fd 100644
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include "hot_tracking.h"
>
> +int sysctl_hot_mem_high_thresh __read_mostly = 0;
>  int sysctl_hot_update_interval __read_mostly = 150;
>
>  /* kmem_cache pointers for slab caches */
> @@ -32,6 +33,7 @@ static void hot_range_item_init(struct hot_range_item *hr,
> hr->len = 1 << RANGE_BITS;
> hr->hot_inode = he;
> atomic_long_inc(>hot_root->hot_cnt);
> +   hot_mem_limit_add(he->hot_root, sizeof(struct hot_range_item));
>  }
>
>  static void hot_range_item_free_cb(struct rcu_head *head)
> @@ -55,6 +57,7 @@ static void hot_range_item_free(struct kref *kref)
> spin_unlock(>m_lock);
>
> atomic_long_dec(>hot_cnt);
> +   hot_mem_limit_sub(root, sizeof(struct hot_range_item));
> call_rcu(>rcu, hot_range_item_free_cb);
>  }
>
> @@ -103,6 +106,8 @@ redo:
>  * newly allocated item.
>  */
> atomic_long_dec(>hot_root->hot_cnt);
> +   hot_mem_limit_sub(he->hot_root,
> +   sizeof(struct 
> hot_range_item));
> kmem_cache_free(hot_range_item_cachep, 
> hr_new);
> }
> spin_unlock(>i_lock);
> @@ -205,6 +210,7 @@ static void hot_inode_item_init(struct hot_inode_item *he,
> he->hot_root = root;
> spin_lock_init(>i_lock);
> atomic_long_inc(>hot_cnt);
> +   hot_mem_limit_add(root, sizeof(struct hot_inode_item));
>  }
>
>  static void hot_inode_item_free_cb(struct rcu_head *head)
> @@ -226,6 +232,7 @@ static void hot_inode_item_free(struct kref *kref)
> hot_range_tree_free(he);
>
> atomic_long_dec(>hot_root->hot_cnt);
> +   hot_mem_limit_sub(he->hot_root, sizeof(struct hot_inode_item));
> call_rcu(>rcu, hot_inode_item_free_cb);
>  }
>
> @@ -272,6 +279,8 @@ redo:
>  * newly allocated item.
>  */
> atomic_long_dec(>hot_cnt);
> +   hot_mem_limit_sub(root,
> +   sizeof(struct 
> hot_inode_item));
> kmem_cache_free(hot_inode_item_cachep, 
> he_new);
> }
> spin_unlock(>t_lock);
> @@ -534,6 +543,23 @@ static unsigned long hot_item_evict(struct hot_info 
> *root, unsigned long work,
> return freed;
>  }
>
> +static void hot_mem_evict(struct hot_info *root)
> +{
> +   unsigned long sum, thresh;
> +
> +   if (sysctl_hot_mem_high_thresh == 0)
> +   return;
> +
> +   sum = hot_mem_limit_sum(root);
> +   /* Note: sysctl_** is in the unit of 1M bytes */
> +   thresh = sysctl_hot_mem_high_thresh;
> +   thresh *= 1024 * 1024;
> +   if (sum <= thresh)
> +   return;
> +
> +   hot_item_evict(root, sum - thresh, hot_mem_limit_sum);
> +}
> +
>  /*
>   * Every sync period we update temperatures for
>   * each hot inode item and hot range item for aging
> @@ -546,6 +572,8 @@ static void hot_update_worker(struct work_struct *work)
> struct hot_inode_item *he;
> struct rb_node *node;
>
> +   hot_mem_evict(root);
> +
> rcu_read_lock();
> node = root->hot_inode_tree.rb_node;
> while (node) {
> @@ -753,6 +781,7 @@ int hot_track_init(struct super_block *sb)
> goto err;
> }
>
> +   hot_mem_limit_init(root);
> sb->s_hot_root = root;
> sb->s_flags |= MS_HOTTRACK;
>
> diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
> index 6a6971e..4ee0b90 100644
> --- a/fs/hot_tracking.h
> +++ b/fs/hot_track

[PATCH] vfs, eventfd: fix the typo

2013-12-11 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 fs/eventfd.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 35470d9..710bf80 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -45,7 +45,7 @@ struct eventfd_ctx {
  *
  * This function is supposed to be called by the kernel in paths that do not
  * allow sleeping. In this function we allow the counter to reach the 
ULLONG_MAX
- * value, and we signal this as overflow condition by returining a POLLERR
+ * value, and we signal this as overflow condition by returning a POLLERR
  * to poll(2).
  *
  * Returns the amount by which the counter was incrememnted.  This will be less
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] vfs, eventfd: fix the typo

2013-12-11 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 fs/eventfd.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 35470d9..710bf80 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -45,7 +45,7 @@ struct eventfd_ctx {
  *
  * This function is supposed to be called by the kernel in paths that do not
  * allow sleeping. In this function we allow the counter to reach the 
ULLONG_MAX
- * value, and we signal this as overflow condition by returining a POLLERR
+ * value, and we signal this as overflow condition by returning a POLLERR
  * to poll(2).
  *
  * Returns the amount by which the counter was incrememnted.  This will be less
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 07/11] VFS hot tracking: Add a /proc interface to control memory usage

2013-12-11 Thread Zhi Yong Wu
Ping ^ 7

On Wed, Nov 6, 2013 at 9:45 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Introduce a /proc interface hot-mem-high-thresh and
 to cap the memory which is consumed by hot_inode_item
 and hot_range_item, and they will be in the unit of
 1M bytes.

 Signed-off-by: Chandra Seetharaman sekha...@us.ibm.com
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  fs/hot_tracking.c| 29 +
  fs/hot_tracking.h| 23 +++
  include/linux/hot_tracking.h |  3 +++
  kernel/sysctl.c  |  7 +++
  4 files changed, 62 insertions(+)

 diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
 index 7a9bd4f..2c5a7fd 100644
 --- a/fs/hot_tracking.c
 +++ b/fs/hot_tracking.c
 @@ -15,6 +15,7 @@
  #include linux/sched.h
  #include hot_tracking.h

 +int sysctl_hot_mem_high_thresh __read_mostly = 0;
  int sysctl_hot_update_interval __read_mostly = 150;

  /* kmem_cache pointers for slab caches */
 @@ -32,6 +33,7 @@ static void hot_range_item_init(struct hot_range_item *hr,
 hr-len = 1  RANGE_BITS;
 hr-hot_inode = he;
 atomic_long_inc(he-hot_root-hot_cnt);
 +   hot_mem_limit_add(he-hot_root, sizeof(struct hot_range_item));
  }

  static void hot_range_item_free_cb(struct rcu_head *head)
 @@ -55,6 +57,7 @@ static void hot_range_item_free(struct kref *kref)
 spin_unlock(root-m_lock);

 atomic_long_dec(root-hot_cnt);
 +   hot_mem_limit_sub(root, sizeof(struct hot_range_item));
 call_rcu(hr-rcu, hot_range_item_free_cb);
  }

 @@ -103,6 +106,8 @@ redo:
  * newly allocated item.
  */
 atomic_long_dec(he-hot_root-hot_cnt);
 +   hot_mem_limit_sub(he-hot_root,
 +   sizeof(struct 
 hot_range_item));
 kmem_cache_free(hot_range_item_cachep, 
 hr_new);
 }
 spin_unlock(he-i_lock);
 @@ -205,6 +210,7 @@ static void hot_inode_item_init(struct hot_inode_item *he,
 he-hot_root = root;
 spin_lock_init(he-i_lock);
 atomic_long_inc(root-hot_cnt);
 +   hot_mem_limit_add(root, sizeof(struct hot_inode_item));
  }

  static void hot_inode_item_free_cb(struct rcu_head *head)
 @@ -226,6 +232,7 @@ static void hot_inode_item_free(struct kref *kref)
 hot_range_tree_free(he);

 atomic_long_dec(he-hot_root-hot_cnt);
 +   hot_mem_limit_sub(he-hot_root, sizeof(struct hot_inode_item));
 call_rcu(he-rcu, hot_inode_item_free_cb);
  }

 @@ -272,6 +279,8 @@ redo:
  * newly allocated item.
  */
 atomic_long_dec(root-hot_cnt);
 +   hot_mem_limit_sub(root,
 +   sizeof(struct 
 hot_inode_item));
 kmem_cache_free(hot_inode_item_cachep, 
 he_new);
 }
 spin_unlock(root-t_lock);
 @@ -534,6 +543,23 @@ static unsigned long hot_item_evict(struct hot_info 
 *root, unsigned long work,
 return freed;
  }

 +static void hot_mem_evict(struct hot_info *root)
 +{
 +   unsigned long sum, thresh;
 +
 +   if (sysctl_hot_mem_high_thresh == 0)
 +   return;
 +
 +   sum = hot_mem_limit_sum(root);
 +   /* Note: sysctl_** is in the unit of 1M bytes */
 +   thresh = sysctl_hot_mem_high_thresh;
 +   thresh *= 1024 * 1024;
 +   if (sum = thresh)
 +   return;
 +
 +   hot_item_evict(root, sum - thresh, hot_mem_limit_sum);
 +}
 +
  /*
   * Every sync period we update temperatures for
   * each hot inode item and hot range item for aging
 @@ -546,6 +572,8 @@ static void hot_update_worker(struct work_struct *work)
 struct hot_inode_item *he;
 struct rb_node *node;

 +   hot_mem_evict(root);
 +
 rcu_read_lock();
 node = root-hot_inode_tree.rb_node;
 while (node) {
 @@ -753,6 +781,7 @@ int hot_track_init(struct super_block *sb)
 goto err;
 }

 +   hot_mem_limit_init(root);
 sb-s_hot_root = root;
 sb-s_flags |= MS_HOTTRACK;

 diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
 index 6a6971e..4ee0b90 100644
 --- a/fs/hot_tracking.h
 +++ b/fs/hot_tracking.h
 @@ -46,4 +46,27 @@ struct hot_inode_item *hot_inode_item_lookup(struct 
 hot_info *root, u64 ino);
  void hot_inode_item_unlink(struct inode *inode);
  u32 hot_temp_calc(struct hot_freq *freq);

 +/* Memory Tracking Functions. */
 +static inline unsigned long hot_mem_limit_sum(struct hot_info *root)
 +{
 +   return atomic_long_read(root-mem);
 +}
 +
 +static inline void hot_mem_limit_sub(struct hot_info *root,
 +   unsigned long count

Re: [PATCH v6 00/11] VFS hot tracking

2013-12-11 Thread Zhi Yong Wu
Ping ^ 7

On Wed, Nov 6, 2013 at 9:45 PM, Zhi Yong Wu zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

   The patchset is trying to introduce hot tracking function in
 VFS layer, which will keep track of real disk I/O in memory.
 By it, you will easily know more details about disk I/O, and
 then detect where disk I/O hot spots are. Also, specific FS
 can take use of it to do accurate defragment, and hot relocation
 support, etc.

   Now it's time to send out its V6 for external review, and
 any comments or ideas are appreciated, thanks.

 NOTE:

   The patchset can be obtained via my kernel dev git on github:
 git://github.com/wuzhy/kernel.git hot_tracking
   If you're interested, you can also review them via
 https://github.com/wuzhy/kernel/commits/hot_tracking

   For how to use and more other info and performance report,
 please check hot_tracking.txt in Documentation and following
 links:
   1.) http://lwn.net/Articles/525651/
   2.) https://lkml.org/lkml/2012/12/20/199

   This patchset has been done scalability or performance tests
 by fs_mark, ffsb and compilebench.

   The perf testings were done on Linux 3.12.0-rc7 with Model IBM,8231-E2C
 Big Endian PPC64 with 64 CPUs and 2 NUMA nodes, 250G RAM and 1.50 TiB
 test hard disk where each test file size is 20G or 100G.
 Architecture:  ppc64
 Byte Order:Big Endian
 CPU(s):64
 On-line CPU(s) list:   0-63
 Thread(s) per core:4
 Core(s) per socket:1
 Socket(s): 16
 NUMA node(s):  2
 Model: IBM,8231-E2C
 Hypervisor vendor: pHyp
 Virtualization type:   full
 L1d cache: 32K
 L1i cache: 32K
 L2 cache:  256K
 L3 cache:  4096K
 NUMA node0 CPU(s): 0-31
 NUMA node1 CPU(s): 32-63

   Below is the perf testing report:

   Please focus on the two key points:
   - The overall overhead which is injected by the patchset
   - The stability of the perf results

 1. fio tests

 w/o hot tracking   w/ 
 hot tracking

 RAM size32G  32G 16G   8G 
   4G   2G  250G

 sequential-8k-1jobs-read 61260KB/s60918KB/s60901KB/s
 62610KB/s60992KB/s60213KB/s60948KB/s

 sequential-8k-1jobs-write 1329KB/s 1329KB/s 1328KB/s 
 1329KB/s 1328KB/s 1329KB/s 1329KB/s

 sequential-8k-8jobs-read 91139KB/s92614KB/s90907KB/s
 89895KB/s92022KB/s90851KB/s91877KB/s

 sequential-8k-8jobs-write 2523KB/s 2522KB/s 2516KB/s 
 2521KB/s 2516KB/s 2518KB/s 2521KB/s

 sequential-256k-1jobs-read  151432KB/s   151403KB/s   151406KB/s   
 151422KB/s   151344KB/s   151446KB/s   151372KB/s

 sequential-256k-1jobs-write  33451KB/s33470KB/s33481KB/s
 33470KB/s33459KB/s33472KB/s33477KB/s

 sequential-256k-8jobs-read  235291KB/s   234555KB/s   234251KB/s   
 233656KB/s   234927KB/s   236380KB/s   235535KB/s

 sequential-256k-8jobs-write  62419KB/s62402KB/s62191KB/s
 62859KB/s62629KB/s62720KB/s62523KB/s

 random-io-mix-8k-1jobs  [READ]2929KB/s 2942KB/s 2946KB/s 
 2929KB/s 2934KB/s 2947KB/s 2946KB/s
 [WRITE]   1262KB/s 1266KB/s 1257KB/s 
 1262KB/s 1257KB/s 1257KB/s 1265KB/s

 random-io-mix-8k-8jobs  [READ]2444KB/s 2442KB/s 2436KB/s 
 2416KB/s 2353KB/s 2441KB/s 2442KB/s
 [WRITE]   1047KB/s 1044KB/s 1047KB/s 
 1028KB/s 1017KB/s 1034KB/s 1049KB/s

 random-io-mix-8k-16jobs [READ]2182KB/s 2184KB/s 2169KB/s 
 2178KB/s 2190KB/s 2184KB/s 2180KB/s
 [WRITE]932KB/s  930KB/s  943KB/s  
 936KB/s  937KB/s  929KB/s  931KB/s

 The above perf parameter is the aggregate bandwidth of threads in the group;
 If you hope to know how about other perf parameters, or fio raw results, 
 please let me know, thanks.

 2. Locking stat - Contention  Cacheline Bouncing

 RAM size class name con-bounces  contentions  acq-bounces   
 acquisitions   cacheline bouncing  locking contention
   
ratio  ratio

   (root-t_lock)-rlock:  15081592 157834  
 374639292   0.96%  0.00%
 250G  (root-m_lock)-rlock:  14691484 119221   
 43077842   1.23%  0.00%
   (he-i_lock)-rlock:   0   0 101879  
 376755218   0.00%  0.00%

   (root-t_lock)-rlock:  29122985 342575  
 374691186   0.85%  0.00%
 32G   (root-m_lock)-rlock:   188

Re: [PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-10 Thread Zhi Yong Wu
On Wed, Dec 11, 2013 at 11:19 AM, David Miller  wrote:
> From: Zhi Yong Wu 
> Date: Wed, 11 Dec 2013 11:14:04 +0800
>
>> Only one reminder, since David has committed the two patches, you
>> maybe need to take their impact on your patches into account.
>
> I reverted these changes from net-next.
So rapid:), thanks for your reminder.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-10 Thread Zhi Yong Wu
Only one reminder, since David has committed the two patches, you
maybe need to take their impact on your patches into account.

On Wed, Dec 11, 2013 at 3:00 AM, David Miller  wrote:
> From: Vlad Yasevich 
> Date: Tue, 10 Dec 2013 12:18:09 -0500
>
>> On 12/09/2013 08:36 PM, David Miller wrote:
>>> From: Zhi Yong Wu 
>>> Date: Sat,  7 Dec 2013 04:55:00 +0800
>>>
>>>> From: Zhi Yong Wu 
>>>>
>>>> By checking related codes, it is impossible that ret > len or total_len,
>>>> so we should remove some useless codes in both above functions.
>>>>
>>>> Signed-off-by: Zhi Yong Wu 
>>>
>>> Applied.
>>
>> Wait a sec.  We want to be able to return a value bigger then len
>> to trigger a MSG_TRUNC.  Jason has patches for to fix this.  If you
>> apply this, we'll have to re-introduce this code back in.
>>
>> Same goes for patch 1/2.
>
> That's fine, right now the code makes no sense as the condition can
> never be triggered so there is no harm removing the illogical code
> meanwhile.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-10 Thread Zhi Yong Wu
Only one reminder, since David has committed the two patches, you
maybe need to take their impact on your patches into account.

On Wed, Dec 11, 2013 at 3:00 AM, David Miller da...@davemloft.net wrote:
 From: Vlad Yasevich vyasev...@gmail.com
 Date: Tue, 10 Dec 2013 12:18:09 -0500

 On 12/09/2013 08:36 PM, David Miller wrote:
 From: Zhi Yong Wu zwu.ker...@gmail.com
 Date: Sat,  7 Dec 2013 04:55:00 +0800

 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 By checking related codes, it is impossible that ret  len or total_len,
 so we should remove some useless codes in both above functions.

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Applied.

 Wait a sec.  We want to be able to return a value bigger then len
 to trigger a MSG_TRUNC.  Jason has patches for to fix this.  If you
 apply this, we'll have to re-introduce this code back in.

 Same goes for patch 1/2.

 That's fine, right now the code makes no sense as the condition can
 never be triggered so there is no harm removing the illogical code
 meanwhile.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-10 Thread Zhi Yong Wu
On Wed, Dec 11, 2013 at 11:19 AM, David Miller da...@davemloft.net wrote:
 From: Zhi Yong Wu zwu.ker...@gmail.com
 Date: Wed, 11 Dec 2013 11:14:04 +0800

 Only one reminder, since David has committed the two patches, you
 maybe need to take their impact on your patches into account.

 I reverted these changes from net-next.
So rapid:), thanks for your reminder.



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] iproute2 3.12.0 release

2013-12-08 Thread Zhi Yong Wu
HI

The manpage of tc hasn't any info related to "tc action", is there any
plan to add it recently? or do i miss anything?

On Sat, Nov 23, 2013 at 9:20 AM, Stephen Hemminger
 wrote:
> A little late but ready and toasty warm here is iproute2 to go with
> 3.12.0 (aka One Giant Leap for Frogkind).
>
> In addition to the usual build  documentation fixes, this
> version includes support for ipv6 on vxlan and GRE.
> As well as fair queue packet scheduler.
>
> If you have been sitting on changes to iproute2 that are in
> net-next for 3.12 merge window, please submit them now.
>
> Iproute2 package is available at:
>   http://kernel.org/pub/linux/utils/net/iproute2/iproute2-3.12.0.tar.gz
>
> You can download the source from:
>   git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
>
> Stay Warm!
>
> ---
> Andreas Henriksson (1):
>   ss: avoid passing negative numbers to malloc
>
> Christophe Gouault (1):
>   xfrm: enable to set non-wildcard mark 0 on SAs and SPs
>
> Eric Dumazet (3):
>   pkt_sched: fq: Fair Queue packet scheduler
>   tc: support TCA_STATS_RATE_EST64
>   htb: add support for direct_qlen attribute
>
> Fan Du (1):
>   xfrm: use memcpy to suppress gcc phony buffer overflow warning.
>
> Hangbin Liu (1):
>   ipaddrlabel: use uint32_t instead of int32_t
>
> Jamal Hadi Salim (2):
>   tc: introduce simple action
>   action: typo nat fix
>
> Nicolas Dichtel (1):
>   iplink: update available type list
>
> Nigel Kukard (1):
>   Fix tc stats when using -batch mode
>
> Petr Písař (2):
>   iproute2: bridge: document mdb
>   iproute2: bridge: Close file with bridge monitor file
>
> Sami Kerola (1):
>   ip: make -resolve addr to print names rather than addresses
>
> Stefan Tomanek (1):
>   ip rule: add route suppression options
>
> Stephen Hemminger (14):
>   Update kernel headers to net-next for 3.12
>   Update to 3.11 net-next kernel headers
>   nstat: add json output format
>   Update to 3.12-rc1 headers
>   nstat: revise json output
>   ifstat: add json output format
>   lnstat: add json output format
>   lnstat, nstat, ifstat: update man pages
>   tc: add default action to kernel headers
>   ipv6 gre: add entry to ether types
>   Fix handling of qdis without options
>   htb: remove old unused duplicate qdisc name
>   update kernel headers
>   v3.12.0
>
> WANG Cong (1):
>   vxlan: add ipv6 support
>
> x...@mail.ru (2):
>   iproute2: GRE over IPv6 tunnel support.
>   iproute2: ip6gre: update man pages
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] iproute2 3.12.0 release

2013-12-08 Thread Zhi Yong Wu
HI

The manpage of tc hasn't any info related to tc action, is there any
plan to add it recently? or do i miss anything?

On Sat, Nov 23, 2013 at 9:20 AM, Stephen Hemminger
step...@networkplumber.org wrote:
 A little late but ready and toasty warm here is iproute2 to go with
 3.12.0 (aka One Giant Leap for Frogkind).

 In addition to the usual build  documentation fixes, this
 version includes support for ipv6 on vxlan and GRE.
 As well as fair queue packet scheduler.

 If you have been sitting on changes to iproute2 that are in
 net-next for 3.12 merge window, please submit them now.

 Iproute2 package is available at:
   http://kernel.org/pub/linux/utils/net/iproute2/iproute2-3.12.0.tar.gz

 You can download the source from:
   git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

 Stay Warm!

 ---
 Andreas Henriksson (1):
   ss: avoid passing negative numbers to malloc

 Christophe Gouault (1):
   xfrm: enable to set non-wildcard mark 0 on SAs and SPs

 Eric Dumazet (3):
   pkt_sched: fq: Fair Queue packet scheduler
   tc: support TCA_STATS_RATE_EST64
   htb: add support for direct_qlen attribute

 Fan Du (1):
   xfrm: use memcpy to suppress gcc phony buffer overflow warning.

 Hangbin Liu (1):
   ipaddrlabel: use uint32_t instead of int32_t

 Jamal Hadi Salim (2):
   tc: introduce simple action
   action: typo nat fix

 Nicolas Dichtel (1):
   iplink: update available type list

 Nigel Kukard (1):
   Fix tc stats when using -batch mode

 Petr Písař (2):
   iproute2: bridge: document mdb
   iproute2: bridge: Close file with bridge monitor file

 Sami Kerola (1):
   ip: make -resolve addr to print names rather than addresses

 Stefan Tomanek (1):
   ip rule: add route suppression options

 Stephen Hemminger (14):
   Update kernel headers to net-next for 3.12
   Update to 3.11 net-next kernel headers
   nstat: add json output format
   Update to 3.12-rc1 headers
   nstat: revise json output
   ifstat: add json output format
   lnstat: add json output format
   lnstat, nstat, ifstat: update man pages
   tc: add default action to kernel headers
   ipv6 gre: add entry to ether types
   Fix handling of qdis without options
   htb: remove old unused duplicate qdisc name
   update kernel headers
   v3.12.0

 WANG Cong (1):
   vxlan: add ipv6 support

 x...@mail.ru (2):
   iproute2: GRE over IPv6 tunnel support.
   iproute2: ip6gre: update man pages

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

By checking related codes, it is impossible that ret > len or total_len,
so we should remove some useless codes in both above functions.

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/tun.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f9c935a..d61719c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1354,7 +1354,6 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
 
ret = tun_do_read(tun, tfile, iv, len,
  file->f_flags & O_NONBLOCK);
-   ret = min_t(ssize_t, ret, len);
 out:
tun_put(tun);
return ret;
@@ -1453,10 +1452,6 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket 
*sock,
}
ret = tun_do_read(tun, tfile, m->msg_iov, total_len,
  flags & MSG_DONTWAIT);
-   if (ret > total_len) {
-   m->msg_flags |= MSG_TRUNC;
-   ret = flags & MSG_TRUNC ? ret : total_len;
-   }
 out:
tun_put(tun);
return ret;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

By checking related codes, it is impossible that ret > len or total_len,
so we should remove some useless coeds in both above functions.

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 4c6f84c..7f4ccdd 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -871,7 +871,6 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
}
 
ret = macvtap_do_read(q, iv, len, file->f_flags & O_NONBLOCK);
-   ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
 out:
return ret;
 }
@@ -1104,10 +1103,6 @@ static int macvtap_recvmsg(struct kiocb *iocb, struct 
socket *sock,
return -EINVAL;
ret = macvtap_do_read(q, m->msg_iov, total_len,
  flags & MSG_DONTWAIT);
-   if (ret > total_len) {
-   m->msg_flags |= MSG_TRUNC;
-   ret = flags & MSG_TRUNC ? ret : total_len;
-   }
return ret;
 }
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] tun: update file current position

2013-12-06 Thread Zhi Yong Wu
On Sat, Dec 7, 2013 at 1:45 AM, David Miller  wrote:
> From: Zhi Yong Wu 
> Date: Fri,  6 Dec 2013 17:08:50 +0800
>
>> From: Zhi Yong Wu 
>>
>> Signed-off-by: Zhi Yong Wu 
>
> Also applied and queued up for -stable, thanks.
>
> I noticed in these two cases that that min_t() adjustment of 'ret'
> seems strange.  I can't understand why it's needed.
>
> If, for example, tun_do_read() really did read more than 'len'
> bytes:
>
> 1) That would write past the end of the buffer.
>
> 2) Writing a different value to the ->ki_pos would mean
>that ->ki_pos is now inaccurate.
>
> Unless someone can explain why the min_t() is needed, we should remove
> it.
Yes, from my side, it seems to be impossible that ret is bigger than
let or total_len.
So we also remove the branch "if (ret > total_len) {...}" in xxx_rcvmsg().
If you hope to submit the patch for this, please let me know, thanks.


-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 2/4] macvtap: remove the dead branch

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..f599c47 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -588,7 +588,7 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
return 0;
 }
 
-static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
+static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
   struct virtio_net_hdr *vnet_hdr)
 {
memset(vnet_hdr, 0, sizeof(*vnet_hdr));
@@ -619,8 +619,6 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff 
*skb,
} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
vnet_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
} /* else everything is zero */
-
-   return 0;
 }
 
 /* Get packet from user space buffer */
@@ -778,9 +776,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if ((len -= vnet_hdr_len) < 0)
return -EINVAL;
 
-   ret = macvtap_skb_to_vnet_hdr(skb, _hdr);
-   if (ret)
-   return ret;
+   macvtap_skb_to_vnet_hdr(skb, _hdr);
 
if (memcpy_toiovecend(iv, (void *)_hdr, 0, 
sizeof(vnet_hdr)))
return -EFAULT;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 4/4] tun: remove unused parameter in tun_do_read()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/tun.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 782e38b..f9c935a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1289,8 +1289,7 @@ done:
 }
 
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
-  struct kiocb *iocb, const struct iovec *iv,
-  ssize_t len, int noblock)
+  const struct iovec *iv, ssize_t len, int noblock)
 {
DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
@@ -1353,7 +1352,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
goto out;
}
 
-   ret = tun_do_read(tun, tfile, iocb, iv, len,
+   ret = tun_do_read(tun, tfile, iv, len,
  file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len);
 out:
@@ -1452,7 +1451,7 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket 
*sock,
 SOL_PACKET, TUN_TX_TIMESTAMP);
goto out;
}
-   ret = tun_do_read(tun, tfile, iocb, m->msg_iov, total_len,
+   ret = tun_do_read(tun, tfile, m->msg_iov, total_len,
  flags & MSG_DONTWAIT);
if (ret > total_len) {
m->msg_flags |= MSG_TRUNC;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 3/4] macvtap: remove unused parameter in macvtap_do_read()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index f599c47..4c6f84c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -819,7 +819,7 @@ done:
return ret ? ret : copied;
 }
 
-static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
+static ssize_t macvtap_do_read(struct macvtap_queue *q,
   const struct iovec *iv, unsigned long len,
   int noblock)
 {
@@ -870,7 +870,7 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
goto out;
}
 
-   ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK);
+   ret = macvtap_do_read(q, iv, len, file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
 out:
return ret;
@@ -1102,7 +1102,7 @@ static int macvtap_recvmsg(struct kiocb *iocb, struct 
socket *sock,
int ret;
if (flags & ~(MSG_DONTWAIT|MSG_TRUNC))
return -EINVAL;
-   ret = macvtap_do_read(q, iocb, m->msg_iov, total_len,
+   ret = macvtap_do_read(q, m->msg_iov, total_len,
  flags & MSG_DONTWAIT);
if (ret > total_len) {
m->msg_flags |= MSG_TRUNC;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 0/4] net: some cleanups

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Since net-next is open now, it's time to post them out again.

Changelog from v3:
 -combine the change that removes the return value check with
the change which adjusts the function return type to "void". [David Miller]

Zhi Yong Wu (4):
  vhost: remove the dead branch
  macvtap: remove the dead branch
  macvtap: remove unused parameter in macvtap_do_read()
  tun: remove unused parameter in tun_do_read()

 drivers/net/macvtap.c |   14 +-
 drivers/net/tun.c |7 +++
 drivers/vhost/net.c   |9 ++---
 drivers/vhost/scsi.c  |7 +--
 drivers/vhost/test.c  |8 +---
 drivers/vhost/vhost.c |4 +---
 drivers/vhost/vhost.h |2 +-
 7 files changed, 14 insertions(+), 37 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 1/4] vhost: remove the dead branch

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu 
Acked-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c   |9 ++---
 drivers/vhost/scsi.c  |7 +--
 drivers/vhost/test.c  |8 +---
 drivers/vhost/vhost.c |4 +---
 drivers/vhost/vhost.h |2 +-
 5 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831eb4f..9a68409 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -683,7 +683,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
-   int r, i;
+   int i;
 
if (!n)
return -ENOMEM;
@@ -706,12 +706,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
n->vqs[i].vhost_hlen = 0;
n->vqs[i].sock_hlen = 0;
}
-   r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
-   if (r < 0) {
-   kfree(n);
-   kfree(vqs);
-   return r;
-   }
+   vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f175629..1e4c75c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1417,18 +1417,13 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
vqs[i] = >vqs[i].vq;
vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
}
-   r = vhost_dev_init(>dev, vqs, VHOST_SCSI_MAX_VQ);
+   vhost_dev_init(>dev, vqs, VHOST_SCSI_MAX_VQ);
 
tcm_vhost_init_inflight(vs, NULL);
 
-   if (r < 0)
-   goto err_init;
-
f->private_data = vs;
return 0;
 
-err_init:
-   kfree(vqs);
 err_vqs:
vhost_scsi_free(vs);
 err_vs:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 339eae8..c2a54fb 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -104,7 +104,6 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
struct vhost_test *n = kmalloc(sizeof *n, GFP_KERNEL);
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
-   int r;
 
if (!n)
return -ENOMEM;
@@ -117,12 +116,7 @@ static int vhost_test_open(struct inode *inode, struct 
file *f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   r = vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
-   if (r < 0) {
-   kfree(vqs);
-   kfree(n);
-   return r;
-   }
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
 
f->private_data = n;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 69068e0..78987e4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -290,7 +290,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
vhost_vq_free_iovecs(dev->vqs[i]);
 }
 
-long vhost_dev_init(struct vhost_dev *dev,
+void vhost_dev_init(struct vhost_dev *dev,
struct vhost_virtqueue **vqs, int nvqs)
 {
struct vhost_virtqueue *vq;
@@ -319,8 +319,6 @@ long vhost_dev_init(struct vhost_dev *dev,
vhost_poll_init(>poll, vq->handle_kick,
POLLIN, dev);
}
-
-   return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 4465ed5..35eeb2a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -127,7 +127,7 @@ struct vhost_dev {
struct task_struct *worker;
 };
 
-long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int 
nvqs);
+void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int 
nvqs);
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] macvtap: update file current position

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..e6e2dd6 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -876,6 +876,8 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
 
ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
+   if (ret > 0)
+   iocb->ki_pos += ret;
 out:
return ret;
 }
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] tun: update file current position

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/tun.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 782e38b..c8ddbd0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1356,6 +1356,8 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
ret = tun_do_read(tun, tfile, iocb, iv, len,
  file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len);
+   if (ret > 0)
+   iocb->ki_pos += ret;
 out:
tun_put(tun);
return ret;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] tun: update file current position

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/tun.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 782e38b..c8ddbd0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1356,6 +1356,8 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
ret = tun_do_read(tun, tfile, iocb, iv, len,
  file-f_flags  O_NONBLOCK);
ret = min_t(ssize_t, ret, len);
+   if (ret  0)
+   iocb-ki_pos += ret;
 out:
tun_put(tun);
return ret;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] macvtap: update file current position

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/macvtap.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..e6e2dd6 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -876,6 +876,8 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
 
ret = macvtap_do_read(q, iocb, iv, len, file-f_flags  O_NONBLOCK);
ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
+   if (ret  0)
+   iocb-ki_pos += ret;
 out:
return ret;
 }
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 1/4] vhost: remove the dead branch

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
Acked-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   |9 ++---
 drivers/vhost/scsi.c  |7 +--
 drivers/vhost/test.c  |8 +---
 drivers/vhost/vhost.c |4 +---
 drivers/vhost/vhost.h |2 +-
 5 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831eb4f..9a68409 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -683,7 +683,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
-   int r, i;
+   int i;
 
if (!n)
return -ENOMEM;
@@ -706,12 +706,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
n-vqs[i].vhost_hlen = 0;
n-vqs[i].sock_hlen = 0;
}
-   r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
-   if (r  0) {
-   kfree(n);
-   kfree(vqs);
-   return r;
-   }
+   vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f175629..1e4c75c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1417,18 +1417,13 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
vqs[i] = vs-vqs[i].vq;
vs-vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
}
-   r = vhost_dev_init(vs-dev, vqs, VHOST_SCSI_MAX_VQ);
+   vhost_dev_init(vs-dev, vqs, VHOST_SCSI_MAX_VQ);
 
tcm_vhost_init_inflight(vs, NULL);
 
-   if (r  0)
-   goto err_init;
-
f-private_data = vs;
return 0;
 
-err_init:
-   kfree(vqs);
 err_vqs:
vhost_scsi_free(vs);
 err_vs:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 339eae8..c2a54fb 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -104,7 +104,6 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
struct vhost_test *n = kmalloc(sizeof *n, GFP_KERNEL);
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
-   int r;
 
if (!n)
return -ENOMEM;
@@ -117,12 +116,7 @@ static int vhost_test_open(struct inode *inode, struct 
file *f)
dev = n-dev;
vqs[VHOST_TEST_VQ] = n-vqs[VHOST_TEST_VQ];
n-vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   r = vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
-   if (r  0) {
-   kfree(vqs);
-   kfree(n);
-   return r;
-   }
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
 
f-private_data = n;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 69068e0..78987e4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -290,7 +290,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
vhost_vq_free_iovecs(dev-vqs[i]);
 }
 
-long vhost_dev_init(struct vhost_dev *dev,
+void vhost_dev_init(struct vhost_dev *dev,
struct vhost_virtqueue **vqs, int nvqs)
 {
struct vhost_virtqueue *vq;
@@ -319,8 +319,6 @@ long vhost_dev_init(struct vhost_dev *dev,
vhost_poll_init(vq-poll, vq-handle_kick,
POLLIN, dev);
}
-
-   return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 4465ed5..35eeb2a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -127,7 +127,7 @@ struct vhost_dev {
struct task_struct *worker;
 };
 
-long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int 
nvqs);
+void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int 
nvqs);
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 0/4] net: some cleanups

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Since net-next is open now, it's time to post them out again.

Changelog from v3:
 -combine the change that removes the return value check with
the change which adjusts the function return type to void. [David Miller]

Zhi Yong Wu (4):
  vhost: remove the dead branch
  macvtap: remove the dead branch
  macvtap: remove unused parameter in macvtap_do_read()
  tun: remove unused parameter in tun_do_read()

 drivers/net/macvtap.c |   14 +-
 drivers/net/tun.c |7 +++
 drivers/vhost/net.c   |9 ++---
 drivers/vhost/scsi.c  |7 +--
 drivers/vhost/test.c  |8 +---
 drivers/vhost/vhost.c |4 +---
 drivers/vhost/vhost.h |2 +-
 7 files changed, 14 insertions(+), 37 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 4/4] tun: remove unused parameter in tun_do_read()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/tun.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 782e38b..f9c935a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1289,8 +1289,7 @@ done:
 }
 
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
-  struct kiocb *iocb, const struct iovec *iv,
-  ssize_t len, int noblock)
+  const struct iovec *iv, ssize_t len, int noblock)
 {
DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
@@ -1353,7 +1352,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
goto out;
}
 
-   ret = tun_do_read(tun, tfile, iocb, iv, len,
+   ret = tun_do_read(tun, tfile, iv, len,
  file-f_flags  O_NONBLOCK);
ret = min_t(ssize_t, ret, len);
 out:
@@ -1452,7 +1451,7 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket 
*sock,
 SOL_PACKET, TUN_TX_TIMESTAMP);
goto out;
}
-   ret = tun_do_read(tun, tfile, iocb, m-msg_iov, total_len,
+   ret = tun_do_read(tun, tfile, m-msg_iov, total_len,
  flags  MSG_DONTWAIT);
if (ret  total_len) {
m-msg_flags |= MSG_TRUNC;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 3/4] macvtap: remove unused parameter in macvtap_do_read()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/macvtap.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index f599c47..4c6f84c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -819,7 +819,7 @@ done:
return ret ? ret : copied;
 }
 
-static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
+static ssize_t macvtap_do_read(struct macvtap_queue *q,
   const struct iovec *iv, unsigned long len,
   int noblock)
 {
@@ -870,7 +870,7 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
goto out;
}
 
-   ret = macvtap_do_read(q, iocb, iv, len, file-f_flags  O_NONBLOCK);
+   ret = macvtap_do_read(q, iv, len, file-f_flags  O_NONBLOCK);
ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
 out:
return ret;
@@ -1102,7 +1102,7 @@ static int macvtap_recvmsg(struct kiocb *iocb, struct 
socket *sock,
int ret;
if (flags  ~(MSG_DONTWAIT|MSG_TRUNC))
return -EINVAL;
-   ret = macvtap_do_read(q, iocb, m-msg_iov, total_len,
+   ret = macvtap_do_read(q, m-msg_iov, total_len,
  flags  MSG_DONTWAIT);
if (ret  total_len) {
m-msg_flags |= MSG_TRUNC;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 net-next 2/4] macvtap: remove the dead branch

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/macvtap.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..f599c47 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -588,7 +588,7 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
return 0;
 }
 
-static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
+static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
   struct virtio_net_hdr *vnet_hdr)
 {
memset(vnet_hdr, 0, sizeof(*vnet_hdr));
@@ -619,8 +619,6 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff 
*skb,
} else if (skb-ip_summed == CHECKSUM_UNNECESSARY) {
vnet_hdr-flags = VIRTIO_NET_HDR_F_DATA_VALID;
} /* else everything is zero */
-
-   return 0;
 }
 
 /* Get packet from user space buffer */
@@ -778,9 +776,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if ((len -= vnet_hdr_len)  0)
return -EINVAL;
 
-   ret = macvtap_skb_to_vnet_hdr(skb, vnet_hdr);
-   if (ret)
-   return ret;
+   macvtap_skb_to_vnet_hdr(skb, vnet_hdr);
 
if (memcpy_toiovecend(iv, (void *)vnet_hdr, 0, 
sizeof(vnet_hdr)))
return -EFAULT;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] tun: update file current position

2013-12-06 Thread Zhi Yong Wu
On Sat, Dec 7, 2013 at 1:45 AM, David Miller da...@davemloft.net wrote:
 From: Zhi Yong Wu zwu.ker...@gmail.com
 Date: Fri,  6 Dec 2013 17:08:50 +0800

 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Also applied and queued up for -stable, thanks.

 I noticed in these two cases that that min_t() adjustment of 'ret'
 seems strange.  I can't understand why it's needed.

 If, for example, tun_do_read() really did read more than 'len'
 bytes:

 1) That would write past the end of the buffer.

 2) Writing a different value to the -ki_pos would mean
that -ki_pos is now inaccurate.

 Unless someone can explain why the min_t() is needed, we should remove
 it.
Yes, from my side, it seems to be impossible that ret is bigger than
let or total_len.
So we also remove the branch if (ret  total_len) {...} in xxx_rcvmsg().
If you hope to submit the patch for this, please let me know, thanks.


-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

By checking related codes, it is impossible that ret  len or total_len,
so we should remove some useless coeds in both above functions.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/macvtap.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 4c6f84c..7f4ccdd 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -871,7 +871,6 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
}
 
ret = macvtap_do_read(q, iv, len, file-f_flags  O_NONBLOCK);
-   ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
 out:
return ret;
 }
@@ -1104,10 +1103,6 @@ static int macvtap_recvmsg(struct kiocb *iocb, struct 
socket *sock,
return -EINVAL;
ret = macvtap_do_read(q, m-msg_iov, total_len,
  flags  MSG_DONTWAIT);
-   if (ret  total_len) {
-   m-msg_flags |= MSG_TRUNC;
-   ret = flags  MSG_TRUNC ? ret : total_len;
-   }
return ret;
 }
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()

2013-12-06 Thread Zhi Yong Wu
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

By checking related codes, it is impossible that ret  len or total_len,
so we should remove some useless codes in both above functions.

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 drivers/net/tun.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f9c935a..d61719c 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1354,7 +1354,6 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
 
ret = tun_do_read(tun, tfile, iv, len,
  file-f_flags  O_NONBLOCK);
-   ret = min_t(ssize_t, ret, len);
 out:
tun_put(tun);
return ret;
@@ -1453,10 +1452,6 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket 
*sock,
}
ret = tun_do_read(tun, tfile, m-msg_iov, total_len,
  flags  MSG_DONTWAIT);
-   if (ret  total_len) {
-   m-msg_flags |= MSG_TRUNC;
-   ret = flags  MSG_TRUNC ? ret : total_len;
-   }
 out:
tun_put(tun);
return ret;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 3/6] macvtap: remove the dead branch

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..d271fb4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -779,8 +779,6 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
return -EINVAL;
 
ret = macvtap_skb_to_vnet_hdr(skb, _hdr);
-   if (ret)
-   return ret;
 
if (memcpy_toiovecend(iv, (void *)_hdr, 0, 
sizeof(vnet_hdr)))
return -EFAULT;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 2/6] vhost: adjust vhost_dev_init() to be void

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
Acked-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c   |4 ++--
 drivers/vhost/scsi.c  |2 +-
 drivers/vhost/test.c  |3 +--
 drivers/vhost/vhost.c |4 +---
 drivers/vhost/vhost.h |2 +-
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 0554785..9a68409 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -683,7 +683,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
-   int r, i;
+   int i;
 
if (!n)
return -ENOMEM;
@@ -706,7 +706,7 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
n->vqs[i].vhost_hlen = 0;
n->vqs[i].sock_hlen = 0;
}
-   r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
+   vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 3164680..1e4c75c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1417,7 +1417,7 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
vqs[i] = >vqs[i].vq;
vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick;
}
-   r = vhost_dev_init(>dev, vqs, VHOST_SCSI_MAX_VQ);
+   vhost_dev_init(>dev, vqs, VHOST_SCSI_MAX_VQ);
 
tcm_vhost_init_inflight(vs, NULL);
 
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 99cb960..c2a54fb 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -104,7 +104,6 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
struct vhost_test *n = kmalloc(sizeof *n, GFP_KERNEL);
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
-   int r;
 
if (!n)
return -ENOMEM;
@@ -117,7 +116,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   r = vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
 
f->private_data = n;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 69068e0..78987e4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -290,7 +290,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
vhost_vq_free_iovecs(dev->vqs[i]);
 }
 
-long vhost_dev_init(struct vhost_dev *dev,
+void vhost_dev_init(struct vhost_dev *dev,
struct vhost_virtqueue **vqs, int nvqs)
 {
struct vhost_virtqueue *vq;
@@ -319,8 +319,6 @@ long vhost_dev_init(struct vhost_dev *dev,
vhost_poll_init(>poll, vq->handle_kick,
POLLIN, dev);
}
-
-   return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 4465ed5..35eeb2a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -127,7 +127,7 @@ struct vhost_dev {
struct task_struct *worker;
 };
 
-long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int 
nvqs);
+void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int 
nvqs);
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 5/6] macvtap: remove unused parameter in macvtap_do_read()

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index f599c47..4c6f84c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -819,7 +819,7 @@ done:
return ret ? ret : copied;
 }
 
-static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
+static ssize_t macvtap_do_read(struct macvtap_queue *q,
   const struct iovec *iv, unsigned long len,
   int noblock)
 {
@@ -870,7 +870,7 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
goto out;
}
 
-   ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK);
+   ret = macvtap_do_read(q, iv, len, file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
 out:
return ret;
@@ -1102,7 +1102,7 @@ static int macvtap_recvmsg(struct kiocb *iocb, struct 
socket *sock,
int ret;
if (flags & ~(MSG_DONTWAIT|MSG_TRUNC))
return -EINVAL;
-   ret = macvtap_do_read(q, iocb, m->msg_iov, total_len,
+   ret = macvtap_do_read(q, m->msg_iov, total_len,
  flags & MSG_DONTWAIT);
if (ret > total_len) {
m->msg_flags |= MSG_TRUNC;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 4/6] macvtap: adjust macvtap_skb_to_vnet_hdr() to be void

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index d271fb4..f599c47 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -588,7 +588,7 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
return 0;
 }
 
-static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
+static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
   struct virtio_net_hdr *vnet_hdr)
 {
memset(vnet_hdr, 0, sizeof(*vnet_hdr));
@@ -619,8 +619,6 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff 
*skb,
} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
vnet_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
} /* else everything is zero */
-
-   return 0;
 }
 
 /* Get packet from user space buffer */
@@ -778,7 +776,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if ((len -= vnet_hdr_len) < 0)
return -EINVAL;
 
-   ret = macvtap_skb_to_vnet_hdr(skb, _hdr);
+   macvtap_skb_to_vnet_hdr(skb, _hdr);
 
if (memcpy_toiovecend(iv, (void *)_hdr, 0, 
sizeof(vnet_hdr)))
return -EFAULT;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 6/6] tun: remove unused parameter in tun_do_read()

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/tun.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 782e38b..f9c935a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1289,8 +1289,7 @@ done:
 }
 
 static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
-  struct kiocb *iocb, const struct iovec *iv,
-  ssize_t len, int noblock)
+  const struct iovec *iv, ssize_t len, int noblock)
 {
DECLARE_WAITQUEUE(wait, current);
struct sk_buff *skb;
@@ -1353,7 +1352,7 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
goto out;
}
 
-   ret = tun_do_read(tun, tfile, iocb, iv, len,
+   ret = tun_do_read(tun, tfile, iv, len,
  file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len);
 out:
@@ -1452,7 +1451,7 @@ static int tun_recvmsg(struct kiocb *iocb, struct socket 
*sock,
 SOL_PACKET, TUN_TX_TIMESTAMP);
goto out;
}
-   ret = tun_do_read(tun, tfile, iocb, m->msg_iov, total_len,
+   ret = tun_do_read(tun, tfile, m->msg_iov, total_len,
  flags & MSG_DONTWAIT);
if (ret > total_len) {
m->msg_flags |= MSG_TRUNC;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 0/6] net: some cleanups

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Since net-next is open now, it's time to post them again.

Zhi Yong Wu (6):
  vhost: remove the dead branch
  vhost: adjust vhost_dev_init() to be void
  macvtap: remove the dead branch
  macvtap: adjust macvtap_skb_to_vnet_hdr() to be void
  macvtap: remove unused parameter in macvtap_do_read()
  tun: remove unused parameter in tun_do_read()

 drivers/net/macvtap.c |   14 +-
 drivers/net/tun.c |7 +++
 drivers/vhost/net.c   |9 ++---
 drivers/vhost/scsi.c  |7 +--
 drivers/vhost/test.c  |8 +---
 drivers/vhost/vhost.c |4 +---
 drivers/vhost/vhost.h |2 +-
 7 files changed, 14 insertions(+), 37 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCH v3 1/6] vhost: remove the dead branch

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu 
Acked-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c  |5 -
 drivers/vhost/scsi.c |5 -
 drivers/vhost/test.c |5 -
 3 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831eb4f..0554785 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -707,11 +707,6 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
n->vqs[i].sock_hlen = 0;
}
r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
-   if (r < 0) {
-   kfree(n);
-   kfree(vqs);
-   return r;
-   }
 
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f175629..3164680 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1421,14 +1421,9 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
 
tcm_vhost_init_inflight(vs, NULL);
 
-   if (r < 0)
-   goto err_init;
-
f->private_data = vs;
return 0;
 
-err_init:
-   kfree(vqs);
 err_vqs:
vhost_scsi_free(vs);
 err_vs:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 339eae8..99cb960 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -118,11 +118,6 @@ static int vhost_test_open(struct inode *inode, struct 
file *f)
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
r = vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
-   if (r < 0) {
-   kfree(vqs);
-   kfree(n);
-   return r;
-   }
 
f->private_data = n;
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] macvtap: update file current position

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..957cc5c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -876,6 +876,8 @@ static ssize_t macvtap_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
 
ret = macvtap_do_read(q, iocb, iv, len, file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len); /* XXX copied from tun.c. Why? */
+   if (ret > 0)
+   iocb->ki_pos = ret;
 out:
return ret;
 }
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] tun: update file current position

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/tun.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 782e38b..e26cbea 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1356,6 +1356,8 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const 
struct iovec *iv,
ret = tun_do_read(tun, tfile, iocb, iv, len,
  file->f_flags & O_NONBLOCK);
ret = min_t(ssize_t, ret, len);
+   if (ret > 0)
+   iocb->ki_pos = ret;
 out:
tun_put(tun);
return ret;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [net-next PATCH 3/6] macvtap: remove the dead branch

2013-12-05 Thread Zhi Yong Wu
On Fri, Dec 6, 2013 at 2:08 PM, Guenter Roeck  wrote:
> On 12/05/2013 02:28 PM, Zhi Yong Wu wrote:
>>
>> From: Zhi Yong Wu 
>>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>   drivers/net/macvtap.c |2 --
>>   1 files changed, 0 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 9093004..d271fb4 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -779,8 +779,6 @@ static ssize_t macvtap_put_user(struct macvtap_queue
>> *q,
>> return -EINVAL;
>>
>> ret = macvtap_skb_to_vnet_hdr(skb, _hdr);
>> -   if (ret)
>> -   return ret;
>>
> Assigning the function's return value to ret just to ignore it seems odd.
>
> Might make sense to change the function type to void.
Yes,  this is done in the next patch of this series.

>
> Guenter
>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCHv2 3/8] macvtap: remove the dead branch

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Signed-off-by: Zhi Yong Wu 
---
 drivers/net/macvtap.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 9093004..d271fb4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -779,8 +779,6 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
return -EINVAL;
 
ret = macvtap_skb_to_vnet_hdr(skb, _hdr);
-   if (ret)
-   return ret;
 
if (memcpy_toiovecend(iv, (void *)_hdr, 0, 
sizeof(vnet_hdr)))
return -EFAULT;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[net-next PATCHv2 1/8] vhost: remove the dead branch

2013-12-05 Thread Zhi Yong Wu
From: Zhi Yong Wu 

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu 
Acked-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c  |5 -
 drivers/vhost/scsi.c |5 -
 drivers/vhost/test.c |5 -
 3 files changed, 0 insertions(+), 15 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 831eb4f..0554785 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -707,11 +707,6 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
n->vqs[i].sock_hlen = 0;
}
r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
-   if (r < 0) {
-   kfree(n);
-   kfree(vqs);
-   return r;
-   }
 
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f175629..3164680 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1421,14 +1421,9 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
 
tcm_vhost_init_inflight(vs, NULL);
 
-   if (r < 0)
-   goto err_init;
-
f->private_data = vs;
return 0;
 
-err_init:
-   kfree(vqs);
 err_vqs:
vhost_scsi_free(vs);
 err_vs:
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 339eae8..99cb960 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -118,11 +118,6 @@ static int vhost_test_open(struct inode *inode, struct 
file *f)
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
r = vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX);
-   if (r < 0) {
-   kfree(vqs);
-   kfree(n);
-   return r;
-   }
 
f->private_data = n;
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >