Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-26 Thread Tejun Heo
Hello, again.

On Wed, Sep 26, 2012 at 09:49:58AM -0700, Tejun Heo wrote:
> > The simple version of flush_workqueue() which I sent yesterday is "chained",
> > because it forces overflow flushers wait for free color and forces only one
> > flusher for one color.
> >
> > Since "not chaining" is important/essential. I sent a new draft implement 
> > today.
> > it uses multiple queues, one for each color(like SRCU).
> > this version is also simple, it remove 90 LOC.
> 
> I'll review your patch but the current implementation is enough on
> that regard.  I was trying to advise against going for two-color
> scheme.

So, I like it.  Two things.

* I think we still want to optimize noop case.  IOW, please keep the
  early exit path for cases where there's no work item to flush.

* Can you please redo the patch series so that it doesn't go too much
  roundabout?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-26 Thread Tejun Heo
Hello, Lai.

On Wed, Sep 26, 2012 at 12:48:59PM +0800, Lai Jiangshan wrote:
> > Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
> > code path and I don't think grabbing mutex twice is too big a deal.  I
> > haven't actually reviewed the code but if it can be much simpler and
> > thus easier to understand and verify, I might go for that.
> 
> I updated it. it is attached, it forces flush_workqueue() to grab
> mutex twice(no other forcing).  overflow queue is implemented in a
> different way. This new algorithm may become our choice likely,
> please review this one.

Will do shortly.

> I did not know this history, thank you.
> 
> But the number of colors is not essential.
> "Does the algorithm chain flushers" is essential.
> 
> If we can have multiple flushers for each color. It is not chained.
> If we have only one flusher for one color. It is chained. Even we
> have multiple color, it is still partially chained(image we have
> very high frequent flush_workqueue()).

If you have very few colors, you can end up merging flushes of a lot
of work items which in turn delays the next flush and thus merging
more of them.  This was what Linus was worried about.

> The initial implementation of flush_workqueue() is "chained" algorithm.

I don't know what you mean by "chained" here.  The current mainline
implementation has enough colors for most use cases and don't assign a
color to single work item.  It's specifically designed to avoid
chained latencies.

> The initial implementation of SRCU is also "chained" algorithm.
> but the current SRCU which was implemented by me is not "chained"
> (I don't propose to use SRCU for flush_workqueue(), I just discuss it)

So, you lost me.  The current implementation doesn't have a problem on
that front.

> The simple version of flush_workqueue() which I sent yesterday is "chained",
> because it forces overflow flushers wait for free color and forces only one
> flusher for one color.
>
> Since "not chaining" is important/essential. I sent a new draft implement 
> today.
> it uses multiple queues, one for each color(like SRCU).
> this version is also simple, it remove 90 LOC.

I'll review your patch but the current implementation is enough on
that regard.  I was trying to advise against going for two-color
scheme.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-26 Thread Tejun Heo
Hello, Lai.

On Wed, Sep 26, 2012 at 12:48:59PM +0800, Lai Jiangshan wrote:
  Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
  code path and I don't think grabbing mutex twice is too big a deal.  I
  haven't actually reviewed the code but if it can be much simpler and
  thus easier to understand and verify, I might go for that.
 
 I updated it. it is attached, it forces flush_workqueue() to grab
 mutex twice(no other forcing).  overflow queue is implemented in a
 different way. This new algorithm may become our choice likely,
 please review this one.

Will do shortly.

 I did not know this history, thank you.
 
 But the number of colors is not essential.
 Does the algorithm chain flushers is essential.
 
 If we can have multiple flushers for each color. It is not chained.
 If we have only one flusher for one color. It is chained. Even we
 have multiple color, it is still partially chained(image we have
 very high frequent flush_workqueue()).

If you have very few colors, you can end up merging flushes of a lot
of work items which in turn delays the next flush and thus merging
more of them.  This was what Linus was worried about.

 The initial implementation of flush_workqueue() is chained algorithm.

I don't know what you mean by chained here.  The current mainline
implementation has enough colors for most use cases and don't assign a
color to single work item.  It's specifically designed to avoid
chained latencies.

 The initial implementation of SRCU is also chained algorithm.
 but the current SRCU which was implemented by me is not chained
 (I don't propose to use SRCU for flush_workqueue(), I just discuss it)

So, you lost me.  The current implementation doesn't have a problem on
that front.

 The simple version of flush_workqueue() which I sent yesterday is chained,
 because it forces overflow flushers wait for free color and forces only one
 flusher for one color.

 Since not chaining is important/essential. I sent a new draft implement 
 today.
 it uses multiple queues, one for each color(like SRCU).
 this version is also simple, it remove 90 LOC.

I'll review your patch but the current implementation is enough on
that regard.  I was trying to advise against going for two-color
scheme.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-26 Thread Tejun Heo
Hello, again.

On Wed, Sep 26, 2012 at 09:49:58AM -0700, Tejun Heo wrote:
  The simple version of flush_workqueue() which I sent yesterday is chained,
  because it forces overflow flushers wait for free color and forces only one
  flusher for one color.
 
  Since not chaining is important/essential. I sent a new draft implement 
  today.
  it uses multiple queues, one for each color(like SRCU).
  this version is also simple, it remove 90 LOC.
 
 I'll review your patch but the current implementation is enough on
 that regard.  I was trying to advise against going for two-color
 scheme.

So, I like it.  Two things.

* I think we still want to optimize noop case.  IOW, please keep the
  early exit path for cases where there's no work item to flush.

* Can you please redo the patch series so that it doesn't go too much
  roundabout?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Lai Jiangshan
On 09/26/2012 04:24 AM, Tejun Heo wrote:
> Hello, Lai.
> 
> On Tue, Sep 25, 2012 at 05:02:43PM +0800, Lai Jiangshan wrote:
>> It is not possible to remove cascading. If cascading code is
>> not in flush_workqueue(), it must be in some where else.
> 
> Yeah, sure, I liked that it didn't have to be done explicitly as a
> separate step.
> 
>> If you force overflow to wait for freed color before do flush(which also
>> force only one flusher for one color), and force the sole flush_workqueue()
>> to grab ->flush_mutex twice, we can simplify the flush_workqueue().
>> (see the attached patch, it remove 100 LOC, and the cascading code becomes
>> only 3 LOC). But these two forcing slow down the caller a little.
> 
> Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
> code path and I don't think grabbing mutex twice is too big a deal.  I
> haven't actually reviewed the code but if it can be much simpler and
> thus easier to understand and verify, I might go for that.

I updated it. it is attached, it forces flush_workqueue() to grab mutex 
twice(no other forcing).
overflow queue is implemented in a different way. This new algorithm may become 
our choice
likely, please review this one.

> 
>> (And if you allow to use SRCU(which is only TWO colors), you can remove 
>> another
>> 150 LOC. flush_workqueue() will become single line. But it will add some 
>> more overhead
>> in flush_workqueue() because SRCU's readsite is lockless)
> 
> I'm not really following how SRCU would factor into this but
> supporting multiple colors was something explicitly requested by
> Linus.  The initial implementation was a lot simpler which supported
> only two colors.  Linus was worried that the high possibility of
> flusher clustering could lead to chaining of latencies.
> 

I did not know this history, thank you.

But the number of colors is not essential.
"Does the algorithm chain flushers" is essential.

If we can have multiple flushers for each color. It is not chained.
If we have only one flusher for one color. It is chained. Even we have multiple
color, it is still partially chained(image we have very high frequent 
flush_workqueue()).

The initial implementation of flush_workqueue() is "chained" algorithm.
The initial implementation of SRCU is also "chained" algorithm.
but the current SRCU which was implemented by me is not "chained"
(I don't propose to use SRCU for flush_workqueue(), I just discuss it)

The simple version of flush_workqueue() which I sent yesterday is "chained",
because it forces overflow flushers wait for free color and forces only one
flusher for one color.

Since "not chaining" is important/essential. I sent a new draft implement today.
it uses multiple queues, one for each color(like SRCU).
this version is also simple, it remove 90 LOC.

Thanks,
Lai

This patch is still applied on top of patch7. it replaces patch8~10

 workqueue.c |  152 diff --git 
a/kernel/workqueue.c b/kernel/workqueue.c
index be407e1..00f02ba 100644

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -208,7 +208,6 @@ struct cpu_workqueue_struct {
  */
 struct wq_flusher {
struct list_headlist;   /* F: list of flushers */
-   int flush_color;/* F: flush color waiting for */
struct completion   done;   /* flush completion */
 };
 
@@ -250,9 +249,7 @@ struct workqueue_struct {
int work_color; /* F: current work color */
int flush_color;/* F: current flush color */
atomic_tnr_cwqs_to_flush[WORK_NR_COLORS];
-   struct wq_flusher   *first_flusher; /* F: first flusher */
-   struct list_headflusher_queue;  /* F: flush waiters */
-   struct list_headflusher_overflow; /* F: flush overflow list */
+   struct list_headflusher[WORK_NR_COLORS]; /* F: flushers */
 
mayday_mask_t   mayday_mask;/* cpus requesting rescue */
struct worker   *rescuer;   /* I: rescue worker */
@@ -1000,8 +997,11 @@ static void wq_dec_flusher_ref(struct workqueue_struct 
*wq, int color)
 * If this was the last reference, wake up the first flusher.
 * It will handle the rest.
 */
-   if (atomic_dec_and_test(>nr_cwqs_to_flush[color]))
-   complete(>first_flusher->done);
+   if (atomic_dec_and_test(>nr_cwqs_to_flush[color])) {
+   BUG_ON(color != wq->flush_color);
+   complete(_first_entry(>flusher[color],
+  struct wq_flusher, list)->done);
+   }
 }
 
 /**
@@ -2540,27 +2540,20 @@ static void insert_wq_barrier(struct 
cpu_workqueue_struct *cwq,
  * becomes new flush_color and work_color is advanced by one.
  * All cwq's work_color are set to new work_color(advanced by one).
  *
- * The caller should have initialized @wq->first_flusher prior to
- * calling this function.
- *
 

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Tejun Heo
Hello, Lai.

On Tue, Sep 25, 2012 at 05:02:43PM +0800, Lai Jiangshan wrote:
> It is not possible to remove cascading. If cascading code is
> not in flush_workqueue(), it must be in some where else.

Yeah, sure, I liked that it didn't have to be done explicitly as a
separate step.

> If you force overflow to wait for freed color before do flush(which also
> force only one flusher for one color), and force the sole flush_workqueue()
> to grab ->flush_mutex twice, we can simplify the flush_workqueue().
> (see the attached patch, it remove 100 LOC, and the cascading code becomes
> only 3 LOC). But these two forcing slow down the caller a little.

Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
code path and I don't think grabbing mutex twice is too big a deal.  I
haven't actually reviewed the code but if it can be much simpler and
thus easier to understand and verify, I might go for that.

> (And if you allow to use SRCU(which is only TWO colors), you can remove 
> another
> 150 LOC. flush_workqueue() will become single line. But it will add some more 
> overhead
> in flush_workqueue() because SRCU's readsite is lockless)

I'm not really following how SRCU would factor into this but
supporting multiple colors was something explicitly requested by
Linus.  The initial implementation was a lot simpler which supported
only two colors.  Linus was worried that the high possibility of
flusher clustering could lead to chaining of latencies.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Tejun Heo
Hello, Lai.

On Tue, Sep 25, 2012 at 05:02:31PM +0800, Lai Jiangshan wrote:
> I found the flush_workqueue() is not nature for me, especially

I don't think it's natural for anybody.  I'm not a big fan of that
code either.

> the usage of the colors and flush_workqueue_prep_cwqs().
> so I try to improve it without change too much things/behavior.
> 
> (These patchset delay other simple patches, I think I should
> send simple patches at first.)

Yes, please do so.

> I always check the code by hard reviewing the code. I always try to image
> there are many thread run in my brain orderless and I write all possible
> transitions in paper. This progress is the most important, no test can
> replace it.
> 
> Human brain can wrong, the attached patch is my testing code.
> It verify flush_workqueue() by cookie number.

Sure, nothing beats careful reviews but then again it's difficult to
have any level of confidence without actually excercising and
verifying each possible code path.  For tricky changes, it helps a lot
if you describe how the code was verified and why and how much you
feel confident about the change.

> I also need your testing code for workqueue. ^_^

Heh, you asked for it.  Attached.  It's a scenario based thing and I
use different scenarios usually in combination with some debug printks
to verify things are behaving as I think they should.

Thanks.

-- 
tejun
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MAX_WQ_NAME		64
#define MAX_WQS			64
#define MAX_WORKS		64

struct wq_spec {
	int			id;	/* -1 terminates */
	unsigned int		max_active;
	unsigned int		flags;
};

enum action {
	ACT_TERM,			/* end */
	ACT_LOG,			/* const char * */
	ACT_BURN,			/* ulong duration_msecs */
	ACT_SLEEP,			/* ulong duration_msecs */
	ACT_WAKEUP,			/* ulong work_id */
	ACT_REQUEUE,			/* ulong delay_msecs, ulong cpu */
	ACT_FLUSH,			/* ulong work_id */
	ACT_FLUSH_WQ,			/* ulong workqueue_id */
	ACT_CANCEL,			/* ulong work_id */
};

struct work_action {
	enum action		action;	/* ACT_TERM terminates */
	union {
		unsigned long	v;
		const char	*s;
	};
	unsigned long		v1;
};

struct work_spec {
	int			id;		/* -1 terminates */
	int			wq_id;
	int			requeue_cnt;
	unsigned int		cpu;
	unsigned long		initial_delay;	/* msecs */

	const struct work_action *actions;
};

struct test_scenario {
	const struct wq_spec	*wq_spec;
	const struct work_spec	**work_spec;	/* NULL terminated */
};

static const struct wq_spec dfl_wq_spec[] = {
	{
		.id		= 0,
		.max_active	= 32,
		.flags		= 0,
	},
	{
		.id		= 1,
		.max_active	= 32,
		.flags		= 0,
	},
	{
		.id		= 2,
		.max_active	= 32,
		.flags		= WQ_RESCUER,
	},
	{
		.id		= 3,
		.max_active	= 32,
		.flags		= WQ_FREEZABLE,
	},
	{
		.id		= 4,
		.max_active	= 1,
		.flags		= WQ_UNBOUND | WQ_FREEZABLE/* | WQ_DBG*/,
	},
	{
		.id		= 5,
		.max_active	= 32,
		.flags		= WQ_NON_REENTRANT,
	},
	{
		.id		= 6,
		.max_active	= 4,
		.flags		= WQ_HIGHPRI,
	},
	{
		.id		= 7,
		.max_active	= 4,
		.flags		= WQ_CPU_INTENSIVE,
	},
	{
		.id		= 8,
		.max_active	= 4,
		.flags		= WQ_HIGHPRI | WQ_CPU_INTENSIVE,
	},
	{ .id = -1 },
};

/*
 * Scenario 0.  All are on cpu0.  work16 and 17 burn cpus for 10 and
 * 5msecs respectively and requeue themselves.  18 sleeps 2 secs and
 * cancel both.
 */
static const struct work_spec work_spec0[] = {
	{
		.id		= 16,
		.requeue_cnt	= 1024,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 10 }},
			{ ACT_REQUEUE,	{ 0 }, NR_CPUS },
			{ ACT_TERM },
		},
	},
	{
		.id		= 17,
		.requeue_cnt	= 1024,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 5 }},
			{ ACT_REQUEUE,	{ 0 }, NR_CPUS },
			{ ACT_TERM },
		},
	},
	{
		.id		= 18,
		.actions	= (const struct work_action[]) {
			{ ACT_LOG,	{ .s = "will sleep 2s and cancel both" }},
			{ ACT_SLEEP,	{ 2000 }},
			{ ACT_CANCEL,	{ 16 }},
			{ ACT_CANCEL,	{ 17 }},
			{ ACT_TERM },
		},
	},
	{ .id = -1 },
};

static const struct test_scenario scenario0 = {
	.wq_spec		= dfl_wq_spec,
	.work_spec		=
	(const struct work_spec *[]) { work_spec0, NULL },
};

/*
 * Scenario 1.  All are on cpu0.  Work 0, 1 and 2 sleep for different
 * intervals but all three will terminate at around 30secs.  3 starts
 * at @28 and 4 at @33 and both sleep for five secs and then
 * terminate.  5 waits for 0, 1, 2 and then flush wq which by the time
 * should have 3 on it.  After 3 completes @32, 5 terminates too.
 * After 4 secs, 4 terminates and all test sequence is done.
 */
static const struct work_spec work_spec1[] = {
	{
		.id		= 0,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 3 }},	/* to cause sched activation */
			{ ACT_LOG,	{ .s = "will sleep 30s" }},
			{ ACT_SLEEP,	{ 3 }},
			{ ACT_TERM },
		},
	},
	{
		.id		= 1,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 5 }},
			{ ACT_LOG,	{ .s = "will sleep 10s and burn 5msec and repeat 3 times" }},
			{ ACT_SLEEP,	{ 1 }},
			{ ACT_BURN,	{ 5 }},
			{ ACT_LOG,	{ .s = "@10s" }},
			{ ACT_SLEEP,	{ 1 }},
			{ ACT_BURN,	{ 5 }},
		

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Lai Jiangshan
On 09/25/2012 04:39 AM, Tejun Heo wrote:
> 
> I do like the removal of explicit cascading and would have gone that
> direction if this code is just being implemented but I'm quite
> skeptical whether changing over to that now is justifiable.  Flush
> bugs tend to be nasty and often difficult to track down.
> 

Hi, Tejun

I know your attitude, it is OK if you reject it.

It is not possible to remove cascading. If cascading code is
not in flush_workqueue(), it must be in some where else.

If you force overflow to wait for freed color before do flush(which also
force only one flusher for one color), and force the sole flush_workqueue()
to grab ->flush_mutex twice, we can simplify the flush_workqueue().
(see the attached patch, it remove 100 LOC, and the cascading code becomes
only 3 LOC). But these two forcing slow down the caller a little.

(And if you allow to use SRCU(which is only TWO colors), you can remove another
150 LOC. flush_workqueue() will become single line. But it will add some more 
overhead
in flush_workqueue() because SRCU's readsite is lockless)

Thanks,
Lai

This patch is applied on top of patch7. it replaces patch8~10

 workqueue.c |  168 ++--
 1 file changed, 30 insertions(+), 138 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index be407e1..bff0ae0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -204,15 +204,6 @@ struct cpu_workqueue_struct {
 };
 
 /*
- * Structure used to wait for workqueue flush.
- */
-struct wq_flusher {
-   struct list_headlist;   /* F: list of flushers */
-   int flush_color;/* F: flush color waiting for */
-   struct completion   done;   /* flush completion */
-};
-
-/*
  * All cpumasks are assumed to be always set on UP and thus can't be
  * used to determine whether there's something to be done.
  */
@@ -250,9 +241,8 @@ struct workqueue_struct {
int work_color; /* F: current work color */
int flush_color;/* F: current flush color */
atomic_tnr_cwqs_to_flush[WORK_NR_COLORS];
-   struct wq_flusher   *first_flusher; /* F: first flusher */
-   struct list_headflusher_queue;  /* F: flush waiters */
-   struct list_headflusher_overflow; /* F: flush overflow list */
+   struct completion   *flusher[WORK_NR_COLORS]; /* F: flusers */
+   wait_queue_head_t   flusher_overflow; /* flush overflow queue */
 
mayday_mask_t   mayday_mask;/* cpus requesting rescue */
struct worker   *rescuer;   /* I: rescue worker */
@@ -1001,7 +991,7 @@ static void wq_dec_flusher_ref(struct workqueue_struct 
*wq, int color)
 * It will handle the rest.
 */
if (atomic_dec_and_test(>nr_cwqs_to_flush[color]))
-   complete(>first_flusher->done);
+   complete(wq->flusher[color]);
 }
 
 /**
@@ -2540,27 +2530,20 @@ static void insert_wq_barrier(struct 
cpu_workqueue_struct *cwq,
  * becomes new flush_color and work_color is advanced by one.
  * All cwq's work_color are set to new work_color(advanced by one).
  *
- * The caller should have initialized @wq->first_flusher prior to
- * calling this function.
- *
  * CONTEXT:
  * mutex_lock(wq->flush_mutex).
- *
- * RETURNS:
- * %true if there's some cwqs to flush.  %false otherwise.
  */
-static bool workqueue_start_flush(struct workqueue_struct *wq)
+static void workqueue_start_flush(struct workqueue_struct *wq)
 {
int flush_color = wq->work_color;
int next_color = work_next_color(wq->work_color);
-   bool wait = false;
unsigned int cpu;
 
BUG_ON(next_color == wq->flush_color);
wq->work_color = next_color;
 
BUG_ON(atomic_read(>nr_cwqs_to_flush[flush_color]));
-   /* this ref is held by first flusher */
+   /* this ref is held by previous flusher */
atomic_set(>nr_cwqs_to_flush[flush_color], 1);
 
for_each_cwq_cpu(cpu, wq) {
@@ -2569,18 +2552,14 @@ static bool workqueue_start_flush(struct 
workqueue_struct *wq)
 
spin_lock_irq(>lock);
 
-   if (cwq->nr_in_flight[flush_color]) {
+   if (cwq->nr_in_flight[flush_color])
atomic_inc(>nr_cwqs_to_flush[flush_color]);
-   wait = true;
-   }
 
BUG_ON(next_color != work_next_color(cwq->work_color));
cwq->work_color = next_color;
 
spin_unlock_irq(>lock);
}
-
-   return wait;
 }
 
 /**
@@ -2595,127 +2574,41 @@ static bool workqueue_start_flush(struct 
workqueue_struct *wq)
  */
 void flush_workqueue(struct workqueue_struct *wq)
 {
-   struct wq_flusher this_flusher = {
-   .list = LIST_HEAD_INIT(this_flusher.list),
-   .flush_color = -1,
-   .done = 

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Lai Jiangshan
On 09/25/2012 04:39 AM, Tejun Heo wrote:
> Hello, Lai.
> 
> On Mon, Sep 24, 2012 at 06:07:02PM +0800, Lai Jiangshan wrote:
>> The core patch is patch6, it makes all flusher can start and the same time
>> and allow us do more cleanup.
>>
>> Only patch1 and patch6 change the behavior of the code.
>> All other patches do not change any behavior.
> 
> It would have been nice if you described what this patchset tries to
> achieve how in the head message.
> 
> I don't see anything wrong with the patchset but flush_workqueue() is
> quite hairy before this patchset and I'm not sure the situation
> improves a lot afterwards.  The current code is known / verified to
> work for quite some time and I'd *much* prefer to keep it stable
> unless it can be vastly simpler.

Hi, Tejun

I know your attitude, it is OK if you reject it.

I found the flush_workqueue() is not nature for me, especially
the usage of the colors and flush_workqueue_prep_cwqs().
so I try to improve it without change too much things/behavior.

(These patchset delay other simple patches, I think I should
send simple patches at first.)

> 
> I do like the removal of explicit cascading and would have gone that
> direction if this code is just being implemented but I'm quite
> skeptical whether changing over to that now is justifiable.  Flush
> bugs tend to be nasty and often difficult to track down.
> 
> I'll think more about it.  How confident are you about the change?
> How did you test them?  For changes like this, it usually helps a lot
> to describe how things were tested as part of head and/or commit
> messages.
> 

I always check the code by hard reviewing the code. I always try to image
there are many thread run in my brain orderless and I write all possible
transitions in paper. This progress is the most important, no test can
replace it.

Human brain can wrong, the attached patch is my testing code.
It verify flush_workqueue() by cookie number.

type "make test" to start the test.
type "Ctrl" and "c" to stop the test.

I also need your testing code for workqueue. ^_^

Thanks,
Lai

diff --git a/workqueue_flush_test/Makefile b/workqueue_flush_test/Makefile
new file mode 100644
index 000..3ecc7aa
--- /dev/null
+++ b/workqueue_flush_test/Makefile
@@ -0,0 +1,10 @@
+obj-m += wflush.o
+
+all:
+   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
+
+clean:
+   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
+
+test: all
+   bash ./test.sh
diff --git a/workqueue_flush_test/test.sh b/workqueue_flush_test/test.sh
new file mode 100644
index 000..a9d2d6a
--- /dev/null
+++ b/workqueue_flush_test/test.sh
@@ -0,0 +1,12 @@
+#/bin/bash
+
+make
+sync
+sync
+echo testing...
+trap 'echo interrupt test' INT
+sudo insmod wflush.ko
+sleep 6
+sudo rmmod wflush.ko
+echo test done
+
diff --git a/workqueue_flush_test/wflush.c b/workqueue_flush_test/wflush.c
new file mode 100644
index 000..971e73d
--- /dev/null
+++ b/workqueue_flush_test/wflush.c
@@ -0,0 +1,129 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct cookie_struct {
+   struct list_head head;
+   unsigned long cookie;
+};
+
+static DEFINE_MUTEX(cookie_lock);
+static unsigned long test_cookie;
+static LIST_HEAD(cookie_head);
+
+static unsigned long last_cookie(void)
+{
+   unsigned long cookie;
+
+   mutex_lock(_lock);
+   cookie = test_cookie - 1; /* c->cookie = test_cookie++; */
+   mutex_unlock(_lock);
+
+   return cookie;
+}
+
+static unsigned long tip_cookie(void)
+{
+   unsigned long cookie;
+
+   mutex_lock(_lock);
+   if (list_empty(_head))
+   cookie = test_cookie;
+   else
+   cookie = list_first_entry(_head, struct cookie_struct, 
head)->cookie;
+   mutex_unlock(_lock);
+
+   return cookie;
+}
+
+struct test_work {
+   struct work_struct w;
+   struct cookie_struct c;
+};
+
+static void add_test_work(struct test_work *t)
+{
+   struct cookie_struct *c = >c;
+
+   mutex_lock(_lock);
+   c->cookie = test_cookie++;
+   BUG_ON(!list_empty(>head));
+   list_add_tail(>head, _head);
+   schedule_work(>w);
+   mutex_unlock(_lock);
+
+   udelay(1+random32()%50);
+}
+
+static void test_work_fn(struct work_struct *w)
+{
+   struct test_work *t = container_of(w, struct test_work, w);
+
+   mutex_lock(_lock);
+   list_del_init(>c.head);
+   mutex_unlock(_lock);
+
+   udelay(1+random32()%50);
+}
+
+static int test_thread(void *arg)
+{
+   unsigned long lcookie, tcookie;
+   struct test_work t[10];
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(t); i++) {
+   INIT_WORK_ONSTACK([i].w, test_work_fn);
+   INIT_LIST_HEAD([i].c.head);
+   }
+
+   do {
+   for (i = 0; i < ARRAY_SIZE(t); i++) {
+   add_test_work(t+i);
+   }
+
+   lcookie = last_cookie();
+   flush_scheduled_work();
+   

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Lai Jiangshan
On 09/25/2012 04:39 AM, Tejun Heo wrote:
 Hello, Lai.
 
 On Mon, Sep 24, 2012 at 06:07:02PM +0800, Lai Jiangshan wrote:
 The core patch is patch6, it makes all flusher can start and the same time
 and allow us do more cleanup.

 Only patch1 and patch6 change the behavior of the code.
 All other patches do not change any behavior.
 
 It would have been nice if you described what this patchset tries to
 achieve how in the head message.
 
 I don't see anything wrong with the patchset but flush_workqueue() is
 quite hairy before this patchset and I'm not sure the situation
 improves a lot afterwards.  The current code is known / verified to
 work for quite some time and I'd *much* prefer to keep it stable
 unless it can be vastly simpler.

Hi, Tejun

I know your attitude, it is OK if you reject it.

I found the flush_workqueue() is not nature for me, especially
the usage of the colors and flush_workqueue_prep_cwqs().
so I try to improve it without change too much things/behavior.

(These patchset delay other simple patches, I think I should
send simple patches at first.)

 
 I do like the removal of explicit cascading and would have gone that
 direction if this code is just being implemented but I'm quite
 skeptical whether changing over to that now is justifiable.  Flush
 bugs tend to be nasty and often difficult to track down.
 
 I'll think more about it.  How confident are you about the change?
 How did you test them?  For changes like this, it usually helps a lot
 to describe how things were tested as part of head and/or commit
 messages.
 

I always check the code by hard reviewing the code. I always try to image
there are many thread run in my brain orderless and I write all possible
transitions in paper. This progress is the most important, no test can
replace it.

Human brain can wrong, the attached patch is my testing code.
It verify flush_workqueue() by cookie number.

type make test to start the test.
type Ctrl and c to stop the test.

I also need your testing code for workqueue. ^_^

Thanks,
Lai

diff --git a/workqueue_flush_test/Makefile b/workqueue_flush_test/Makefile
new file mode 100644
index 000..3ecc7aa
--- /dev/null
+++ b/workqueue_flush_test/Makefile
@@ -0,0 +1,10 @@
+obj-m += wflush.o
+
+all:
+   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
+
+clean:
+   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
+
+test: all
+   bash ./test.sh
diff --git a/workqueue_flush_test/test.sh b/workqueue_flush_test/test.sh
new file mode 100644
index 000..a9d2d6a
--- /dev/null
+++ b/workqueue_flush_test/test.sh
@@ -0,0 +1,12 @@
+#/bin/bash
+
+make
+sync
+sync
+echo testing...
+trap 'echo interrupt test' INT
+sudo insmod wflush.ko
+sleep 6
+sudo rmmod wflush.ko
+echo test done
+
diff --git a/workqueue_flush_test/wflush.c b/workqueue_flush_test/wflush.c
new file mode 100644
index 000..971e73d
--- /dev/null
+++ b/workqueue_flush_test/wflush.c
@@ -0,0 +1,129 @@
+#include linux/module.h
+#include linux/kthread.h
+#include linux/workqueue.h
+#include linux/delay.h
+#include linux/mutex.h
+#include linux/list.h
+#include linux/random.h
+
+struct cookie_struct {
+   struct list_head head;
+   unsigned long cookie;
+};
+
+static DEFINE_MUTEX(cookie_lock);
+static unsigned long test_cookie;
+static LIST_HEAD(cookie_head);
+
+static unsigned long last_cookie(void)
+{
+   unsigned long cookie;
+
+   mutex_lock(cookie_lock);
+   cookie = test_cookie - 1; /* c-cookie = test_cookie++; */
+   mutex_unlock(cookie_lock);
+
+   return cookie;
+}
+
+static unsigned long tip_cookie(void)
+{
+   unsigned long cookie;
+
+   mutex_lock(cookie_lock);
+   if (list_empty(cookie_head))
+   cookie = test_cookie;
+   else
+   cookie = list_first_entry(cookie_head, struct cookie_struct, 
head)-cookie;
+   mutex_unlock(cookie_lock);
+
+   return cookie;
+}
+
+struct test_work {
+   struct work_struct w;
+   struct cookie_struct c;
+};
+
+static void add_test_work(struct test_work *t)
+{
+   struct cookie_struct *c = t-c;
+
+   mutex_lock(cookie_lock);
+   c-cookie = test_cookie++;
+   BUG_ON(!list_empty(c-head));
+   list_add_tail(c-head, cookie_head);
+   schedule_work(t-w);
+   mutex_unlock(cookie_lock);
+
+   udelay(1+random32()%50);
+}
+
+static void test_work_fn(struct work_struct *w)
+{
+   struct test_work *t = container_of(w, struct test_work, w);
+
+   mutex_lock(cookie_lock);
+   list_del_init(t-c.head);
+   mutex_unlock(cookie_lock);
+
+   udelay(1+random32()%50);
+}
+
+static int test_thread(void *arg)
+{
+   unsigned long lcookie, tcookie;
+   struct test_work t[10];
+   int i;
+
+   for (i = 0; i  ARRAY_SIZE(t); i++) {
+   INIT_WORK_ONSTACK(t[i].w, test_work_fn);
+   INIT_LIST_HEAD(t[i].c.head);
+   }
+
+   do {
+   for (i = 0; i  ARRAY_SIZE(t); i++) {
+   

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Lai Jiangshan
On 09/25/2012 04:39 AM, Tejun Heo wrote:
 
 I do like the removal of explicit cascading and would have gone that
 direction if this code is just being implemented but I'm quite
 skeptical whether changing over to that now is justifiable.  Flush
 bugs tend to be nasty and often difficult to track down.
 

Hi, Tejun

I know your attitude, it is OK if you reject it.

It is not possible to remove cascading. If cascading code is
not in flush_workqueue(), it must be in some where else.

If you force overflow to wait for freed color before do flush(which also
force only one flusher for one color), and force the sole flush_workqueue()
to grab -flush_mutex twice, we can simplify the flush_workqueue().
(see the attached patch, it remove 100 LOC, and the cascading code becomes
only 3 LOC). But these two forcing slow down the caller a little.

(And if you allow to use SRCU(which is only TWO colors), you can remove another
150 LOC. flush_workqueue() will become single line. But it will add some more 
overhead
in flush_workqueue() because SRCU's readsite is lockless)

Thanks,
Lai

This patch is applied on top of patch7. it replaces patch8~10

 workqueue.c |  168 ++--
 1 file changed, 30 insertions(+), 138 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index be407e1..bff0ae0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -204,15 +204,6 @@ struct cpu_workqueue_struct {
 };
 
 /*
- * Structure used to wait for workqueue flush.
- */
-struct wq_flusher {
-   struct list_headlist;   /* F: list of flushers */
-   int flush_color;/* F: flush color waiting for */
-   struct completion   done;   /* flush completion */
-};
-
-/*
  * All cpumasks are assumed to be always set on UP and thus can't be
  * used to determine whether there's something to be done.
  */
@@ -250,9 +241,8 @@ struct workqueue_struct {
int work_color; /* F: current work color */
int flush_color;/* F: current flush color */
atomic_tnr_cwqs_to_flush[WORK_NR_COLORS];
-   struct wq_flusher   *first_flusher; /* F: first flusher */
-   struct list_headflusher_queue;  /* F: flush waiters */
-   struct list_headflusher_overflow; /* F: flush overflow list */
+   struct completion   *flusher[WORK_NR_COLORS]; /* F: flusers */
+   wait_queue_head_t   flusher_overflow; /* flush overflow queue */
 
mayday_mask_t   mayday_mask;/* cpus requesting rescue */
struct worker   *rescuer;   /* I: rescue worker */
@@ -1001,7 +991,7 @@ static void wq_dec_flusher_ref(struct workqueue_struct 
*wq, int color)
 * It will handle the rest.
 */
if (atomic_dec_and_test(wq-nr_cwqs_to_flush[color]))
-   complete(wq-first_flusher-done);
+   complete(wq-flusher[color]);
 }
 
 /**
@@ -2540,27 +2530,20 @@ static void insert_wq_barrier(struct 
cpu_workqueue_struct *cwq,
  * becomes new flush_color and work_color is advanced by one.
  * All cwq's work_color are set to new work_color(advanced by one).
  *
- * The caller should have initialized @wq-first_flusher prior to
- * calling this function.
- *
  * CONTEXT:
  * mutex_lock(wq-flush_mutex).
- *
- * RETURNS:
- * %true if there's some cwqs to flush.  %false otherwise.
  */
-static bool workqueue_start_flush(struct workqueue_struct *wq)
+static void workqueue_start_flush(struct workqueue_struct *wq)
 {
int flush_color = wq-work_color;
int next_color = work_next_color(wq-work_color);
-   bool wait = false;
unsigned int cpu;
 
BUG_ON(next_color == wq-flush_color);
wq-work_color = next_color;
 
BUG_ON(atomic_read(wq-nr_cwqs_to_flush[flush_color]));
-   /* this ref is held by first flusher */
+   /* this ref is held by previous flusher */
atomic_set(wq-nr_cwqs_to_flush[flush_color], 1);
 
for_each_cwq_cpu(cpu, wq) {
@@ -2569,18 +2552,14 @@ static bool workqueue_start_flush(struct 
workqueue_struct *wq)
 
spin_lock_irq(gcwq-lock);
 
-   if (cwq-nr_in_flight[flush_color]) {
+   if (cwq-nr_in_flight[flush_color])
atomic_inc(wq-nr_cwqs_to_flush[flush_color]);
-   wait = true;
-   }
 
BUG_ON(next_color != work_next_color(cwq-work_color));
cwq-work_color = next_color;
 
spin_unlock_irq(gcwq-lock);
}
-
-   return wait;
 }
 
 /**
@@ -2595,127 +2574,41 @@ static bool workqueue_start_flush(struct 
workqueue_struct *wq)
  */
 void flush_workqueue(struct workqueue_struct *wq)
 {
-   struct wq_flusher this_flusher = {
-   .list = LIST_HEAD_INIT(this_flusher.list),
-   .flush_color = -1,
-   .done = 

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Tejun Heo
Hello, Lai.

On Tue, Sep 25, 2012 at 05:02:31PM +0800, Lai Jiangshan wrote:
 I found the flush_workqueue() is not nature for me, especially

I don't think it's natural for anybody.  I'm not a big fan of that
code either.

 the usage of the colors and flush_workqueue_prep_cwqs().
 so I try to improve it without change too much things/behavior.
 
 (These patchset delay other simple patches, I think I should
 send simple patches at first.)

Yes, please do so.

 I always check the code by hard reviewing the code. I always try to image
 there are many thread run in my brain orderless and I write all possible
 transitions in paper. This progress is the most important, no test can
 replace it.
 
 Human brain can wrong, the attached patch is my testing code.
 It verify flush_workqueue() by cookie number.

Sure, nothing beats careful reviews but then again it's difficult to
have any level of confidence without actually excercising and
verifying each possible code path.  For tricky changes, it helps a lot
if you describe how the code was verified and why and how much you
feel confident about the change.

 I also need your testing code for workqueue. ^_^

Heh, you asked for it.  Attached.  It's a scenario based thing and I
use different scenarios usually in combination with some debug printks
to verify things are behaving as I think they should.

Thanks.

-- 
tejun
#include linux/module.h
#include linux/workqueue.h
#include linux/jiffies.h
#include linux/delay.h
#include linux/sched.h
#include linux/wait.h
#include linux/cpu.h
#include linux/kthread.h

#define MAX_WQ_NAME		64
#define MAX_WQS			64
#define MAX_WORKS		64

struct wq_spec {
	int			id;	/* -1 terminates */
	unsigned int		max_active;
	unsigned int		flags;
};

enum action {
	ACT_TERM,			/* end */
	ACT_LOG,			/* const char * */
	ACT_BURN,			/* ulong duration_msecs */
	ACT_SLEEP,			/* ulong duration_msecs */
	ACT_WAKEUP,			/* ulong work_id */
	ACT_REQUEUE,			/* ulong delay_msecs, ulong cpu */
	ACT_FLUSH,			/* ulong work_id */
	ACT_FLUSH_WQ,			/* ulong workqueue_id */
	ACT_CANCEL,			/* ulong work_id */
};

struct work_action {
	enum action		action;	/* ACT_TERM terminates */
	union {
		unsigned long	v;
		const char	*s;
	};
	unsigned long		v1;
};

struct work_spec {
	int			id;		/* -1 terminates */
	int			wq_id;
	int			requeue_cnt;
	unsigned int		cpu;
	unsigned long		initial_delay;	/* msecs */

	const struct work_action *actions;
};

struct test_scenario {
	const struct wq_spec	*wq_spec;
	const struct work_spec	**work_spec;	/* NULL terminated */
};

static const struct wq_spec dfl_wq_spec[] = {
	{
		.id		= 0,
		.max_active	= 32,
		.flags		= 0,
	},
	{
		.id		= 1,
		.max_active	= 32,
		.flags		= 0,
	},
	{
		.id		= 2,
		.max_active	= 32,
		.flags		= WQ_RESCUER,
	},
	{
		.id		= 3,
		.max_active	= 32,
		.flags		= WQ_FREEZABLE,
	},
	{
		.id		= 4,
		.max_active	= 1,
		.flags		= WQ_UNBOUND | WQ_FREEZABLE/* | WQ_DBG*/,
	},
	{
		.id		= 5,
		.max_active	= 32,
		.flags		= WQ_NON_REENTRANT,
	},
	{
		.id		= 6,
		.max_active	= 4,
		.flags		= WQ_HIGHPRI,
	},
	{
		.id		= 7,
		.max_active	= 4,
		.flags		= WQ_CPU_INTENSIVE,
	},
	{
		.id		= 8,
		.max_active	= 4,
		.flags		= WQ_HIGHPRI | WQ_CPU_INTENSIVE,
	},
	{ .id = -1 },
};

/*
 * Scenario 0.  All are on cpu0.  work16 and 17 burn cpus for 10 and
 * 5msecs respectively and requeue themselves.  18 sleeps 2 secs and
 * cancel both.
 */
static const struct work_spec work_spec0[] = {
	{
		.id		= 16,
		.requeue_cnt	= 1024,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 10 }},
			{ ACT_REQUEUE,	{ 0 }, NR_CPUS },
			{ ACT_TERM },
		},
	},
	{
		.id		= 17,
		.requeue_cnt	= 1024,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 5 }},
			{ ACT_REQUEUE,	{ 0 }, NR_CPUS },
			{ ACT_TERM },
		},
	},
	{
		.id		= 18,
		.actions	= (const struct work_action[]) {
			{ ACT_LOG,	{ .s = will sleep 2s and cancel both }},
			{ ACT_SLEEP,	{ 2000 }},
			{ ACT_CANCEL,	{ 16 }},
			{ ACT_CANCEL,	{ 17 }},
			{ ACT_TERM },
		},
	},
	{ .id = -1 },
};

static const struct test_scenario scenario0 = {
	.wq_spec		= dfl_wq_spec,
	.work_spec		=
	(const struct work_spec *[]) { work_spec0, NULL },
};

/*
 * Scenario 1.  All are on cpu0.  Work 0, 1 and 2 sleep for different
 * intervals but all three will terminate at around 30secs.  3 starts
 * at @28 and 4 at @33 and both sleep for five secs and then
 * terminate.  5 waits for 0, 1, 2 and then flush wq which by the time
 * should have 3 on it.  After 3 completes @32, 5 terminates too.
 * After 4 secs, 4 terminates and all test sequence is done.
 */
static const struct work_spec work_spec1[] = {
	{
		.id		= 0,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 3 }},	/* to cause sched activation */
			{ ACT_LOG,	{ .s = will sleep 30s }},
			{ ACT_SLEEP,	{ 3 }},
			{ ACT_TERM },
		},
	},
	{
		.id		= 1,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 5 }},
			{ ACT_LOG,	{ .s = will sleep 10s and burn 5msec and repeat 3 times }},
			{ ACT_SLEEP,	{ 1 }},
			{ ACT_BURN,	{ 

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Tejun Heo
Hello, Lai.

On Tue, Sep 25, 2012 at 05:02:43PM +0800, Lai Jiangshan wrote:
 It is not possible to remove cascading. If cascading code is
 not in flush_workqueue(), it must be in some where else.

Yeah, sure, I liked that it didn't have to be done explicitly as a
separate step.

 If you force overflow to wait for freed color before do flush(which also
 force only one flusher for one color), and force the sole flush_workqueue()
 to grab -flush_mutex twice, we can simplify the flush_workqueue().
 (see the attached patch, it remove 100 LOC, and the cascading code becomes
 only 3 LOC). But these two forcing slow down the caller a little.

Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
code path and I don't think grabbing mutex twice is too big a deal.  I
haven't actually reviewed the code but if it can be much simpler and
thus easier to understand and verify, I might go for that.

 (And if you allow to use SRCU(which is only TWO colors), you can remove 
 another
 150 LOC. flush_workqueue() will become single line. But it will add some more 
 overhead
 in flush_workqueue() because SRCU's readsite is lockless)

I'm not really following how SRCU would factor into this but
supporting multiple colors was something explicitly requested by
Linus.  The initial implementation was a lot simpler which supported
only two colors.  Linus was worried that the high possibility of
flusher clustering could lead to chaining of latencies.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Lai Jiangshan
On 09/26/2012 04:24 AM, Tejun Heo wrote:
 Hello, Lai.
 
 On Tue, Sep 25, 2012 at 05:02:43PM +0800, Lai Jiangshan wrote:
 It is not possible to remove cascading. If cascading code is
 not in flush_workqueue(), it must be in some where else.
 
 Yeah, sure, I liked that it didn't have to be done explicitly as a
 separate step.
 
 If you force overflow to wait for freed color before do flush(which also
 force only one flusher for one color), and force the sole flush_workqueue()
 to grab -flush_mutex twice, we can simplify the flush_workqueue().
 (see the attached patch, it remove 100 LOC, and the cascading code becomes
 only 3 LOC). But these two forcing slow down the caller a little.
 
 Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
 code path and I don't think grabbing mutex twice is too big a deal.  I
 haven't actually reviewed the code but if it can be much simpler and
 thus easier to understand and verify, I might go for that.

I updated it. it is attached, it forces flush_workqueue() to grab mutex 
twice(no other forcing).
overflow queue is implemented in a different way. This new algorithm may become 
our choice
likely, please review this one.

 
 (And if you allow to use SRCU(which is only TWO colors), you can remove 
 another
 150 LOC. flush_workqueue() will become single line. But it will add some 
 more overhead
 in flush_workqueue() because SRCU's readsite is lockless)
 
 I'm not really following how SRCU would factor into this but
 supporting multiple colors was something explicitly requested by
 Linus.  The initial implementation was a lot simpler which supported
 only two colors.  Linus was worried that the high possibility of
 flusher clustering could lead to chaining of latencies.
 

I did not know this history, thank you.

But the number of colors is not essential.
Does the algorithm chain flushers is essential.

If we can have multiple flushers for each color. It is not chained.
If we have only one flusher for one color. It is chained. Even we have multiple
color, it is still partially chained(image we have very high frequent 
flush_workqueue()).

The initial implementation of flush_workqueue() is chained algorithm.
The initial implementation of SRCU is also chained algorithm.
but the current SRCU which was implemented by me is not chained
(I don't propose to use SRCU for flush_workqueue(), I just discuss it)

The simple version of flush_workqueue() which I sent yesterday is chained,
because it forces overflow flushers wait for free color and forces only one
flusher for one color.

Since not chaining is important/essential. I sent a new draft implement today.
it uses multiple queues, one for each color(like SRCU).
this version is also simple, it remove 90 LOC.

Thanks,
Lai

This patch is still applied on top of patch7. it replaces patch8~10

 workqueue.c |  152 diff --git 
a/kernel/workqueue.c b/kernel/workqueue.c
index be407e1..00f02ba 100644

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -208,7 +208,6 @@ struct cpu_workqueue_struct {
  */
 struct wq_flusher {
struct list_headlist;   /* F: list of flushers */
-   int flush_color;/* F: flush color waiting for */
struct completion   done;   /* flush completion */
 };
 
@@ -250,9 +249,7 @@ struct workqueue_struct {
int work_color; /* F: current work color */
int flush_color;/* F: current flush color */
atomic_tnr_cwqs_to_flush[WORK_NR_COLORS];
-   struct wq_flusher   *first_flusher; /* F: first flusher */
-   struct list_headflusher_queue;  /* F: flush waiters */
-   struct list_headflusher_overflow; /* F: flush overflow list */
+   struct list_headflusher[WORK_NR_COLORS]; /* F: flushers */
 
mayday_mask_t   mayday_mask;/* cpus requesting rescue */
struct worker   *rescuer;   /* I: rescue worker */
@@ -1000,8 +997,11 @@ static void wq_dec_flusher_ref(struct workqueue_struct 
*wq, int color)
 * If this was the last reference, wake up the first flusher.
 * It will handle the rest.
 */
-   if (atomic_dec_and_test(wq-nr_cwqs_to_flush[color]))
-   complete(wq-first_flusher-done);
+   if (atomic_dec_and_test(wq-nr_cwqs_to_flush[color])) {
+   BUG_ON(color != wq-flush_color);
+   complete(list_first_entry(wq-flusher[color],
+  struct wq_flusher, list)-done);
+   }
 }
 
 /**
@@ -2540,27 +2540,20 @@ static void insert_wq_barrier(struct 
cpu_workqueue_struct *cwq,
  * becomes new flush_color and work_color is advanced by one.
  * All cwq's work_color are set to new work_color(advanced by one).
  *
- * The caller should have initialized @wq-first_flusher prior to
- * calling this function.
- *
  * CONTEXT:
  * mutex_lock(wq-flush_mutex).
- *

Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-24 Thread Tejun Heo
Hello, Lai.

On Mon, Sep 24, 2012 at 06:07:02PM +0800, Lai Jiangshan wrote:
> The core patch is patch6, it makes all flusher can start and the same time
> and allow us do more cleanup.
> 
> Only patch1 and patch6 change the behavior of the code.
> All other patches do not change any behavior.

It would have been nice if you described what this patchset tries to
achieve how in the head message.

I don't see anything wrong with the patchset but flush_workqueue() is
quite hairy before this patchset and I'm not sure the situation
improves a lot afterwards.  The current code is known / verified to
work for quite some time and I'd *much* prefer to keep it stable
unless it can be vastly simpler.

I do like the removal of explicit cascading and would have gone that
direction if this code is just being implemented but I'm quite
skeptical whether changing over to that now is justifiable.  Flush
bugs tend to be nasty and often difficult to track down.

I'll think more about it.  How confident are you about the change?
How did you test them?  For changes like this, it usually helps a lot
to describe how things were tested as part of head and/or commit
messages.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-24 Thread Lai Jiangshan
The core patch is patch6, it makes all flusher can start and the same time
and allow us do more cleanup.

Only patch1 and patch6 change the behavior of the code.
All other patches do not change any behavior.

Lai Jiangshan (10):
  workqueue: always pass flush responsibility to next
  workqueue: remove unneeded check
  workqueue: remove while(true)
  workqueue: use nr_cwqs_to_flush array
  workqueue: add wq_dec_flusher_ref()
  workqueue: start all flusher at the same time
  workqueue: simplify flush_workqueue_prep_cwqs()
  workqueue: assign overflowed flushers's flush color when queued
  workqueue: remove flusher_overflow
  workqueue: remove wq->flush_color

 kernel/workqueue.c |  243 +++
 1 files changed, 91 insertions(+), 152 deletions(-)

-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-24 Thread Lai Jiangshan
The core patch is patch6, it makes all flusher can start and the same time
and allow us do more cleanup.

Only patch1 and patch6 change the behavior of the code.
All other patches do not change any behavior.

Lai Jiangshan (10):
  workqueue: always pass flush responsibility to next
  workqueue: remove unneeded check
  workqueue: remove while(true)
  workqueue: use nr_cwqs_to_flush array
  workqueue: add wq_dec_flusher_ref()
  workqueue: start all flusher at the same time
  workqueue: simplify flush_workqueue_prep_cwqs()
  workqueue: assign overflowed flushers's flush color when queued
  workqueue: remove flusher_overflow
  workqueue: remove wq-flush_color

 kernel/workqueue.c |  243 +++
 1 files changed, 91 insertions(+), 152 deletions(-)

-- 
1.7.4.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-24 Thread Tejun Heo
Hello, Lai.

On Mon, Sep 24, 2012 at 06:07:02PM +0800, Lai Jiangshan wrote:
 The core patch is patch6, it makes all flusher can start and the same time
 and allow us do more cleanup.
 
 Only patch1 and patch6 change the behavior of the code.
 All other patches do not change any behavior.

It would have been nice if you described what this patchset tries to
achieve how in the head message.

I don't see anything wrong with the patchset but flush_workqueue() is
quite hairy before this patchset and I'm not sure the situation
improves a lot afterwards.  The current code is known / verified to
work for quite some time and I'd *much* prefer to keep it stable
unless it can be vastly simpler.

I do like the removal of explicit cascading and would have gone that
direction if this code is just being implemented but I'm quite
skeptical whether changing over to that now is justifiable.  Flush
bugs tend to be nasty and often difficult to track down.

I'll think more about it.  How confident are you about the change?
How did you test them?  For changes like this, it usually helps a lot
to describe how things were tested as part of head and/or commit
messages.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/