Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

2014-09-03 Thread Andreas Rohner
On 2014-09-03 02:35, Ryusuke Konishi wrote:
 On Mon, 01 Sep 2014 21:18:30 +0200, Andreas Rohner wrote:
 On 2014-09-01 20:43, Andreas Rohner wrote:
 Hi Ryusuke,
 On 2014-09-01 19:59, Ryusuke Konishi wrote:
 On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
 Under normal circumstances nilfs_sync_fs() writes out the super block,
 which causes a flush of the underlying block device. But this depends on
 the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
 last segment crosses a segment boundary. So if only a small amount of
 data is written before the call to nilfs_sync_fs(), no flush of the
 block device occurs.

 In the above case an additional call to blkdev_issue_flush() is needed.
 To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
 introduced, which is cleared whenever new logs are written and set
 whenever the block device is flushed.

 Signed-off-by: Andreas Rohner andreas.roh...@gmx.net

 The patch looks good to me except that I feel the use of atomic
 test-and-set bitwise operations something unfavorable (though it's
 logically correct).  I will try to send this to upstream as is unless
 a comment comes to mind.

 I originally thought, that it is necessary to do it atomically to avoid
 a race condition, but I am not so sure about that any more. I think the
 only case we have to avoid is, to call set_nilfs_flushed() after
 blkdev_issue_flush(), because this could race with the
 clear_nilfs_flushed() from the segment construction. So this should also
 work:

  +  if (wait  !err  nilfs_test_opt(nilfs, BARRIER) 
  +  !nilfs_flushed(nilfs)) {
  +  set_nilfs_flushed(nilfs);
  +  err = blkdev_issue_flush(sb-s_bdev, GFP_KERNEL, NULL);
  +  if (err != -EIO)
  +  err = 0;
  +  }
  +

 On the other hand, it says in the comments to set_bit(), that it can be
 reordered on architectures other than x86. test_and_set_bit() implies a
 memory barrier on all architectures. But I don't think the processor
 would reorder set_nilfs_flushed() after the external function call to
 blkdev_issue_flush(), would it?
 
 I believe compiler doesn't reorder set_bit() operation after an
 external function call unless it knows the content of the function and
 the function can be optimized.  But, yes, set_bit() doesn't imply
 memory barrier unlike test_and_set_bit().  As for
 blkdev_issue_flush(), it would imply memory barrier by some lock
 functions or other primitive used inside it.  (I haven't actually
 confirmed that the premise is true)

Yes blkdev_issue_flush() probably implies a memory barrier.

 On the other hand, we need explicit barrier operation like
 smp_mb__after_atomic() if a certain operation is performed after
 set_bit() and the changed bit should be visible to other processors
 before the operation.

Great suggestion. I didn't know about those functions. Do we also need a
call to smp_mb__before_atomic() before clear_nilfs_flushed(nilfs) in
segment.c?

I would be happy to provide another version of the patch with
set_nilfs_flushed(nilfs) and smp_mb__after_atomic() if you prefer that
version over the test_and_set_bit approach...

br,
Andreas Rohner
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] nilfs2: add a tracepoint for tracking stage transition of segment construction

2014-09-03 Thread Ryusuke Konishi
Hi Mitake-san,
On Tue,  2 Sep 2014 21:19:39 +0900, Mitake Hitoshi wrote:
 From: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
 
 This patch adds a tracepoint for tracking stage transition of block
 collection in segment construction. With the tracepoint, we can
 analysis the behavior of segment construction in depth. It would be
 useful for bottleneck detection and debugging, etc.
 
 The tracepoint is created with the standard trace API of linux (like
 ext3, ext4, f2fs and btrfs). So we can analysis with existing tools
 easily. Of course, more detailed analysis will be possible if we can
 create nilfs specific analysis tools.
 
 Below is an example of event dump with Brendan Gregg's perf-tools
 (https://github.com/brendangregg/perf-tools). Time consumption between
 each stage can be obtained.
 
 $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
 Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
 segctord-14875 [003] ...1 28311.067794: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_INIT
 segctord-14875 [003] ...1 28311.068139: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_GC
 segctord-14875 [003] ...1 28311.068139: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_FILE
 segctord-14875 [003] ...1 28311.068486: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_IFILE
 segctord-14875 [003] ...1 28311.068540: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_CPFILE
 segctord-14875 [003] ...1 28311.068561: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SUFILE
 segctord-14875 [003] ...1 28311.068565: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DAT
 segctord-14875 [003] ...1 28311.068573: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_SR
 segctord-14875 [003] ...1 28311.068574: 
 nilfs2_collection_stage_transition: sci = 8800ce6de000, stage = ST_DONE
 
 For capturing transition correctly, this patch renames the member scnt
 of nilfs_cstage and adds wrappers for the member. With this change,
 every transition of the stage can produce trace event in a correct
 manner.
 
 Of course the tracepoint added by this patch is very limited, so we
 need to add more points for detailed analysis. This patch is something
 like demonstration. If this concept is acceptable for the nilfs
 community, I'd like to add more tracepoints and prepare analysis
 tools.

Great!

This tracepoint support looks to be what I wanted to introduce to
nilfs2 to help debugging and performance analysis. I felt it's really
nice after I tried this patch with the perf-tools though I am not
familiar with the manner of the tracepoints.

Could you proceed this work from what you think useful ?  I will help
sending this work to upstream step by step, and would like to extend
it learning various tracepoint features.

By the way, your mail addresses differ between the author line (from
line) and the sob line.  Can you include a From line so that the
mail addresses match between them.

Thanks,
Ryusuke Konishi

 Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
 ---
  fs/nilfs2/segment.c   | 70 
 ++-
  fs/nilfs2/segment.h   |  5 ++--
  include/trace/events/nilfs2.h | 50 +++
  3 files changed, 103 insertions(+), 22 deletions(-)
  create mode 100644 include/trace/events/nilfs2.h
 
 diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
 index a1a1916..e841e22 100644
 --- a/fs/nilfs2/segment.c
 +++ b/fs/nilfs2/segment.c
 @@ -76,6 +76,35 @@ enum {
   NILFS_ST_DONE,
  };
  
 +#define CREATE_TRACE_POINTS
 +#include trace/events/nilfs2.h
 +
 +/*
 + * nilfs_sc_cstage_inc(), nilfs_sc_cstage_set(), nilfs_sc_cstage_get() are
 + * wrapper functions of stage count (nilfs_sc_info-sc_stage.__scnt). Users 
 of
 + * the variable must use them because transition of stage count must involve
 + * trace events (trace_nilfs2_collection_stage_transition).
 + *
 + * nilfs_sc_cstage_get() isn't required for the above purpose because it 
 doesn't
 + * produce events. It is provided just for making the intention clear.
 + */
 +static inline void nilfs_sc_cstage_inc(struct nilfs_sc_info *sci)
 +{
 + sci-sc_stage.__scnt++;
 + trace_nilfs2_collection_stage_transition(sci);
 +}
 +
 +static inline void nilfs_sc_cstage_set(struct nilfs_sc_info *sci, int 
 next_scnt)
 +{
 + sci-sc_stage.__scnt = next_scnt;
 + trace_nilfs2_collection_stage_transition(sci);
 +}
 +
 +static inline int nilfs_sc_cstage_get(struct nilfs_sc_info *sci)
 +{
 + return sci-sc_stage.__scnt;
 +}
 +
  /* State flags of collection */
  #define NILFS_CF_NODE0x0001  /* Collecting node blocks */
  #define NILFS_CF_IFILE_STARTED   0x0002  /* IFILE stage has started */
 @@ -1055,7 +1084,7 @@ static