Adar Dembo has posted comments on this change.

Change subject: Allow for reserving disk space for non-Kudu processes

Patch Set 5:

File src/kudu/consensus/

Line 111: DEFINE_int64(log_dir_reserved_bytes, 0,
Nit: I think this would be more clear as fs_wal_dir_reserved_bytes, as it would 
closely parallel fs_wal_dir.

Not sure if that implies it ought to be defined in the fs module and declared 
here, I don't have a strong opinion on that.

Line 113: TAG_FLAG(log_dir_reserved_bytes, runtime);
Hmm, I'm not sure if we should allow changing these values at runtime. If 
someone increases the reservation, we'll stop allowing new containers (good), 
but we have no way of reducing Kudu's disk space usage to meet the reservation 
(bad). Meaning, we can't fully satisfy the operator's desired semantics for a 
runtime change of these flags.
File src/kudu/fs/

PS5, Line 951: We use a COARSE clock so we need to let a
             :   // little bit of time pass so we get at least one unit of time 
greater than
             :   // before when we call MonoTime::Now() again.
Let's switch to FINE clock so we won't have this limitation.
File src/kudu/fs/

Line 64: DEFINE_int32(full_disk_cache_seconds, 30,
Nit: please prefix with log_block_manager. It's super long, yes, but it's the 
only way we have to "namespace" the flags.

Line 70: DEFINE_int64(data_dirs_reserved_bytes, 0,
Please rename to fs_data_dirs_reserved_bytes (see my comment on 
log_dir_reserved_bytes for the rationale). Also indicate in the help that it 
only works with the log block manager.

Line 96:                       "Number of unavailable log block containers",
Nit: Number of Unavailable Block Containers (to be consistent with the existing 
container metrics).

Line 98:                       "Number of non-full Block Containers that are 
under root paths that "
Nit: "Number of non-full log block containers that are under root paths whose 
disks are full" (TBCWTECM).

Line 265:   RWFile* data_file() const { return data_file_.get(); }
We're only using this for the filename. Could you make narrow accessor that 
just returns the data file's name? You could call it DataFilePath() to parallel 

As for the implementation, since it'll parallel MetadataFilePath(), could you 
use the same implementation for both? That is, either file.filename(), or 
append the right suffix to path_.

Line 267:   int64_t preallocated_offset() const { return preallocated_offset_; }
Where is this used?

Line 273:   Env* env() const { return block_manager_->env(); }
Is this ever used?

Line 312:   int64_t preallocated_offset_ = 0;
Huh, didn't know C++11 allowed this. Could you make the same change for 
total_bytes_written_? Would be nice for both primitives to be initialized 

Line 1230:   // Indexes of root paths that are below their reserved space 
Can we defer this work to the case where GetAvailableContainer() has returned 
null? That would speed up CreateBlock() for the common case.

Line 1250:         return Status::IOError("Unable to allocate block: All data 
directories are full. "
We should add ENOSPC to this status, right?

Line 1292:                                                      
Preallocate() may not actually preallocate 
FLAGS_log_container_preallocate_bytes bytes. When the container is first 
created it will, but after that, the preallocate base and bounds are 
total_bytes_written_ and FLAGS_log_container_preallocate_bytes respectively. 
Which means, each successive Preallocate() will only preallocate by the size of 
the last block to be written to the container.

To be honest, I'm not even sure if the above produces desirable results. More 
specifically, suppose we have a container backed by a single ext4 extent that 
was preallocated when the container was created. Will subsequent preallocations 
"extend" that extent? Or will they add new extents, each the size of the last 
block written? If it's the latter, then the preallocation logic is hot garbage 
and should be replaced.

It's not fair of me to ask you to fix preallocation (though you're certainly 
welcome to if you prefer to do that), but I do think we need to pass the right 
value to VerifySufficientDiskSpace(). In the context of preallocation here, I'm 
not really sure what it should be. As an alternative, what do you think of 
moving disk space enforcement into Container::WriteData()? It's not ideal 
because that's not where  disk space is actually consumed, and we'd need to 
verify that a failure there doesn't orphan blocks (I think the failure should 
result in WritableBlock::Abort()), but it has the added advantage of doing the 
right thing when preallocation is disabled. And I don't think the overhead is 
so bad now that you're caching statvfs() calls with FullDiskCache.

Line 1413:     MonoTime now = MonoTime::Now(MonoTime::COARSE);
FINE here too. And this can be done outside the lock.

Line 1432:   } // Release lock_.
Nit: FWIW, I think this is implied by the scope (i.e. don't really see the need 
for the comment).

Line 1758:   if (expires->ComesBefore(MonoTime::Now(MonoTime::COARSE))) return 
false; // Expired.
Use FINE here and below; that's what we default to when introducing timeouts 
File src/kudu/fs/log_block_manager.h:

Line 156:   // Returns true if the given 'root_path' has been marked full and 
'now' is
Nit: 'now' (with single quotes) suggests a function argument, perhaps "the 
current time" (without double quotes) instead?

Oh, did you mean 'expires' instead of 'now'? If so, could you update the 

PS5, Line 158: Note that this cache uses an "exact match" on the root path 
name, so
             :   // marking /a/b full will not cause /a/b/c to be considered 
This is true, but how is it relevant? The LBM uses only exact matches on root 
paths everywhere else, right?

Line 160:   bool IsRootFull(const std::string& root_path, MonoTime* expires = 
nullptr) const;
Nit: expires should actually be expires_out, to be consistent with the 

Line 366:   FullDiskCache full_disk_cache_;
Nit: add a comment.
File src/kudu/util/

PS5, Line 34: class TestEnvUtil: public KuduTest {
            : };
Nit: you can just remove this and use TEST_F(KuduTest, ...) below.
File src/kudu/util/

Line 23: #include <sys/statvfs.h>
Don't need this anymore.

Line 46: TAG_FLAG(disk_reserved_prefixes_with_bytes_free_for_testing, runtime);
Why does this and the above flag need to be changeable at runtime? AFAICT you 
never change them via RPC (that's what 'runtime' allows you to do).

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to