[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-02-04 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r374684866
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -97,23 +91,33 @@ void FlowFileRepository::flush() {
   }
 }
 
-void FlowFileRepository::run() {
-  // threshold for purge
+void FlowFileRepository::printStats() {
+  std::string key_count;
+  db_->GetProperty("rocksdb.estimate-num-keys", _count);
+
+  std::string table_readers;
+  db_->GetProperty("rocksdb.estimate-table-readers-mem", _readers);
+
+  std::string all_memtables;
+  db_->GetProperty("rocksdb.cur-size-all-mem-tables", _memtables);
+
+  logger_->log_info("Repository stats: key count: %zu, table readers size: 
%zu, all memory tables size: %zu",
+  key_count, table_readers, all_memtables);
+}
 
+void FlowFileRepository::run() {
+  auto last = std::chrono::system_clock::now();
 
 Review comment:
   Done. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-29 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r372505686
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -89,23 +82,28 @@ void FlowFileRepository::flush() {
   }
 }
 
-void FlowFileRepository::run() {
-  // threshold for purge
+void FlowFileRepository::printStats() {
 
 Review comment:
   Now it's single line. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-29 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r372506095
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.h
 ##
 @@ -103,6 +105,9 @@ class FlowFileRepository : public core::Repository, public 
std::enable_shared_fr
 options.create_if_missing = true;
 options.use_direct_io_for_flush_and_compaction = true;
 options.use_direct_reads = true;
+options.write_buffer_size = 8 << 20;
 
 Review comment:
   Added a long comment to explain the decision behind. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-29 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r372505686
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -89,23 +82,28 @@ void FlowFileRepository::flush() {
   }
 }
 
-void FlowFileRepository::run() {
-  // threshold for purge
+void FlowFileRepository::printStats() {
 
 Review comment:
   Now it's single line and printed less frequently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-28 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371749842
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -69,14 +68,8 @@ void FlowFileRepository::flush() {
 batch.Delete(keys[i]);
   }
 
-
-  if (db_->Write(rocksdb::WriteOptions(), ).ok()) {
-logger_->log_trace("Decrementing %u from a repo size of %u", 
decrement_total, repo_size_.load());
-if (decrement_total > repo_size_.load()) {
-  repo_size_ = 0;
-} else {
-  repo_size_ -= decrement_total;
-}
+  if (!db_->Write(rocksdb::WriteOptions(), ).ok()) {
+logger_->log_warn("Failed to execute batch operation when flushing 
FlowFileRepository");
 
 Review comment:
   https://issues.apache.org/jira/browse/MINIFICPP-1129


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-28 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371745162
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -89,23 +82,28 @@ void FlowFileRepository::flush() {
   }
 }
 
-void FlowFileRepository::run() {
-  // threshold for purge
+void FlowFileRepository::printStats() {
 
 Review comment:
   Fair point.
   
   I would keep it at info level, but call less frequently. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-27 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371310031
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -69,14 +68,8 @@ void FlowFileRepository::flush() {
 batch.Delete(keys[i]);
   }
 
-
-  if (db_->Write(rocksdb::WriteOptions(), ).ok()) {
-logger_->log_trace("Decrementing %u from a repo size of %u", 
decrement_total, repo_size_.load());
-if (decrement_total > repo_size_.load()) {
-  repo_size_ = 0;
-} else {
-  repo_size_ -= decrement_total;
-}
+  if (!db_->Write(rocksdb::WriteOptions(), ).ok()) {
+logger_->log_warn("Failed to execute batch operation when flushing 
FlowFileRepository");
 
 Review comment:
   I removed size handling as:
   - It was done in a inconsistent way.
   - Even if the repo_full flag was set, nothing happened, new elements were 
inserted. 
   - Handling repo size out of the repo (without taking the size of keys into 
account) is most probably a bad design.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-27 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371310031
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.cpp
 ##
 @@ -69,14 +68,8 @@ void FlowFileRepository::flush() {
 batch.Delete(keys[i]);
   }
 
-
-  if (db_->Write(rocksdb::WriteOptions(), ).ok()) {
-logger_->log_trace("Decrementing %u from a repo size of %u", 
decrement_total, repo_size_.load());
-if (decrement_total > repo_size_.load()) {
-  repo_size_ = 0;
-} else {
-  repo_size_ -= decrement_total;
-}
+  if (!db_->Write(rocksdb::WriteOptions(), ).ok()) {
+logger_->log_warn("Failed to execute batch operation when flushing 
FlowFileRepository");
 
 Review comment:
   I removed size handling as:
   - It was done in an inconsistent way.
   - Even if the repo_full flag was set, nothing happened, new elements were 
inserted. 
   - Handling repo size out of the repo (without taking the size of keys into 
account) is most probably a bad design.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [nifi-minifi-cpp] arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce sawtooth in memory usage of rocksdb flowfile …

2020-01-27 Thread GitBox
arpadboda commented on a change in pull request #715: MINIFICPP-1126 - Reduce 
sawtooth in memory usage of rocksdb flowfile …
URL: https://github.com/apache/nifi-minifi-cpp/pull/715#discussion_r371309009
 
 

 ##
 File path: extensions/rocksdb-repos/FlowFileRepository.h
 ##
 @@ -103,6 +105,9 @@ class FlowFileRepository : public core::Repository, public 
std::enable_shared_fr
 options.create_if_missing = true;
 options.use_direct_io_for_flush_and_compaction = true;
 options.use_direct_reads = true;
+options.write_buffer_size = 8 << 20;
 
 Review comment:
   This is the important part of this PR.
   
   When operations are done in rocksdb, it stores them in an unsorted list of 
events. The buffer getting full means that these events should be merged and 
serialized, so records are written to the disk in the regular structure of 
rocksdb. 
   
   In our case it means two things:
   -During our regular usecase the content of the buffer in continuously 
growing as creation and later deletion of the same element results in two 
events added to the buffer (log). 
   -When the buffer is full and events are merged, there is a CPU spike and the 
result (nothing or nearly nothing in our case) is written to the underlying 
storage. 
   
   As this buffer only contains flowfile metadata (attributes and content 
location), it's filled quite slowly. It takes hours in case 10 flowfiles are 
generated / sec. 
   
   To avoid memory usage going that high (FF repo can consume more than the 
rest of whole MiNiFI when the buffer is close to full) and have smaller CPU 
peaks the buffer is now emptied more frequently. This still means minutes with 
a decent load. 
   
   For more information check this: 
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#memtable
   (Note: write amplification doesn't matter in our case as we usually persist 
negligible amount of data as flowfiles fade out of the system in a very short 
time)
   
   The logging added in this PR logs the current write buffer usage as well. It 
helps monitoring the peaks. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services