Review Request 60568: Log Feeder input config attribute "tail" should be clearer

Miklos Gergely Fri, 30 Jun 2017 08:14:24 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60568/
-----------------------------------------------------------


Review request for Ambari, Oliver Szabo and Robert Nettleton.


Bugs: AMBARI-21387
    https://issues.apache.org/jira/browse/AMBARI-21387


Repository: ambari


Description
-------

tail = true should mean that only the latest log file is loaded
tail = false should mean that the previous rolled over log files are loaded 
too, but not followed

AbstractInputFile must be prepared for previous rolled over files to check in 
even when the next part is actively handled, as the checking in is done when 
the data was moved to solr, so all file related fields were moved to maps, so 
the checkin can identify which file's status is checked in.

Log files are sorted alphabetically after identified, thus ensureing that the 
first one is the active one. In practice they were sorted already, but the 
File.listFiles function doesn't ensures it according to it's documentation, so 
it's better to do so explicitly.

The thread that is responsible for deleting unused checkpoint files should do 
one more check before deleting a checkpoint file because the file it belongs to 
is no more there, or it's file key doesn't matches the one in the checkpoint 
file: first it should look up if there is any other file in the same folder 
which have the same file key, in which case it is assumed that the file was 
renamed since. As the code that's looking for a checkpoint file to determine 
the next line to load does that by the name of the checkpoint file, which is 
generated from the file key it will successfuly identify it irrelevant of the 
name of the file.

I know, it's a bit complicated :) But I couldn't put it into words less 
complicated.


Also fixed Log Search Config ZK bug not to handle not node data change related 
events.


Diffs
-----

  
ambari-logsearch/ambari-logsearch-config-zookeeper/src/main/java/org/apache/ambari/logsearch/config/zookeeper/LogSearchConfigZK.java
 26375e1 
  
ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/AbstractInputFile.java
 ab50eb7 
  
ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/Input.java
 c36f96b 
  
ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/InputFile.java
 fc40ca4 
  
ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/InputManager.java
 19894ae 
  
ambari-logsearch/ambari-logsearch-logfeeder/src/main/java/org/apache/ambari/logfeeder/input/InputS3File.java
 2b19503 
  
ambari-logsearch/ambari-logsearch-logfeeder/src/test/java/org/apache/ambari/logfeeder/input/InputManagerTest.java
 e9bbe7e 


Diff: https://reviews.apache.org/r/60568/diff/1/


Testing
-------

Tested on local vagrant cluster, by loading previous files again and again, and 
by renaming rolled over files. In both cases no duplicate data were loaded, 
checkpoints were handled correctly.


Thanks,

Miklos Gergely

Review Request 60568: Log Feeder input config attribute "tail" should be clearer

Reply via email to