[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-19 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r326462840
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -68,15 +97,18 @@ public FileIOEngine(long capacity, String... filePaths) 
throws IOException {
   // The next setting length will throw exception,logging this message
   // is just used for the detail reason of exception,
   String msg = "Only " + StringUtils.byteDesc(totalSpace)
-  + " total space under " + filePath + ", not enough for requested 
"
-  + StringUtils.byteDesc(sizePerFile);
++ " total space under " + filePath + ", not enough for requested "
++ StringUtils.byteDesc(sizePerFile);
   LOG.warn(msg);
 }
-rafs[i].setLength(sizePerFile);
+File file = new File(filePath);
+if (file.length() != sizePerFile) {
+  rafs[i].setLength(sizePerFile);
+}
 
 Review comment:
   Yaya that was my Q..  Oh great.. I did not read whole the code.. Then its 
perfect. Pls add enough comments around to know why this check and set length 
is so imp.  Great.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-19 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r326120074
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -68,15 +97,18 @@ public FileIOEngine(long capacity, String... filePaths) 
throws IOException {
   // The next setting length will throw exception,logging this message
   // is just used for the detail reason of exception,
   String msg = "Only " + StringUtils.byteDesc(totalSpace)
-  + " total space under " + filePath + ", not enough for requested 
"
-  + StringUtils.byteDesc(sizePerFile);
++ " total space under " + filePath + ", not enough for requested "
++ StringUtils.byteDesc(sizePerFile);
   LOG.warn(msg);
 }
-rafs[i].setLength(sizePerFile);
+File file = new File(filePath);
+if (file.length() != sizePerFile) {
+  rafs[i].setLength(sizePerFile);
+}
 
 Review comment:
   Ya some fat comments here would be nice.  Got why u have this check now..  I 
can think of a case though.  Say we have a file based cache with one file and 
size was 10 GB.  Now the restart of the RS happening. The cache is persisted 
also.  Before restart the size is been increased to 20 GB. There is no truncate 
and ideally the cache get rebuilt.  Only thing is after the restart the cache 
capacity is increased.  But now as per the code, the length is changed here and 
so the last modified time and which will fail the verify phase.  Is it some 
thing to be considered?  Dont want much complex handling for this.  Might not 
be a common case for persisted cache.  Max what happening is we not able to 
retrieve persisted cache.  But welcoming thinking/suggestion.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325766107
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -59,6 +65,30 @@ public FileIOEngine(long capacity, String... filePaths) 
throws IOException {
 this.fileChannels = new FileChannel[filePaths.length];
 this.rafs = new RandomAccessFile[filePaths.length];
 this.channelLocks = new ReentrantLock[filePaths.length];
+init();
+  }
+
+  @Override
+  public boolean verifyFileIntegrity(byte[] persistentChecksum, String 
algorithm)
+throws IOException {
+try {
+  byte[] calculateChecksum = calculateChecksum(algorithm);
+  if (!Bytes.equals(persistentChecksum, calculateChecksum)) {
+throw new IOException("The persistent checksum is " + 
Bytes.toString(persistentChecksum) +
+  ", but the calculate checksum is " + 
Bytes.toString(calculateChecksum));
+  }
+  return true;
+} catch (IOException ioex) {
+  LOG.error("File verification failed because of ", ioex);
+  reinit();
 
 Review comment:
   Why this reinit only in this case?   In case of checksum mismatch no such 
thing happening no?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325765169
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -59,6 +65,30 @@ public FileIOEngine(long capacity, String... filePaths) 
throws IOException {
 this.fileChannels = new FileChannel[filePaths.length];
 this.rafs = new RandomAccessFile[filePaths.length];
 this.channelLocks = new ReentrantLock[filePaths.length];
+init();
+  }
+
+  @Override
+  public boolean verifyFileIntegrity(byte[] persistentChecksum, String 
algorithm)
+throws IOException {
+try {
+  byte[] calculateChecksum = calculateChecksum(algorithm);
+  if (!Bytes.equals(persistentChecksum, calculateChecksum)) {
+throw new IOException("The persistent checksum is " + 
Bytes.toString(persistentChecksum) +
+  ", but the calculate checksum is " + 
Bytes.toString(calculateChecksum));
+  }
+  return true;
+} catch (IOException ioex) {
+  LOG.error("File verification failed because of ", ioex);
+  reinit();
+  return false;
+} catch (NoSuchAlgorithmException nsae) {
+  LOG.error("No such algorithm " + algorithm, nsae);
+  throw new RuntimeException(nsae);
 
 Review comment:
   What will happen if we throw RTE here? The FileIOE create itself will get 
failed or even more? Retrieve from persistent cached fail need not be a reason 
for the FileIOE to be failed.  But here we have configured this to be a 
persistent FileIOE. Later when we try persist during RS shutdown, we will get 
failure any way (?).  Are u considering that here for this RTE?  Am not really 
sure this throw RTE is correct here. Thoughts


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325763798
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -59,6 +65,30 @@ public FileIOEngine(long capacity, String... filePaths) 
throws IOException {
 this.fileChannels = new FileChannel[filePaths.length];
 this.rafs = new RandomAccessFile[filePaths.length];
 this.channelLocks = new ReentrantLock[filePaths.length];
+init();
+  }
+
+  @Override
+  public boolean verifyFileIntegrity(byte[] persistentChecksum, String 
algorithm)
+throws IOException {
+try {
+  byte[] calculateChecksum = calculateChecksum(algorithm);
+  if (!Bytes.equals(persistentChecksum, calculateChecksum)) {
+throw new IOException("The persistent checksum is " + 
Bytes.toString(persistentChecksum) +
 
 Review comment:
   The response from this method is in 2 ways for IOE or mismatch of checksum. 
In caller place we check for boolean return value basically.  Can we just log 
the issue here and return boolean in any case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325762374
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
 ##
 @@ -288,14 +302,17 @@ public BucketCache(String ioEngineName, long capacity, 
int blockSize, int[] buck
 this.ramCache = new ConcurrentHashMap();
 
 this.backingMap = new ConcurrentHashMap((int) 
blockNumCapacity);
-
-if (ioEngine.isPersistent() && persistencePath != null) {
+if (ioEngine.isPersistent()) {
   try {
-retrieveFromFile(bucketSizes);
+if (persistencePath != null) {
+  retrieveFromFile(bucketSizes);
+} else {
+  ((PersistentIOEngine) ioEngine).deleteCacheDataFile();
 
 Review comment:
   If we delete data file here not, there wont be any functional diff I 
believe.  We were not doing it before. Any reason to do so now?  Or we do this 
delete in master branch now? Not remembering exactly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325760729
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
 ##
 @@ -1021,41 +1038,60 @@ void doDrain(final List entries) throws 
InterruptedException {
 
   private void persistToFile() throws IOException {
 assert !cacheEnabled;
-FileOutputStream fos = null;
-ObjectOutputStream oos = null;
-try {
+try (ObjectOutputStream oos = new ObjectOutputStream(
+  new FileOutputStream(persistencePath, false))){
   if (!ioEngine.isPersistent()) {
 throw new IOException("Attempt to persist non-persistent cache 
mappings!");
   }
-  fos = new FileOutputStream(persistencePath, false);
-  oos = new ObjectOutputStream(fos);
+  oos.write(ProtobufUtil.PB_MAGIC);
+  byte[] checksum = ((PersistentIOEngine) 
ioEngine).calculateChecksum(algorithm);
+  oos.writeInt(checksum.length);
+  oos.write(checksum);
   oos.writeLong(cacheCapacity);
   oos.writeUTF(ioEngine.getClass().getName());
   oos.writeUTF(backingMap.getClass().getName());
   oos.writeObject(deserialiserMap);
   oos.writeObject(backingMap);
-} finally {
-  if (oos != null) oos.close();
-  if (fos != null) fos.close();
+} catch (NoSuchAlgorithmException e) {
+  LOG.error("No such algorithm : " + algorithm + "! Failed to persist data 
on exit",e);
 }
   }
 
   @SuppressWarnings("unchecked")
-  private void retrieveFromFile(int[] bucketSizes) throws IOException, 
BucketAllocatorException,
+  private void retrieveFromFile(int[] bucketSizes) throws IOException,
   ClassNotFoundException {
 File persistenceFile = new File(persistencePath);
 if (!persistenceFile.exists()) {
   return;
 }
 assert !cacheEnabled;
-FileInputStream fis = null;
 ObjectInputStream ois = null;
 try {
   if (!ioEngine.isPersistent())
 throw new IOException(
 "Attempt to restore non-persistent cache mappings!");
-  fis = new FileInputStream(persistencePath);
-  ois = new ObjectInputStream(fis);
+  ois = new ObjectInputStream(new FileInputStream(persistencePath));
+  int pblen = ProtobufUtil.lengthOfPBMagic();
+  byte[] pbuf = new byte[pblen];
+  int read = ois.read(pbuf);
+  if (read != pblen) {
+throw new IOException("Incorrect number of bytes read while checking 
for protobuf magic "
+  + "number. Requested=" + pblen + ", Received= " + read + ", File=" + 
persistencePath);
+  }
+  if (Bytes.equals(ProtobufUtil.PB_MAGIC, pbuf)) {
+int length = ois.readInt();
+byte[] persistentChecksum = new byte[length];
+int readLen = ois.read(persistentChecksum);
+if (readLen != length || !((PersistentIOEngine) 
ioEngine).verifyFileIntegrity(
+persistentChecksum, algorithm)) {
+  return;
 
 Review comment:
   We can add an INFO log here? Or we have the log saying the verify fail in 
FileIOE?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325501278
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -267,6 +339,75 @@ void refreshFileConnection(int accessFileNum, IOException 
ioe) throws IOExceptio
 }
   }
 
+  /**
+   * Delete bucketcache files
+   */
+  private void deleteCacheDataFile() {
+if (filePaths == null) {
+  return;
+}
+for (String file : filePaths) {
+  new File(file).delete();
+}
+  }
+
+  @Override
+  public byte[] calculateChecksum()
+throws IOException, NoSuchAlgorithmException {
+if (filePaths == null) {
+  return null;
+}
+StringBuilder sb = new StringBuilder();
+for (String filePath : filePaths){
+  File file = new File(filePath);
+  if (file.exists()){
+sb.append(filePath);
+sb.append(getFileSize(filePath));
+sb.append(file.lastModified());
+  } else {
+throw new IOException("Cache file: " + filePath + " is not exists.");
+  }
+}
+MessageDigest messageDigest = MessageDigest.getInstance(algorithmName);
+messageDigest.update(Bytes.toBytes(sb.toString()));
+return messageDigest.digest();
+  }
+
+  @Override
+  public boolean isOldVersion() {
 
 Review comment:
   I see now this is been used.   So my point was like read the checksum in 
BucketCache only. And let IT decide whether to do the checksum verify or not 
based on PBMagic.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325500913
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -20,45 +20,107 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.ObjectInputStream;
 import java.io.RandomAccessFile;
 import java.nio.ByteBuffer;
 import java.nio.channels.ClosedByInterruptException;
 import java.nio.channels.ClosedChannelException;
 import java.nio.channels.FileChannel;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.Arrays;
 import java.util.concurrent.locks.ReentrantLock;
 
 import com.google.common.annotations.VisibleForTesting;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.hbase.classification.InterfaceAudience;
+import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.util.Shell;
 import org.apache.hadoop.util.StringUtils;
 
 /**
  * IO engine that stores data to a file on the local file system.
  */
 @InterfaceAudience.Private
-public class FileIOEngine implements IOEngine {
+public class FileIOEngine implements PersistentIOEngine {
   private static final Log LOG = LogFactory.getLog(FileIOEngine.class);
   public static final String FILE_DELIMITER = ",";
+  private static final DuFileCommand DU = new DuFileCommand(new String[] 
{"du", ""});
+
   private final String[] filePaths;
   private final FileChannel[] fileChannels;
   private final RandomAccessFile[] rafs;
   private final ReentrantLock[] channelLocks;
 
   private final long sizePerFile;
   private final long capacity;
+  private final String algorithmName;
+  /**
+   * Whether the persistent file support verify file integrity, old version 
file
+   * does not support verification,
+   */
+  private boolean oldVersion;
 
   private FileReadAccessor readAccessor = new FileReadAccessor();
   private FileWriteAccessor writeAccessor = new FileWriteAccessor();
 
-  public FileIOEngine(long capacity, String... filePaths) throws IOException {
+  public FileIOEngine(String algorithmName, String persistentPath,
+long capacity, String... filePaths) throws IOException {
 this.sizePerFile = capacity / filePaths.length;
 this.capacity = this.sizePerFile * filePaths.length;
 this.filePaths = filePaths;
 this.fileChannels = new FileChannel[filePaths.length];
 this.rafs = new RandomAccessFile[filePaths.length];
 this.channelLocks = new ReentrantLock[filePaths.length];
+this.algorithmName = algorithmName;
+// not configure persistent path
+if (persistentPath == null) {
+  deleteCacheDataFile();
+}
+init();
+  }
+
+  /**
+   * Verify cache files's integrity
+   * @param persistentPath the backingMap persistent path
+   * @param ois the ObjectInputStream to read persistent file
+   * @return true if verify successfully
+   */
+  @Override
+  public boolean verifyFileIntegrity(String persistentPath, ObjectInputStream 
ois)
+throws IOException {
+try {
+  byte[] PBMagic = new byte[ProtobufUtil.PB_MAGIC.length];
+  ois.read(PBMagic);
+  if (Bytes.equals(ProtobufUtil.PB_MAGIC, PBMagic)) {
+int length = ois.readInt();
+byte[] persistentChecksum = new byte[length];
+ois.read(persistentChecksum);
+byte[] calculateChecksum = calculateChecksum();
+if (!Bytes.equals(persistentChecksum, calculateChecksum)) {
+  LOG.warn("The persistent checksum is " + 
Bytes.toString(persistentChecksum) +
+", but the calculate checksum is " + 
Bytes.toString(calculateChecksum));
+  throw new IOException();
+}
+return true;
+  } else {
+// if the persistent file is not start with PB_MAGIC, it's an old 
version file
+oldVersion = true;
 
 Review comment:
   Do we still need this oldVersion state here in FileIOE?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325500780
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -20,45 +20,107 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.ObjectInputStream;
 import java.io.RandomAccessFile;
 import java.nio.ByteBuffer;
 import java.nio.channels.ClosedByInterruptException;
 import java.nio.channels.ClosedChannelException;
 import java.nio.channels.FileChannel;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.Arrays;
 import java.util.concurrent.locks.ReentrantLock;
 
 import com.google.common.annotations.VisibleForTesting;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.hbase.classification.InterfaceAudience;
+import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.util.Shell;
 import org.apache.hadoop.util.StringUtils;
 
 /**
  * IO engine that stores data to a file on the local file system.
  */
 @InterfaceAudience.Private
-public class FileIOEngine implements IOEngine {
+public class FileIOEngine implements PersistentIOEngine {
   private static final Log LOG = LogFactory.getLog(FileIOEngine.class);
   public static final String FILE_DELIMITER = ",";
+  private static final DuFileCommand DU = new DuFileCommand(new String[] 
{"du", ""});
+
   private final String[] filePaths;
   private final FileChannel[] fileChannels;
   private final RandomAccessFile[] rafs;
   private final ReentrantLock[] channelLocks;
 
   private final long sizePerFile;
   private final long capacity;
+  private final String algorithmName;
+  /**
+   * Whether the persistent file support verify file integrity, old version 
file
+   * does not support verification,
+   */
+  private boolean oldVersion;
 
   private FileReadAccessor readAccessor = new FileReadAccessor();
   private FileWriteAccessor writeAccessor = new FileWriteAccessor();
 
-  public FileIOEngine(long capacity, String... filePaths) throws IOException {
+  public FileIOEngine(String algorithmName, String persistentPath,
+long capacity, String... filePaths) throws IOException {
 this.sizePerFile = capacity / filePaths.length;
 this.capacity = this.sizePerFile * filePaths.length;
 this.filePaths = filePaths;
 this.fileChannels = new FileChannel[filePaths.length];
 this.rafs = new RandomAccessFile[filePaths.length];
 this.channelLocks = new ReentrantLock[filePaths.length];
+this.algorithmName = algorithmName;
+// not configure persistent path
+if (persistentPath == null) {
+  deleteCacheDataFile();
+}
+init();
+  }
+
+  /**
+   * Verify cache files's integrity
+   * @param persistentPath the backingMap persistent path
+   * @param ois the ObjectInputStream to read persistent file
+   * @return true if verify successfully
+   */
+  @Override
+  public boolean verifyFileIntegrity(String persistentPath, ObjectInputStream 
ois)
 
 Review comment:
   Ah..  Can we pass the stored checksum only here? Actually the write of the 
checksum was done by BucketCache right?  So expecting the FileIOE to know that 
looks bit ugly. I mean its like PBMagic and then an int for checkusm length and 
so on.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325500197
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
 ##
 @@ -359,24 +373,22 @@ public String getIoEngine() {
 
   /**
* Get the IOEngine from the IO engine name
-   * @param ioEngineName
-   * @param capacity
* @return the IOEngine
* @throws IOException
*/
-  private IOEngine getIOEngineFromName(String ioEngineName, long capacity)
+  private IOEngine getIOEngineFromName()
 
 Review comment:
   See below comment also. Actually there is no need to have any change here 
now. We do not need to pass any extra args to FileIOE constructor now.  No need 
to even change the order of call to getIOEngineFromName() above right. Better 
keep only the req change IMHO.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hbase] anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the file integrity in persistent IOEngine

2019-09-18 Thread GitBox
anoopsjohn commented on a change in pull request #633: HBASE-22890 Verify the 
file integrity in persistent IOEngine
URL: https://github.com/apache/hbase/pull/633#discussion_r325499796
 
 

 ##
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/FileIOEngine.java
 ##
 @@ -20,45 +20,107 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.ObjectInputStream;
 import java.io.RandomAccessFile;
 import java.nio.ByteBuffer;
 import java.nio.channels.ClosedByInterruptException;
 import java.nio.channels.ClosedChannelException;
 import java.nio.channels.FileChannel;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.Arrays;
 import java.util.concurrent.locks.ReentrantLock;
 
 import com.google.common.annotations.VisibleForTesting;
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.hbase.classification.InterfaceAudience;
+import org.apache.hadoop.hbase.protobuf.ProtobufUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.util.Shell;
 import org.apache.hadoop.util.StringUtils;
 
 /**
  * IO engine that stores data to a file on the local file system.
  */
 @InterfaceAudience.Private
-public class FileIOEngine implements IOEngine {
+public class FileIOEngine implements PersistentIOEngine {
   private static final Log LOG = LogFactory.getLog(FileIOEngine.class);
   public static final String FILE_DELIMITER = ",";
+  private static final DuFileCommand DU = new DuFileCommand(new String[] 
{"du", ""});
+
   private final String[] filePaths;
   private final FileChannel[] fileChannels;
   private final RandomAccessFile[] rafs;
   private final ReentrantLock[] channelLocks;
 
   private final long sizePerFile;
   private final long capacity;
+  private final String algorithmName;
+  /**
+   * Whether the persistent file support verify file integrity, old version 
file
+   * does not support verification,
+   */
+  private boolean oldVersion;
 
   private FileReadAccessor readAccessor = new FileReadAccessor();
   private FileWriteAccessor writeAccessor = new FileWriteAccessor();
 
-  public FileIOEngine(long capacity, String... filePaths) throws IOException {
+  public FileIOEngine(String algorithmName, String persistentPath,
+long capacity, String... filePaths) throws IOException {
 this.sizePerFile = capacity / filePaths.length;
 this.capacity = this.sizePerFile * filePaths.length;
 this.filePaths = filePaths;
 this.fileChannels = new FileChannel[filePaths.length];
 this.rafs = new RandomAccessFile[filePaths.length];
 this.channelLocks = new ReentrantLock[filePaths.length];
+this.algorithmName = algorithmName;
+// not configure persistent path
+if (persistentPath == null) {
 
 Review comment:
   This is some thing new we are doing now. Previously this delete step was not 
there if the files exists, and no persistentPath been configured.
   In fact am wondering why we need to pass the persistentPath and algo here 
now?  We call the verify method and there pass the persistentPath  and can pass 
algo or so also.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services