[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-09 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312596657
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.ozone.recon.tasks.
+OMDBUpdateEvent.OMDBUpdateAction.PUT;
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(OmKeyInfo.class)
+
+public class TestFileSizeCountTask {
+  @Test
+  public void testCalculateBinIndex() {
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);// 1 PB
+when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(42);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+when(fileSizeCountTask.nextClosestPowerIndexOfTwo(
+anyLong())).thenCallRealMethod();
+
+long fileSize = 1024L;// 1 KB
+int binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(1, binIndex);
+
+fileSize = 1023L;// 1KB - 1B
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(0, binIndex);
+
+fileSize = 562949953421312L;  // 512 TB
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421313L;  // (512 TB + 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421311L;  // (512 TB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(39, binIndex);
+
+fileSize = 1125899906842624L;  // 1 PB - last (extra) bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = 10L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(7, binIndex);
+
+fileSize = 1125899906842623L;  // (1 PB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 1125899906842624L * 4;  // 4 PB - last extra bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = Long.MAX_VALUE;// extra bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+  }
+
+  @Test
+  public void testFileCountBySizeReprocess() throws IOException {
 
 Review comment:
   We will need another test to cover process method and `DELETE` event. But, I 
am ok with adding it in another JIRA. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-09 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r312595826
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.ozone.recon.tasks.
+OMDBUpdateEvent.OMDBUpdateAction.PUT;
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
 
 Review comment:
   nit: change this to File Size Count task


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-07 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311801586
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -70,39 +77,51 @@ public void testGetFileCounts() throws IOException {
 verify(utilizationService, times(1)).getFileCounts();
 verify(fileCountBySizeDao, times(1)).findAll();
 
-assertEquals(41, resultList.size());
-long fileSize = 4096L;
+assertEquals(maxBinSize, resultList.size());
+long fileSize = 4096L;  // 4KB
 int index =  findIndex(fileSize);
 long count = resultList.get(index).getCount();
 assertEquals(index, count);
 
-fileSize = 1125899906842624L;
+fileSize = 1125899906842624L;   // 1PB
 index = findIndex(fileSize);
-if (index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size");
-}
+count = resultList.get(index).getCount();
+assertEquals(maxBinSize - 1, index);
+assertEquals(index, count);
 
-fileSize = 1025L;
+fileSize = 1025L;   // 1 KB + 1B
 index = findIndex(fileSize);
-count = resultList.get(index).getCount();
+count = resultList.get(index).getCount(); //last extra bin for files >= 1PB
 assertEquals(index, count);
 
 fileSize = 25L;
 index = findIndex(fileSize);
 count = resultList.get(index).getCount();
 assertEquals(index, count);
+
+fileSize = 1125899906842623L;   // 1PB - 1B
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L * 4;   // 4 PB
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(maxBinSize - 1, index);
+assertEquals(index, count);
   }
 
   public int findIndex(long dataSize) {
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if (logValue < 10) {
-  return 0;
-} else {
-  int index = logValue - 10;
-  if (index > maxBinSize) {
-return Integer.MIN_VALUE;
-  }
-  return (dataSize % oneKb == 0) ? index + 1 : index;
+if (dataSize > Math.pow(2, (maxBinSize + 10 - 2))) {  // 1 PB = 2 ^ 50
+  return maxBinSize - 1;
+}
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 
 Review comment:
   This makes the unit test void. If we have the same logic used in the actual 
methods here, then the unit tests are always going to assert to true. We should 
use constant values to test against the actual methods.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-07 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311800232
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -98,7 +99,9 @@ protected int getMaxBinSize() {
 keyIter = omKeyInfoTable.iterator()) {
   while (keyIter.hasNext()) {
 Table.KeyValue kv = keyIter.next();
-countFileSize(kv.getValue());
+
+// reprocess() is a PUT operation on the DB.
+updateUpperBoundCount(kv.getValue(), "PUT");
 
 Review comment:
   nit: update this with enum `Operation.PUT` in the future. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-07 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311799843
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -78,7 +78,8 @@ protected long getMaxFileSizeUpperBound() {
   protected int getMaxBinSize() {
 if (maxBinSize == -1) {
   // extra bin to add files > 1PB.
-  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+  // 1 KB (2 ^ 10) is the smallest tracked file.
+  maxBinSize = nextClosetPowerIndexOfTwo(maxFileSizeUpperBound) - 10 + 1;
 
 Review comment:
   nit: typo in `nextClosestPowerIndexOfTwo`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-07 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311801756
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -116,13 +126,13 @@ public void testFileCountBySizeReprocess() throws 
IOException {
 when(fileSizeCountTask.getMaxFileSizeUpperBound()).
 thenReturn(4096L);
 when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
-when(fileSizeCountTask.getMaxBinSize()).thenReturn(3);
+//when(fileSizeCountTask.getMaxBinSize()).thenReturn(3);
 
 Review comment:
   This line should be removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311372538
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+
+/**
+ * Test for File size count service.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService {
+  private UtilizationService utilizationService;
+  @Mock private FileCountBySizeDao fileCountBySizeDao;
+  private List resultList = new ArrayList<>();
+  private int oneKb = 1024;
+  private int maxBinSize = 41;
+
+  public void setUpResultList() {
+for(int i = 0; i < 41; i++){
+  resultList.add(new FileCountBySize((long) Math.pow(2, (10+i)), (long) 
i));
+}
+  }
+
+  @Test
+  public void testGetFileCounts() throws IOException {
+setUpResultList();
+
+utilizationService = mock(UtilizationService.class);
+when(utilizationService.getFileCounts()).thenCallRealMethod();
+when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
+when(fileCountBySizeDao.findAll()).thenReturn(resultList);
+
+utilizationService.getFileCounts();
+verify(utilizationService, times(1)).getFileCounts();
+verify(fileCountBySizeDao, times(1)).findAll();
+
+assertEquals(41, resultList.size());
+long fileSize = 4096L;
+int index =  findIndex(fileSize);
+long count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L;
+index = findIndex(fileSize);
+if (index == Integer.MIN_VALUE) {
+  throw new IOException("File Size larger than permissible file size");
+}
+
+fileSize = 1025L;
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 25L;
+index = findIndex(fileSize);
+count = resultList.get(index).getCount();
+assertEquals(index, count);
+  }
+
+  public int findIndex(long dataSize) {
+int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
+if (logValue < 10) {
+  return 0;
+} else {
+  int index = logValue - 10;
+  if (index > maxBinSize) {
+return Integer.MIN_VALUE;
 
 Review comment:
   This needs to be updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311368897
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if (resultSet != null) {
+  for (FileCountBySize row : resultSet) {
+upperBoundCount[index] = row.getCount();
+index++;
+  }
+}
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+ 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311371119
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if (resultSet != null) {
+  for (FileCountBySize row : resultSet) {
+upperBoundCount[index] = row.getCount();
+index++;
+  }
+}
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+ 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311368484
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if (resultSet != null) {
+  for (FileCountBySize row : resultSet) {
+upperBoundCount[index] = row.getCount();
+index++;
+  }
+}
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+ 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311366032
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon-codegen/src/main/java/org/hadoop/ozone/recon/schema/UtilizationSchemaDefinition.java
 ##
 @@ -65,5 +69,12 @@ void createClusterGrowthTable(Connection conn) {
 .execute();
   }
 
-
+  void createFileSizeCount(Connection conn) {
+DSL.using(conn).createTableIfNotExists(FILE_COUNT_BY_SIZE_TABLE_NAME)
+.column("file_size_kb", SQLDataType.BIGINT)
 
 Review comment:
   Aren't we storing file size in bytes? Can we change this to just file_size?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311369498
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if (resultSet != null) {
+  for (FileCountBySize row : resultSet) {
+upperBoundCount[index] = row.getCount();
+index++;
+  }
+}
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+ 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311371872
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,108 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+
+/**
+ * Test for File size count service.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService {
+  private UtilizationService utilizationService;
+  @Mock private FileCountBySizeDao fileCountBySizeDao;
+  private List resultList = new ArrayList<>();
+  private int oneKb = 1024;
+  private int maxBinSize = 41;
+
+  public void setUpResultList() {
+for(int i = 0; i < 41; i++){
+  resultList.add(new FileCountBySize((long) Math.pow(2, (10+i)), (long) 
i));
+}
+  }
+
+  @Test
+  public void testGetFileCounts() throws IOException {
+setUpResultList();
+
+utilizationService = mock(UtilizationService.class);
+when(utilizationService.getFileCounts()).thenCallRealMethod();
+when(utilizationService.getDao()).thenReturn(fileCountBySizeDao);
+when(fileCountBySizeDao.findAll()).thenReturn(resultList);
+
+utilizationService.getFileCounts();
+verify(utilizationService, times(1)).getFileCounts();
+verify(fileCountBySizeDao, times(1)).findAll();
+
+assertEquals(41, resultList.size());
+long fileSize = 4096L;
+int index =  findIndex(fileSize);
+long count = resultList.get(index).getCount();
+assertEquals(index, count);
+
+fileSize = 1125899906842624L;
+index = findIndex(fileSize);
+if (index == Integer.MIN_VALUE) {
 
 Review comment:
   This is not required


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311374276
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestFileSizeCountTask.java
 ##
 @@ -0,0 +1,129 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.OmMetadataManagerImpl;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.TypedTable;
+import org.junit.Test;
+
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.io.IOException;
+
+import static org.junit.Assert.assertEquals;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.BDDMockito.given;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.times;
+import static org.powermock.api.mockito.PowerMockito.mock;
+import static org.powermock.api.mockito.PowerMockito.when;
+
+/**
+ * Unit test for Container Key mapper task.
+ */
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(OmKeyInfo.class)
+
+public class TestFileSizeCountTask {
+  @Test
+  public void testCalculateBinIndex() {
+FileSizeCountTask fileSizeCountTask = mock(FileSizeCountTask.class);
+
+when(fileSizeCountTask.getMaxFileSizeUpperBound()).
+thenReturn(1125899906842624L);// 1 PB
+when(fileSizeCountTask.getOneKB()).thenReturn(1024L);
+when(fileSizeCountTask.getMaxBinSize()).thenReturn(42);
+when(fileSizeCountTask.calculateBinIndex(anyLong())).thenCallRealMethod();
+
+long fileSize = 1024L;// 1 KB
+int binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(1, binIndex);
+
+fileSize = 1023L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(0, binIndex);
+
+fileSize = 562949953421312L;  // 512 TB
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421313L;  // (512 TB + 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(40, binIndex);
+
+fileSize = 562949953421311L;  // (512 TB - 1B)
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(39, binIndex);
+
+fileSize = 1125899906842624L;  // 1 PB - last (extra) bin
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(41, binIndex);
+
+fileSize = 10L;
+binIndex = fileSizeCountTask.calculateBinIndex(fileSize);
+assertEquals(7, binIndex);
+
+fileSize = 1125899906842623L;
 
 Review comment:
   I suppose this is 1 PB - 1B. Can you add a comment for this one and the 
previous one as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311369007
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,241 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 2Kb..,4MB,.., 1TB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = -1;
+  private long maxFileSizeUpperBound = 1125899906842624L; // 1 PB
+  private long[] upperBoundCount;
+  private long oneKb = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return oneKb;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
+if (maxBinSize == -1) {
+  // extra bin to add files > 1PB.
+  maxBinSize = calculateBinIndex(maxFileSizeUpperBound) + 1;
+}
+return maxBinSize;
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while (keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+}
+populateFileCountBySizeDB();
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  void updateCountFromDB() {
+// Read - Write operations to DB are in ascending order
+// of file size upper bounds.
+List resultSet = fileCountBySizeDao.findAll();
+int index = 0;
+if (resultSet != null) {
+  for (FileCountBySize row : resultSet) {
+upperBoundCount[index] = row.getCount();
+index++;
+  }
+}
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+ 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311309493
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 
 Review comment:
   Yes, it should be `LONG.MAX_VALUE`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311293167
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -65,6 +64,23 @@ public FileSizeCountTask(OMMetadataManager 
omMetadataManager,
 } catch (Exception e) {
   LOG.error("Unable to fetch Key Table updates ", e);
 }
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
 
 Review comment:
   Can we change this method to take `fileSize` as an argument? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311304029
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 
 Review comment:
   From the logic to calculate bins, the last bin covers the total count of all 
the files > maxFileSizeUpperBound. But, while writing to DB, the last bin's key 
is written as `maxFileSizeUpperBound^2`. In this case, the last bin upper bound 
will be written as 2PB which is wrong. Can we change this logic to have last 
bin as `Integer.MAX_VALUE`? And can you verify this in the unit test for API 
response as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311301854
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
 
 Review comment:
   Can you add a comment as to why we need to subtract this value by 10?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311301668
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -155,70 +164,70 @@ private void fetchUpperBoundCount(String type) {
 LOG.error("Unexpected exception while updating key data : {} {}",
 updatedKey, e.getMessage());
 return new ImmutablePair<>(getTaskName(), false);
-  } finally {
-populateFileCountBySizeDB();
   }
+  populateFileCountBySizeDB();
 }
 LOG.info("Completed a 'process' run of FileSizeCountTask.");
 return new ImmutablePair<>(getTaskName(), true);
   }
 
   /**
* Calculate the bin index based on size of the Key.
+   * index is calculated as the number of right shifts
+   * needed until dataSize becomes zero.
*
* @param dataSize Size of the key.
* @return int bin index in upperBoundCount
*/
-  private int calcBinIndex(long dataSize) {
-if(dataSize >= maxFileSizeUpperBound) {
-  return Integer.MIN_VALUE;
-} else if (dataSize > SIZE_512_TB) {
-  //given the small difference in 512TB and 512TB + 1B, index for both 
would
-  //return same, to differentiate specific condition added.
-  return maxBinSize - 1;
-}
-int logValue = (int) Math.ceil(Math.log(dataSize)/Math.log(2));
-if(logValue < 10){
-  return 0;
-} else{
-  return (dataSize % ONE_KB == 0) ? logValue - 10 + 1: logValue - 10;
+  int calculateBinIndex(long dataSize) {
+int index = 0;
+while(dataSize != 0) {
+  dataSize >>= 1;
+  index += 1;
 }
+return index < 10 ? 0 : index - 10;
   }
 
-  private void countFileSize(OmKeyInfo omKeyInfo) throws IOException{
-int index = calcBinIndex(omKeyInfo.getDataSize());
-if(index == Integer.MIN_VALUE) {
-  throw new IOException("File Size larger than permissible file size "
-  + maxFileSizeUpperBound +" bytes");
+  void countFileSize(OmKeyInfo omKeyInfo) {
+int index;
+if (omKeyInfo.getDataSize() >= maxFileSizeUpperBound) {
+  index = maxBinSize - 1;
+} else {
+  index = calculateBinIndex(omKeyInfo.getDataSize());
 }
 upperBoundCount[index]++;
   }
 
-  private void populateFileCountBySizeDB() {
+  /**
+   * Populate DB with the counts of file sizes calculated
+   * using the dao.
+   *
+   */
+  void populateFileCountBySizeDB() {
 for (int i = 0; i < upperBoundCount.length; i++) {
   long fileSizeUpperBound = (long) Math.pow(2, (10 + i));
   FileCountBySize fileCountRecord =
   fileCountBySizeDao.findById(fileSizeUpperBound);
   FileCountBySize newRecord = new
   FileCountBySize(fileSizeUpperBound, upperBoundCount[i]);
-  if(fileCountRecord == null){
+  if (fileCountRecord == null) {
 fileCountBySizeDao.insert(newRecord);
-  } else{
+  } else {
 fileCountBySizeDao.update(newRecord);
   }
 }
   }
 
   private void updateUpperBoundCount(OmKeyInfo value, String operation)
   throws IOException {
-int binIndex = calcBinIndex(value.getDataSize());
-if(binIndex == Integer.MIN_VALUE) {
+int binIndex = calculateBinIndex(value.getDataSize());
+if (binIndex == Integer.MIN_VALUE) {
 
 Review comment:
   This is not required as `calculateBinIndex` will never return 
`Integer.MIN_VALUE`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-08-06 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r311293167
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -65,6 +64,23 @@ public FileSizeCountTask(OMMetadataManager 
omMetadataManager,
 } catch (Exception e) {
   LOG.error("Unable to fetch Key Table updates ", e);
 }
+upperBoundCount = new long[getMaxBinSize()];
+  }
+
+  protected long getOneKB() {
+return ONE_KB;
+  }
+
+  protected long getMaxFileSizeUpperBound() {
+return maxFileSizeUpperBound;
+  }
+
+  protected int getMaxBinSize() {
 
 Review comment:
   Can we make this method to take `fileSize` as an argument? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-31 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309410908
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
 
 Review comment:
   Can we change these ranges to (2KB, 4KB, 8KB, 16KB, ...1PB) ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-31 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r309408238
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,231 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private int maxBinSize = 41;
+  private long maxFileSizeUpperBound = 1125899906842624L;
+  private long SIZE_512_TB = 562949953421312L;
+  private long[] upperBoundCount = new long[maxBinSize];
+  private long ONE_KB = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager,
+  Configuration sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+fetchUpperBoundCount("reprocess");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+
+} catch (IOException ioEx) {
+  LOG.error("Unable to populate File Size Count in Recon DB. ", ioEx);
+  return new ImmutablePair<>(getTaskName(), false);
+} finally {
+  populateFileCountBySizeDB();
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void fetchUpperBoundCount(String type) {
+if(type.equals("process")) {
+  List resultSet = fileCountBySizeDao.findAll();
+  int index = 0;
+  if(resultSet != null) {
+for (FileCountBySize row : resultSet) {
+  upperBoundCount[index] = row.getCount();
+  index++;
+}
+  }
+} else {
+  upperBoundCount = new long[maxBinSize];
+}
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+
+fetchUpperBoundCount("process");
+
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-23 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306565085
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
+  private OMMetadataManager omMetadataManager;
+  private GuiceInjectorUtilsForTestsImpl guiceInjectorTest =
+  new GuiceInjectorUtilsForTestsImpl();
+  private boolean isSetupDone = false;
+  private UtilizationService utilizationService;
+
+  private Injector getInjector() {
+return injector;
+  }
+  private Configuration sqlConfiguration;
+
+  private void initializeInjector() throws Exception {
+omMetadataManager = initializeNewOmMetadataManager();
+OzoneConfiguration configuration =
+guiceInjectorTest.getTestOzoneConfiguration(temporaryFolder);
+
+ozoneManagerServiceProvider = new OzoneManagerServiceProviderImpl(
+configuration);
+ReconOMMetadataManager reconOMMetadataManager =
+getTestMetadataManager(omMetadataManager);
+
+Injector parentInjector = guiceInjectorTest.getInjector(
+ozoneManagerServiceProvider, reconOMMetadataManager, temporaryFolder);
+
+injector = parentInjector.createChildInjector(new AbstractModule() {
+  @Override
+  protected void configure() {
+utilizationService = new UtilizationService();
+bind(UtilizationService.class).toInstance(utilizationService);
+  }
+});
+  }
+
+  @Before
+  public void setUp() throws Exception {
+
+if (!isSetupDone) {
 
 Review comment:
   This is because power mock runner does not apply JUnit class rules. 
https://github.com/apache/hadoop/pull/1055#discussion_r301358174 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-23 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306564640
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/api/ContainerKeyService.java
 ##
 @@ -38,6 +37,7 @@
 import javax.ws.rs.core.MediaType;
 import javax.ws.rs.core.Response;
 
+import javax.inject.Inject;
 
 Review comment:
   We use javax inject for servlets and bridge them with guice bindings here - 
https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/ReconRestServletModule.java#L110


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-23 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306563171
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestUtilizationService.java
 ##
 @@ -0,0 +1,177 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import com.google.inject.AbstractModule;
+import com.google.inject.Injector;
+import org.apache.hadoop.hdds.client.BlockID;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup;
+import org.apache.hadoop.ozone.recon.AbstractOMMetadataManagerTest;
+import org.apache.hadoop.ozone.recon.GuiceInjectorUtilsForTestsImpl;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask;
+import org.hadoop.ozone.recon.schema.UtilizationSchemaDefinition;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import javax.ws.rs.core.Response;
+import java.util.*;
+
+import static org.junit.Assert.assertEquals;
+
+@RunWith(PowerMockRunner.class)
+@PowerMockIgnore({"javax.management.*", "javax.net.ssl.*"})
+@PrepareForTest(ReconUtils.class)
+public class TestUtilizationService extends AbstractOMMetadataManagerTest {
+
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private Injector injector;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;  
//should we use interface?
 
 Review comment:
   Do we need this comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-23 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306562822
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, "PUT");
+break;
+
+  case DELETE:
+updateCountForKey(updatedValue, "DELETE");
+break;
+
+  default: LOG.debug("Skipping DB update event : " + omdbUpdateEvent
+  .getAction());
+}
+
+  } catch (IOException e) {
+LOG.error("Unexpected exception while updating key data : {} ",
+updatedKey, e);
+return 

[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-23 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306562245
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
+return new ImmutablePair<>(getTaskName(), false);
+}
+
+LOG.info("Completed a 'reprocess' run of FileSizeCountTask.");
+return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  protected Collection getTaskTables() {
+return tables;
+  }
+
+  /**
+   * Read the Keys from update events and update the count of files
+   * pertaining to a certain upper bound.
+   *
+   * @param events Update events - PUT/DELETE.
+   * @return Pair
+   */
+  @Override
+  Pair process(OMUpdateEventBatch events) {
+LOG.info("Starting a 'process' run of FileSizeCountTask.");
+Iterator eventIterator = events.getIterator();
+while (eventIterator.hasNext()) {
+  OMDBUpdateEvent omdbUpdateEvent = 
eventIterator.next();
+  String updatedKey = omdbUpdateEvent.getKey();
+  OmKeyInfo updatedValue = omdbUpdateEvent.getValue();
+
+  try{
+switch (omdbUpdateEvent.getAction()) {
+  case PUT:
+updateCountForKey(updatedValue, "PUT");
 
 Review comment:
   Let's not call the `updateCountForKey` method within the iteration. Let's 
keep a local counter array similar to upperBoundCount and increment or 
decrement it based on `PUT` or `DELETE`. After iteration, we can then call 
`updateCountForKey` with the local counter array. This will limit the number of 
database `SELECT` and `UPDATE` operations to just one time.


[GitHub] [hadoop] vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add ability in Recon to track the number of small files in an Ozone Cluster

2019-07-23 Thread GitBox
vivekratnavel commented on a change in pull request #1146: HDDS-1366. Add 
ability in Recon to track the number of small files in an Ozone Cluster
URL: https://github.com/apache/hadoop/pull/1146#discussion_r306501528
 
 

 ##
 File path: 
hadoop-ozone/ozone-recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/FileSizeCountTask.java
 ##
 @@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import com.google.inject.Inject;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.utils.db.Table;
+import org.apache.hadoop.utils.db.TableIterator;
+import org.hadoop.ozone.recon.schema.tables.daos.FileCountBySizeDao;
+import org.hadoop.ozone.recon.schema.tables.pojos.FileCountBySize;
+import org.jooq.Configuration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Iterator;
+
+/**
+ * Class to iterate over the OM DB and store the counts of existing/new
+ * files binned into ranges (1KB, 10Kb..,10MB,..1PB) to the Recon
+ * fileSize DB.
+ */
+public class FileSizeCountTask extends ReconDBUpdateTask {
+  private static final Logger LOG =
+  LoggerFactory.getLogger(FileSizeCountTask.class);
+
+  private long[] upperBoundCount = new long[16];
+  private long minFileSize = 1024L;
+  private Collection tables = new ArrayList<>();
+  private FileCountBySizeDao fileCountBySizeDao;
+
+  @Inject
+  public FileSizeCountTask(OMMetadataManager omMetadataManager, Configuration 
sqlConfiguration) {
+super("FileSizeCountTask");
+try {
+  tables.add(omMetadataManager.getKeyTable().getName());
+  fileCountBySizeDao = new FileCountBySizeDao(sqlConfiguration);
+} catch (Exception e) {
+  LOG.error("Unable to fetch Key Table updates ", e);
+}
+  }
+
+  /**
+   * Read the Keys from OM snapshot DB and calculate the upper bound of
+   * File Size it belongs to.
+   *
+   * @param omMetadataManager OM Metadata instance.
+   * @return Pair
+   */
+  @Override
+  public Pair reprocess(OMMetadataManager omMetadataManager) {
+LOG.info("Starting a 'reprocess' run of FileSizeCountTask.");
+
+Table omKeyInfoTable = omMetadataManager.getKeyTable();
+try (TableIterator>
+keyIter = omKeyInfoTable.iterator()) {
+  while(keyIter.hasNext()) {
+Table.KeyValue kv = keyIter.next();
+countFileSize(kv.getValue());
+  }
+  populateFileCountBySizeDB();
+
+} catch (IOException ioEx) {
+LOG.error("Unable to populate Container Key Prefix data in Recon DB. ", 
ioEx);
 
 Review comment:
   Can we log a better error message here? We are only trying to populate file 
counts by size and not container key prefix here. Also, the indentation seems 
to be off here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org