[jira] [Updated] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size

2017-03-01 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16014:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~vihangk1]. I committed to master.

> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of 
> hive.mv.files.thread for pool size
> --
>
> Key: HIVE-16014
> URL: https://issues.apache.org/jira/browse/HIVE-16014
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-16014.01.patch, HIVE-16014.02.patch, 
> HIVE-16014.03.patch
>
>
> HiveMetastoreChecker uses hive.mv.files.thread configuration value for 
> determining the pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set allDirs, int 
> maxDepth) throws IOException, HiveException {
> ConcurrentLinkedQueue basePaths = new ConcurrentLinkedQueue<>();
> basePaths.add(basePath);
> Set dirSet = Collections.newSetFromMap(new ConcurrentHashMap Boolean>());
> // Here we just reuse the THREAD_COUNT configuration for
> // HIVE_MOVE_FILES_THREAD_COUNT
> int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 
> 15);
> // Check if too low config is provided for move files. 2x CPU is 
> reasonable max count.
> poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
> Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the 
> Filesystem. In such a case different pool sizes for HMSHandler and 
> HiveMetastoreChecker can affect the performance. Eg. If 
> {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and 
> {{hive.mv.files.thread}} is much higher like 100 or vice versa the smaller 
> pool will become the bottleneck. If would be good to use 
> {{hive.metastore.fshandler.threads}} to size the pool for 
> HiveMetastoreChecker since the number missing partitions and number of 
> partitions to be added will most likely be the same. In such a case the 
> performance of the query will be optimum when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very 
> likely that they may be different. But since there is a strong co-relation 
> between amount of work done by HiveMetastoreChecker and 
> HiveMetastore.add_partitions call it might be a good idea to use 
> {{hive.metastore.fshandler.threads}} for pool size instead of 
> {{hive.mv.files.thread}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size

2017-02-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16014:
---
Attachment: HIVE-16014.03.patch

Attaching the patch after rebasing to the latest code in master branch and 
resolving conflicts

> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of 
> hive.mv.files.thread for pool size
> --
>
> Key: HIVE-16014
> URL: https://issues.apache.org/jira/browse/HIVE-16014
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16014.01.patch, HIVE-16014.02.patch, 
> HIVE-16014.03.patch
>
>
> HiveMetastoreChecker uses hive.mv.files.thread configuration value for 
> determining the pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set allDirs, int 
> maxDepth) throws IOException, HiveException {
> ConcurrentLinkedQueue basePaths = new ConcurrentLinkedQueue<>();
> basePaths.add(basePath);
> Set dirSet = Collections.newSetFromMap(new ConcurrentHashMap Boolean>());
> // Here we just reuse the THREAD_COUNT configuration for
> // HIVE_MOVE_FILES_THREAD_COUNT
> int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 
> 15);
> // Check if too low config is provided for move files. 2x CPU is 
> reasonable max count.
> poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
> Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the 
> Filesystem. In such a case different pool sizes for HMSHandler and 
> HiveMetastoreChecker can affect the performance. Eg. If 
> {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and 
> {{hive.mv.files.thread}} is much higher like 100 or vice versa the smaller 
> pool will become the bottleneck. If would be good to use 
> {{hive.metastore.fshandler.threads}} to size the pool for 
> HiveMetastoreChecker since the number missing partitions and number of 
> partitions to be added will most likely be the same. In such a case the 
> performance of the query will be optimum when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very 
> likely that they may be different. But since there is a strong co-relation 
> between amount of work done by HiveMetastoreChecker and 
> HiveMetastore.add_partitions call it might be a good idea to use 
> {{hive.metastore.fshandler.threads}} for pool size instead of 
> {{hive.mv.files.thread}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size

2017-02-23 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16014:
---
Attachment: HIVE-16014.02.patch

> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of 
> hive.mv.files.thread for pool size
> --
>
> Key: HIVE-16014
> URL: https://issues.apache.org/jira/browse/HIVE-16014
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16014.01.patch, HIVE-16014.02.patch
>
>
> HiveMetastoreChecker uses hive.mv.files.thread configuration value for 
> determining the pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set allDirs, int 
> maxDepth) throws IOException, HiveException {
> ConcurrentLinkedQueue basePaths = new ConcurrentLinkedQueue<>();
> basePaths.add(basePath);
> Set dirSet = Collections.newSetFromMap(new ConcurrentHashMap Boolean>());
> // Here we just reuse the THREAD_COUNT configuration for
> // HIVE_MOVE_FILES_THREAD_COUNT
> int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 
> 15);
> // Check if too low config is provided for move files. 2x CPU is 
> reasonable max count.
> poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
> Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the 
> Filesystem. In such a case different pool sizes for HMSHandler and 
> HiveMetastoreChecker can affect the performance. Eg. If 
> {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and 
> {{hive.mv.files.thread}} is much higher like 100 or vice versa the smaller 
> pool will become the bottleneck. If would be good to use 
> {{hive.metastore.fshandler.threads}} to size the pool for 
> HiveMetastoreChecker since the number missing partitions and number of 
> partitions to be added will most likely be the same. In such a case the 
> performance of the query will be optimum when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very 
> likely that they may be different. But since there is a strong co-relation 
> between amount of work done by HiveMetastoreChecker and 
> HiveMetastore.add_partitions call it might be a good idea to use 
> {{hive.metastore.fshandler.threads}} for pool size instead of 
> {{hive.mv.files.thread}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size

2017-02-23 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16014:
---
Status: Patch Available  (was: Open)

Hi [~spena] can you please review. Its a simple patch to use a different config 
for the sizing the thread pool.

> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of 
> hive.mv.files.thread for pool size
> --
>
> Key: HIVE-16014
> URL: https://issues.apache.org/jira/browse/HIVE-16014
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16014.01.patch
>
>
> HiveMetastoreChecker uses hive.mv.files.thread configuration value for 
> determining the pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set allDirs, int 
> maxDepth) throws IOException, HiveException {
> ConcurrentLinkedQueue basePaths = new ConcurrentLinkedQueue<>();
> basePaths.add(basePath);
> Set dirSet = Collections.newSetFromMap(new ConcurrentHashMap Boolean>());
> // Here we just reuse the THREAD_COUNT configuration for
> // HIVE_MOVE_FILES_THREAD_COUNT
> int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 
> 15);
> // Check if too low config is provided for move files. 2x CPU is 
> reasonable max count.
> poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
> Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the 
> Filesystem. In such a case different pool sizes for HMSHandler and 
> HiveMetastoreChecker can affect the performance. Eg. If 
> {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and 
> {{hive.mv.files.thread}} is much higher like 100 or vice versa the smaller 
> pool will become the bottleneck. If would be good to use 
> {{hive.metastore.fshandler.threads}} to size the pool for 
> HiveMetastoreChecker since the number missing partitions and number of 
> partitions to be added will most likely be the same. In such a case the 
> performance of the query will be optimum when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very 
> likely that they may be different. But since there is a strong co-relation 
> between amount of work done by HiveMetastoreChecker and 
> HiveMetastore.add_partitions call it might be a good idea to use 
> {{hive.metastore.fshandler.threads}} for pool size instead of 
> {{hive.mv.files.thread}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size

2017-02-23 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16014:
---
Attachment: HIVE-16014.01.patch

> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of 
> hive.mv.files.thread for pool size
> --
>
> Key: HIVE-16014
> URL: https://issues.apache.org/jira/browse/HIVE-16014
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16014.01.patch
>
>
> HiveMetastoreChecker uses hive.mv.files.thread configuration value for 
> determining the pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set allDirs, int 
> maxDepth) throws IOException, HiveException {
> ConcurrentLinkedQueue basePaths = new ConcurrentLinkedQueue<>();
> basePaths.add(basePath);
> Set dirSet = Collections.newSetFromMap(new ConcurrentHashMap Boolean>());
> // Here we just reuse the THREAD_COUNT configuration for
> // HIVE_MOVE_FILES_THREAD_COUNT
> int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 
> 15);
> // Check if too low config is provided for move files. 2x CPU is 
> reasonable max count.
> poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
> Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the 
> Filesystem. In such a case different pool sizes for HMSHandler and 
> HiveMetastoreChecker can affect the performance. Eg. If 
> {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and 
> {{hive.mv.files.thread}} is much higher like 100 or vice versa the smaller 
> pool will become the bottleneck. If would be good to use 
> {{hive.metastore.fshandler.threads}} to size the pool for 
> HiveMetastoreChecker since the number missing partitions and number of 
> partitions to be added will most likely be the same. In such a case the 
> performance of the query will be optimum when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very 
> likely that they may be different. But since there is a strong co-relation 
> between amount of work done by HiveMetastoreChecker and 
> HiveMetastore.add_partitions call it might be a good idea to use 
> {{hive.metastore.fshandler.threads}} for pool size instead of 
> {{hive.mv.files.thread}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)