[jira] Updated: (HIVE-2026) Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2026:
-

Attachment: HIVE-2026.patch

running tests and will update a review board. 

 Parallelize UpdateInputAccessTimeHook
 -

 Key: HIVE-2026
 URL: https://issues.apache.org/jira/browse/HIVE-2026
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2026.patch


 UpdateInputAccessTimeHook is usually used as a pre-execution hook to update 
 the metastore's lastAccessTime field of input partition/table. If a query 
 touches a large number of partitions, this hooks takes a long time to 
 execute. One approach is to make the hook itself to run in a separate thread. 
 But it is hard to guarantee backward compatibility in semantics in case of 
 exceptions encountered in the hook execution. This task takes another 
 approach to parallelize the hook itself (update multiple partitions 
 concurrently), but execute each pre-hook in sequential order. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: HIVE-2026. Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/460/
---

Review request for hive.


Summary
---

define hive.hooks.parallel.degree to control max # of thread to update 
metastore in parallel. 


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1076459 
  trunk/conf/hive-default.xml 1076459 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1076459 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
 1076459 

Diff: https://reviews.apache.org/r/460/diff


Testing
---


Thanks,

Ning



[jira] Commented: (HIVE-2026) Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13001923#comment-13001923
 ] 

Ning Zhang commented on HIVE-2026:
--

review board: https://reviews.apache.org/r/460/

 Parallelize UpdateInputAccessTimeHook
 -

 Key: HIVE-2026
 URL: https://issues.apache.org/jira/browse/HIVE-2026
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2026.patch


 UpdateInputAccessTimeHook is usually used as a pre-execution hook to update 
 the metastore's lastAccessTime field of input partition/table. If a query 
 touches a large number of partitions, this hooks takes a long time to 
 execute. One approach is to make the hook itself to run in a separate thread. 
 But it is hard to guarantee backward compatibility in semantics in case of 
 exceptions encountered in the hook execution. This task takes another 
 approach to parallelize the hook itself (update multiple partitions 
 concurrently), but execute each pre-hook in sequential order. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-2026) Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2026:
-

Status: Patch Available  (was: Open)

 Parallelize UpdateInputAccessTimeHook
 -

 Key: HIVE-2026
 URL: https://issues.apache.org/jira/browse/HIVE-2026
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2026.patch


 UpdateInputAccessTimeHook is usually used as a pre-execution hook to update 
 the metastore's lastAccessTime field of input partition/table. If a query 
 touches a large number of partitions, this hooks takes a long time to 
 execute. One approach is to make the hook itself to run in a separate thread. 
 But it is hard to guarantee backward compatibility in semantics in case of 
 exceptions encountered in the hook execution. This task takes another 
 approach to parallelize the hook itself (update multiple partitions 
 concurrently), but execute each pre-hook in sequential order. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: Regarding HIVE-1737

2011-03-03 Thread Siying Dong
Hi Mohit,



Can you be more precise how the fixed and variable row size are evaluated 
wrongly? I don't quite understand what you mean. Did I miss any context?



I guess you are running a previous version and try to figure out whether you 
need to port this patch? In that case, I think OOM is the worst possible case. 
We also care about whether one task uses more resource than it really needs and 
competes resource with other tasks. I don't think there can be other impact. If 
you want to try to repro a OOM, you should produce a condition that sum of 
distinct string key size  maximum heap size, and fix size + aggregate 
parameter size much smaller than average key size. You can try very long 
distinct string keys as input and group by it. My feeling is that it is not 
such a common case, since we never hit OOM for this.



For current trunk or version 0.7, now the codes are really not the same as when 
we did HIVE-1737, since we've had HIVE-1830 now, which put a memory usage check 
and force to flush the disk when memory is more than a threshold, so that even 
without HIVE-1737, there won't be OOM any way.



Thanks,



Siying




From: Mohit [mohitsi...@huawei.com]
Sent: Tuesday, March 01, 2011 7:08 AM
To: Siying Dong
Cc: Namit Jain; chinna...@huawei.com; hive-...@hadoop.apache.org
Subject: FW: Regarding HIVE-1737

Hi Namit/Siying,

Ok, even I agree with your analysis. Both the fixed and variable row size 
evaluated wrongly here.

But what I was more interested in how critical is the change; like what if hash 
aggregation map is not flushed, even if the number of existing entries overshot 
the false entries stats calculated on basis of configured property 
hive.map.aggr.map.percentmemory (whereas if it happens faithfully by the code 
changes you did, it will trigger flush), any issues apart from out of memory in 
child JVM or there is more to it, something else bad can happen?

If you can provide me the pointers to re-produce it's side effect, It will be 
great.

-Mohit

***
This e-mail and attachments contain confidential information from HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any 
use of the information contained herein in any way (including, but not limited 
to, total or partial disclosure, reproduction, or dissemination) by persons 
other than the intended recipient's) is prohibited. If you receive this e-mail 
in error, please notify the sender by phone or email immediately and delete it!

From: Mohit [mailto:mohitsi...@huawei.com]
Sent: Tuesday, March 01, 2011 12:39 PM
To: 'siyin...@fb.com'
Subject: Regarding HIVE-1737

Hi Siying,

Hope you doing great.
Well, I have one request regarding this defect, I'm not able to understand and 
hence reproduce this issue.
May be you can help in that, I need to know what queries you ran.

-Mohit

***
This e-mail and attachments contain confidential information from HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any 
use of the information contained herein in any way (including, but not limited 
to, total or partial disclosure, reproduction, or dissemination) by persons 
other than the intended recipient's) is prohibited. If you receive this e-mail 
in error, please notify the sender by phone or email immediately and delete it!



Re: Review Request: HIVE-2026. Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread M IS

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/460/#review283
---



trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
https://reviews.apache.org/r/460/#comment534

How about going for a centralized thread pool and submitting the tasks for 
that pool.
This can have advantages like, we need not have to create threads and we 
could come to know of the status of the task submitted through the future 
object. And use this future to to wait till the task is finished. We can re 
factor the code to make UpdateWorker to implement Runnable instead of extending 
of Thread. 


- M


On 2011-03-03 00:53:49, Ning Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/460/
 ---
 
 (Updated 2011-03-03 00:53:49)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 define hive.hooks.parallel.degree to control max # of thread to update 
 metastore in parallel. 
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1076459 
   trunk/conf/hive-default.xml 1076459 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1076459 
   
 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
  1076459 
 
 Diff: https://reviews.apache.org/r/460/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ning
 




[jira] Updated: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-03 Thread Prajakta Kalmegh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prajakta Kalmegh updated HIVE-1694:
---

Attachment: HIVE-1694.2.patch.txt

Patch version 2 - includes changes for review comments from John.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-03 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002017#comment-13002017
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi John

We have made all the changes as suggested by you except for making the code 
pluggable (so that the rewrite expression changes depending on which index 
handler is used). We will submit this change along with the patch for new index 
type. 

We have started working on the new index type creation as per your suggestion 
and will let you know once that is complete. 

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-2027) Asynchronous Hooks

2011-03-03 Thread Ning Zhang (JIRA)
Asynchronous Hooks
--

 Key: HIVE-2027
 URL: https://issues.apache.org/jira/browse/HIVE-2027
 Project: Hive
  Issue Type: New Feature
Reporter: Ning Zhang


PreHook and PostHook are executed sequentially in the order if they are defined 
in hive.exec.pre.hooks and hve.exec.post.hooks. In some cases the sequential 
semantics are mandatory, but not for all cases. It would be desirable to define 
an AysncHook that extends Hook (similarly for AsyncPreHook and AsyncPostHook) 
to asynchronously execute the hooks in a thread pool.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-03 Thread Prajakta Kalmegh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prajakta Kalmegh updated HIVE-1694:
---

Attachment: (was: HIVE-1694.2.patch.txt)

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694_2010-10-28.diff, 
 demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HIVE-2026. Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread MIS
Hi, Ning

Just to be clear on what I was suggesting, I have created a patch only for
this file.
Please have a look.

Thanks,
MIS.


On Thu, Mar 3, 2011 at 5:50 PM, M IS misapa...@gmail.com wrote:

This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/460/

 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.javahttps://reviews.apache.org/r/460/diff/1/?file=13550#file13550line82
  (Diff
 revision 1)

 public void run(SessionState sess, SetReadEntity inputs,

77

   Thread[] threads = new Thread[nThreads];

   How about going for a centralized thread pool and submitting the tasks for 
 that pool.
 This can have advantages like, we need not have to create threads and we 
 could come to know of the status of the task submitted through the future 
 object. And use this future to to wait till the task is finished. We can re 
 factor the code to make UpdateWorker to implement Runnable instead of 
 extending of Thread.


 - M

 On March 3rd, 2011, 12:53 a.m., Ning Zhang wrote:
   Review request for hive.
 By Ning Zhang.

 *Updated 2011-03-03 00:53:49*
 Description

 define hive.hooks.parallel.degree to control max # of thread to update 
 metastore in parallel.

   Diffs

- trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
(1076459)
- trunk/conf/hive-default.xml (1076459)
- 
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
(1076459)
- 
 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
(1076459)

 View Diff https://reviews.apache.org/r/460/diff/

### Eclipse Workspace Patch 1.0
#P hive
Index: ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
===
--- ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java	(revision 1076702)
+++ ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java	(working copy)
@@ -17,18 +17,26 @@
  */
 package org.apache.hadoop.hive.ql.hooks;
 
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
 import java.util.Set;
-import java.util.LinkedHashMap;
-import java.util.Map;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
 
 import org.apache.hadoop.hive.conf.HiveConf;
-import org.apache.hadoop.hive.ql.session.SessionState;
-import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hadoop.hive.ql.metadata.Hive;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
 import org.apache.hadoop.hive.ql.metadata.Partition;
 import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.session.SessionState;
+import org.apache.hadoop.security.UserGroupInformation;
 
+
 /**
  * Implementation of a pre execute hook that updates the access
  * times for all the inputs.
@@ -39,7 +47,6 @@
 
   public static class PreExec implements PreExecute {
 Hive db;
-
 public void run(SessionState sess, SetReadEntity inputs,
 SetWriteEntity outputs, UserGroupInformation ugi)
   throws Exception {
@@ -54,35 +61,122 @@
 }
   }
 
+  if (inputs.size() == 0) {
+return;
+  }
+
   int lastAccessTime = (int) (System.currentTimeMillis()/1000);
 
-  for(ReadEntity re: inputs) {
-// Set the last query time
+  int nThreads = HiveConf.getIntVar(sess.getConf(), HiveConf.ConfVars.HOOKS_PARALLEL_DEGREE);
+  int maxThreads = HiveConf.getIntVar(sess.getConf(), HiveConf.ConfVars.METASTORESERVERMAXTHREADS);
+
+  if (nThreads  1) {
+nThreads = 1;
+  } else if (nThreads  maxThreads) {
+nThreads = maxThreads;
+  }
+  if (nThreads  inputs.size()) {
+nThreads = inputs.size();
+  }
+
+  // This can be a rather common/centrally used thread pool.
+  ExecutorService exeService = new ThreadPoolExecutor(nThreads, nThreads, 5000, TimeUnit.MILLISECONDS,
+  new LinkedBlockingQueueRunnable());
+  ListFuture? futures = new ArrayListFuture?(nThreads);
+
+  ListReadEntity[] threadInputs = new List[nThreads];
+
+  // assign ReadEntities to threads
+  int i = 0;
+  for (i = 0; i  nThreads; ++i) {
+threadInputs[i] = new ArrayListReadEntity();
+  }
+
+  i = 0;
+  for (ReadEntity re: inputs) {
+threadInputs[i % nThreads].add(re);
+++i;
+  }
+
+  try {
+// launch all threads
+Runnable updateWorker;
+Future? futureTask;
+for (i = 0; i  nThreads; ++i) {
+  updateWorker = new UpdateWorker(sess.getConf(), threadInputs[i], lastAccessTime);
+  futureTask =  exeService.submit(updateWorker);
+  futures.add(futureTask);
+}
+
+// wait 

[jira] Updated: (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-03-03 Thread Prajakta Kalmegh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prajakta Kalmegh updated HIVE-1694:
---

Attachment: HIVE-1694.2.patch.txt

Patch version 2 - includes changes for review comments from John. Re-attaching 
the appropriate file.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HIVE-2026. Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang
Hi MIS,

Thanks for the contribution! To allow broader audience to review, can you 
upload your patch to the JIRA and the review board (I can help you with the 
review board if it doesn't allow you to change the request).

A couple of comments before uploading your patch:

1) the 5 sec keepAliveTime seems low. If the # of threads is more than the # of 
cores, does it mean the thread will be terminated after 5 secs after it is 
waiting to get scheduled?

2) do you need to call execService.shutDown() in case of a Throwable is caught?

On Mar 3, 2011, at 10:09 AM, MIS wrote:

Hi, Ning

Just to be clear on what I was suggesting, I have created a patch only for this 
file.
Please have a look.

Thanks,
MIS.


On Thu, Mar 3, 2011 at 5:50 PM, M IS 
misapa...@gmail.commailto:misapa...@gmail.com wrote:
This is an automatically generated e-mail. To reply, visit: 
https://reviews.apache.org/r/460/

trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.javahttps://reviews.apache.org/r/460/diff/1/?file=13550#file13550line82
 (Diff revision 1)

public void run(SessionState sess, SetReadEntity inputs,


77

  Thread[] threads = new Thread[nThreads];


How about going for a centralized thread pool and submitting the tasks for that 
pool.
This can have advantages like, we need not have to create threads and we could 
come to know of the status of the task submitted through the future object. And 
use this future to to wait till the task is finished. We can re factor the code 
to make UpdateWorker to implement Runnable instead of extending of Thread.


- M


On March 3rd, 2011, 12:53 a.m., Ning Zhang wrote:

Review request for hive.
By Ning Zhang.

Updated 2011-03-03 00:53:49

Description

define hive.hooks.parallel.degree to control max # of thread to update 
metastore in parallel.



Diffs

  *   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (1076459)
  *   trunk/conf/hive-default.xml (1076459)
  *   
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
(1076459)
  *   
trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
 (1076459)

View Diffhttps://reviews.apache.org/r/460/diff/


HIVE-2026_1.patch



[jira] Commented: (HIVE-1833) Task-cleanup task should be disabled

2011-03-03 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002165#comment-13002165
 ] 

Joydeep Sen Sarma commented on HIVE-1833:
-

committed - thanks Scott!

 Task-cleanup task should be disabled
 

 Key: HIVE-1833
 URL: https://issues.apache.org/jira/browse/HIVE-1833
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: HIVE-1833.1.txt, HIVE-1833.txt


 Currently when task fails, a cleanup attempt will be scheduled right after 
 that.
 This is unnecessary and increase the latency. MapReduce will allow disabling 
 this (see MAPREDUCE-2206).
 After that patch is committed, we should set the JobConf in HIVE to disable 
 cleanup task.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (HIVE-1833) Task-cleanup task should be disabled

2011-03-03 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma resolved HIVE-1833.
-

   Resolution: Fixed
Fix Version/s: 0.8.0

 Task-cleanup task should be disabled
 

 Key: HIVE-1833
 URL: https://issues.apache.org/jira/browse/HIVE-1833
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.8.0

 Attachments: HIVE-1833.1.txt, HIVE-1833.txt


 Currently when task fails, a cleanup attempt will be scheduled right after 
 that.
 This is unnecessary and increase the latency. MapReduce will allow disabling 
 this (see MAPREDUCE-2206).
 After that patch is committed, we should set the JobConf in HIVE to disable 
 cleanup task.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-2022) Making JDO thread-safe by default

2011-03-03 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002184#comment-13002184
 ] 

Paul Yang commented on HIVE-2022:
-

Apologies for the build break - Ning and I are looking into fixing some issues 
with my build environment.

 Making JDO thread-safe by default
 -

 Key: HIVE-2022
 URL: https://issues.apache.org/jira/browse/HIVE-2022
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-2022.patch


 If there are multiple thread accessing metastore concurrently, there are 
 cases that JDO threw exceptions because of concurrent access of HashMap 
 inside JDO. Setting javax.jdo.option.Multithreaded to true solves this issue. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-2025) Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022

2011-03-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002259#comment-13002259
 ] 

Carl Steinbach commented on HIVE-2025:
--

Review request: https://reviews.apache.org/r/464/


 Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022
 -

 Key: HIVE-2025
 URL: https://issues.apache.org/jira/browse/HIVE-2025
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Ning Zhang
Priority: Critical
 Attachments: HIVE-2025.patch


 The patch for HIVE-2022 broke TestEmbeddedHiveMetaStore and 
 TestRemoteHiveMetaStore
 https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/590/
 @Paul: Assigning this to you.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HIVE-2025: Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022

2011-03-03 Thread Paul Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/464/#review297
---

Ship it!


Looks good to me - will test and commit.

- Paul


On 2011-03-03 13:46:55, Carl Steinbach wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/464/
 ---
 
 (Updated 2011-03-03 13:46:55)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Review request for HIVE-2025.
 
 
 This addresses bugs HIVE-2022 and HIVE-2025.
 https://issues.apache.org/jira/browse/HIVE-2022
 https://issues.apache.org/jira/browse/HIVE-2025
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1076530 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1076530 
 
 Diff: https://reviews.apache.org/r/464/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Carl
 




[jira] Commented: (HIVE-1833) Task-cleanup task should be disabled

2011-03-03 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002293#comment-13002293
 ] 

Scott Chen commented on HIVE-1833:
--

Thanks for the help, Joy :)

 Task-cleanup task should be disabled
 

 Key: HIVE-1833
 URL: https://issues.apache.org/jira/browse/HIVE-1833
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.8.0

 Attachments: HIVE-1833.1.txt, HIVE-1833.txt


 Currently when task fails, a cleanup attempt will be scheduled right after 
 that.
 This is unnecessary and increase the latency. MapReduce will allow disabling 
 this (see MAPREDUCE-2206).
 After that patch is committed, we should set the JobConf in HIVE to disable 
 cleanup task.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-2025) Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022

2011-03-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002296#comment-13002296
 ] 

Carl Steinbach commented on HIVE-2025:
--

+1. Will commit if tests pass.

 Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022
 -

 Key: HIVE-2025
 URL: https://issues.apache.org/jira/browse/HIVE-2025
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Ning Zhang
Priority: Critical
 Attachments: HIVE-2025.patch


 The patch for HIVE-2022 broke TestEmbeddedHiveMetaStore and 
 TestRemoteHiveMetaStore
 https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/590/
 @Paul: Assigning this to you.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Hive-trunk-h0.20 #592

2011-03-03 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/592/changes

Changes:

[jssarma] HIVE-1833: Task Cleanup  task should be disabled (Scott Chen via 
jssarma)

--
[...truncated 27596 lines...]
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-03_15-28-14_662_975814063074649674/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] 2011-03-03 15:28:17,738 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-03_15-28-14_662_975814063074649674/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103031528_1528160580.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-03_15-28-19_161_1275930092982649220/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-03_15-28-19_161_1275930092982649220/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE

Re: Review Request: HIVE-2026. Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread ಕರಿಯ
Hi, Ning

1)  You are right on this. But, here the keepAliveTime will not be having
much of prominence as the core pool size itself is nThreads and the max pool
size is also nThreads.  So, even if the threads are idle, nThreads will
always remain in the pool that is created to process the tasks. Also, since
in this scenario the thread pool is being created for a specific purpose
thread pool configuration is fine.

This can be achieved in a more simple manner as below:
ExecutorService exeService = Executors.newFixedThreadPool(nThreads);

I'll use this in the new patch.

Further, If we are going to be use (which we need to in future) a common or
a centralized thread pool, then the thread pool configuration needs to be
carefully arrived at taking into account the number of cores available at
our disposal on a particular machine and depending on profiling results, but
this is for later.

2) In the current scenario, we need to call execService.shutDown() in any
case, if an exception is thrown or otherwise, as it is a local thread pool
and we won't be using it any further.  If the thread pool were to be a
common/centralized one, we need not have to call shutDown().

Please let me know if this is fine, then I'll upload the patch attached with
this file in Jira.

Thanks,
ಕರಿಯ

On Fri, Mar 4, 2011 at 12:44 AM, Ning Zhang nzh...@fb.com wrote:

  Hi MIS,

  Thanks for the contribution! To allow broader audience to review, can you
 upload your patch to the JIRA and the review board (I can help you with the
 review board if it doesn't allow you to change the request).

  A couple of comments before uploading your patch:

  1) the 5 sec keepAliveTime seems low. If the # of threads is more than
 the # of cores, does it mean the thread will be terminated after 5 secs
 after it is waiting to get scheduled?

  2) do you need to call execService.shutDown() in case of a Throwable is
 caught?

  On Mar 3, 2011, at 10:09 AM, MIS wrote:

 Hi, Ning

 Just to be clear on what I was suggesting, I have created a patch only for
 this file.
 Please have a look.

 Thanks,
 MIS.


 On Thu, Mar 3, 2011 at 5:50 PM, M IS misapa...@gmail.com wrote:

This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/460/

 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.javahttps://reviews.apache.org/r/460/diff/1/?file=13550#file13550line82
  (Diff
 revision 1)

 public void run(SessionState sess, SetReadEntity inputs,

   77

   Thread[] threads = new Thread[nThreads];

   How about going for a centralized thread pool and submitting the tasks for 
 that pool.
 This can have advantages like, we need not have to create threads and we 
 could come to know of the status of the task submitted through the future 
 object. And use this future to to wait till the task is finished. We can re 
 factor the code to make UpdateWorker to implement Runnable instead of 
 extending of Thread.


 - M

 On March 3rd, 2011, 12:53 a.m., Ning Zhang wrote:
   Review request for hive.
 By Ning Zhang.

 *Updated 2011-03-03 00:53:49*
 Description

 define hive.hooks.parallel.degree to control max # of thread to update 
 metastore in parallel.

   Diffs

 - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
(1076459)
- trunk/conf/hive-default.xml (1076459)
- 
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
(1076459)
- 
 trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
(1076459)

 View Diff https://reviews.apache.org/r/460/diff/


 HIVE-2026_1.patch



### Eclipse Workspace Patch 1.0
#P hive
Index: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
===
--- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java	(revision 1076715)
+++ common/src/java/org/apache/hadoop/hive/conf/HiveConf.java	(working copy)
@@ -111,6 +111,7 @@
 DEFAULT_ZOOKEEPER_PARTITION_NAME(hive.lockmgr.zookeeper.default.partition.name, __HIVE_DEFAULT_ZOOKEEPER_PARTITION__),
 // Whether to show a link to the most failed task + debugging tips
 SHOW_JOB_FAIL_DEBUG_INFO(hive.exec.show.job.failure.debug.info, true),
+HOOKS_PARALLEL_DEGREE(hive.hooks.parallel.degree, 1),
 
 // should hive determine whether to run in local mode automatically ?
 LOCALMODEAUTO(hive.exec.mode.local.auto, false),
Index: metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
===
--- metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java	(revision 1076715)
+++ metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java	(working copy)
@@ -26,9 +26,9 @@
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
-import java.util.Set;
 import java.util.Map.Entry;
 import java.util.Properties;
+import java.util.Set;
 import java.util.concurrent.locks.Lock;
 

Re: Review Request: HIVE-2026. Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang
It looks good to me. Thanks!

On Mar 3, 2011, at 10:18 PM, ಕರಿಯ wrote:


Hi, Ning

1)  You are right on this. But, here the keepAliveTime will not be having much 
of prominence as the core pool size itself is nThreads and the max pool size is 
also nThreads.  So, even if the threads are idle, nThreads will always remain 
in the pool that is created to process the tasks. Also, since in this scenario 
the thread pool is being created for a specific purpose thread pool 
configuration is fine.

This can be achieved in a more simple manner as below:
ExecutorService exeService = Executors.newFixedThreadPool(nThreads);

I'll use this in the new patch.

Further, If we are going to be use (which we need to in future) a common or a 
centralized thread pool, then the thread pool configuration needs to be 
carefully arrived at taking into account the number of cores available at our 
disposal on a particular machine and depending on profiling results, but this 
is for later.

2) In the current scenario, we need to call execService.shutDown() in any case, 
if an exception is thrown or otherwise, as it is a local thread pool and we 
won't be using it any further.  If the thread pool were to be a 
common/centralized one, we need not have to call shutDown().

Please let me know if this is fine, then I'll upload the patch attached with 
this file in Jira.

Thanks,
ಕರಿಯ

On Fri, Mar 4, 2011 at 12:44 AM, Ning Zhang 
nzh...@fb.commailto:nzh...@fb.com wrote:
Hi MIS,

Thanks for the contribution! To allow broader audience to review, can you 
upload your patch to the JIRA and the review board (I can help you with the 
review board if it doesn't allow you to change the request).

A couple of comments before uploading your patch:

1) the 5 sec keepAliveTime seems low. If the # of threads is more than the # of 
cores, does it mean the thread will be terminated after 5 secs after it is 
waiting to get scheduled?

2) do you need to call execService.shutDown() in case of a Throwable is caught?

On Mar 3, 2011, at 10:09 AM, MIS wrote:

Hi, Ning

Just to be clear on what I was suggesting, I have created a patch only for this 
file.
Please have a look.

Thanks,
MIS.


On Thu, Mar 3, 2011 at 5:50 PM, M IS 
misapa...@gmail.commailto:misapa...@gmail.com wrote:
This is an automatically generated e-mail. To reply, visit: 
https://reviews.apache.org/r/460/

trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.javahttps://reviews.apache.org/r/460/diff/1/?file=13550#file13550line82
 (Diff revision 1)

public void run(SessionState sess, SetReadEntity inputs,


77

  Thread[] threads = new Thread[nThreads];


How about going for a centralized thread pool and submitting the tasks for that 
pool.
This can have advantages like, we need not have to create threads and we could 
come to know of the status of the task submitted through the future object. And 
use this future to to wait till the task is finished. We can re factor the code 
to make UpdateWorker to implement Runnable instead of extending of Thread.


- M


On March 3rd, 2011, 12:53 a.m., Ning Zhang wrote:

Review request for hive.
By Ning Zhang.

Updated 2011-03-03 00:53:49

Description

define hive.hooks.parallel.degree to control max # of thread to update 
metastore in parallel.



Diffs

  *   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (1076459)
  *   trunk/conf/hive-default.xml (1076459)
  *   
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
(1076459)
  *   
trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/UpdateInputAccessTimeHook.java
 (1076459)

View Diffhttps://reviews.apache.org/r/460/diff/


HIVE-2026_1.patch


HIVE-2026_2.patch



[jira] Updated: (HIVE-2025) Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022

2011-03-03 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2025:
-

   Resolution: Fixed
Fix Version/s: 0.8.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Ning!

 Fix TestEmbeddedHiveMetaStore and TestRemoteHiveMetaStore broken by HIVE-2022
 -

 Key: HIVE-2025
 URL: https://issues.apache.org/jira/browse/HIVE-2025
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Ning Zhang
Priority: Critical
 Fix For: 0.8.0

 Attachments: HIVE-2025.patch


 The patch for HIVE-2022 broke TestEmbeddedHiveMetaStore and 
 TestRemoteHiveMetaStore
 https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/590/
 @Paul: Assigning this to you.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (HIVE-2023) Add javax.jdo.option.Multithreaded configuration property to HiveConf

2011-03-03 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-2023.
--

Resolution: Fixed

fixed as part of HIVE-2025.

 Add javax.jdo.option.Multithreaded configuration property to HiveConf
 -

 Key: HIVE-2023
 URL: https://issues.apache.org/jira/browse/HIVE-2023
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Carl Steinbach
Assignee: Ning Zhang

 The configuration property javax.jdo.option.Multithreaded was added to 
 hive-default.xml in HIVE-2022. This property also needs to be added to 
 HiveConf.java.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-2026) Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ಕರಿಯ updated HIVE-2026:
---

Attachment: HIVE-2026_2.patch

Patch incorporating the review comments.

 Parallelize UpdateInputAccessTimeHook
 -

 Key: HIVE-2026
 URL: https://issues.apache.org/jira/browse/HIVE-2026
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2026.patch, HIVE-2026_2.patch


 UpdateInputAccessTimeHook is usually used as a pre-execution hook to update 
 the metastore's lastAccessTime field of input partition/table. If a query 
 touches a large number of partitions, this hooks takes a long time to 
 execute. One approach is to make the hook itself to run in a separate thread. 
 But it is hard to guarantee backward compatibility in semantics in case of 
 exceptions encountered in the hook execution. This task takes another 
 approach to parallelize the hook itself (update multiple partitions 
 concurrently), but execute each pre-hook in sequential order. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-2026) Parallelize UpdateInputAccessTimeHook

2011-03-03 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002488#comment-13002488
 ] 

Ning Zhang commented on HIVE-2026:
--

The review board has also been updated with the new HIVE-2026_2.patch

 Parallelize UpdateInputAccessTimeHook
 -

 Key: HIVE-2026
 URL: https://issues.apache.org/jira/browse/HIVE-2026
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2026.patch, HIVE-2026_2.patch


 UpdateInputAccessTimeHook is usually used as a pre-execution hook to update 
 the metastore's lastAccessTime field of input partition/table. If a query 
 touches a large number of partitions, this hooks takes a long time to 
 execute. One approach is to make the hook itself to run in a separate thread. 
 But it is hard to guarantee backward compatibility in semantics in case of 
 exceptions encountered in the hook execution. This task takes another 
 approach to parallelize the hook itself (update multiple partitions 
 concurrently), but execute each pre-hook in sequential order. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira