[jira] [Commented] (HIVE-1451) Creating a table stores the full address of namenode in the metadata. This leads to problems when the namenode address changes.

2011-08-31 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094706#comment-13094706
 ] 

MIS commented on HIVE-1451:
---

+1 for the issue. 
This is one of those features which many assume exists by default, but doesn't. 

I too have run into this and resolved it by changing the DB_LOCATION_URI column 
and LOCATION in the tables DBS and SDS respectively to point to the latest 
namenode URI. {My metastore was on MySql}.

This issue will help us from manually changing namenode URI in db should the 
address of the namenode change.


 Creating a table stores the full address of namenode in the metadata. This 
 leads to problems when the namenode address changes.
 ---

 Key: HIVE-1451
 URL: https://issues.apache.org/jira/browse/HIVE-1451
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.5.0
 Environment: Any
Reporter: Arvind Prabhakar

 Here is an excerpt from table metadata for an arbitrary table {{table1}}:
 {noformat}
 hive describe extended table1;
 OK
 ...
 Detailed Table Information...
 location:hdfs://localhost:9000/user/arvind/hive/warehouse/table1, 
 ...
 {noformat}
 As can be seen, the full address of namenode is captured in the location 
 information for the table. This information is later used to run any queries 
 on the table - thus making it impossible to change the namenode location once 
 the table has been created. For example, for the above table, a query will 
 fail if the namenode is migrated from port 9000 to 8020:
 {noformat}
 hive select * from table1;
 OK
 Failed with exception java.io.IOException:java.net.ConnectException: Call to 
 localhost/127.0.0.1:9000
 failed on connection exception: java.net.ConnectException: Connection refused
 Time taken: 10.78 seconds
 hive 
 {noformat}
 It should be possible to change the namenode location regardless of when the 
 tables are created. Also, any query execution should work with the configured 
 namenode at that point in time rather than requiring the configuration to be 
 exactly the same at the time when the tables were created.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-08-11 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083306#comment-13083306
 ] 

MIS commented on HIVE-2181:
---

-1 for the issue.
What if I'm running multiple hive servers on different port in the same machine 
{With my metastore db on a mysql server}, then if one of the server instances 
restarts, it would end up deleting the scratch dir, which would affect other 
running instances as well. Even if we specify different scratch dir for each of 
the instances, I doubt about the value add from this property.

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
  Labels: patch
 Fix For: 0.8.0

 Attachments: HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-17 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008328#comment-13008328
 ] 

MIS commented on HIVE-2051:
---

Yes it is necessary for the executor to be terminated if the jobs have been 
submitted to it, even though submitted jobs may have been completed. 

However, what we need not do here is, after the executor is shutdown, await 
till the termination gets over, since this is redundant. As all the submitted 
jobs to the executor will be completed by the time we shutdown the executor. 
This is what is ensured when we do result.get()
i.e., the following piece of code is not required.
+  do {
+try {
+  executor.awaitTermination(Integer.MAX_VALUE, TimeUnit.SECONDS);
+  executorDone = true;
+} catch (InterruptedException e) {
+}
+  } while (!executorDone);

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-17 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008331#comment-13008331
 ] 

MIS commented on HIVE-2051:
---

The solution to this issue resembles that of HIVE-2026, so we can follow a 
similar approach.

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1959) Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection.

2011-03-01 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13000970#comment-13000970
 ] 

MIS commented on HIVE-1959:
---

How about using WeakHashMap in place of using HashMap instead of explicitly 
removing from the map! The WeakHashMap can be used for both the fields- 
queryInfoMap and taskInfoMap of HiveHistory.java class.

 Potential memory leak when same connection used for long time. TaskInfo and 
 QueryInfo objects are getting accumulated on executing more queries on the 
 same connection.
 ---

 Key: HIVE-1959
 URL: https://issues.apache.org/jira/browse/HIVE-1959
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1959.patch


 *org.apache.hadoop.hive.ql.history.HiveHistory$TaskInfo* and 
 *org.apache.hadoop.hive.ql.history.HiveHistory$QueryInfo* these two objects 
 are getting accumulated on executing more number of queries on the same 
 connection. These objects are getting released only when the connection is 
 closed.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1883) Periodic cleanup of Hive History log files.

2011-02-17 Thread MIS (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995934#comment-12995934
 ] 

MIS commented on HIVE-1883:
---

Carl is right on this. There is no need to have a 'scheduled' timer task to 
take care of the log files. There are enough handles already available in log4j 
library used by Hive to handle the log files.
As far as the current issue is concerned, RolllingFileAppender can be used and 
a max limit can be set.
if it is wished that no data should be lost then DailyRollingFileAppender can 
be used and a cron job can be run to handle the a week's[or what ever the time 
frame chosen] log files.

Further, there is one more disadvantage in running the 'scheduled' timer task 
to handle log files, creates more problems than it solves. Though 
ScheduledThreadPoolExecutor could be an answer, but its just not worth the 
effort.

 Periodic cleanup of Hive History log files.
 ---

 Key: HIVE-1883
 URL: https://issues.apache.org/jira/browse/HIVE-1883
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
 Environment: Hive 0.6.0,  Hadoop 0.20.1
 SUSE Linux Enterprise Server 11 (i586)
 VERSION = 11
 PATCHLEVEL = 0
Reporter: Mohit Sikri

 After starting hive and running queries transaction history files are getting 
 creating in the /tmp/root folder.
 These files we should remove periodically(not all of them but) which are too 
 old to represent any significant information.
 Solution :-
 A scheduled timer task, which cleans up the log files older than the 
 configured time.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira