[jira] Commented: (HIVE-1675) SAXParseException on plan.xml during local mode.

2010-12-07 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968759#action_12968759
 ] 

Bennie Schut commented on HIVE-1675:


manage to cause this with parallel=false so perhaps not such an interesting 
angle ;-)

I've added some more logging to better understand the cause of this:

2010-12-07 15:49:44,697 INFO  exec.Utilities 
(Utilities.java:getMapRedWork(154)) - Getting 
jobid:9c2eeba4-a602-4d4b-ba0b-60ce815c4ea7 from cache.
2010-12-07 15:49:44,703 INFO  lzo.GPLNativeCodeLoader 
(GPLNativeCodeLoader.java:clinit(34)) - Loaded native gpl library
2010-12-07 15:49:44,705 INFO  lzo.LzoCodec (LzoCodec.java:clinit(72)) - 
Successfully loaded  initialized native-lzo library [hadoop-lzo rev 
c7acdaa96a7ce04538c0716fe699ffaf11836c70]
2010-12-07 15:49:44,712 INFO  mapred.FileInputFormat 
(FileInputFormat.java:listStatus(192)) - Total input paths to process : 1
2010-12-07 15:49:44,880 INFO  exec.Utilities 
(Utilities.java:getMapRedWork(154)) - Getting 
jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache.
2010-12-07 15:49:44,882 INFO  exec.ExecDriver 
(SessionState.java:printInfo(268)) - Job running in-process (local Hadoop)
2010-12-07 15:49:44,882 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(256)) - job_local_0001
java.lang.RuntimeException: java.io.FileNotFoundException: 
HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:166)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:238)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:244)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:139)
Caused by: java.io.FileNotFoundException: 
HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.init(FileInputStream.java:106)
at java.io.FileInputStream.init(FileInputStream.java:66)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:157)
... 3 more

First thing I noticed which is different from a successful job would be that 
it's trying to get a different jobid from the cache Getting 
jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache
I'm still confused.

 SAXParseException on plan.xml during local mode.
 

 Key: HIVE-1675
 URL: https://issues.apache.org/jira/browse/HIVE-1675
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Fix For: 0.7.0

 Attachments: HIVE-1675.patch, local_10005_plan.xml, 
 local_10006_plan.xml


 When hive switches to local mode (hive.exec.mode.local.auto=true) I receive a 
 sax parser exception on the plan.xml
 If I set hive.exec.mode.local.auto=false I get the correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1759) Many important broken links on Hive web page

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1759.
---

Resolution: Fixed

The issue was the content of the site MUST be all checked into svn and copied 
into /www/hive.apache.org, I built and committed the API docs for all older 
releases. Site should have no broken links 3.0 docs will propagate in the next 
hour or so.

 Many important broken links on Hive web page
 

 Key: HIVE-1759
 URL: https://issues.apache.org/jira/browse/HIVE-1759
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Jeff Hammerbacher
Assignee: Edward Capriolo

 The change log links are broken, perhaps because of the move to a TLP, and 
 the Jira issue log links all point to the 0.5 issue log. Also, all of the 
 documentation links are broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1823) upgrade the database thrift interface to allow parameters key-value pairs

2010-12-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1823.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed. Thanks Ning

 upgrade the database thrift interface to allow parameters key-value pairs
 -

 Key: HIVE-1823
 URL: https://issues.apache.org/jira/browse/HIVE-1823
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1823.2.patch, HIVE-1823.patch


 In order to store data center specify parameters to Hive database, it is 
 desirable to extend Hive database thrift interface with a parameters map 
 similar to Table and Partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1727) Not able to download hive from apache site.

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1727.
---

Resolution: Won't Fix

That was the old location. The new location is 
svn co http://svn.apache.org/repos/asf/hive/trunk hive

The documentation in the wki looks correct. 
http://wiki.apache.org/hadoop/Hive/GettingStarted#Installation_and_Configuration

If you find the documentation wrong somewhere feel free to re-open..



 Not able to download hive from apache site.
 ---

 Key: HIVE-1727
 URL: https://issues.apache.org/jira/browse/HIVE-1727
 Project: Hive
  Issue Type: Bug
 Environment: Centos 5.4
Reporter: Sangeetha Sundar
Priority: Critical
   Original Estimate: 3h
  Remaining Estimate: 3h

 Hi ,
 I am trying to download Hive as specified in the apache site and getting the 
 following error.
 [had...@system9 ~]$ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk
 svn: PROPFIND request failed on '/repos/asf/hadoop/hive/trunk'
 svn: PROPFIND of '/repos/asf/hadoop/hive/trunk': Could not resolve hostname 
 `svn.apache.org': Temporary failure in name resolution (http://svn.apache.org)
 but am able to ping that ipaddress from web browser.
 Please help me to resolve this issue.
 Or else please suggest me any other way to download hive.
 Thanks in advance..
 -Sangita

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1728) Problem while downloading Hive from Apche site

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1728.
---

Resolution: Duplicate

Duplicate of HIVE-1727

 Problem while downloading Hive from Apche site
 --

 Key: HIVE-1728
 URL: https://issues.apache.org/jira/browse/HIVE-1728
 Project: Hive
  Issue Type: Bug
 Environment: CentOS 5.4
Reporter: Sangeetha Sundar
Priority: Critical
   Original Estimate: 3h
  Remaining Estimate: 3h

 Hi ,
 I am trying to download Hive as specified in the apache site and getting the 
 following error.
 [had...@system9 ~]$ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk
 svn: PROPFIND request failed on '/repos/asf/hadoop/hive/trunk'
 svn: PROPFIND of '/repos/asf/hadoop/hive/trunk': Could not resolve hostname 
 `svn.apache.org': Temporary failure in name resolution (http://svn.apache.org)
 but am able to ping that ipaddress from web browser.
 Please help me to resolve this issue.
 Or else please suggest me any other way to download hive.
 Thanks in advance..
 -Sangita

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1778) simultaneously launched queries collide on hive intermediate directories

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-1778:
-

Assignee: Edward Capriolo

 simultaneously launched queries collide on hive intermediate directories
 

 Key: HIVE-1778
 URL: https://issues.apache.org/jira/browse/HIVE-1778
 Project: Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: Edward Capriolo

 we saw one instance of multiple queries for the same user launched in 
 parallel (from a workflow engine) use the same intermediate directories. 
 which is obviously super bad but not suprising considering how we allocate 
 them:
Random rand = new Random();
   String executionId = hive_ + format.format(new Date()) + _  + 
 Math.abs(rand.nextLong());
  Java documentation says: Two Random objects created within the same 
 millisecond will have the same sequence of random numbers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist

2010-12-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1763:
-

Status: Open  (was: Patch Available)

 drop table (or view) should issue warning if table doesn't exist
 

 Key: HIVE-1763
 URL: https://issues.apache.org/jira/browse/HIVE-1763
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: dan f
Assignee: Paul Butler
Priority: Minor
 Attachments: HIVE-1763.patch


 drop table reports OK even if the table doesn't exist.  Better to report 
 something like mysql's Unknown table 'foo' so that, e.g., unwanted tables 
 (especially ones with names prone to typos) don't persist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968835#action_12968835
 ] 

Namit Jain commented on HIVE-1763:
--

However, it will need a lot of test result files to be updated.
Most of the tests will break

 drop table (or view) should issue warning if table doesn't exist
 

 Key: HIVE-1763
 URL: https://issues.apache.org/jira/browse/HIVE-1763
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: dan f
Assignee: Paul Butler
Priority: Minor
 Attachments: HIVE-1763.patch


 drop table reports OK even if the table doesn't exist.  Better to report 
 something like mysql's Unknown table 'foo' so that, e.g., unwanted tables 
 (especially ones with names prone to typos) don't persist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1303) Adding/selecting many external partitions tables in one session eventually fails

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1303.
---

Resolution: Won't Fix

This was solved by doing pooling at the JPOX/Data Nucleus level.

 Adding/selecting many external partitions tables in one session eventually 
 fails
 

 Key: HIVE-1303
 URL: https://issues.apache.org/jira/browse/HIVE-1303
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Priority: Critical

 echo create external table if not exists edtest ( dat string ) partitioned 
 by (dummy string) location '/tmp/a';  test.q
  for i in {1..3000} ; do echo alter table ed_test add partition 
 (dummy='${i}') location '/tmp/duh'; ; done  test.q
 hive -f test.q
 Also, there are problems working with this type of table as well. :(
 $ hive -e explain select * from X_action 
 Hive history file=/tmp/XX/hive_job_log_media6_201004121029_170696698.txt
 FAILED: Error in semantic analysis: javax.jdo.JDODataStoreException: Access 
 denied for user 'hivadm'@'XX' (using password: YES)
 NestedThrowables:
 java.sql.SQLException: Access denied for user 'hivadm'@'XX' (using 
 password: YES)
 Interestingly enough if we specify some partitions we can dodge this error. I 
 get the fealing that the select * is trying to select too many partitions and 
 causing this error.
 2010-04-12 10:33:02,789 ERROR metadata.Hive (Hive.java:getPartition(629)) - 
 javax.jdo.JDODataStoreException: Access denied for user 'hivadm'@'rs01
 .sd.pl.pvt' (using password: YES)
 at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:274)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:551)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMPartition(ObjectStore.java:716)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:704)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:593)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:418)
 at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:620)
 at 
 org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:4883)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5224)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:44)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:251)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 NestedThrowablesStackTrace:
 java.sql.SQLException: Access denied for user 
 'hivadm'@'X.domain.whatetever' (using password: YES)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:946)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885)
 at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3436)
 at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1247)
 at com.mysql.jdbc.Connection.createNewIO(Connection.java:2775)
 at com.mysql.jdbc.Connection.init(Connection.java:1555)
 at 
 com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:285)
 at 
 org.datanucleus.store.rdbms.datasource.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:142)
 at 
 org.datanucleus.store.rdbms.datasource.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:118)
 at 
 org.datanucleus.store.rdbms.ConnectionProviderPriorityList.getConnection(ConnectionProviderPriorityList.java:59)
 at 
 

[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968841#action_12968841
 ] 

Namit Jain commented on HIVE-1648:
--

@Yongqiang, you have missed the test changes in the patch - can you add them 
also ?

 Automatically gathering stats when reading a table/partition
 

 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Paul Butler
 Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, 
 HIVE-1648.patch, hive-1648.svn.patch


 HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
 gathering stats. This requires additional scan of the data. Stats gathering 
 can be piggy-backed on TableScanOperator whenever a table/partition is 
 scanned (given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1508) Add cleanup method to HiveHistory class

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968845#action_12968845
 ] 

Namit Jain commented on HIVE-1508:
--

+1

 Add cleanup method to HiveHistory class
 ---

 Key: HIVE-1508
 URL: https://issues.apache.org/jira/browse/HIVE-1508
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Anurag Phadke
Assignee: Edward Capriolo
Priority: Blocker
 Fix For: 0.7.0

 Attachments: hive-1508-1-patch.txt


 Running hive server for long time  90 minutes results in too many open 
 file-handles, eventually causing the server to crash as the server runs out 
 of file handle.
 Actual bug as described by Carl Steinbach:
 the hive_job_log_* files are created by the HiveHistory class. This class 
 creates a PrintWriter for writing to the file, but never closes the writer. 
 It looks like we need to add a cleanup method to HiveHistory that closes the 
 PrintWriter and does any other necessary cleanup. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist

2010-12-07 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968855#action_12968855
 ] 

John Sichi commented on HIVE-1763:
--

See HIVE-1542 for my suggested approach.


 drop table (or view) should issue warning if table doesn't exist
 

 Key: HIVE-1763
 URL: https://issues.apache.org/jira/browse/HIVE-1763
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: dan f
Assignee: Paul Butler
Priority: Minor
 Attachments: HIVE-1763.patch


 drop table reports OK even if the table doesn't exist.  Better to report 
 something like mysql's Unknown table 'foo' so that, e.g., unwanted tables 
 (especially ones with names prone to typos) don't persist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1778) simultaneously launched queries collide on hive intermediate directories

2010-12-07 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968859#action_12968859
 ] 

Joydeep Sen Sarma commented on HIVE-1778:
-

whatever works - we could pass in hash the query string and time (perhaps a 
nanosecond timer) to come up with a better seed for the random generator for 
example.

 simultaneously launched queries collide on hive intermediate directories
 

 Key: HIVE-1778
 URL: https://issues.apache.org/jira/browse/HIVE-1778
 Project: Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: Edward Capriolo

 we saw one instance of multiple queries for the same user launched in 
 parallel (from a workflow engine) use the same intermediate directories. 
 which is obviously super bad but not suprising considering how we allocate 
 them:
Random rand = new Random();
   String executionId = hive_ + format.format(new Date()) + _  + 
 Math.abs(rand.nextLong());
  Java documentation says: Two Random objects created within the same 
 millisecond will have the same sequence of random numbers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1834) more debugging for locking

2010-12-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1834:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed! Thanks Namit!

 more debugging for locking
 --

 Key: HIVE-1834
 URL: https://issues.apache.org/jira/browse/HIVE-1834
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1834.1.patch


 Along with the time and the queryid, it might be a good idea to log if the 
 lock was acquired explicitly (by a lock command)
 or implicitly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-97) tab completion for hive cli

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-97.
-

Resolution: Duplicate

This is solved and is being enhanced by. HIVE-1835.

 tab completion for hive cli
 ---

 Key: HIVE-97
 URL: https://issues.apache.org/jira/browse/HIVE-97
 Project: Hive
  Issue Type: Improvement
  Components: Clients, Documentation
Reporter: Pete Wyckoff

 jline provides a framework for implementing tab completion.  if one can 
 somehow enumerate the grammar in a way that jline understands, this would 
 improve usability a lot.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1838) Add quickLZ compression codec for Hive.

2010-12-07 Thread He Yongqiang (JIRA)
Add quickLZ compression codec for Hive.
---

 Key: HIVE-1838
 URL: https://issues.apache.org/jira/browse/HIVE-1838
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1835) Better auto-complete for Hive

2010-12-07 Thread Paul Butler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Butler updated HIVE-1835:
--

Attachment: HIVE-1835.2.patch

Fixed missing file

 Better auto-complete for Hive
 -

 Key: HIVE-1835
 URL: https://issues.apache.org/jira/browse/HIVE-1835
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Paul Butler
Assignee: Paul Butler
Priority: Minor
 Attachments: HIVE-1835.2.patch, HIVE-1835.patch


 - Add functions and keywords to auto-complete list
 - Make Hive auto-complete aware of Hive delimiters (eg. whitespace, 
 parentheses)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1837:
---

Attachment: hive-1837.1.patch

an initial patch. will do more tests in our env.

 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1839) Error message for Both Left and Right Aliases Encountered in Join time cites wrong row/col

2010-12-07 Thread Adam Kramer (JIRA)
Error message for Both Left and Right Aliases Encountered in Join time cites 
wrong row/col


 Key: HIVE-1839
 URL: https://issues.apache.org/jira/browse/HIVE-1839
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Adam Kramer


In all cases of the above error, the error message looks like this:

FAILED: Error in semantic analysis: line 0:-1 Both Left and Right Aliases 
Encountered in Join time

...the 0:-1 is incorrect. This should provide the row and the column number.

Ideally, it would also provide the textual left and right aliases so that the 
user could identify which aliases are encountered where since this is rarely 
obvious.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-12-07 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968907#action_12968907
 ] 

Ning Zhang commented on HIVE-1526:
--

Thanks Ashutosh and Carl! The changes look good and all unit tests have passed. 
However, there are conflicts after another JIRA is committed. Carl, can you 
pelase regenerate the patch yet another time? I'll try my best to test and 
commit ASAP to avoid conflicts again. 

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure, Clients
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: compile.err, HIVE-1526-complete.4.patch.txt, 
 HIVE-1526-complete.5.patch.txt, HIVE-1526-complete.6.patch.txt, 
 HIVE-1526-complete.7.patch.txt, HIVE-1526-no-codegen.3.patch.txt, 
 HIVE-1526-no-codegen.4.patch.txt, HIVE-1526-no-codegen.5.patch.txt, 
 HIVE-1526-no-codegen.6.patch.txt, HIVE-1526-no-codegen.7.patch.txt, 
 HIVE-1526.2.patch.txt, HIVE-1526.3.patch.txt, hive-1526.txt, libfb303.jar, 
 libthrift.jar, serde2_test.patch, svn_rm.sh, test.log, thrift-0.5.0.jar, 
 thrift-fb303-0.5.0.jar


 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES

2010-12-07 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1836:
-

Attachment: HIVE-1836.patch

Attaching HIVE-1836.patch that addresses both HIVE-1821 (DESC DATABASE) and 
1836 (CREATE DATABASE WITH DBPROPERTIES). 

 Extend the CREATE DATABASE command with DBPROPERTIES
 

 Key: HIVE-1836
 URL: https://issues.apache.org/jira/browse/HIVE-1836
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1836.patch


 We should be able to assign key-value pairs of properties to Hive databases. 
 The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands:
 {code}
 CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 
 'value2');
 {code}
 The 
 {code}
 DESC DATABASE EXTENDED DB_NAME;
 {code}
 should be able to display the properties. (requires HIVE-1821)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES

2010-12-07 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1836:
-

Status: Patch Available  (was: Open)

 Extend the CREATE DATABASE command with DBPROPERTIES
 

 Key: HIVE-1836
 URL: https://issues.apache.org/jira/browse/HIVE-1836
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1836.patch


 We should be able to assign key-value pairs of properties to Hive databases. 
 The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands:
 {code}
 CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 
 'value2');
 {code}
 The 
 {code}
 DESC DATABASE EXTENDED DB_NAME;
 {code}
 should be able to display the properties. (requires HIVE-1821)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1821) describe database command

2010-12-07 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968915#action_12968915
 ] 

Ning Zhang commented on HIVE-1821:
--

a patch is uploaded to HIVE-1836 that address this patch.

 describe database command
 -

 Key: HIVE-1821
 URL: https://issues.apache.org/jira/browse/HIVE-1821
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang

 a describe (extended) database command would be helpful if we introduces 
 parameters associated with databases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1837:
---

Attachment: hive-1837.2.patch

a new patch after some tests in the cluster

 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch, hive-1837.2.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1821) describe database command

2010-12-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1821.
--

Resolution: Duplicate

Duplicate of HIVE-1836

 describe database command
 -

 Key: HIVE-1821
 URL: https://issues.apache.org/jira/browse/HIVE-1821
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang

 a describe (extended) database command would be helpful if we introduces 
 parameters associated with databases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1821) describe database command

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968939#action_12968939
 ] 

Namit Jain commented on HIVE-1821:
--

If you are doing this, do you want to add a 'alter database' also ?

 describe database command
 -

 Key: HIVE-1821
 URL: https://issues.apache.org/jira/browse/HIVE-1821
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang

 a describe (extended) database command would be helpful if we introduces 
 parameters associated with databases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES

2010-12-07 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968961#action_12968961
 ] 

Ning Zhang commented on HIVE-1836:
--

Yes, will add 'alter database' also in a follow-up JIRA. A question is that if 
alter the current database, wether to change the HiveConf parameters 
accordingly. Since 'alter database' is not a blocking issue yet, I'm working on 
HIVE-1820 first and then come back to that. 

 Extend the CREATE DATABASE command with DBPROPERTIES
 

 Key: HIVE-1836
 URL: https://issues.apache.org/jira/browse/HIVE-1836
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1836.patch


 We should be able to assign key-value pairs of properties to Hive databases. 
 The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands:
 {code}
 CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 
 'value2');
 {code}
 The 
 {code}
 DESC DATABASE EXTENDED DB_NAME;
 {code}
 should be able to display the properties. (requires HIVE-1821)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969000#action_12969000
 ] 

Ashutosh Chauhan commented on HIVE-1837:


You get this feature for free when you move to secure Hadoop. A mapreduce job 
by default gets a token which expires in 24 hrs. So, usually MR job spawned by 
Hive query will fail after that time. Job may request renewal upto 7 days. 
Beyond that, special provisions are required. So, timeout is inherently built 
into secure hadoop.

 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch, hive-1837.2.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969037#action_12969037
 ] 

Namit Jain commented on HIVE-1836:
--

+1

 Extend the CREATE DATABASE command with DBPROPERTIES
 

 Key: HIVE-1836
 URL: https://issues.apache.org/jira/browse/HIVE-1836
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1836.patch


 We should be able to assign key-value pairs of properties to Hive databases. 
 The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands:
 {code}
 CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 
 'value2');
 {code}
 The 
 {code}
 DESC DATABASE EXTENDED DB_NAME;
 {code}
 should be able to display the properties. (requires HIVE-1821)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969089#action_12969089
 ] 

Namit Jain commented on HIVE-1096:
--

sure, that would be very useful

Let me know if you run into any issues 

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
 hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt, 
 hive-1096-15.patch.txt, hive-1096-2.diff, hive-1096-20.patch.txt, 
 hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969093#action_12969093
 ] 

Namit Jain commented on HIVE-1837:
--

@Ashutosh, we cant wait for this feature till secure hadoop is available.
Once Hive is migrated to that, we can change the implementation of this feature.

@Yongqiang, can you add the new parameter definition in hive-default.xml ?
Also, can you make the thread sleep time (10 min.) configurable ?
Can you add a new test for the same - I mean, have a very small timeout and 
thread sleep time,
and a custom script which is sleeping indefinitely ? 



 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch, hive-1837.2.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1415) add CLI command for executing a SQL script

2010-12-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-1415:
-

Assignee: Edward Capriolo

 add CLI command for executing a SQL script
 --

 Key: HIVE-1415
 URL: https://issues.apache.org/jira/browse/HIVE-1415
 Project: Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
Reporter: John Sichi
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: hive-1415-1-patch.txt


 Suggestion in HIVE-1405 was source, e.g.
 source somescript.sql;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969141#action_12969141
 ] 

He Yongqiang commented on HIVE-1837:


it is still very difficult to add a testcase. It's because there is a 
System.exit(-1) in the monitor thread. The test process will exit.

 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch, hive-1837.2.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1838) Add quickLZ compression codec for Hive.

2010-12-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969144#action_12969144
 ] 

He Yongqiang commented on HIVE-1838:


No. I mean compression codec for Hive. It could be used to compress 
intermediate data.

Here are some results:

5. Hadoop compression with native library (COMPRESSLEVEL=BEST_SPEED)
time java 
-Djava.library.path=/data/users/heyongqiang/hadoop-0.20/build/native/Linux-amd64-64/lib/
 CompressFile

real0m34.179s
user0m29.031s
sys 0m1.607s

compressed size: 275M

6. LZF
[heyongqi...@dev782 compress_test]$ time lzf -c 00_0 

real0m39.031s
user0m8.727s
sys 0m2.231s
compressed size: 393M

7. FastLZ
time fastlz/6pack -1 00_0 00_0.fastlz
real0m19.020s
user0m18.083s
sys 0m0.935s

compressed size: 391M

8.QuickLZ
time ./compress_file ../00_0 ../00_0.quicklz

real0m15.652s
user0m14.047s
sys 0m1.603s

compressed size: 334M

I modified QuickLZ's compress_file code to use a buffer for fairness. It turns 
out the result is very close to FastLZ. The modified version of QuickLZ is just 
one second better.


 Add quickLZ compression codec for Hive.
 ---

 Key: HIVE-1838
 URL: https://issues.apache.org/jira/browse/HIVE-1838
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969146#action_12969146
 ] 

Ashutosh Chauhan commented on HIVE-1837:


One way to get around System.exit() problem in testcase is to create your own 
SecurityManager and use that. In your SecurityManager override checkExit() and 
throw an exception. This way whenever System.exit() is encountered, an 
exception will be thrown. In your testcase you can catch the exception and then 
do the asserts that you want. I did very similar things while writing junit 
tests for Howl. 

 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch, hive-1837.2.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1837) optional timeout for hive clients

2010-12-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969151#action_12969151
 ] 

He Yongqiang commented on HIVE-1837:


Cool, Thanks Ashutosh! I will try that.

 optional timeout for hive clients
 -

 Key: HIVE-1837
 URL: https://issues.apache.org/jira/browse/HIVE-1837
 Project: Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive-1837.1.patch, hive-1837.2.patch


 It would be a good idea to have a optional timeout for hive clients.
 We encountered a query today, which seemed to have run by mistake, and it was 
 running for about a month.
 This was holding zookeeper locks, and making the whole debugging more complex 
 than it should be.
 It would be a good idea to have a timeout for a hive client.
 @Ning, I remember there was some issue with the Hive client having a timeout 
 of 1 day with HiPal.
 Do you remember the details ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1694) Accelerate query execution using indexes

2010-12-07 Thread Prajakta Kalmegh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969186#action_12969186
 ] 

Prajakta Kalmegh commented on HIVE-1694:


Hi,

I am Prajakta from Persistent Systems Ltd. and am working on the changes that 
John and Namit have suggested above along with Nikhil and Prafulla.
This is a design note about implementation of above review comments.

We're implementing the following changes as a single transformation in 
optimizer:
(1) Table replacement: involves modification of some internal members of 
TableScanOperator.
(2) Group by removal: involves removal of some operators (GBY-RS-GBY) where 
GBY is done at both mapper-reducer side; and re-setting of correct parent and 
child operators within the DAG.
(3) Sub-query insertion: involves creation of new DAG for sub-query and 
attaching it to the original DAG at an appropriate place.
(4) Projection modification: involves steps similar to (3).

We have implemented the above changes as a proof of concept. In this 
implementation, we have invoked this rule as the first transformation in the 
optimizer code so that lineage information is computed later as part of the 
Generator transformation. Another reason that we have applied this as the first 
transformation is that, as of now, the implementation uses the query block (QB) 
information to decide if the transformation can be applied for the input query 
(similar to the canApplyThisRule() method in the original rewrite code). 
Finally, to make the changes (3) and (4), we are modifying the operator DAG. 
However, we are not modifying the original query block (QB). Hence, this leaves 
the QB and the operator DAG out of sync.

The major issues in this implementation approach are:
1. The changes listed above require either modification of operator DAG (in 
case of 2) or creation of new operator DAG(in case of 3 and 4). The 
implementation requires some hacks in the SemanticAnalyzer code if we create a 
new DAG (as in the case of replaceViewReferenceWithDefinition() method which 
uses ParseDriver() to do the same). However, the methods are private (like 
genBodyPlan(...), genSelectPlan(...) etc) making it all the more difficult to 
implement changes (3) and (4) without access to these methods.
2. The creation of new DAG will require creating all associated data structures 
like QB, ASTNode etc as this information is necessary to generate DAG operator 
plan for the sub-queries. This approach would be very similar to what we are 
already doing in rewrite i.e creating new QB and ASTNode. 
3. Since we are creating a new DAG and appending it to the enclosing query DAG, 
we also need to modify the row schema and row resolvers for the operators.

One of the questions that underlies before finalizing the above approach is 
whether the cost-based optimizer, which is to be implemented in the future, 
will work on the query block or on the DAG operator tree. In case it works on 
the operator DAG, then the implementation changes we listed here are bound to 
be done. However, if the cost-based optimizer is to work on the query block, 
then we feel that the initial query rewrite engine code which worked after 
semantic analysis but before plan generation can be made to work with the 
cost-based optimizer. It will be a valuable input from your side if you could 
comment on the cost-based optimizer.


 Accelerate query execution using indexes
 

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Nikhil Deshpande
 Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to 

[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM

2010-12-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969203#action_12969203
 ] 

Namit Jain commented on HIVE-1830:
--

  if (groupByOp.getConf() == null) {
91  System.out.println(Group by desc is null);
92  return null;
93}




This should never happen


GroupByOperator:
memoryThreshold = HiveConf.getFloatVar(hconf, 
HiveConf.ConfVars.HIVEMAPAGGRM⬅
EMORYTHRESHOLD);


This should also be in groupByDesc



 mappers in group followed by joins may die OOM
 --

 Key: HIVE-1830
 URL: https://issues.apache.org/jira/browse/HIVE-1830
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Liyin Tang
 Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, 
 hive-1830-4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1820) Make Hive database data center aware

2010-12-07 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1820:
-

Attachment: HIVE-1820.patch

attaching HIVE-1820.patch for review. 

 Make Hive database data center aware
 

 Key: HIVE-1820
 URL: https://issues.apache.org/jira/browse/HIVE-1820
 Project: Hive
  Issue Type: New Feature
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1820.patch


 In order to support multiple data centers (different DFS, MR clusters) for 
 hive, it is desirable to extend Hive database to be data center aware. 
 Currently Hive database is a logical concept and has no DFS or MR cluster 
 info associated with it. Database has the location property indicating the 
 default warehouse directory, but user cannot specify and change it. In order 
 to make it data center aware, the following info need to be maintained:
 1) data warehouse root location which is the default HDFS location for newly 
 created tables (default=hive.metadata.warehouse.dir).
 2) scratch dir which is the HDFS location where MR intermediate files are 
 created (default=hive.exec.scratch.dir)
 3) MR job tracker URI that jobs should be submitted to 
 (default=mapred.job.tracker)
 4) hadoop (bin) dir ($HADOOP_HOME/bin/hadoop)
 These parameters should be saved in database.parameters (key, value) pair and 
 they overwrite the jobconf parameters (so if the default database has no 
 parameter it will get it from the hive-default.xml or hive-site.xml as it is 
 now). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.