[jira] Commented: (HIVE-1675) SAXParseException on plan.xml during local mode.
[ https://issues.apache.org/jira/browse/HIVE-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968759#action_12968759 ] Bennie Schut commented on HIVE-1675: manage to cause this with parallel=false so perhaps not such an interesting angle ;-) I've added some more logging to better understand the cause of this: 2010-12-07 15:49:44,697 INFO exec.Utilities (Utilities.java:getMapRedWork(154)) - Getting jobid:9c2eeba4-a602-4d4b-ba0b-60ce815c4ea7 from cache. 2010-12-07 15:49:44,703 INFO lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:clinit(34)) - Loaded native gpl library 2010-12-07 15:49:44,705 INFO lzo.LzoCodec (LzoCodec.java:clinit(72)) - Successfully loaded initialized native-lzo library [hadoop-lzo rev c7acdaa96a7ce04538c0716fe699ffaf11836c70] 2010-12-07 15:49:44,712 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1 2010-12-07 15:49:44,880 INFO exec.Utilities (Utilities.java:getMapRedWork(154)) - Getting jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache. 2010-12-07 15:49:44,882 INFO exec.ExecDriver (SessionState.java:printInfo(268)) - Job running in-process (local Hadoop) 2010-12-07 15:49:44,882 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(256)) - job_local_0001 java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:166) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:238) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:244) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:139) Caused by: java.io.FileNotFoundException: HIVE_PLANe8b2dab2-986a-4bb1-947f-00aec5b46a06 (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:106) at java.io.FileInputStream.init(FileInputStream.java:66) at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:157) ... 3 more First thing I noticed which is different from a successful job would be that it's trying to get a different jobid from the cache Getting jobid:e8b2dab2-986a-4bb1-947f-00aec5b46a06 from cache I'm still confused. SAXParseException on plan.xml during local mode. Key: HIVE-1675 URL: https://issues.apache.org/jira/browse/HIVE-1675 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.7.0 Reporter: Bennie Schut Assignee: Bennie Schut Fix For: 0.7.0 Attachments: HIVE-1675.patch, local_10005_plan.xml, local_10006_plan.xml When hive switches to local mode (hive.exec.mode.local.auto=true) I receive a sax parser exception on the plan.xml If I set hive.exec.mode.local.auto=false I get the correct results. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1759) Many important broken links on Hive web page
[ https://issues.apache.org/jira/browse/HIVE-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-1759. --- Resolution: Fixed The issue was the content of the site MUST be all checked into svn and copied into /www/hive.apache.org, I built and committed the API docs for all older releases. Site should have no broken links 3.0 docs will propagate in the next hour or so. Many important broken links on Hive web page Key: HIVE-1759 URL: https://issues.apache.org/jira/browse/HIVE-1759 Project: Hive Issue Type: Bug Components: Documentation Reporter: Jeff Hammerbacher Assignee: Edward Capriolo The change log links are broken, perhaps because of the move to a TLP, and the Jira issue log links all point to the 0.5 issue log. Also, all of the documentation links are broken. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1823) upgrade the database thrift interface to allow parameters key-value pairs
[ https://issues.apache.org/jira/browse/HIVE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1823. -- Resolution: Fixed Hadoop Flags: [Reviewed] Committed. Thanks Ning upgrade the database thrift interface to allow parameters key-value pairs - Key: HIVE-1823 URL: https://issues.apache.org/jira/browse/HIVE-1823 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1823.2.patch, HIVE-1823.patch In order to store data center specify parameters to Hive database, it is desirable to extend Hive database thrift interface with a parameters map similar to Table and Partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1727) Not able to download hive from apache site.
[ https://issues.apache.org/jira/browse/HIVE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-1727. --- Resolution: Won't Fix That was the old location. The new location is svn co http://svn.apache.org/repos/asf/hive/trunk hive The documentation in the wki looks correct. http://wiki.apache.org/hadoop/Hive/GettingStarted#Installation_and_Configuration If you find the documentation wrong somewhere feel free to re-open.. Not able to download hive from apache site. --- Key: HIVE-1727 URL: https://issues.apache.org/jira/browse/HIVE-1727 Project: Hive Issue Type: Bug Environment: Centos 5.4 Reporter: Sangeetha Sundar Priority: Critical Original Estimate: 3h Remaining Estimate: 3h Hi , I am trying to download Hive as specified in the apache site and getting the following error. [had...@system9 ~]$ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk svn: PROPFIND request failed on '/repos/asf/hadoop/hive/trunk' svn: PROPFIND of '/repos/asf/hadoop/hive/trunk': Could not resolve hostname `svn.apache.org': Temporary failure in name resolution (http://svn.apache.org) but am able to ping that ipaddress from web browser. Please help me to resolve this issue. Or else please suggest me any other way to download hive. Thanks in advance.. -Sangita -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1728) Problem while downloading Hive from Apche site
[ https://issues.apache.org/jira/browse/HIVE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-1728. --- Resolution: Duplicate Duplicate of HIVE-1727 Problem while downloading Hive from Apche site -- Key: HIVE-1728 URL: https://issues.apache.org/jira/browse/HIVE-1728 Project: Hive Issue Type: Bug Environment: CentOS 5.4 Reporter: Sangeetha Sundar Priority: Critical Original Estimate: 3h Remaining Estimate: 3h Hi , I am trying to download Hive as specified in the apache site and getting the following error. [had...@system9 ~]$ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk svn: PROPFIND request failed on '/repos/asf/hadoop/hive/trunk' svn: PROPFIND of '/repos/asf/hadoop/hive/trunk': Could not resolve hostname `svn.apache.org': Temporary failure in name resolution (http://svn.apache.org) but am able to ping that ipaddress from web browser. Please help me to resolve this issue. Or else please suggest me any other way to download hive. Thanks in advance.. -Sangita -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1778) simultaneously launched queries collide on hive intermediate directories
[ https://issues.apache.org/jira/browse/HIVE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo reassigned HIVE-1778: - Assignee: Edward Capriolo simultaneously launched queries collide on hive intermediate directories Key: HIVE-1778 URL: https://issues.apache.org/jira/browse/HIVE-1778 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Assignee: Edward Capriolo we saw one instance of multiple queries for the same user launched in parallel (from a workflow engine) use the same intermediate directories. which is obviously super bad but not suprising considering how we allocate them: Random rand = new Random(); String executionId = hive_ + format.format(new Date()) + _ + Math.abs(rand.nextLong()); Java documentation says: Two Random objects created within the same millisecond will have the same sequence of random numbers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1763: - Status: Open (was: Patch Available) drop table (or view) should issue warning if table doesn't exist Key: HIVE-1763 URL: https://issues.apache.org/jira/browse/HIVE-1763 Project: Hive Issue Type: Improvement Components: Metastore Reporter: dan f Assignee: Paul Butler Priority: Minor Attachments: HIVE-1763.patch drop table reports OK even if the table doesn't exist. Better to report something like mysql's Unknown table 'foo' so that, e.g., unwanted tables (especially ones with names prone to typos) don't persist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968835#action_12968835 ] Namit Jain commented on HIVE-1763: -- However, it will need a lot of test result files to be updated. Most of the tests will break drop table (or view) should issue warning if table doesn't exist Key: HIVE-1763 URL: https://issues.apache.org/jira/browse/HIVE-1763 Project: Hive Issue Type: Improvement Components: Metastore Reporter: dan f Assignee: Paul Butler Priority: Minor Attachments: HIVE-1763.patch drop table reports OK even if the table doesn't exist. Better to report something like mysql's Unknown table 'foo' so that, e.g., unwanted tables (especially ones with names prone to typos) don't persist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1303) Adding/selecting many external partitions tables in one session eventually fails
[ https://issues.apache.org/jira/browse/HIVE-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-1303. --- Resolution: Won't Fix This was solved by doing pooling at the JPOX/Data Nucleus level. Adding/selecting many external partitions tables in one session eventually fails Key: HIVE-1303 URL: https://issues.apache.org/jira/browse/HIVE-1303 Project: Hive Issue Type: Bug Affects Versions: 0.5.0 Reporter: Edward Capriolo Priority: Critical echo create external table if not exists edtest ( dat string ) partitioned by (dummy string) location '/tmp/a'; test.q for i in {1..3000} ; do echo alter table ed_test add partition (dummy='${i}') location '/tmp/duh'; ; done test.q hive -f test.q Also, there are problems working with this type of table as well. :( $ hive -e explain select * from X_action Hive history file=/tmp/XX/hive_job_log_media6_201004121029_170696698.txt FAILED: Error in semantic analysis: javax.jdo.JDODataStoreException: Access denied for user 'hivadm'@'XX' (using password: YES) NestedThrowables: java.sql.SQLException: Access denied for user 'hivadm'@'XX' (using password: YES) Interestingly enough if we specify some partitions we can dodge this error. I get the fealing that the select * is trying to select too many partitions and causing this error. 2010-04-12 10:33:02,789 ERROR metadata.Hive (Hive.java:getPartition(629)) - javax.jdo.JDODataStoreException: Access denied for user 'hivadm'@'rs01 .sd.pl.pvt' (using password: YES) at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:274) at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:551) at org.apache.hadoop.hive.metastore.ObjectStore.getMPartition(ObjectStore.java:716) at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:704) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:593) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:418) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:620) at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:4883) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5224) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:44) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) NestedThrowablesStackTrace: java.sql.SQLException: Access denied for user 'hivadm'@'X.domain.whatetever' (using password: YES) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:946) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:885) at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3436) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1247) at com.mysql.jdbc.Connection.createNewIO(Connection.java:2775) at com.mysql.jdbc.Connection.init(Connection.java:1555) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:285) at org.datanucleus.store.rdbms.datasource.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:142) at org.datanucleus.store.rdbms.datasource.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:118) at org.datanucleus.store.rdbms.ConnectionProviderPriorityList.getConnection(ConnectionProviderPriorityList.java:59) at
[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
[ https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968841#action_12968841 ] Namit Jain commented on HIVE-1648: -- @Yongqiang, you have missed the test changes in the patch - can you add them also ? Automatically gathering stats when reading a table/partition Key: HIVE-1648 URL: https://issues.apache.org/jira/browse/HIVE-1648 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Paul Butler Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.4.patch, HIVE-1648.patch, hive-1648.svn.patch HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator whenever a table/partition is scanned (given not LIMIT operator). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1508) Add cleanup method to HiveHistory class
[ https://issues.apache.org/jira/browse/HIVE-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968845#action_12968845 ] Namit Jain commented on HIVE-1508: -- +1 Add cleanup method to HiveHistory class --- Key: HIVE-1508 URL: https://issues.apache.org/jira/browse/HIVE-1508 Project: Hive Issue Type: Bug Components: Metastore Reporter: Anurag Phadke Assignee: Edward Capriolo Priority: Blocker Fix For: 0.7.0 Attachments: hive-1508-1-patch.txt Running hive server for long time 90 minutes results in too many open file-handles, eventually causing the server to crash as the server runs out of file handle. Actual bug as described by Carl Steinbach: the hive_job_log_* files are created by the HiveHistory class. This class creates a PrintWriter for writing to the file, but never closes the writer. It looks like we need to add a cleanup method to HiveHistory that closes the PrintWriter and does any other necessary cleanup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1763) drop table (or view) should issue warning if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968855#action_12968855 ] John Sichi commented on HIVE-1763: -- See HIVE-1542 for my suggested approach. drop table (or view) should issue warning if table doesn't exist Key: HIVE-1763 URL: https://issues.apache.org/jira/browse/HIVE-1763 Project: Hive Issue Type: Improvement Components: Metastore Reporter: dan f Assignee: Paul Butler Priority: Minor Attachments: HIVE-1763.patch drop table reports OK even if the table doesn't exist. Better to report something like mysql's Unknown table 'foo' so that, e.g., unwanted tables (especially ones with names prone to typos) don't persist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1778) simultaneously launched queries collide on hive intermediate directories
[ https://issues.apache.org/jira/browse/HIVE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968859#action_12968859 ] Joydeep Sen Sarma commented on HIVE-1778: - whatever works - we could pass in hash the query string and time (perhaps a nanosecond timer) to come up with a better seed for the random generator for example. simultaneously launched queries collide on hive intermediate directories Key: HIVE-1778 URL: https://issues.apache.org/jira/browse/HIVE-1778 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Assignee: Edward Capriolo we saw one instance of multiple queries for the same user launched in parallel (from a workflow engine) use the same intermediate directories. which is obviously super bad but not suprising considering how we allocate them: Random rand = new Random(); String executionId = hive_ + format.format(new Date()) + _ + Math.abs(rand.nextLong()); Java documentation says: Two Random objects created within the same millisecond will have the same sequence of random numbers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1834) more debugging for locking
[ https://issues.apache.org/jira/browse/HIVE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1834: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed! Thanks Namit! more debugging for locking -- Key: HIVE-1834 URL: https://issues.apache.org/jira/browse/HIVE-1834 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1834.1.patch Along with the time and the queryid, it might be a good idea to log if the lock was acquired explicitly (by a lock command) or implicitly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-97) tab completion for hive cli
[ https://issues.apache.org/jira/browse/HIVE-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-97. - Resolution: Duplicate This is solved and is being enhanced by. HIVE-1835. tab completion for hive cli --- Key: HIVE-97 URL: https://issues.apache.org/jira/browse/HIVE-97 Project: Hive Issue Type: Improvement Components: Clients, Documentation Reporter: Pete Wyckoff jline provides a framework for implementing tab completion. if one can somehow enumerate the grammar in a way that jline understands, this would improve usability a lot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1838) Add quickLZ compression codec for Hive.
Add quickLZ compression codec for Hive. --- Key: HIVE-1838 URL: https://issues.apache.org/jira/browse/HIVE-1838 Project: Hive Issue Type: New Feature Reporter: He Yongqiang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1835) Better auto-complete for Hive
[ https://issues.apache.org/jira/browse/HIVE-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Butler updated HIVE-1835: -- Attachment: HIVE-1835.2.patch Fixed missing file Better auto-complete for Hive - Key: HIVE-1835 URL: https://issues.apache.org/jira/browse/HIVE-1835 Project: Hive Issue Type: New Feature Components: CLI Reporter: Paul Butler Assignee: Paul Butler Priority: Minor Attachments: HIVE-1835.2.patch, HIVE-1835.patch - Add functions and keywords to auto-complete list - Make Hive auto-complete aware of Hive delimiters (eg. whitespace, parentheses) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1837: --- Attachment: hive-1837.1.patch an initial patch. will do more tests in our env. optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1839) Error message for Both Left and Right Aliases Encountered in Join time cites wrong row/col
Error message for Both Left and Right Aliases Encountered in Join time cites wrong row/col Key: HIVE-1839 URL: https://issues.apache.org/jira/browse/HIVE-1839 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Adam Kramer In all cases of the above error, the error message looks like this: FAILED: Error in semantic analysis: line 0:-1 Both Left and Right Aliases Encountered in Join time ...the 0:-1 is incorrect. This should provide the row and the column number. Ideally, it would also provide the textual left and right aliases so that the user could identify which aliases are encountered where since this is rarely obvious. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968907#action_12968907 ] Ning Zhang commented on HIVE-1526: -- Thanks Ashutosh and Carl! The changes look good and all unit tests have passed. However, there are conflicts after another JIRA is committed. Carl, can you pelase regenerate the patch yet another time? I'll try my best to test and commit ASAP to avoid conflicts again. Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hive Issue Type: Task Components: Build Infrastructure, Clients Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.7.0 Attachments: compile.err, HIVE-1526-complete.4.patch.txt, HIVE-1526-complete.5.patch.txt, HIVE-1526-complete.6.patch.txt, HIVE-1526-complete.7.patch.txt, HIVE-1526-no-codegen.3.patch.txt, HIVE-1526-no-codegen.4.patch.txt, HIVE-1526-no-codegen.5.patch.txt, HIVE-1526-no-codegen.6.patch.txt, HIVE-1526-no-codegen.7.patch.txt, HIVE-1526.2.patch.txt, HIVE-1526.3.patch.txt, hive-1526.txt, libfb303.jar, libthrift.jar, serde2_test.patch, svn_rm.sh, test.log, thrift-0.5.0.jar, thrift-fb303-0.5.0.jar Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1836: - Attachment: HIVE-1836.patch Attaching HIVE-1836.patch that addresses both HIVE-1821 (DESC DATABASE) and 1836 (CREATE DATABASE WITH DBPROPERTIES). Extend the CREATE DATABASE command with DBPROPERTIES Key: HIVE-1836 URL: https://issues.apache.org/jira/browse/HIVE-1836 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1836.patch We should be able to assign key-value pairs of properties to Hive databases. The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands: {code} CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2'); {code} The {code} DESC DATABASE EXTENDED DB_NAME; {code} should be able to display the properties. (requires HIVE-1821) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1836: - Status: Patch Available (was: Open) Extend the CREATE DATABASE command with DBPROPERTIES Key: HIVE-1836 URL: https://issues.apache.org/jira/browse/HIVE-1836 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1836.patch We should be able to assign key-value pairs of properties to Hive databases. The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands: {code} CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2'); {code} The {code} DESC DATABASE EXTENDED DB_NAME; {code} should be able to display the properties. (requires HIVE-1821) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1821) describe database command
[ https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968915#action_12968915 ] Ning Zhang commented on HIVE-1821: -- a patch is uploaded to HIVE-1836 that address this patch. describe database command - Key: HIVE-1821 URL: https://issues.apache.org/jira/browse/HIVE-1821 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang a describe (extended) database command would be helpful if we introduces parameters associated with databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-1837: --- Attachment: hive-1837.2.patch a new patch after some tests in the cluster optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1821) describe database command
[ https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-1821. -- Resolution: Duplicate Duplicate of HIVE-1836 describe database command - Key: HIVE-1821 URL: https://issues.apache.org/jira/browse/HIVE-1821 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang a describe (extended) database command would be helpful if we introduces parameters associated with databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1821) describe database command
[ https://issues.apache.org/jira/browse/HIVE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968939#action_12968939 ] Namit Jain commented on HIVE-1821: -- If you are doing this, do you want to add a 'alter database' also ? describe database command - Key: HIVE-1821 URL: https://issues.apache.org/jira/browse/HIVE-1821 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang a describe (extended) database command would be helpful if we introduces parameters associated with databases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968961#action_12968961 ] Ning Zhang commented on HIVE-1836: -- Yes, will add 'alter database' also in a follow-up JIRA. A question is that if alter the current database, wether to change the HiveConf parameters accordingly. Since 'alter database' is not a blocking issue yet, I'm working on HIVE-1820 first and then come back to that. Extend the CREATE DATABASE command with DBPROPERTIES Key: HIVE-1836 URL: https://issues.apache.org/jira/browse/HIVE-1836 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1836.patch We should be able to assign key-value pairs of properties to Hive databases. The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands: {code} CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2'); {code} The {code} DESC DATABASE EXTENDED DB_NAME; {code} should be able to display the properties. (requires HIVE-1821) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969000#action_12969000 ] Ashutosh Chauhan commented on HIVE-1837: You get this feature for free when you move to secure Hadoop. A mapreduce job by default gets a token which expires in 24 hrs. So, usually MR job spawned by Hive query will fail after that time. Job may request renewal upto 7 days. Beyond that, special provisions are required. So, timeout is inherently built into secure hadoop. optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1836) Extend the CREATE DATABASE command with DBPROPERTIES
[ https://issues.apache.org/jira/browse/HIVE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969037#action_12969037 ] Namit Jain commented on HIVE-1836: -- +1 Extend the CREATE DATABASE command with DBPROPERTIES Key: HIVE-1836 URL: https://issues.apache.org/jira/browse/HIVE-1836 Project: Hive Issue Type: Sub-task Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1836.patch We should be able to assign key-value pairs of properties to Hive databases. The proposed syntax is similar to the CREATE TABLE and CREATE INDEX commands: {code} CREATE DATABASE DB_NAME WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2'); {code} The {code} DESC DATABASE EXTENDED DB_NAME; {code} should be able to display the properties. (requires HIVE-1821) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969089#action_12969089 ] Namit Jain commented on HIVE-1096: -- sure, that would be very useful Let me know if you run into any issues Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-15.patch.txt, hive-1096-15.patch.txt, hive-1096-2.diff, hive-1096-20.patch.txt, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969093#action_12969093 ] Namit Jain commented on HIVE-1837: -- @Ashutosh, we cant wait for this feature till secure hadoop is available. Once Hive is migrated to that, we can change the implementation of this feature. @Yongqiang, can you add the new parameter definition in hive-default.xml ? Also, can you make the thread sleep time (10 min.) configurable ? Can you add a new test for the same - I mean, have a very small timeout and thread sleep time, and a custom script which is sleeping indefinitely ? optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1415) add CLI command for executing a SQL script
[ https://issues.apache.org/jira/browse/HIVE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo reassigned HIVE-1415: - Assignee: Edward Capriolo add CLI command for executing a SQL script -- Key: HIVE-1415 URL: https://issues.apache.org/jira/browse/HIVE-1415 Project: Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Reporter: John Sichi Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: hive-1415-1-patch.txt Suggestion in HIVE-1405 was source, e.g. source somescript.sql; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969141#action_12969141 ] He Yongqiang commented on HIVE-1837: it is still very difficult to add a testcase. It's because there is a System.exit(-1) in the monitor thread. The test process will exit. optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1838) Add quickLZ compression codec for Hive.
[ https://issues.apache.org/jira/browse/HIVE-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969144#action_12969144 ] He Yongqiang commented on HIVE-1838: No. I mean compression codec for Hive. It could be used to compress intermediate data. Here are some results: 5. Hadoop compression with native library (COMPRESSLEVEL=BEST_SPEED) time java -Djava.library.path=/data/users/heyongqiang/hadoop-0.20/build/native/Linux-amd64-64/lib/ CompressFile real0m34.179s user0m29.031s sys 0m1.607s compressed size: 275M 6. LZF [heyongqi...@dev782 compress_test]$ time lzf -c 00_0 real0m39.031s user0m8.727s sys 0m2.231s compressed size: 393M 7. FastLZ time fastlz/6pack -1 00_0 00_0.fastlz real0m19.020s user0m18.083s sys 0m0.935s compressed size: 391M 8.QuickLZ time ./compress_file ../00_0 ../00_0.quicklz real0m15.652s user0m14.047s sys 0m1.603s compressed size: 334M I modified QuickLZ's compress_file code to use a buffer for fairness. It turns out the result is very close to FastLZ. The modified version of QuickLZ is just one second better. Add quickLZ compression codec for Hive. --- Key: HIVE-1838 URL: https://issues.apache.org/jira/browse/HIVE-1838 Project: Hive Issue Type: New Feature Reporter: He Yongqiang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969146#action_12969146 ] Ashutosh Chauhan commented on HIVE-1837: One way to get around System.exit() problem in testcase is to create your own SecurityManager and use that. In your SecurityManager override checkExit() and throw an exception. This way whenever System.exit() is encountered, an exception will be thrown. In your testcase you can catch the exception and then do the asserts that you want. I did very similar things while writing junit tests for Howl. optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1837) optional timeout for hive clients
[ https://issues.apache.org/jira/browse/HIVE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969151#action_12969151 ] He Yongqiang commented on HIVE-1837: Cool, Thanks Ashutosh! I will try that. optional timeout for hive clients - Key: HIVE-1837 URL: https://issues.apache.org/jira/browse/HIVE-1837 Project: Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive-1837.1.patch, hive-1837.2.patch It would be a good idea to have a optional timeout for hive clients. We encountered a query today, which seemed to have run by mistake, and it was running for about a month. This was holding zookeeper locks, and making the whole debugging more complex than it should be. It would be a good idea to have a timeout for a hive client. @Ning, I remember there was some issue with the Hive client having a timeout of 1 day with HiPal. Do you remember the details ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1694) Accelerate query execution using indexes
[ https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969186#action_12969186 ] Prajakta Kalmegh commented on HIVE-1694: Hi, I am Prajakta from Persistent Systems Ltd. and am working on the changes that John and Namit have suggested above along with Nikhil and Prafulla. This is a design note about implementation of above review comments. We're implementing the following changes as a single transformation in optimizer: (1) Table replacement: involves modification of some internal members of TableScanOperator. (2) Group by removal: involves removal of some operators (GBY-RS-GBY) where GBY is done at both mapper-reducer side; and re-setting of correct parent and child operators within the DAG. (3) Sub-query insertion: involves creation of new DAG for sub-query and attaching it to the original DAG at an appropriate place. (4) Projection modification: involves steps similar to (3). We have implemented the above changes as a proof of concept. In this implementation, we have invoked this rule as the first transformation in the optimizer code so that lineage information is computed later as part of the Generator transformation. Another reason that we have applied this as the first transformation is that, as of now, the implementation uses the query block (QB) information to decide if the transformation can be applied for the input query (similar to the canApplyThisRule() method in the original rewrite code). Finally, to make the changes (3) and (4), we are modifying the operator DAG. However, we are not modifying the original query block (QB). Hence, this leaves the QB and the operator DAG out of sync. The major issues in this implementation approach are: 1. The changes listed above require either modification of operator DAG (in case of 2) or creation of new operator DAG(in case of 3 and 4). The implementation requires some hacks in the SemanticAnalyzer code if we create a new DAG (as in the case of replaceViewReferenceWithDefinition() method which uses ParseDriver() to do the same). However, the methods are private (like genBodyPlan(...), genSelectPlan(...) etc) making it all the more difficult to implement changes (3) and (4) without access to these methods. 2. The creation of new DAG will require creating all associated data structures like QB, ASTNode etc as this information is necessary to generate DAG operator plan for the sub-queries. This approach would be very similar to what we are already doing in rewrite i.e creating new QB and ASTNode. 3. Since we are creating a new DAG and appending it to the enclosing query DAG, we also need to modify the row schema and row resolvers for the operators. One of the questions that underlies before finalizing the above approach is whether the cost-based optimizer, which is to be implemented in the future, will work on the query block or on the DAG operator tree. In case it works on the operator DAG, then the implementation changes we listed here are bound to be done. However, if the cost-based optimizer is to work on the query block, then we feel that the initial query rewrite engine code which worked after semantic analysis but before plan generation can be made to work with the cost-based optimizer. It will be a valuable input from your side if you could comment on the cost-based optimizer. Accelerate query execution using indexes Key: HIVE-1694 URL: https://issues.apache.org/jira/browse/HIVE-1694 Project: Hive Issue Type: New Feature Components: Indexing, Query Processor Affects Versions: 0.7.0 Reporter: Nikhil Deshpande Assignee: Nikhil Deshpande Attachments: demo_q1.hql, demo_q2.hql, HIVE-1694_2010-10-28.diff The index building patch (Hive-417) is checked into trunk, this JIRA issue tracks supporting indexes in Hive compiler execution engine for SELECT queries. This is in ref. to John's comment at https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869 on creating separate JIRA issue for tracking index usage in optimizer query execution. The aim of this effort is to use indexes to accelerate query execution (for certain class of queries). E.g. - Filters and range scans (already being worked on by He Yongqiang as part of HIVE-417?) - Joins (index based joins) - Group By, Order By and other misc cases The proposal is multi-step: 1. Building index based operators, compiler and execution engine changes 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose between index scans, full table scans etc.) This JIRA initially focuses on the first step. This JIRA is expected to
[jira] Commented: (HIVE-1830) mappers in group followed by joins may die OOM
[ https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12969203#action_12969203 ] Namit Jain commented on HIVE-1830: -- if (groupByOp.getConf() == null) { 91 System.out.println(Group by desc is null); 92 return null; 93} This should never happen GroupByOperator: memoryThreshold = HiveConf.getFloatVar(hconf, HiveConf.ConfVars.HIVEMAPAGGRM⬅ EMORYTHRESHOLD); This should also be in groupByDesc mappers in group followed by joins may die OOM -- Key: HIVE-1830 URL: https://issues.apache.org/jira/browse/HIVE-1830 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Liyin Tang Attachments: hive-1830-1.patch, hive-1830-2.patch, hive-1830-3.patch, hive-1830-4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1820) Make Hive database data center aware
[ https://issues.apache.org/jira/browse/HIVE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1820: - Attachment: HIVE-1820.patch attaching HIVE-1820.patch for review. Make Hive database data center aware Key: HIVE-1820 URL: https://issues.apache.org/jira/browse/HIVE-1820 Project: Hive Issue Type: New Feature Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1820.patch In order to support multiple data centers (different DFS, MR clusters) for hive, it is desirable to extend Hive database to be data center aware. Currently Hive database is a logical concept and has no DFS or MR cluster info associated with it. Database has the location property indicating the default warehouse directory, but user cannot specify and change it. In order to make it data center aware, the following info need to be maintained: 1) data warehouse root location which is the default HDFS location for newly created tables (default=hive.metadata.warehouse.dir). 2) scratch dir which is the HDFS location where MR intermediate files are created (default=hive.exec.scratch.dir) 3) MR job tracker URI that jobs should be submitted to (default=mapred.job.tracker) 4) hadoop (bin) dir ($HADOOP_HOME/bin/hadoop) These parameters should be saved in database.parameters (key, value) pair and they overwrite the jobconf parameters (so if the default database has no parameter it will get it from the hive-default.xml or hive-site.xml as it is now). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.