Re: Review Request 33806: Add Tree traversal tools to ParseUtil class that allow for checking node structures with general predicate
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33806/#review84156 --- Ship it! Ship It! - Sergio Pena On May 11, 2015, 6:38 p.m., Reuben Kuhnert wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33806/ --- (Updated May 11, 2015, 6:38 p.m.) Review request for hive, Gopal V, John Pullokkaran, and Sergio Pena. Bugs: HIVE-10190 https://issues.apache.org/jira/browse/HIVE-10190 Repository: hive-git Description --- HIVE-10190: CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java 372c93d9af01608538b2e2e5a50c45188acb04f9 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 373429cbf666f1b19828c532aea3c07f08f95e1a Diff: https://reviews.apache.org/r/33806/diff/ Testing --- Tested locally Thanks, Reuben Kuhnert
[jira] [Created] (HIVE-10738) Beeline does not respect hive.cli.print.current.db
Reuben Kuhnert created HIVE-10738: - Summary: Beeline does not respect hive.cli.print.current.db Key: HIVE-10738 URL: https://issues.apache.org/jira/browse/HIVE-10738 Project: Hive Issue Type: Bug Reporter: Reuben Kuhnert Assignee: Reuben Kuhnert Priority: Minor Hive CLI (shows default database): {code} hive set hive.cli.print.current.db=true; set hive.cli.print.current.db=true; hive (default) {code} Beeline (no change): {code} 0: jdbc:hive2://localhost:1 set hive.cli.print.current.db=true; set hive.cli.print.current.db=true; No rows affected (3.016 seconds) 0: jdbc:hive2://localhost:1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
Up until recently Hive supported numerous versions of Hadoop code base with a simple shim layer. I would rather we stick to the shim layer. I think this was easily the best part about hive was that a single release worked well regardless of your hadoop version. It was also a key element to hive's success. I do not want to see us have multiple branches. On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang xzh...@cloudera.com wrote: Thanks for the explanation, Alan! While I have understood more on the proposal, I actually see more problems than the confusion of two lines of releases. Essentially, this proposal forces a user to make a hard choice between a stabler, legacy-aware release line and an adventurous, pioneering release line. And once the choice is made, there is no easy way back or forward. Here is my interpretation. Let's say we have two main branches as proposed. I develop a new feature which I think useful for both branches. So, I commit it to both branches. My feature requires additional schema support, so I provide upgrade scripts for both branches. The scripts are different because the two branches have already diverged in schema. Now the two branches evolve in a diverging fashion like this. This is all good as long as a user stays in his line. The moment the user considers a switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because there is no upgrade path from a release in branch-1 to a release in branch-2! If we want to provide an upgrade path, then there will be MxN paths, where M and N are the number of releases in the two branches, respectively. This is going to be next to a nightmare, not only for users, but also for us. Also, the proposal will require two sets of things that Hive provides: double documentation, double feature tracking, double build/test infrastructures, etc. This approach can also potentially cause the problem we saw in hadoop releases, where 0.23 release was greater than 1.0 release. To me, the problem we are trying to solve is deprecating old things such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, however, we approached the problem in less favorable ways. First, it seemed we wanted to deprecate something just for the sake of deprecation, and it's not based on the rationale that supports the desire. Dev might write code that accidentally break hadoop-1 build. However, this is more a build infrastructure problem rather than the burden of supporting hadoop-1. If our build could catch it at precommit test, then I would think the accident can be well avoided. Most of the times, fixing the build is trivial. And we have already addressed the build infrastructure problem. Secondly, if we do have a strong reason to deprecate something, we should have a deprecation plan rather than declaring on the spot that the current release is the last one supporting X. I think Microsoft did a better job in terms production deprecation. For instance, they announced long before the last day desupporting Windows XP. In my opinion, we should have a similar vision, giving users, distributions enough time to adjust rather than shocking them with breaking news. In summary, I do see the need of deprecation in Hive, but I am afraid the way we take, including the proposal here, isn't going to nicely solve the problem. On the contrary, I foresee a spectrum of confusion, frustration, and burden for the user as well as for developers. Thanks, Xuefu On Fri, May 15, 2015 at 8:19 PM, Alan Gates alanfga...@gmail.com wrote: Xuefu Zhang xzh...@cloudera.com May 15, 2015 at 17:31 Just make sure that I understand the proposal correctly: we are going to have two main branches, one for hadoop-1 and one for hadoop-2. We shouldn't tie this to hadoop-1 and 2. It's about Hive not Hadoop. It will be some time before Hive's branch-2 is stable, while Hadoop-2 is already well established. New features are only merged to branch-2. That essentially says we stop development for hadoop-1, right? If developers want to keep contributing patches to branch-1 then there's no need for it to stop. We would want to avoid putting new features only on branch-1, unless they only made sense in that context. But I assume we'll see people contributing to branch-1 for some time. Are we also making two lines of releases: ene for branch-1 and one for branch-2? Won't that be confusing and also burdensome if we release say 1.3, 2.0, 2.1, 1.4... I'm asserting that it will be less confusing than the alternatives. We need some way to make early releases of many of the new features. I believe that this proposal is less confusing than if we start putting the new features in 1.x branches. This is particularly true because it would help us to start being able to drop older functionality like Hadoop-1 and MapReduce, which is very hard to do in the 1.x line without stranding users. Please note that we will
[jira] [Created] (HIVE-10740) RpcServer should be restarted if related configuration is changed [Spark Branch]
Jimmy Xiang created HIVE-10740: -- Summary: RpcServer should be restarted if related configuration is changed [Spark Branch] Key: HIVE-10740 URL: https://issues.apache.org/jira/browse/HIVE-10740 Project: Hive Issue Type: Bug Components: Spark Reporter: Jimmy Xiang In reviewing patch for HIVE-10721, Chengxiang pointed out an existing issue with HoS: the RpcServer is never restarted even related configurations are changed, as we do for SparkSession. We should monitor related configurations and restart the RpcServer if any is changed. It should be restarted while there is no active SparkSession. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10739) Hiveserver2 Memory leak in ObjectInspectorFactory cache
Binglin Chang created HIVE-10739: Summary: Hiveserver2 Memory leak in ObjectInspectorFactory cache Key: HIVE-10739 URL: https://issues.apache.org/jira/browse/HIVE-10739 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Binglin Chang User issue multiple add jar to add thrift classes to classpath and create table or doing query using those thrift serdes. After session is closed, there is still class and objectinspector instance live in the cache, so the classloader for the class and all the other referenced class and static fields cannot be freed. We may need to provided an option to create inspector without putting it to cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
This concept of experimental features basically translates to I do not have the time to care about people not using my version. I do not see it as good. We have seen what happened to upstream hadoop there was this gap between 0.21 , and ??.??. No one was clear what the API was (mapred, new mapreduce), no one know what to link off of cdh?, vanilla?, yahoo distribution?. IMHO. This is just going to increase fragmentation. On Mon, May 18, 2015 at 1:04 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Up until recently Hive supported numerous versions of Hadoop code base with a simple shim layer. I would rather we stick to the shim layer. I think this was easily the best part about hive was that a single release worked well regardless of your hadoop version. It was also a key element to hive's success. I do not want to see us have multiple branches. On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang xzh...@cloudera.com wrote: Thanks for the explanation, Alan! While I have understood more on the proposal, I actually see more problems than the confusion of two lines of releases. Essentially, this proposal forces a user to make a hard choice between a stabler, legacy-aware release line and an adventurous, pioneering release line. And once the choice is made, there is no easy way back or forward. Here is my interpretation. Let's say we have two main branches as proposed. I develop a new feature which I think useful for both branches. So, I commit it to both branches. My feature requires additional schema support, so I provide upgrade scripts for both branches. The scripts are different because the two branches have already diverged in schema. Now the two branches evolve in a diverging fashion like this. This is all good as long as a user stays in his line. The moment the user considers a switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because there is no upgrade path from a release in branch-1 to a release in branch-2! If we want to provide an upgrade path, then there will be MxN paths, where M and N are the number of releases in the two branches, respectively. This is going to be next to a nightmare, not only for users, but also for us. Also, the proposal will require two sets of things that Hive provides: double documentation, double feature tracking, double build/test infrastructures, etc. This approach can also potentially cause the problem we saw in hadoop releases, where 0.23 release was greater than 1.0 release. To me, the problem we are trying to solve is deprecating old things such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, however, we approached the problem in less favorable ways. First, it seemed we wanted to deprecate something just for the sake of deprecation, and it's not based on the rationale that supports the desire. Dev might write code that accidentally break hadoop-1 build. However, this is more a build infrastructure problem rather than the burden of supporting hadoop-1. If our build could catch it at precommit test, then I would think the accident can be well avoided. Most of the times, fixing the build is trivial. And we have already addressed the build infrastructure problem. Secondly, if we do have a strong reason to deprecate something, we should have a deprecation plan rather than declaring on the spot that the current release is the last one supporting X. I think Microsoft did a better job in terms production deprecation. For instance, they announced long before the last day desupporting Windows XP. In my opinion, we should have a similar vision, giving users, distributions enough time to adjust rather than shocking them with breaking news. In summary, I do see the need of deprecation in Hive, but I am afraid the way we take, including the proposal here, isn't going to nicely solve the problem. On the contrary, I foresee a spectrum of confusion, frustration, and burden for the user as well as for developers. Thanks, Xuefu On Fri, May 15, 2015 at 8:19 PM, Alan Gates alanfga...@gmail.com wrote: Xuefu Zhang xzh...@cloudera.com May 15, 2015 at 17:31 Just make sure that I understand the proposal correctly: we are going to have two main branches, one for hadoop-1 and one for hadoop-2. We shouldn't tie this to hadoop-1 and 2. It's about Hive not Hadoop. It will be some time before Hive's branch-2 is stable, while Hadoop-2 is already well established. New features are only merged to branch-2. That essentially says we stop development for hadoop-1, right? If developers want to keep contributing patches to branch-1 then there's no need for it to stop. We would want to avoid putting new features only on branch-1, unless they only made sense in that context. But I assume we'll see people contributing to branch-1 for some time. Are we also making two lines of releases: ene for branch-1 and one for branch-2? Won't that be confusing and also burdensome
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
Xuefu Zhang mailto:xzh...@cloudera.com May 15, 2015 at 22:29 Thanks for the explanation, Alan! While I have understood more on the proposal, I actually see more problems than the confusion of two lines of releases. Essentially, this proposal forces a user to make a hard choice between a stabler, legacy-aware release line and an adventurous, pioneering release line. And once the choice is made, there is no easy way back or forward. Here is my interpretation. Let's say we have two main branches as proposed. I develop a new feature which I think useful for both branches. So, I commit it to both branches. My feature requires additional schema support, so I provide upgrade scripts for both branches. The scripts are different because the two branches have already diverged in schema. Now the two branches evolve in a diverging fashion like this. This is all good as long as a user stays in his line. The moment the user considers a switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because there is no upgrade path from a release in branch-1 to a release in branch-2! If we want to provide an upgrade path, then there will be MxN paths, where M and N are the number of releases in the two branches, respectively. This is going to be next to a nightmare, not only for users, but also for us. MxN would indeed be bad, but there is no reason to approach it that way. It's highly unlikely that users will want to migrate from 2.x - 1.y. And for a given 1.x release, we can assume that users will want to be able to migrate to the current head of branch for 2.y. So this means we would need two upgrade scripts from each 1.x release. This is extra effort but it is not that bad. Also, the proposal will require two sets of things that Hive provides: double documentation, double feature tracking, double build/test infrastructures, etc. Our documentation already handles the fact that certain features are only supported in certain releases. Our test and build infrastructure can already be made to work on multiple branches. I'm not sure what you mean by double feature tracking. This approach can also potentially cause the problem we saw in hadoop releases, where 0.23 release was greater than 1.0 release. I'm sorry, I don't follow what you're saying here. You mean the numbers are just bigger (like 23 1)? We already have that problem, this doesn't make it worse. To me, the problem we are trying to solve is deprecating old things such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, however, we approached the problem in less favorable ways. That is only one of the two problems. The other is to provide a mechanism for experimental features. First, it seemed we wanted to deprecate something just for the sake of deprecation, and it's not based on the rationale that supports the desire. Dev might write code that accidentally break hadoop-1 build. However, this is more a build infrastructure problem rather than the burden of supporting hadoop-1. If our build could catch it at precommit test, then I would think the accident can be well avoided. Most of the times, fixing the build is trivial. And we have already addressed the build infrastructure problem. Secondly, if we do have a strong reason to deprecate something, we should have a deprecation plan rather than declaring on the spot that the current release is the last one supporting X. I think Microsoft did a better job in terms production deprecation. For instance, they announced long before the last day desupporting Windows XP. In my opinion, we should have a similar vision, giving users, distributions enough time to adjust rather than shocking them with breaking news. In summary, I do see the need of deprecation in Hive, but I am afraid the way we take, including the proposal here, isn't going to nicely solve the problem. On the contrary, I foresee a spectrum of confusion, frustration, and burden for the user as well as for developers. Thanks, Xuefu Xuefu Zhang mailto:xzh...@cloudera.com May 15, 2015 at 17:31 Just make sure that I understand the proposal correctly: we are going to have two main branches, one for hadoop-1 and one for hadoop-2. New features are only merged to branch-2. That essentially says we stop development for hadoop-1, right? Are we also making two lines of releases: ene for branch-1 and one for branch-2? Won't that be confusing and also burdensome if we release say 1.3, 2.0, 2.1, 1.4... Please note that we will have hadoop 3 soon. What's the story there? Thanks, Xuefu On Fri, May 15, 2015 at 4:43 PM, Vaibhav Gumashtavgumas...@hortonworks.com wrote: +1 on the new branch. I think it’ll help in faster dev time for these important changes. —Vaibhav From: Alan Gatesalanfga...@gmail.com Reply-To: dev@hive.apache.orgdev@hive.apache.org Date: Friday, May 15, 2015 at 4:11 PM To: dev@hive.apache.orgdev@hive.apache.org Subject: Re: [DISCUSS]
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
Edward Capriolo mailto:edlinuxg...@gmail.com May 18, 2015 at 10:14 This concept of experimental features basically translates to I do not have the time to care about people not using my version. No, it does not. Continuing to support old features is a cost/benefit trade off, both for developers and users. The cost for developers is continuing to work around older code, the cost for users that they get less new features, less performance improvements, less stability improvements because developers are spending time working around the old code. At some point in the cost/benefit analysis the costs are high enough that it makes sense to stop supporting it. I am asserting that we are at that point. Caring about people not on the latest version is an important part of what I am proposing. There are still many users using Hive either on Hadoop 1 or for more traditional Hive workloads (batch, ETL). It is important to give these users a good path forward. My assertion is that a branch-1 is the best way to do this. So to continue in the cost/benefit paradigm, what I have proposed does have an additional cost for developers. As I have said in my responses to Xuefu, I don't think these are too bad, and I assert that they are less than continuing to carry forward older functionality ad infinitum. My intent is that for users who are not interested in new features or workloads the cost is at or near zero. Customers interested in newer functionality will continue to have pay the cost of upgrades, but that is true anyway. Alan. I do not see it as good. We have seen what happened to upstream hadoop there was this gap between 0.21 , and ??.??. No one was clear what the API was (mapred, new mapreduce), no one know what to link off of cdh?, vanilla?, yahoo distribution?. IMHO. This is just going to increase fragmentation. On Mon, May 18, 2015 at 1:04 PM, Edward Capriolo edlinuxg...@gmail.com Edward Capriolo mailto:edlinuxg...@gmail.com May 18, 2015 at 10:04 Up until recently Hive supported numerous versions of Hadoop code base with a simple shim layer. I would rather we stick to the shim layer. I think this was easily the best part about hive was that a single release worked well regardless of your hadoop version. It was also a key element to hive's success. I do not want to see us have multiple branches. Xuefu Zhang mailto:xzh...@cloudera.com May 15, 2015 at 22:29 Thanks for the explanation, Alan! While I have understood more on the proposal, I actually see more problems than the confusion of two lines of releases. Essentially, this proposal forces a user to make a hard choice between a stabler, legacy-aware release line and an adventurous, pioneering release line. And once the choice is made, there is no easy way back or forward. Here is my interpretation. Let's say we have two main branches as proposed. I develop a new feature which I think useful for both branches. So, I commit it to both branches. My feature requires additional schema support, so I provide upgrade scripts for both branches. The scripts are different because the two branches have already diverged in schema. Now the two branches evolve in a diverging fashion like this. This is all good as long as a user stays in his line. The moment the user considers a switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because there is no upgrade path from a release in branch-1 to a release in branch-2! If we want to provide an upgrade path, then there will be MxN paths, where M and N are the number of releases in the two branches, respectively. This is going to be next to a nightmare, not only for users, but also for us. Also, the proposal will require two sets of things that Hive provides: double documentation, double feature tracking, double build/test infrastructures, etc. This approach can also potentially cause the problem we saw in hadoop releases, where 0.23 release was greater than 1.0 release. To me, the problem we are trying to solve is deprecating old things such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, however, we approached the problem in less favorable ways. First, it seemed we wanted to deprecate something just for the sake of deprecation, and it's not based on the rationale that supports the desire. Dev might write code that accidentally break hadoop-1 build. However, this is more a build infrastructure problem rather than the burden of supporting hadoop-1. If our build could catch it at precommit test, then I would think the accident can be well avoided. Most of the times, fixing the build is trivial. And we have already addressed the build infrastructure problem. Secondly, if we do have a strong reason to deprecate something, we should have a deprecation plan rather than declaring on the spot that the current release is the last one supporting X. I think Microsoft did a better job in terms production
[jira] [Created] (HIVE-10741) count distinct rewrite is not firing
Ashutosh Chauhan created HIVE-10741: --- Summary: count distinct rewrite is not firing Key: HIVE-10741 URL: https://issues.apache.org/jira/browse/HIVE-10741 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Rewrite introduced in HIVE-10568 is not effective outside of test environment -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
I think that it is past time for Hive to have a stable and next branch. Every release from Hive 0.11 to Hive 1.2 has been a major release in terms of changes and functionality. Part of what we've been missing is a way of making stable releases that don't move as fast and supports the customers with minor new features, but no big sweeping changes. That will be a win for users. I'm +1 on Alan's plan of making a new release branch. .. Owen
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
I think we need some path for deprecating old Hadoop versions, the same way we deprecate old Java version support or old RDBMS version support. At some point the cost of supporting Hadoop 1 exceeds the benefit. Same goes for stuff like MR; supporting it, esp. for perf work, becomes a burden, and it’s outdated with 2 alternatives, one of which has been around for 2 releases. The branches are a graceful way to get rid of the legacy burden. Alternatively, when sweeping changes are made, we can do what Hbase did (which is not pretty imho), where 0.94 version had ~30 dot releases because people cannot upgrade to 0.96 “singularity” release. I posit that people who run Hadoop 1 and MR at this day and age (and more so as time passes) are people who either don’t care about perf and new features, only stability; so, stability-focused branch would be perfect to support them. On 15/5/18, 10:04, Edward Capriolo edlinuxg...@gmail.com wrote: Up until recently Hive supported numerous versions of Hadoop code base with a simple shim layer. I would rather we stick to the shim layer. I think this was easily the best part about hive was that a single release worked well regardless of your hadoop version. It was also a key element to hive's success. I do not want to see us have multiple branches. On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang xzh...@cloudera.com wrote: Thanks for the explanation, Alan! While I have understood more on the proposal, I actually see more problems than the confusion of two lines of releases. Essentially, this proposal forces a user to make a hard choice between a stabler, legacy-aware release line and an adventurous, pioneering release line. And once the choice is made, there is no easy way back or forward. Here is my interpretation. Let's say we have two main branches as proposed. I develop a new feature which I think useful for both branches. So, I commit it to both branches. My feature requires additional schema support, so I provide upgrade scripts for both branches. The scripts are different because the two branches have already diverged in schema. Now the two branches evolve in a diverging fashion like this. This is all good as long as a user stays in his line. The moment the user considers a switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because there is no upgrade path from a release in branch-1 to a release in branch-2! If we want to provide an upgrade path, then there will be MxN paths, where M and N are the number of releases in the two branches, respectively. This is going to be next to a nightmare, not only for users, but also for us. Also, the proposal will require two sets of things that Hive provides: double documentation, double feature tracking, double build/test infrastructures, etc. This approach can also potentially cause the problem we saw in hadoop releases, where 0.23 release was greater than 1.0 release. To me, the problem we are trying to solve is deprecating old things such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, however, we approached the problem in less favorable ways. First, it seemed we wanted to deprecate something just for the sake of deprecation, and it's not based on the rationale that supports the desire. Dev might write code that accidentally break hadoop-1 build. However, this is more a build infrastructure problem rather than the burden of supporting hadoop-1. If our build could catch it at precommit test, then I would think the accident can be well avoided. Most of the times, fixing the build is trivial. And we have already addressed the build infrastructure problem. Secondly, if we do have a strong reason to deprecate something, we should have a deprecation plan rather than declaring on the spot that the current release is the last one supporting X. I think Microsoft did a better job in terms production deprecation. For instance, they announced long before the last day desupporting Windows XP. In my opinion, we should have a similar vision, giving users, distributions enough time to adjust rather than shocking them with breaking news. In summary, I do see the need of deprecation in Hive, but I am afraid the way we take, including the proposal here, isn't going to nicely solve the problem. On the contrary, I foresee a spectrum of confusion, frustration, and burden for the user as well as for developers. Thanks, Xuefu On Fri, May 15, 2015 at 8:19 PM, Alan Gates alanfga...@gmail.com wrote: Xuefu Zhang xzh...@cloudera.com May 15, 2015 at 17:31 Just make sure that I understand the proposal correctly: we are going to have two main branches, one for hadoop-1 and one for hadoop-2. We shouldn't tie this to hadoop-1 and 2. It's about Hive not Hadoop. It will be some time before Hive's branch-2 is stable, while Hadoop-2 is already well established. New features are only merged to branch-2. That essentially says we stop development
Re: [DISCUSS] Supporting Hadoop-1 and experimental features
Note: by “cannot” I mean “are unwilling to”; upgrade paths exist, but some people are set in their ways or have practical considerations and don’t care for new shiny stuff. On 15/5/18, 11:46, Sergey Shelukhin ser...@hortonworks.com wrote: I think we need some path for deprecating old Hadoop versions, the same way we deprecate old Java version support or old RDBMS version support. At some point the cost of supporting Hadoop 1 exceeds the benefit. Same goes for stuff like MR; supporting it, esp. for perf work, becomes a burden, and it’s outdated with 2 alternatives, one of which has been around for 2 releases. The branches are a graceful way to get rid of the legacy burden. Alternatively, when sweeping changes are made, we can do what Hbase did (which is not pretty imho), where 0.94 version had ~30 dot releases because people cannot upgrade to 0.96 “singularity” release. I posit that people who run Hadoop 1 and MR at this day and age (and more so as time passes) are people who either don’t care about perf and new features, only stability; so, stability-focused branch would be perfect to support them. On 15/5/18, 10:04, Edward Capriolo edlinuxg...@gmail.com wrote: Up until recently Hive supported numerous versions of Hadoop code base with a simple shim layer. I would rather we stick to the shim layer. I think this was easily the best part about hive was that a single release worked well regardless of your hadoop version. It was also a key element to hive's success. I do not want to see us have multiple branches. On Sat, May 16, 2015 at 1:29 AM, Xuefu Zhang xzh...@cloudera.com wrote: Thanks for the explanation, Alan! While I have understood more on the proposal, I actually see more problems than the confusion of two lines of releases. Essentially, this proposal forces a user to make a hard choice between a stabler, legacy-aware release line and an adventurous, pioneering release line. And once the choice is made, there is no easy way back or forward. Here is my interpretation. Let's say we have two main branches as proposed. I develop a new feature which I think useful for both branches. So, I commit it to both branches. My feature requires additional schema support, so I provide upgrade scripts for both branches. The scripts are different because the two branches have already diverged in schema. Now the two branches evolve in a diverging fashion like this. This is all good as long as a user stays in his line. The moment the user considers a switch, mostly likely, from branch-1 to branch-2, he is stuck. Why? Because there is no upgrade path from a release in branch-1 to a release in branch-2! If we want to provide an upgrade path, then there will be MxN paths, where M and N are the number of releases in the two branches, respectively. This is going to be next to a nightmare, not only for users, but also for us. Also, the proposal will require two sets of things that Hive provides: double documentation, double feature tracking, double build/test infrastructures, etc. This approach can also potentially cause the problem we saw in hadoop releases, where 0.23 release was greater than 1.0 release. To me, the problem we are trying to solve is deprecating old things such hadoop-1, Hive CLI, etc. This a valid problem to be solved. As I see, however, we approached the problem in less favorable ways. First, it seemed we wanted to deprecate something just for the sake of deprecation, and it's not based on the rationale that supports the desire. Dev might write code that accidentally break hadoop-1 build. However, this is more a build infrastructure problem rather than the burden of supporting hadoop-1. If our build could catch it at precommit test, then I would think the accident can be well avoided. Most of the times, fixing the build is trivial. And we have already addressed the build infrastructure problem. Secondly, if we do have a strong reason to deprecate something, we should have a deprecation plan rather than declaring on the spot that the current release is the last one supporting X. I think Microsoft did a better job in terms production deprecation. For instance, they announced long before the last day desupporting Windows XP. In my opinion, we should have a similar vision, giving users, distributions enough time to adjust rather than shocking them with breaking news. In summary, I do see the need of deprecation in Hive, but I am afraid the way we take, including the proposal here, isn't going to nicely solve the problem. On the contrary, I foresee a spectrum of confusion, frustration, and burden for the user as well as for developers. Thanks, Xuefu On Fri, May 15, 2015 at 8:19 PM, Alan Gates alanfga...@gmail.com wrote: Xuefu Zhang xzh...@cloudera.com May 15, 2015 at 17:31 Just make sure that I understand the proposal correctly: we are going to have two main branches, one for hadoop-1 and one for hadoop-2. We shouldn't tie
Review Request 34368: HIVE-10550: Dynamic RDD caching optimization for HoS.[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34368/ --- Review request for hive and chengxiang li. Bugs: HIVE-10550 https://issues.apache.org/jira/browse/HIVE-10550 Repository: hive-git Description --- See jira description. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java d5ea96a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 19d3fee ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 26cfebd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 8b15099 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/LocalSparkJobStatus.java 5d62596 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 8e56263 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java 5990d17 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java fb20080 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 ql/src/test/results/clientpositive/spark/ppd_outer_join3.q.out 6a0654a spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java af6332e spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java beed8a3 spark-client/src/main/java/org/apache/hive/spark/client/MonitorCallback.java e1e899e spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java b77c9e8 spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java d33ad7e Diff: https://reviews.apache.org/r/34368/diff/ Testing --- Thanks, Xuefu Zhang
GenericUDF.getConstantLongValue
Hello Everyone There is a bug in GenericUDF.getConstantLongValue. There are 2 patches available: 1. fix the bug https://issues.apache.org/jira/browse/HIVE-10580 2. delete the method because it's not used https://issues.apache.org/jira/browse/HIVE-10710 Can any committer +1 on one or another solution. I'm fine with any solution. Thank you Alex
[ANNOUNCE] Apache Hive 1.2.0 Released
The Apache Hive team is proud to announce the the release of Apache Hive version 1.2.0. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * Query execution via Apache Hadoop MapReduce, Apache Tez or Apache Spark frameworks. For Hive release details and downloads, please visit: https://hive.apache.org/downloads.html Hive 1.2.0 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345styleName=TextprojectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team
[jira] [Created] (HIVE-10742) rename_table_location.q test fails
Vikram Dixit K created HIVE-10742: - Summary: rename_table_location.q test fails Key: HIVE-10742 URL: https://issues.apache.org/jira/browse/HIVE-10742 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0, 1.3.0 Reporter: Vikram Dixit K Assignee: Sushanth Sowmyan The test rename_table_location.q fails all the time but is not being caught by the HiveQA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 33968: HIVE-10644 create SHA2 UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33968/#review84217 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java https://reviews.apache.org/r/33968/#comment135357 In retrospect, I wish these paremeter utility methods had been put into a utility class rather than in GenericUDF - I feel like we are adding a lot of clutter to a class that users use to subclass. Not sure if it's too late to do something about this - I see on HIVE-10580 there is some discussion about whether this can be removed. Can you either create a new UDF params utility class for these, or just add these methods directly to GenericUDFSha2 for now? - Jason Dere On May 13, 2015, 5:48 a.m., Alexander Pivovarov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33968/ --- (Updated May 13, 2015, 5:48 a.m.) Review request for hive and Jason Dere. Bugs: HIVE-10644 https://issues.apache.org/jira/browse/HIVE-10644 Repository: hive-git Description --- HIVE-10644 create SHA2 UDF Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 02a604ff0a4ed92dfd94b199e8b539f636b66f77 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java b043bdc882af7c0b83787526a5a55c9dc29c6681 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSha2.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSha2.java PRE-CREATION ql/src/test/queries/clientpositive/udf_sha2.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out a422760400c62d026324dd667e4a632bfbe01b82 ql/src/test/results/clientpositive/udf_sha2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/33968/diff/ Testing --- Thanks, Alexander Pivovarov
Re: Review Request 33968: HIVE-10644 create SHA2 UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33968/ --- (Updated May 18, 2015, 10:24 p.m.) Review request for hive and Jason Dere. Changes --- added GenericUDFParamUtils Bugs: HIVE-10644 https://issues.apache.org/jira/browse/HIVE-10644 Repository: hive-git Description --- HIVE-10644 create SHA2 UDF Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 02a604ff0a4ed92dfd94b199e8b539f636b66f77 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFParamUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSha2.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSha2.java PRE-CREATION ql/src/test/queries/clientpositive/udf_sha2.q PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out a422760400c62d026324dd667e4a632bfbe01b82 ql/src/test/results/clientpositive/udf_sha2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/33968/diff/ Testing --- Thanks, Alexander Pivovarov
Re: [ANNOUNCE] Apache Hive 1.2.0 Released
Thanks for driving this Sushanth! On Mon, May 18, 2015 at 2:25 PM, Sushanth Sowmyan khorg...@apache.org wrote: The Apache Hive team is proud to announce the the release of Apache Hive version 1.2.0. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * Query execution via Apache Hadoop MapReduce, Apache Tez or Apache Spark frameworks. For Hive release details and downloads, please visit: https://hive.apache.org/downloads.html Hive 1.2.0 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345styleName=TextprojectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team
[jira] [Created] (HIVE-10747) enable the cleanup side effect for Encryption related qfile test
Ferdinand Xu created HIVE-10747: --- Summary: enable the cleanup side effect for Encryption related qfile test Key: HIVE-10747 URL: https://issues.apache.org/jira/browse/HIVE-10747 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Ferdinand Xu Assignee: Ferdinand Xu The hive conf is not reset in the clearTestSideEffects method which is involved from HIVE-8900. This will have pollute other qfile's settings running by TestEncryptedHDFSCliDriver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientpositive/udaf_collect_list_set_nested.q PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_list_set_nested.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_nested.q Thanks, Chao Sun
Re: Review Request 34393: HIVE-10427 - collect_list() and collect_set() should accept struct types as argument
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/#review84260 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java https://reviews.apache.org/r/34393/#comment135437 should we also support arrays and unions? ql/src/test/queries/clientpositive/udaf_collect_list_set_nested.q https://reviews.apache.org/r/34393/#comment135438 add a negative test to validate unsupported types? - Lenni Kuff On May 19, 2015, 4:47 a.m., Chao Sun wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34393/ --- (Updated May 19, 2015, 4:47 a.m.) Review request for hive. Bugs: HIVE-10427 https://issues.apache.org/jira/browse/HIVE-10427 Repository: hive-git Description --- Currently for collect_list() and collect_set(), only primitive types are supported. This patch adds support for struct and map types as well. It turned out I that all I need is loosen the type checking. Diffs - data/files/customers.txt PRE-CREATION data/files/nested_orders.txt PRE-CREATION data/files/orders.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java 536c4a7 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java 6dc424a ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java efcc8f5 ql/src/test/queries/clientpositive/udaf_collect_list_set_nested.q PRE-CREATION ql/src/test/results/clientpositive/udaf_collect_list_set_nested.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34393/diff/ Testing --- All but one test (which seems unrelated) are passing. I also added a test: udaf_collect_list_set_nested.q Thanks, Chao Sun
[jira] [Created] (HIVE-10745) Better null handling by Vectorizer
Ashutosh Chauhan created HIVE-10745: --- Summary: Better null handling by Vectorizer Key: HIVE-10745 URL: https://issues.apache.org/jira/browse/HIVE-10745 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Minor refactoring around null handling in Vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 34385: Better null handling by Vectorizer
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34385/ --- Review request for hive and Gopal V. Bugs: HIVE-10745 https://issues.apache.org/jira/browse/HIVE-10745 Repository: hive-git Description --- Better null handling by Vectorizer Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeEvaluatorFactory.java f08321c ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 48f34a9 Diff: https://reviews.apache.org/r/34385/diff/ Testing --- Thanks, Ashutosh Chauhan
[jira] [Created] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
Greg Senia created HIVE-10746: - Summary: Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 1.2.0, 0.14.0, 0.14.1, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \t line.delim \n serialization.format\t Time taken: 1.245 seconds, Fetched: 54 row(s) Explain Hive 1.2.0 w/Tez: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE)
[jira] [Created] (HIVE-10743) LLAP: rare NPE in IO
Sergey Shelukhin created HIVE-10743: --- Summary: LLAP: rare NPE in IO Key: HIVE-10743 URL: https://issues.apache.org/jira/browse/HIVE-10743 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin {noformat} 2015-05-18 15:37:33,702 [TezTaskRunner_attempt_1431919257083_0116_1_00_09_0(container_1_0116_01_10_sershe_20150518153700_b3649675-c035-4d9a-8dfb-2818b0173022:1_Map 1_9_0)] INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://cn041-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpch_orc_snappy_1000.db/lineitem/93_0 2015-05-18 15:37:33,743 [IO-Elevator-Thread-9(container_1_0116_01_10_sershe_20150518153700_b3649675-c035-4d9a-8dfb-2818b0173022:1_Map 1_9_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Resulting disk ranges to read (file 7895017): [{range start: 28153685 end: 70814209}] 2015-05-18 15:37:33,743 [IO-Elevator-Thread-9(container_1_0116_01_10_sershe_20150518153700_b3649675-c035-4d9a-8dfb-2818b0173022:1_Map 1_9_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Disk ranges after cache (file 7895017, base offset 3): [{range start: 28153685 end: 70814209}] 2015-05-18 15:37:33,791 [IO-Elevator-Thread-9(container_1_0116_01_10_sershe_20150518153700_b3649675-c035-4d9a-8dfb-2818b0173022:1_Map 1_9_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Disk ranges after disk read (file 7895017, base offset 3): [{data range [28153685, 70814209), size: 42660524 type: direct}] 2015-05-18 15:37:33,804 [IO-Elevator-Thread-9(container_1_0116_01_10_sershe_20150518153700_b3649675-c035-4d9a-8dfb-2818b0173022:1_Map 1_9_0)] INFO org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: setError called; closed false, done false, err null, pending 0 ... Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.InStream.readEncodedStream(InStream.java:763) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:445) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:294) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:56) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {noformat} Not sure yet how this happened. May add some logging or look more if I see it again -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10744) LLAP: dags get stuck in yet another way
Sergey Shelukhin created HIVE-10744: --- Summary: LLAP: dags get stuck in yet another way Key: HIVE-10744 URL: https://issues.apache.org/jira/browse/HIVE-10744 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Siddharth Seth DAG gets stuck when number of tasks that is multiple of number of containers on machine (6, 12, ... in my case) fails to finish at the end of the stage (I am running a job with 500-1000 maps). Happened twice on 3rd DAG with 1000-map job (TPCH Q1), then when I reduced to 500 happened on 7th DAG so far. [~sseth] has the details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)