[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233993#comment-14233993 ] Rui Li commented on HIVE-8991: -- I looked a little more into this. It seems hive-exec is properly added to class path (as user application jar in {{SparkSubmit}}) and class loader can load {{HiveIgnoreKeyTextOutputFormat}}: {noformat} 2014-12-04 08:35:33,383 INFO [stdout-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(384)) - [Loaded org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat from file:/home/hive/packaging/target/apache-hive-0.15.0-SNAPSHOT-bin/apache-hive-0.15.0-SNAPSHOT-bin/lib/hive-exec-0.15.0-SNAPSHOT.jar] {noformat} Nevertheless I still get the following error: {noformat} 2014-12-04 08:32:26,681 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(384)) - java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat {noformat} Besides, the exception is thrown when we try to deserialize SparkWork in the job, which means {{org.apache.hadoop.hive.ql.exec.spark.KryoSerializer}} has been loaded properly. I'll do more debugging. Wondering if it's possible the error message is not accurate. As for {{SparkSubmitDriverBootstrapper}} hanging issue, it's because it calls System.exit in a shutdown hook which causes deadlock. It's been fixed in latest branch. Fix custom_input_output_format [Spark Branch] - Key: HIVE-8991 URL: https://issues.apache.org/jira/browse/HIVE-8991 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8991.1-spark.patch After HIVE-8836, {{custom_input_output_format}} fails because of missing hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9019) Avoid using SPARK_JAVA_OPTS [Spark Branch]
Rui Li created HIVE-9019: Summary: Avoid using SPARK_JAVA_OPTS [Spark Branch] Key: HIVE-9019 URL: https://issues.apache.org/jira/browse/HIVE-9019 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li SPARK_JAVA_OPTS has been deprecated, see {{SparkConf.validateSettings}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9019) Avoid using SPARK_JAVA_OPTS [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9019: - Issue Type: Sub-task (was: Test) Parent: HIVE-7292 Avoid using SPARK_JAVA_OPTS [Spark Branch] -- Key: HIVE-9019 URL: https://issues.apache.org/jira/browse/HIVE-9019 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li SPARK_JAVA_OPTS has been deprecated, see {{SparkConf.validateSettings}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9018) SHOW GRANT ROLE in Hive should return grant_time in human readable format
[ https://issues.apache.org/jira/browse/HIVE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HIVE-9018: - Attachment: HIVE-9018.003.patch SHOW GRANT ROLE in Hive should return grant_time in human readable format --- Key: HIVE-9018 URL: https://issues.apache.org/jira/browse/HIVE-9018 Project: Hive Issue Type: Improvement Reporter: Dapeng Sun Priority: Minor Attachments: HIVE-9018.003.patch Currently, SHOW GRANT ROLE will return the 'grant_time' in microseconds since epoch. It would be nice if this were in human readable format. Current output: 1411801585902000 Desired output: Sat, Sep 27 2014 00:06:25.902 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9018) SHOW GRANT ROLE in Hive should return grant_time in human readable format
[ https://issues.apache.org/jira/browse/HIVE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HIVE-9018: - Attachment: (was: HIVE-9018.002.patch) SHOW GRANT ROLE in Hive should return grant_time in human readable format --- Key: HIVE-9018 URL: https://issues.apache.org/jira/browse/HIVE-9018 Project: Hive Issue Type: Improvement Reporter: Dapeng Sun Priority: Minor Attachments: HIVE-9018.003.patch Currently, SHOW GRANT ROLE will return the 'grant_time' in microseconds since epoch. It would be nice if this were in human readable format. Current output: 1411801585902000 Desired output: Sat, Sep 27 2014 00:06:25.902 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9018) SHOW GRANT ROLE in Hive should return grant_time in human readable format
[ https://issues.apache.org/jira/browse/HIVE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HIVE-9018: - Attachment: (was: HIVE-9018.patch) SHOW GRANT ROLE in Hive should return grant_time in human readable format --- Key: HIVE-9018 URL: https://issues.apache.org/jira/browse/HIVE-9018 Project: Hive Issue Type: Improvement Reporter: Dapeng Sun Priority: Minor Attachments: HIVE-9018.003.patch Currently, SHOW GRANT ROLE will return the 'grant_time' in microseconds since epoch. It would be nice if this were in human readable format. Current output: 1411801585902000 Desired output: Sat, Sep 27 2014 00:06:25.902 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.
Anant Nag created HIVE-9020: --- Summary: When dropping external tables, Hive should not verify whether user has access to the data. Key: HIVE-9020 URL: https://issues.apache.org/jira/browse/HIVE-9020 Project: Hive Issue Type: Bug Reporter: Anant Nag When dropping tables, hive verifies whether the user has access to the data on hdfs. It fails, if user doesn't have access. It makes sense for internal tables since the data has to be deleted when dropping internal tables but for external tables, Hive should not check for data access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9021) Hive should not allow any user to create tables in other hive DB's that user doesn't own
Anant Nag created HIVE-9021: --- Summary: Hive should not allow any user to create tables in other hive DB's that user doesn't own Key: HIVE-9021 URL: https://issues.apache.org/jira/browse/HIVE-9021 Project: Hive Issue Type: Bug Reporter: Anant Nag Hive allows users to create tables in other users db. This should not be allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9022) When creating external tables, Hive needs to verify whether the user has read permissions to the data
Anant Nag created HIVE-9022: --- Summary: When creating external tables, Hive needs to verify whether the user has read permissions to the data Key: HIVE-9022 URL: https://issues.apache.org/jira/browse/HIVE-9022 Project: Hive Issue Type: Bug Reporter: Anant Nag Hive doesn't verify whether user has read permissions on the data before creating external table referring to the data. This needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9016: Attachment: HIVE-9016.1-spark.patch It's weird that unit test has not been triggered after 6 hours, upload patch again. SparkCounter display name is not set correctly[Spark Branch] Key: HIVE-9016 URL: https://issues.apache.org/jira/browse/HIVE-9016 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch SparkCounter displayName is set with SparkCounterGroup displayName, we should not do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8783: Attachment: HIVE-8783.1-spark.patch Actually, Hive already has stats_counter.q and stats_counter_partitioned.q for unit test of table statistic collection on Counter. stats_counter.q has enabled yet, I enable stats_counter_partitioned.q in this patch. Create some tests that use Spark counter for stats collection [Spark Branch] Key: HIVE-8783 URL: https://issues.apache.org/jira/browse/HIVE-8783 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-8783.1-spark.patch Currently when .q tests are run with Spark, the default stats collection is fs. We need to have some tests that use Spark counter for stats collection to enhance coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8783: Status: Patch Available (was: Open) Create some tests that use Spark counter for stats collection [Spark Branch] Key: HIVE-8783 URL: https://issues.apache.org/jira/browse/HIVE-8783 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-8783.1-spark.patch Currently when .q tests are run with Spark, the default stats collection is fs. We need to have some tests that use Spark counter for stats collection to enhance coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.
[ https://issues.apache.org/jira/browse/HIVE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9020: Affects Version/s: 0.13.1 Status: Patch Available (was: Open) Hive now doesn't verify whether the user has access to the data while dropping an external table. It also checks now whether the user is the owner of the table before dropping it. When dropping external tables, Hive should not verify whether user has access to the data. --- Key: HIVE-9020 URL: https://issues.apache.org/jira/browse/HIVE-9020 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Anant Nag When dropping tables, hive verifies whether the user has access to the data on hdfs. It fails, if user doesn't have access. It makes sense for internal tables since the data has to be deleted when dropping internal tables but for external tables, Hive should not check for data access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.
[ https://issues.apache.org/jira/browse/HIVE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9020: Attachment: dropExternal.patch Hive now doesn't verify whether the user has access to the data while dropping an external table. It also checks now whether the user is the owner of the table before dropping it. When dropping external tables, Hive should not verify whether user has access to the data. --- Key: HIVE-9020 URL: https://issues.apache.org/jira/browse/HIVE-9020 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Anant Nag Attachments: dropExternal.patch When dropping tables, hive verifies whether the user has access to the data on hdfs. It fails, if user doesn't have access. It makes sense for internal tables since the data has to be deleted when dropping internal tables but for external tables, Hive should not check for data access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9022) When creating external tables, Hive needs to verify whether the user has read permissions to the data
[ https://issues.apache.org/jira/browse/HIVE-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9022: Attachment: createExternal.patch When creating external tables, Hive needs to verify whether the user has read permissions to the data - Key: HIVE-9022 URL: https://issues.apache.org/jira/browse/HIVE-9022 Project: Hive Issue Type: Bug Reporter: Anant Nag Attachments: createExternal.patch Hive doesn't verify whether user has read permissions on the data before creating external table referring to the data. This needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9022) When creating external tables, Hive needs to verify whether the user has read permissions to the data
[ https://issues.apache.org/jira/browse/HIVE-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9022: Labels: patch (was: ) Affects Version/s: 0.13.1 Status: Patch Available (was: Open) The user should have read and execute permissions of the parent folder of the data location as well as the location itself. Hive now checks if both parent and data location has read and execute permissions before creating the table. When creating external tables, Hive needs to verify whether the user has read permissions to the data - Key: HIVE-9022 URL: https://issues.apache.org/jira/browse/HIVE-9022 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Anant Nag Labels: patch Attachments: createExternal.patch Hive doesn't verify whether user has read permissions on the data before creating external table referring to the data. This needs to be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9021) Hive should not allow any user to create tables in other hive DB's that user doesn't own
[ https://issues.apache.org/jira/browse/HIVE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9021: Attachment: db.patch Hive should not allow any user to create tables in other hive DB's that user doesn't own Key: HIVE-9021 URL: https://issues.apache.org/jira/browse/HIVE-9021 Project: Hive Issue Type: Bug Reporter: Anant Nag Attachments: db.patch Hive allows users to create tables in other users db. This should not be allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9021) Hive should not allow any user to create tables in other hive DB's that user doesn't own
[ https://issues.apache.org/jira/browse/HIVE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anant Nag updated HIVE-9021: Labels: patch (was: ) Affects Version/s: 0.13.1 Status: Patch Available (was: Open) Hive now checks if the user is owner of the database before creating table in the database. Hive should not allow any user to create tables in other hive DB's that user doesn't own Key: HIVE-9021 URL: https://issues.apache.org/jira/browse/HIVE-9021 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Anant Nag Labels: patch Attachments: db.patch Hive allows users to create tables in other users db. This should not be allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234082#comment-14234082 ] Rui Li commented on HIVE-8991: -- Not sure if it's because how we add hive-exec: If added dynamically (as application jar), spark loads it with {{ExecutorURLClassLoader}} and set it as the thread's ContextClassLoader. Then we hit the NoClassDefFoundError. If added to {{spark.driver.extraClassPath}}, then it's loaded with the system class loader and the error is gone. Fix custom_input_output_format [Spark Branch] - Key: HIVE-8991 URL: https://issues.apache.org/jira/browse/HIVE-8991 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8991.1-spark.patch After HIVE-8836, {{custom_input_output_format}} fails because of missing hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9023) HiveHistoryImpl relies on removed counters to print num rows
Slava Markeyev created HIVE-9023: Summary: HiveHistoryImpl relies on removed counters to print num rows Key: HIVE-9023 URL: https://issues.apache.org/jira/browse/HIVE-9023 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1, 0.13.0, 0.14.0, 0.14.1 Reporter: Slava Markeyev Priority: Minor HiveHistoryImpl still relies on the counters that were removed in HIVE-5982 to determine the number of rows loaded. This results in regression of functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8974: -- Attachment: (was: HIVE-8974.03.patch) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8974: -- Attachment: (was: HIVE-8974.03.patch) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8974: -- Attachment: HIVE-8974.03.patch Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.03.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Apache Hive 1.0 ?
On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote: Hi, I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I think both HBase and Hive are past due for doing 1.0 releases. So I am a major +1 for Hive-1.0 (non-binding of course). Agreed :) The important thing for calling something 1.0 I think is the focus on user level API and compatibility issues. I think we need to remember that not all our users write SQL statements. For example Drill, Spark SQL, PrestoDB, Impala, Kylin, Ranger, Sentry, Carl's view project at LI, and more are all users of Hive as well. At the user meetup tonight Carl suggest we get our act together in regards to documenting which API's are public or not, for these users. I think that makes a lot of sense. But still, you should think about future releases and for example when you can do a 1.x release versus 2.x release. We have started thinking about that some time ago, and we are adopting a semantic versioning proposal ( https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E ) for this exact same reason. In Hive, things may be a bit different than HBase or Hadoop (since the major interface is SQL) but still I think you should consider the implications for all the APIs that Hive surfaces and for deployment, etc for a 1.0 discussion. For HBase, the official theme of the 1.0 release is (from my RC mail): The theme of (eventual) 1.0 release is to become a stable base for future 1.x series of releases. 1.0 release will aim to achieve at least the same level of stability of 0.98 releases without introducing too many new features. What I am getting at is that, in HBase, we opted for not introducing a lot of major features and branched relatively early to give more time to stabilize the branch. In the end what you want to deliver and market as 1.0 should be relatively stable in my opinion. Just my 2 cents from an outsider perspective. Enis On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com wrote: Would everyone just laugh if I suggested that a 1.0 release ought to include complete documentation? -- Lefty On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com wrote: The reasons for confusion in the Hadoop case were different. There were many branches, and new features were added in minor version releases, eg kerberos security was not there in 0.20.2, but it was added in 0.20.20x. Then you had other versions like 0.21, but the older 0.20.20x version was the one that was converted as 1.x. This confusion isn't there in hive. In case of hive, every 0.x release has been adding new features, and releases have been sequential. 0.x.y releases have been maintenance releases. 1.0 is a sequential release after 0.14, and it is a newer release than 0.14. I agree that the version in Hadoop created lot of confusion, but I don't see this as being the same. We could check in the user mailing list to see if they are going to be HUGELY confused by this. If it makes things better, we can also include the change to delete HiveServer1 in the new release. That is a safer change, which was mainly just deleting that old code. That would be a major difference from 0.14. (The docs have already been updated to say that 0.14 does not support 0.20, so I don't think we need that in 1.0). Looks like we have agreement that 1.0 versioning scheme is a great thing for hive. I don't think there is a strong reason to delay a 1.0 release by several months to the detriment of hive. On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: Major release means more functionality, while minor releases provides stability. Therefore, I'd think, 1.0, as a major release, should bring in something new to the user. If it's desirable to provide more stable release, then 0.14.1, 0.14.2, and so on are the right ones. In my opinion, we should avoid doing anti-pattern by introducing major release like a maintenance release and creating confusions among users. In one word, major release is NOT equal to major confusion. --Xuefu On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin ser...@hortonworks.com wrote: I think it's better to do 1.0 release off a maintenance release, since that is more stable. Trunk is moving fast. HBase uses odd release numbers for this purpose, where 0.95, 97, 99 etc. are dev releases and 0.96, 0.98, 1.0 etc. are public; that works well for baking, but since we don't have that seems like 14.0 would be a good place to bake. 15.0 with bunch of new bugs that we are busy introducing may not be as good for 1.0 IMHO... On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com wrote: Hi
Re: Review Request 28283: HIVE-8900:Create encryption testing framework
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28283/#review63681 --- Ship it! Ship It! - Sergio Pena On Dic. 3, 2014, 1:02 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28283/ --- (Updated Dic. 3, 2014, 1:02 a.m.) Review request for hive. Repository: hive-git Description --- The patch includes: 1. enable security properties for hive security cluster Diffs - .gitignore c5decaf data/scripts/q_test_cleanup_for_encryption.sql PRE-CREATION data/scripts/q_test_init_for_encryption.sql PRE-CREATION itests/qtest/pom.xml 376f4a9 itests/src/test/resources/testconfiguration.properties 3ae001d itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 31d5c29 ql/src/test/queries/clientpositive/create_encrypted_table.q PRE-CREATION shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 2e00d93 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 8161fc1 shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java fa66a4a Diff: https://reviews.apache.org/r/28283/diff/ Testing --- Thanks, cheng xu
Re: Review Request 27713: CBO: enable groupBy index
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/#review63710 --- Ship it! Ship It! - John Pullokkaran On Dec. 2, 2014, 11:18 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/ --- (Updated Dec. 2, 2014, 11:18 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. The basic problem is that for SEL1-SEL2-GRY-...-SEL3, the previous version only modify SEL2, which immediately precedes GRY. Now, with CBO, we have lots of SELs, e.g., SEL1. So, the solution is to modify all of them. Diffs - itests/src/test/resources/testconfiguration.properties fc1f345 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 9ffa708 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java 02216de ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 0f06ec9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java 74614f3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java d699308 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27713/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 27713: CBO: enable groupBy index
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/#review63711 --- Ship it! - John Pullokkaran On Dec. 2, 2014, 11:18 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/ --- (Updated Dec. 2, 2014, 11:18 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. The basic problem is that for SEL1-SEL2-GRY-...-SEL3, the previous version only modify SEL2, which immediately precedes GRY. Now, with CBO, we have lots of SELs, e.g., SEL1. So, the solution is to modify all of them. Diffs - itests/src/test/resources/testconfiguration.properties fc1f345 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 9ffa708 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java 02216de ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 0f06ec9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java 74614f3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java d699308 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27713/diff/ Testing --- Thanks, pengcheng xiong
Review Request 28699: HIVE-8783 Create some tests that use Spark counter for stats collection [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28699/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8783 https://issues.apache.org/jira/browse/HIVE-8783 Repository: hive-git Description --- Hive already has stats_counter.q and stats_counter_partitioned.q for unit test of table statistic collection on Counter. stats_counter.q has enabled yet, I enable stats_counter_partitioned.q in this patch. Diffs - itests/src/test/resources/testconfiguration.properties 09c667e ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 ql/src/test/results/clientpositive/spark/stats_counter_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28699/diff/ Testing --- Thanks, chengxiang li
Re: SVN server hanging
Apache infra team is looking into it . -- Forwarded message -- From: Geoffrey Corey cor...@apache.org Date: Wed, Dec 3, 2014 at 9:56 AM Subject: Notice: Subversion master undergoing emergency maintenance To: committ...@apache.org Eris is currently undergoing some emergency maintenance due to disk errors. We do not currently have an ETA on when this will be fixed. In the meantime, there will be no access to commit to SVN. The read-only mirror at svn.eu.apache.org is still working. The blog post can be found here. [1] [1] - https://blogs.apache.org/infra/entry/subversion_master_undergoing_emergency_maintenance -- Geoff On behalf of Infra. On Wed, Dec 3, 2014 at 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote: It seems Hive svn server is hanging. Does anyone have means to restart it? Thanks, Xuefu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: SVN server hanging
https://blogs.apache.org/infra/entry/subversion_master_undergoing_emergency_maintenance On Dec 3, 2014 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote: It seems Hive svn server is hanging. Does anyone have means to restart it? Thanks, Xuefu
Re: Review Request 27713: CBO: enable groupBy index
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/ --- (Updated Dec. 3, 2014, 7:40 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. The basic problem is that for SEL1-SEL2-GRY-...-SEL3, the previous version only modify SEL2, which immediately precedes GRY. Now, with CBO, we have lots of SELs, e.g., SEL1. So, the solution is to modify all of them. Diffs - itests/src/test/resources/testconfiguration.properties fc1f345 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 9ffa708 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java 02216de ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 0f06ec9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java 74614f3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java d699308 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27713/diff/ Testing --- Thanks, pengcheng xiong
Re: Review Request 27713: CBO: enable groupBy index
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27713/ --- (Updated Dec. 3, 2014, 7:40 p.m.) Review request for hive and John Pullokkaran. Changes --- remove white spaces Repository: hive-git Description --- Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. The basic problem is that for SEL1-SEL2-GRY-...-SEL3, the previous version only modify SEL2, which immediately precedes GRY. Now, with CBO, we have lots of SELs, e.g., SEL1. So, the solution is to modify all of them. Diffs (updated) - itests/src/test/resources/testconfiguration.properties fc1f345 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 9ffa708 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java 02216de ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 0f06ec9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java 74614f3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java d699308 ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/27713/diff/ Testing --- Thanks, pengcheng xiong
Remove lock on compilation stage
I mentioned this at the meetup tonight. With all the new work being done in the compilation phase I think that this global compiler lock might be more impactful. Of course it does not impact Hive CLI, but most of the users I know use HS2. https://issues.apache.org/jira/browse/HIVE-4239 Does anyone have interest in doing some parallel testing with the lock removed? Brock
RE: Apache Hive 1.0 ?
Hi, From more of an end user perspective, if we were to move to a 1.0 release then it should be a complete offering. Have we defined what this would include? What is our definition of complete documentation? In general, I would expect a 1.0 release to include: 1. A stable code base that is reasonable current (eg. Implemented on YARN). 2. A complete set of functionality that would enable a company to use Hive as an analytical /BI database.This would include a rather complete implementation of SQL (minus transaction processing). 3. A reliable install program/kit. 4. Documentation including: a. API specification b. User guide - to include deviations from ANSI standard SQL and any extensions c. Administration guidance including how to install, configure and administer Hive. d. Release Notes clearly detailing release inclusions and known issues (open Jiras), and capability with other Apache projects. Thank You, Follow me on @BigData73 - Bill Busch | SSA | Enterprise Information Solutions CWP m: 704.806.2485 | NASDAQ: PRFT | Perficient.com BI/DW | Advanced Analytics | Big Data | ECI| EPM | MDM -Original Message- From: Enis Söztutar [mailto:e...@apache.org] Sent: Wednesday, December 03, 2014 5:27 PM To: dev@hive.apache.org Subject: Re: Apache Hive 1.0 ? Hi, I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I think both HBase and Hive are past due for doing 1.0 releases. So I am a major +1 for Hive-1.0 (non-binding of course). The important thing for calling something 1.0 I think is the focus on user level API and compatibility issues. But still, you should think about future releases and for example when you can do a 1.x release versus 2.x release. We have started thinking about that some time ago, and we are adopting a semantic versioning proposal ( https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E) for this exact same reason. In Hive, things may be a bit different than HBase or Hadoop (since the major interface is SQL) but still I think you should consider the implications for all the APIs that Hive surfaces and for deployment, etc for a 1.0 discussion. For HBase, the official theme of the 1.0 release is (from my RC mail): The theme of (eventual) 1.0 release is to become a stable base for future 1.x series of releases. 1.0 release will aim to achieve at least the same level of stability of 0.98 releases without introducing too many new features. What I am getting at is that, in HBase, we opted for not introducing a lot of major features and branched relatively early to give more time to stabilize the branch. In the end what you want to deliver and market as 1.0 should be relatively stable in my opinion. Just my 2 cents from an outsider perspective. Enis On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com wrote: Would everyone just laugh if I suggested that a 1.0 release ought to include complete documentation? -- Lefty On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com wrote: The reasons for confusion in the Hadoop case were different. There were many branches, and new features were added in minor version releases, eg kerberos security was not there in 0.20.2, but it was added in 0.20.20x. Then you had other versions like 0.21, but the older 0.20.20x version was the one that was converted as 1.x. This confusion isn't there in hive. In case of hive, every 0.x release has been adding new features, and releases have been sequential. 0.x.y releases have been maintenance releases. 1.0 is a sequential release after 0.14, and it is a newer release than 0.14. I agree that the version in Hadoop created lot of confusion, but I don't see this as being the same. We could check in the user mailing list to see if they are going to be HUGELY confused by this. If it makes things better, we can also include the change to delete HiveServer1 in the new release. That is a safer change, which was mainly just deleting that old code. That would be a major difference from 0.14. (The docs have already been updated to say that 0.14 does not support 0.20, so I don't think we need that in 1.0). Looks like we have agreement that 1.0 versioning scheme is a great thing for hive. I don't think there is a strong reason to delay a 1.0 release by several months to the detriment of hive. On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: Major release means more functionality, while minor releases provides stability. Therefore, I'd think, 1.0, as a major release, should bring in something new to the user. If it's desirable to provide more stable release, then 0.14.1, 0.14.2, and so on are the right ones.
Re: Apache Hive 1.0 ?
I think 1.0 release in particular should be a relatively stable release, since we go from beta(?) stage of 0.x to 1.0. Otherwise, what prevents us from promoting 0.14 to 1.0? 0.14.1 is not done yet, so it would be great, we will have no 0.x.y releases, and 1.1 will become the first fix release, no confusion. On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: Major release means more functionality, while minor releases provides stability. Therefore, I'd think, 1.0, as a major release, should bring in something new to the user. If it's desirable to provide more stable release, then 0.14.1, 0.14.2, and so on are the right ones. In my opinion, we should avoid doing anti-pattern by introducing major release like a maintenance release and creating confusions among users. In one word, major release is NOT equal to major confusion. --Xuefu On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin ser...@hortonworks.com wrote: I think it's better to do 1.0 release off a maintenance release, since that is more stable. Trunk is moving fast. HBase uses odd release numbers for this purpose, where 0.95, 97, 99 etc. are dev releases and 0.96, 0.98, 1.0 etc. are public; that works well for baking, but since we don't have that seems like 14.0 would be a good place to bake. 15.0 with bunch of new bugs that we are busy introducing may not be as good for 1.0 IMHO... On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com wrote: Hi Thejas, Thank you very much for your proposal! Hadoop did something similar renaming branches to branch-1 and branch-2. At the time, although I was very much in favor of the new release numbers, I thought it could have been handled better. Renaming release branches ended up being very confusing for users and I had a ton of conversations with users about how releases were related. In this situation, I feel the situation is similar, we'll release 1.0 which is really just the second maintainence release of the 0.14 branch. Thus it's 1.0 but really it's just 0.14 + some fixes. I feel this will again be confusing for users. For this important change, I think we should use a new release vehicle. Thus, I'd suggest we do the rename in trunk, soon, and then the next release of Hive will be 1.0. Cheers, Brock On Tue, Dec 2, 2014 at 10:07 AM, Thejas Nair the...@hortonworks.com wrote: Apache Hive is the de facto SQL query engine in the hadoop ecosystem. I believe it is also the most widely used one as well. Hive is used in production in large number of enterprises. However, this 0.x.y versioning that we have been using for Hive obscures this status of Hive. I propose creating a 1.0 release out of the 0.14 branch of Hive. We already have some bug fixes for 0.14 release that have been added to the branch and a maintenance release is due. Having it out of this maintenance branch would create a better first 1.0 version, and we would be able to do it soon. What would have been 0.15 version would then become 1.1 version . Thoughts ? Thanks, Thejas -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from
Re: Apache Hive 1.0 ?
I'd like to see HiveCLI, HiveServer, and the original JDBC driver deprecated and purged from the codebase before the 1.0 release. This topic probably needs its own thread, but I thought I should mention it here. Thanks. - Carl On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote: Hi, I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I think both HBase and Hive are past due for doing 1.0 releases. So I am a major +1 for Hive-1.0 (non-binding of course). The important thing for calling something 1.0 I think is the focus on user level API and compatibility issues. But still, you should think about future releases and for example when you can do a 1.x release versus 2.x release. We have started thinking about that some time ago, and we are adopting a semantic versioning proposal ( https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E ) for this exact same reason. In Hive, things may be a bit different than HBase or Hadoop (since the major interface is SQL) but still I think you should consider the implications for all the APIs that Hive surfaces and for deployment, etc for a 1.0 discussion. For HBase, the official theme of the 1.0 release is (from my RC mail): The theme of (eventual) 1.0 release is to become a stable base for future 1.x series of releases. 1.0 release will aim to achieve at least the same level of stability of 0.98 releases without introducing too many new features. What I am getting at is that, in HBase, we opted for not introducing a lot of major features and branched relatively early to give more time to stabilize the branch. In the end what you want to deliver and market as 1.0 should be relatively stable in my opinion. Just my 2 cents from an outsider perspective. Enis On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com wrote: Would everyone just laugh if I suggested that a 1.0 release ought to include complete documentation? -- Lefty On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com wrote: The reasons for confusion in the Hadoop case were different. There were many branches, and new features were added in minor version releases, eg kerberos security was not there in 0.20.2, but it was added in 0.20.20x. Then you had other versions like 0.21, but the older 0.20.20x version was the one that was converted as 1.x. This confusion isn't there in hive. In case of hive, every 0.x release has been adding new features, and releases have been sequential. 0.x.y releases have been maintenance releases. 1.0 is a sequential release after 0.14, and it is a newer release than 0.14. I agree that the version in Hadoop created lot of confusion, but I don't see this as being the same. We could check in the user mailing list to see if they are going to be HUGELY confused by this. If it makes things better, we can also include the change to delete HiveServer1 in the new release. That is a safer change, which was mainly just deleting that old code. That would be a major difference from 0.14. (The docs have already been updated to say that 0.14 does not support 0.20, so I don't think we need that in 1.0). Looks like we have agreement that 1.0 versioning scheme is a great thing for hive. I don't think there is a strong reason to delay a 1.0 release by several months to the detriment of hive. On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: Major release means more functionality, while minor releases provides stability. Therefore, I'd think, 1.0, as a major release, should bring in something new to the user. If it's desirable to provide more stable release, then 0.14.1, 0.14.2, and so on are the right ones. In my opinion, we should avoid doing anti-pattern by introducing major release like a maintenance release and creating confusions among users. In one word, major release is NOT equal to major confusion. --Xuefu On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin ser...@hortonworks.com wrote: I think it's better to do 1.0 release off a maintenance release, since that is more stable. Trunk is moving fast. HBase uses odd release numbers for this purpose, where 0.95, 97, 99 etc. are dev releases and 0.96, 0.98, 1.0 etc. are public; that works well for baking, but since we don't have that seems like 14.0 would be a good place to bake. 15.0 with bunch of new bugs that we are busy introducing may not be as good for 1.0 IMHO... On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com wrote: Hi Thejas, Thank you very much for your proposal! Hadoop did something similar renaming branches to branch-1 and branch-2. At the time, although I was
Review Request 28632: Turn CBO on
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28632/ --- Review request for hive and Sergey Shelukhin. Bugs: HIVE-8395 https://issues.apache.org/jira/browse/HIVE-8395 Repository: hive-git Description --- Turn CBO on Diffs - accumulo-handler/src/test/results/positive/accumulo_predicate_pushdown.q.out 309f2f7 accumulo-handler/src/test/results/positive/accumulo_queries.q.out 8d7f19c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2e2bf5a contrib/src/test/results/clientpositive/dboutput.q.out 554ca02 contrib/src/test/results/clientpositive/udaf_example_avg.q.out d300b0f contrib/src/test/results/clientpositive/udaf_example_group_concat.q.out 762461b contrib/src/test/results/clientpositive/udaf_example_max.q.out 82aeca7 contrib/src/test/results/clientpositive/udaf_example_max_n.q.out db95fcb contrib/src/test/results/clientpositive/udaf_example_min.q.out b62ff39 contrib/src/test/results/clientpositive/udaf_example_min_n.q.out 1344186 hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 4e4364e hbase-handler/src/test/results/positive/hbase_queries.q.out 0b4ed37 hbase-handler/src/test/results/positive/hbase_timestamp.q.out f70d371 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 23a1b97 ql/src/test/queries/clientnegative/join_nonexistent_part.q b4a4757 ql/src/test/queries/clientpositive/ambiguous_col.q 5ccd2c8 ql/src/test/queries/clientpositive/annotate_stats_groupby2.q 6e65577 ql/src/test/queries/clientpositive/constantPropagateForSubQuery.q 149a290 ql/src/test/queries/clientpositive/filter_join_breaktask2.q 7f4258f ql/src/test/queries/clientpositive/join_vc.q bbf3e85 ql/src/test/queries/clientpositive/mrr.q 9f068cc ql/src/test/queries/clientpositive/optimize_nullscan.q f3b896b ql/src/test/queries/clientpositive/ppd_gby_join.q 82f358b ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q fbfcbe1 ql/src/test/queries/clientpositive/subquery_exists_explain_rewrite.q 60dfdaf ql/src/test/queries/clientpositive/subquery_in_explain_rewrite.q 1d1639d ql/src/test/results/clientnegative/join_nonexistent_part.q.out a924895 ql/src/test/results/clientnegative/ptf_negative_InvalidValueBoundary.q.out 6ad9905 ql/src/test/results/clientpositive/allcolref_in_udf.q.out 969f64b ql/src/test/results/clientpositive/alter_partition_coltype.q.out f71fa05 ql/src/test/results/clientpositive/ambiguous_col.q.out d583162 ql/src/test/results/clientpositive/annotate_stats_filter.q.out 70df189 ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 2640ff7 ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out 2f85c92 ql/src/test/results/clientpositive/annotate_stats_join.q.out ee46003 ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out 70c9e1d ql/src/test/results/clientpositive/annotate_stats_limit.q.out b61a597 ql/src/test/results/clientpositive/annotate_stats_part.q.out fb3c17b ql/src/test/results/clientpositive/annotate_stats_select.q.out 8fbb208 ql/src/test/results/clientpositive/annotate_stats_table.q.out a74d85c ql/src/test/results/clientpositive/annotate_stats_union.q.out 919015a ql/src/test/results/clientpositive/ansi_sql_arithmetic.q.out 4917ac0 ql/src/test/results/clientpositive/authorization_explain.q.out 3d97227 ql/src/test/results/clientpositive/auto_join1.q.out 8096a94 ql/src/test/results/clientpositive/auto_join10.q.out 7fb3070 ql/src/test/results/clientpositive/auto_join11.q.out 98c8285 ql/src/test/results/clientpositive/auto_join12.q.out f116e23 ql/src/test/results/clientpositive/auto_join13.q.out 3396a0c ql/src/test/results/clientpositive/auto_join14.q.out 55c9b5d ql/src/test/results/clientpositive/auto_join16.q.out bd5b378 ql/src/test/results/clientpositive/auto_join17.q.out 0fa7aa9 ql/src/test/results/clientpositive/auto_join18.q.out 2303f18 ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out ee5a32c ql/src/test/results/clientpositive/auto_join19.q.out 4c2e26e ql/src/test/results/clientpositive/auto_join2.q.out 11d57e9 ql/src/test/results/clientpositive/auto_join22.q.out c4a0084 ql/src/test/results/clientpositive/auto_join25.q.out 08cbe42 ql/src/test/results/clientpositive/auto_join26.q.out a40615f ql/src/test/results/clientpositive/auto_join27.q.out db348b7 ql/src/test/results/clientpositive/auto_join3.q.out 0bfb27a ql/src/test/results/clientpositive/auto_join33.q.out e5a7c52 ql/src/test/results/clientpositive/auto_join4.q.out 6492a64 ql/src/test/results/clientpositive/auto_join5.q.out 1073302 ql/src/test/results/clientpositive/auto_join6.q.out 88b7770 ql/src/test/results/clientpositive/auto_join7.q.out 5de5640
Hive-0.14 - Build # 760 - Failure
Changes for Build #760 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #760) Status: Failure Check console output at https://builds.apache.org/job/Hive-0.14/760/ to view the results.
Review Request 28632: Turn CBO on
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28632/ --- Review request for hive and Sergey Shelukhin. Bugs: HIVE-8395 https://issues.apache.org/jira/browse/HIVE-8395 Repository: hive-git Description --- Turn CBO on Diffs - accumulo-handler/src/test/results/positive/accumulo_predicate_pushdown.q.out 309f2f7 accumulo-handler/src/test/results/positive/accumulo_queries.q.out 8d7f19c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2e2bf5a contrib/src/test/results/clientpositive/dboutput.q.out 554ca02 contrib/src/test/results/clientpositive/udaf_example_avg.q.out d300b0f contrib/src/test/results/clientpositive/udaf_example_group_concat.q.out 762461b contrib/src/test/results/clientpositive/udaf_example_max.q.out 82aeca7 contrib/src/test/results/clientpositive/udaf_example_max_n.q.out db95fcb contrib/src/test/results/clientpositive/udaf_example_min.q.out b62ff39 contrib/src/test/results/clientpositive/udaf_example_min_n.q.out 1344186 hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 4e4364e hbase-handler/src/test/results/positive/hbase_queries.q.out 0b4ed37 hbase-handler/src/test/results/positive/hbase_timestamp.q.out f70d371 itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 23a1b97 ql/src/test/queries/clientnegative/join_nonexistent_part.q b4a4757 ql/src/test/queries/clientpositive/ambiguous_col.q 5ccd2c8 ql/src/test/queries/clientpositive/annotate_stats_groupby2.q 6e65577 ql/src/test/queries/clientpositive/constantPropagateForSubQuery.q 149a290 ql/src/test/queries/clientpositive/filter_join_breaktask2.q 7f4258f ql/src/test/queries/clientpositive/join_vc.q bbf3e85 ql/src/test/queries/clientpositive/mrr.q 9f068cc ql/src/test/queries/clientpositive/optimize_nullscan.q f3b896b ql/src/test/queries/clientpositive/ppd_gby_join.q 82f358b ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q fbfcbe1 ql/src/test/queries/clientpositive/subquery_exists_explain_rewrite.q 60dfdaf ql/src/test/queries/clientpositive/subquery_in_explain_rewrite.q 1d1639d ql/src/test/results/clientnegative/join_nonexistent_part.q.out a924895 ql/src/test/results/clientnegative/ptf_negative_InvalidValueBoundary.q.out 6ad9905 ql/src/test/results/clientpositive/allcolref_in_udf.q.out 969f64b ql/src/test/results/clientpositive/alter_partition_coltype.q.out f71fa05 ql/src/test/results/clientpositive/ambiguous_col.q.out d583162 ql/src/test/results/clientpositive/annotate_stats_filter.q.out 70df189 ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 2640ff7 ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out 2f85c92 ql/src/test/results/clientpositive/annotate_stats_join.q.out ee46003 ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out 70c9e1d ql/src/test/results/clientpositive/annotate_stats_limit.q.out b61a597 ql/src/test/results/clientpositive/annotate_stats_part.q.out fb3c17b ql/src/test/results/clientpositive/annotate_stats_select.q.out 8fbb208 ql/src/test/results/clientpositive/annotate_stats_table.q.out a74d85c ql/src/test/results/clientpositive/annotate_stats_union.q.out 919015a ql/src/test/results/clientpositive/ansi_sql_arithmetic.q.out 4917ac0 ql/src/test/results/clientpositive/authorization_explain.q.out 3d97227 ql/src/test/results/clientpositive/auto_join1.q.out 8096a94 ql/src/test/results/clientpositive/auto_join10.q.out 7fb3070 ql/src/test/results/clientpositive/auto_join11.q.out 98c8285 ql/src/test/results/clientpositive/auto_join12.q.out f116e23 ql/src/test/results/clientpositive/auto_join13.q.out 3396a0c ql/src/test/results/clientpositive/auto_join14.q.out 55c9b5d ql/src/test/results/clientpositive/auto_join16.q.out bd5b378 ql/src/test/results/clientpositive/auto_join17.q.out 0fa7aa9 ql/src/test/results/clientpositive/auto_join18.q.out 2303f18 ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out ee5a32c ql/src/test/results/clientpositive/auto_join19.q.out 4c2e26e ql/src/test/results/clientpositive/auto_join2.q.out 11d57e9 ql/src/test/results/clientpositive/auto_join22.q.out c4a0084 ql/src/test/results/clientpositive/auto_join25.q.out 08cbe42 ql/src/test/results/clientpositive/auto_join26.q.out a40615f ql/src/test/results/clientpositive/auto_join27.q.out db348b7 ql/src/test/results/clientpositive/auto_join3.q.out 0bfb27a ql/src/test/results/clientpositive/auto_join33.q.out e5a7c52 ql/src/test/results/clientpositive/auto_join4.q.out 6492a64 ql/src/test/results/clientpositive/auto_join5.q.out 1073302 ql/src/test/results/clientpositive/auto_join6.q.out 88b7770 ql/src/test/results/clientpositive/auto_join7.q.out 5de5640
dev-ow...@hive.apache.org.
Can Hive handles Unstructured data o it handles only structured data? Please confirm Thanks Mohan
Re: Apache Hive 1.0 ?
Enis, What you said about backward compatibility makes sense. Since we are planning to remove HiveServer1 support, it makes sense to do that in 1.0. Ending Java 6 support is also something we have been discussing in the mailing list. We can document Java 7 as minimum requirement for 1.0 . On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote: Hi, I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I think both HBase and Hive are past due for doing 1.0 releases. So I am a major +1 for Hive-1.0 (non-binding of course). The important thing for calling something 1.0 I think is the focus on user level API and compatibility issues. But still, you should think about future releases and for example when you can do a 1.x release versus 2.x release. We have started thinking about that some time ago, and we are adopting a semantic versioning proposal ( https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E) for this exact same reason. In Hive, things may be a bit different than HBase or Hadoop (since the major interface is SQL) but still I think you should consider the implications for all the APIs that Hive surfaces and for deployment, etc for a 1.0 discussion. For HBase, the official theme of the 1.0 release is (from my RC mail): The theme of (eventual) 1.0 release is to become a stable base for future 1.x series of releases. 1.0 release will aim to achieve at least the same level of stability of 0.98 releases without introducing too many new features. What I am getting at is that, in HBase, we opted for not introducing a lot of major features and branched relatively early to give more time to stabilize the branch. In the end what you want to deliver and market as 1.0 should be relatively stable in my opinion. Just my 2 cents from an outsider perspective. Enis On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com wrote: Would everyone just laugh if I suggested that a 1.0 release ought to include complete documentation? -- Lefty On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com wrote: The reasons for confusion in the Hadoop case were different. There were many branches, and new features were added in minor version releases, eg kerberos security was not there in 0.20.2, but it was added in 0.20.20x. Then you had other versions like 0.21, but the older 0.20.20x version was the one that was converted as 1.x. This confusion isn't there in hive. In case of hive, every 0.x release has been adding new features, and releases have been sequential. 0.x.y releases have been maintenance releases. 1.0 is a sequential release after 0.14, and it is a newer release than 0.14. I agree that the version in Hadoop created lot of confusion, but I don't see this as being the same. We could check in the user mailing list to see if they are going to be HUGELY confused by this. If it makes things better, we can also include the change to delete HiveServer1 in the new release. That is a safer change, which was mainly just deleting that old code. That would be a major difference from 0.14. (The docs have already been updated to say that 0.14 does not support 0.20, so I don't think we need that in 1.0). Looks like we have agreement that 1.0 versioning scheme is a great thing for hive. I don't think there is a strong reason to delay a 1.0 release by several months to the detriment of hive. On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: Major release means more functionality, while minor releases provides stability. Therefore, I'd think, 1.0, as a major release, should bring in something new to the user. If it's desirable to provide more stable release, then 0.14.1, 0.14.2, and so on are the right ones. In my opinion, we should avoid doing anti-pattern by introducing major release like a maintenance release and creating confusions among users. In one word, major release is NOT equal to major confusion. --Xuefu On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin ser...@hortonworks.com wrote: I think it's better to do 1.0 release off a maintenance release, since that is more stable. Trunk is moving fast. HBase uses odd release numbers for this purpose, where 0.95, 97, 99 etc. are dev releases and 0.96, 0.98, 1.0 etc. are public; that works well for baking, but since we don't have that seems like 14.0 would be a good place to bake. 15.0 with bunch of new bugs that we are busy introducing may not be as good for 1.0 IMHO... On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com wrote: Hi Thejas, Thank you very much for your proposal! Hadoop did something similar renaming branches to branch-1 and branch-2. At the time, although I was very much in favor of the
Re: SVN server hanging
FYI: I logged an INFRA JIRA: https://issues.apache.org/jira/browse/INFRA-8782 Thanks, Xuefu On Wed, Dec 3, 2014 at 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote: It seems Hive svn server is hanging. Does anyone have means to restart it? Thanks, Xuefu
Re: SVN server hanging
Note that this means all pre-commit builds will fail... On Wed, Dec 3, 2014 at 10:13 AM, Brock Noland br...@cloudera.com wrote: https://blogs.apache.org/infra/entry/subversion_master_undergoing_emergency_maintenance On Dec 3, 2014 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote: It seems Hive svn server is hanging. Does anyone have means to restart it? Thanks, Xuefu
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234264#comment-14234264 ] Xuefu Zhang commented on HIVE-7292: --- [~libing], I assume you assigned this JIRA to yourself by mistake. However, let me know if you plan to work on this. Thanks. Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Bing Li Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234275#comment-14234275 ] Xuefu Zhang commented on HIVE-8991: --- Hi [~lirui], many thanks for the new findings. I think the patch here is good to be checked in to fix the the test. However, if you and [~vanzin] find additional improvement needed for library loading mechanism, please create a new JIRA and linked with this one. Thanks. Fix custom_input_output_format [Spark Branch] - Key: HIVE-8991 URL: https://issues.apache.org/jira/browse/HIVE-8991 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8991.1-spark.patch After HIVE-8836, {{custom_input_output_format}} fails because of missing hive-it-util in remote driver's class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234280#comment-14234280 ] Xuefu Zhang commented on HIVE-8783: --- Patch looks good. I had a couple of minor comments on review board. Create some tests that use Spark counter for stats collection [Spark Branch] Key: HIVE-8783 URL: https://issues.apache.org/jira/browse/HIVE-8783 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-8783.1-spark.patch Currently when .q tests are run with Spark, the default stats collection is fs. We need to have some tests that use Spark counter for stats collection to enhance coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234285#comment-14234285 ] Xuefu Zhang commented on HIVE-9016: --- [~chengxiang li], Precommit test needs to get source from svn. Currently svn server is down. Thus, there is no precommit test until it's fixed. SparkCounter display name is not set correctly[Spark Branch] Key: HIVE-9016 URL: https://issues.apache.org/jira/browse/HIVE-9016 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch SparkCounter displayName is set with SparkCounterGroup displayName, we should not do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28699: HIVE-8783 Create some tests that use Spark counter for stats collection [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28699/#review63852 --- ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java https://reviews.apache.org/r/28699/#comment106158 Name this variable as partitions might be a little confusing. Same as below partition. Maybe we can call them as partitionSpecs and partitionSpec respectively. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java https://reviews.apache.org/r/28699/#comment106159 list seems too general to be meaningful. - Xuefu Zhang On Dec. 4, 2014, 9:22 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28699/ --- (Updated Dec. 4, 2014, 9:22 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8783 https://issues.apache.org/jira/browse/HIVE-8783 Repository: hive-git Description --- Hive already has stats_counter.q and stats_counter_partitioned.q for unit test of table statistic collection on Counter. stats_counter.q has enabled yet, I enable stats_counter_partitioned.q in this patch. Diffs - itests/src/test/resources/testconfiguration.properties 09c667e ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 ql/src/test/results/clientpositive/spark/stats_counter_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28699/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234332#comment-14234332 ] Xuefu Zhang commented on HIVE-9016: --- [~chengxiang li], could you also remove ShimLoader.getHadoopShims().getCounterGroupName() and related methods, since they are not used any more? SparkCounter display name is not set correctly[Spark Branch] Key: HIVE-9016 URL: https://issues.apache.org/jira/browse/HIVE-9016 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch SparkCounter displayName is set with SparkCounterGroup displayName, we should not do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Hive-0.14 - Build # 761 - Still Failing
Changes for Build #760 Changes for Build #761 No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #761) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-0.14/761/ to view the results.
Re: dev-ow...@hive.apache.org.
Define unstructured. Hive can handle data such Avro or JSON, which I would call self-structured. I believe the SerDes for these types can even set the schema for the table or partition you are reading based on the data in the file. Alan. Mohan Krishna mailto:mohan.25fe...@gmail.com December 3, 2014 at 17:01 Can Hive handles Unstructured data o it handles only structured data? Please confirm Thanks Mohan -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: dev-ow...@hive.apache.org.
Thanks alan for the answer/ So, can i conclude that Hive handles unstructured data? On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates ga...@hortonworks.com wrote: Define unstructured. Hive can handle data such Avro or JSON, which I would call self-structured. I believe the SerDes for these types can even set the schema for the table or partition you are reading based on the data in the file. Alan. Mohan Krishna mohan.25fe...@gmail.com December 3, 2014 at 17:01 Can Hive handles Unstructured data o it handles only structured data? Please confirm Thanks Mohan -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8974: -- Attachment: HIVE-8974.04.patch Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.03.patch, HIVE-8974.04.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Status: Open (was: Patch Available) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.11.patch, HIVE-8774.12.patch, HIVE-8774.13.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8774) CBO: enable groupBy index
[ https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-8774: -- Status: Patch Available (was: Open) CBO: enable groupBy index - Key: HIVE-8774 URL: https://issues.apache.org/jira/browse/HIVE-8774 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, HIVE-8774.11.patch, HIVE-8774.12.patch, HIVE-8774.13.patch, HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch Right now, even when groupby index is build, CBO is not able to use it. In this patch, we are trying to make it use groupby index that we build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: dev-ow...@hive.apache.org.
Mohan, It will handle it, but it is probably (depending on your use case) not optimal. Hive's sweat spot is structured data. Bill Thank You, Follow me on @BigData73 - Bill Busch | SSA | Enterprise Information Solutions CWP m: 704.806.2485 | NASDAQ: PRFT | Perficient.com BI/DW | Advanced Analytics | Big Data | ECI| EPM | MDM -Original Message- From: Mohan Krishna [mailto:mohan.25fe...@gmail.com] Sent: Thursday, December 04, 2014 1:09 PM To: dev@hive.apache.org Subject: Re: dev-ow...@hive.apache.org. Thanks alan for the answer/ So, can i conclude that Hive handles unstructured data? On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates ga...@hortonworks.com wrote: Define unstructured. Hive can handle data such Avro or JSON, which I would call self-structured. I believe the SerDes for these types can even set the schema for the table or partition you are reading based on the data in the file. Alan. Mohan Krishna mohan.25fe...@gmail.com December 3, 2014 at 17:01 Can Hive handles Unstructured data o it handles only structured data? Please confirm Thanks Mohan -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234532#comment-14234532 ] Nick Dimiduk commented on HIVE-8809: Using activeByDefault causes issues -- if you specify some other unrelated profiles (thrift generation, for instance), you end up disabling your default profile. Better to use a property flag. Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: dev-ow...@hive.apache.org.
Thankyou Bill Now it is clear for me, Thanks On Fri, Dec 5, 2014 at 12:54 AM, Bill Busch bill.bu...@perficient.com wrote: Mohan, It will handle it, but it is probably (depending on your use case) not optimal. Hive's sweat spot is structured data. Bill Thank You, Follow me on @BigData73 - Bill Busch | SSA | Enterprise Information Solutions CWP m: 704.806.2485 | NASDAQ: PRFT | Perficient.com BI/DW | Advanced Analytics | Big Data | ECI| EPM | MDM -Original Message- From: Mohan Krishna [mailto:mohan.25fe...@gmail.com] Sent: Thursday, December 04, 2014 1:09 PM To: dev@hive.apache.org Subject: Re: dev-ow...@hive.apache.org. Thanks alan for the answer/ So, can i conclude that Hive handles unstructured data? On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates ga...@hortonworks.com wrote: Define unstructured. Hive can handle data such Avro or JSON, which I would call self-structured. I believe the SerDes for these types can even set the schema for the table or partition you are reading based on the data in the file. Alan. Mohan Krishna mohan.25fe...@gmail.com December 3, 2014 at 17:01 Can Hive handles Unstructured data o it handles only structured data? Please confirm Thanks Mohan -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-8809: --- Attachment: HIVE-8809.01.patch Over on HBase, we have the property hadoop.profile and check it's value. See also http://java.dzone.com/articles/maven-profile-best-practices Give this patch a spin. For hadoop1 build, add {{-Dhadoop.profile=1}}. Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234593#comment-14234593 ] Sergey Shelukhin commented on HIVE-9013: Can you add some message if it's restricted for single-property case? Otherwise +1 Hive set command exposes metastore db password -- Key: HIVE-9013 URL: https://issues.apache.org/jira/browse/HIVE-9013 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Binglin Chang Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch When auth is enabled, we still need set command to set some variables(e.g. mapreduce.job.queuename), but set command alone also list all information(including vars in restrict list), this exposes like javax.jdo.option.ConnectionPassword I think conf var in the restrict list should also excluded from dump vars command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables
[ https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234642#comment-14234642 ] Vikram Dixit K commented on HIVE-8870: -- +1 for 0.14 errors when selecting a struct field within an array from ORC based tables -- Key: HIVE-8870 URL: https://issues.apache.org/jira/browse/HIVE-8870 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez) Reporter: Michael Haeusler Assignee: Sergio Peña Attachments: HIVE-8870.3.patch When using ORC as storage for a table, we get errors on selecting a struct field within an array. These errors do not appear with default format. {code:sql} CREATE TABLE `foobar_orc`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string) STORED AS ORC; {code} When selecting from this _empty_ table, we get a direct NPE within the Hive CLI: {code:sql} SELECT elements.elementId FROM foobar_orc; -- FAILED: RuntimeException java.lang.NullPointerException {code} A more real-world query produces a RuntimeException / NullPointerException in the mapper: {code:sql} SELECT uid, element.elementId FROM foobar_orc LATERAL VIEW EXPLODE(elements) e AS element; Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) [...] Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) [...] FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} Both queries run fine on a non-orc table: {code:sql} CREATE TABLE `foobar`( `uid` bigint, `elements` arraystructelementid:bigint,foo:structbar:string); SELECT elements.elementId FROM foobar; -- OK -- Time taken: 0.225 seconds SELECT uid, element.elementId FROM foobar LATERAL VIEW EXPLODE(elements) e AS element; -- Total MapReduce CPU Time Spent: 1 seconds 920 msec -- OK -- Time taken: 25.905 seconds {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default
[ https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234694#comment-14234694 ] Brock Noland commented on HIVE-8809: I had some pretty ugly issues with the system property approach (in HBase's pom) when we used ivy and ant. However, now that we are on maven, perhaps it won't be an issue. Activate maven profile hadoop-2 by default -- Key: HIVE-8809 URL: https://issues.apache.org/jira/browse/HIVE-8809 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Minor Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, dep_without_hadoop_2.txt For every maven command profile needs to be specified explicitly. It will be better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 profile. With this change both the following commands will be equivalent {code} mvn clean install -DskipTests mvn clean install -DskipTests -Phadoop-2 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Attachment: (was: HIVE-9001.1.patch) Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy
[ https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9001: Attachment: HIVE-9001.1.patch [~sushanth] Made the space adjustments and overwrote the previous patch since this is a very minor change. Thanks Hari Ship with log4j.properties file that has a reliable time based rolling policy - Key: HIVE-9001 URL: https://issues.apache.org/jira/browse/HIVE-9001 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9001.1.patch The hive log gets locked by the hive process and cannot be rolled in windows OS. Install Hive in Windows, start hive, try and rename hive log while Hive is running. Wait for log4j tries to rename it and it will throw the same error as it is locked by the process. The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 should be integrated to Hive for a reliable rollover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234767#comment-14234767 ] Szehon Ho commented on HIVE-9007: - I'll leave this JIRA for now. One observation to note here is that it is revealed in ppd_join4.q test, if you add set hive.auto.convert.join=true for the test. The plan has too many HashTableSinks. {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Spark A masked pattern was here Vertices: Map 1 Map Operator Tree: TableScan alias: test_tbl Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Filter Operator predicate: ((id is not null and (name = 'c')) and (id = 'a')) (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Spark HashTable Sink Operator condition expressions: 0 1 keys: 0 'a' (type: string) 1 'a' (type: string) Local Work: Map Reduce Local Work Map 2 Map Operator Tree: TableScan alias: t3 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Filter Operator predicate: (id = 'a') (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Spark HashTable Sink Operator condition expressions: 0 1 keys: 0 'a' (type: string) 1 'a' (type: string) Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} It could be related to this issue. I'll come back to this JIRA at a later point, or others who are free can take it. Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch] -- Key: HIVE-9007 URL: https://issues.apache.org/jira/browse/HIVE-9007 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, which may cause map join in spark branch to generate wrong plan. Currently, the map join conversion in spark branch first goes through a method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, removes RS associated with big table, and keep RSs for all small tables. Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of the mapjoin op with HTS (note it doesn't check whether the RS belongs to small table or big table.) The issue arises, when IdentityProjectRemover comes into play, which may result into a situation that a operator tree has two consecutive RSs. Imaging the following example: {noformat} Join MapJoin / \/ \ RS RS --- RS RS / \ / \ TS RS TS TS (big table) \ (small table) TS {noformat} In this case, all parents of the mapjoin op will be RS, even the branch for big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, which is obviously incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong Liu updated HIVE-8966: - Status: Open (was: Patch Available) Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8966.patch hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234769#comment-14234769 ] Jihong Liu commented on HIVE-8966: -- I think we may have to withdraw this patch for now. It looks like currently hive must not support doing compaction and loading in the same time for a partition. Without this patch, if loading for a partition is not completely finished, compaction will always fail, so nothing happen. After apply this patch, compaction will go through and finish. However we may loss data! I did a test. Data could be lost if we do compaction meanwhile the loading is not finished yet. But if keep the current version, it must be a limitation for hive. If streaming load to a partition for a long period, performance will be affected if cannot do compaction on it. For completely solve this issue, my initial thinking is that the delta files with open transaction should not be compacted. Currently they must be inlcuded, and it is probably the reason for data lost. But other closed delta files should be able to compact. So we can do compaction and loading in the same time. Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8966.patch hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8974: -- Attachment: HIVE-8974.04.patch Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.03.patch, HIVE-8974.04.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-8974: -- Attachment: (was: HIVE-8974.04.patch) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Affects Versions: 0.15.0 Reporter: Julian Hyde Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.03.patch, HIVE-8974.04.patch, HIVE-8974.patch CLEAR LIBRARY CACHE Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8866) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns
[ https://issues.apache.org/jira/browse/HIVE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8866: --- Status: In Progress (was: Patch Available) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns Key: HIVE-8866 URL: https://issues.apache.org/jira/browse/HIVE-8866 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8866.01.patch, HIVE-8866.02.patch Vectorization assumes partitions are of same number of columns, and takes upon # of columns on first read. consequent addPartitionColsToBatch throws ArrayIndexOutOfboundsException if the # columns is bigger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Apache Hive 1.0 ?
HiveServer, and the original JDBC driver have already been purged in trunk. The HiveServer1 docs have been asking users to use HiveServer2 for a long time. The case with Hive CLI is different. We never marked that as deprecated or asked users to use beeline instead. Beeline had been lacking in some features until recently. We just added some capabilities to beeline such has progress/log information support. We need to discuss deprecating that, deprecate it and wait for some time (at least a year or so considering how widely it is used), before we can remove it. I think that is more like a candidate for a 2.0 . Thanks, Thejas On Wed, Dec 3, 2014 at 3:43 PM, Carl Steinbach cwsteinb...@gmail.com wrote: I'd like to see HiveCLI, HiveServer, and the original JDBC driver deprecated and purged from the codebase before the 1.0 release. This topic probably needs its own thread, but I thought I should mention it here. Thanks. - Carl On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote: Hi, I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I think both HBase and Hive are past due for doing 1.0 releases. So I am a major +1 for Hive-1.0 (non-binding of course). The important thing for calling something 1.0 I think is the focus on user level API and compatibility issues. But still, you should think about future releases and for example when you can do a 1.x release versus 2.x release. We have started thinking about that some time ago, and we are adopting a semantic versioning proposal ( https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E ) for this exact same reason. In Hive, things may be a bit different than HBase or Hadoop (since the major interface is SQL) but still I think you should consider the implications for all the APIs that Hive surfaces and for deployment, etc for a 1.0 discussion. For HBase, the official theme of the 1.0 release is (from my RC mail): The theme of (eventual) 1.0 release is to become a stable base for future 1.x series of releases. 1.0 release will aim to achieve at least the same level of stability of 0.98 releases without introducing too many new features. What I am getting at is that, in HBase, we opted for not introducing a lot of major features and branched relatively early to give more time to stabilize the branch. In the end what you want to deliver and market as 1.0 should be relatively stable in my opinion. Just my 2 cents from an outsider perspective. Enis On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com wrote: Would everyone just laugh if I suggested that a 1.0 release ought to include complete documentation? -- Lefty On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com wrote: The reasons for confusion in the Hadoop case were different. There were many branches, and new features were added in minor version releases, eg kerberos security was not there in 0.20.2, but it was added in 0.20.20x. Then you had other versions like 0.21, but the older 0.20.20x version was the one that was converted as 1.x. This confusion isn't there in hive. In case of hive, every 0.x release has been adding new features, and releases have been sequential. 0.x.y releases have been maintenance releases. 1.0 is a sequential release after 0.14, and it is a newer release than 0.14. I agree that the version in Hadoop created lot of confusion, but I don't see this as being the same. We could check in the user mailing list to see if they are going to be HUGELY confused by this. If it makes things better, we can also include the change to delete HiveServer1 in the new release. That is a safer change, which was mainly just deleting that old code. That would be a major difference from 0.14. (The docs have already been updated to say that 0.14 does not support 0.20, so I don't think we need that in 1.0). Looks like we have agreement that 1.0 versioning scheme is a great thing for hive. I don't think there is a strong reason to delay a 1.0 release by several months to the detriment of hive. On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote: Major release means more functionality, while minor releases provides stability. Therefore, I'd think, 1.0, as a major release, should bring in something new to the user. If it's desirable to provide more stable release, then 0.14.1, 0.14.2, and so on are the right ones. In my opinion, we should avoid doing anti-pattern by introducing major release like a maintenance release and creating confusions among users. In one word, major release is NOT equal to major confusion. --Xuefu On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin ser...@hortonworks.com wrote: I think it's better to
[jira] [Updated] (HIVE-8866) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns
[ https://issues.apache.org/jira/browse/HIVE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8866: --- Attachment: HIVE-8866.03.patch Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns Key: HIVE-8866 URL: https://issues.apache.org/jira/browse/HIVE-8866 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8866.01.patch, HIVE-8866.02.patch, HIVE-8866.03.patch Vectorization assumes partitions are of same number of columns, and takes upon # of columns on first read. consequent addPartitionColsToBatch throws ArrayIndexOutOfboundsException if the # columns is bigger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8866) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns
[ https://issues.apache.org/jira/browse/HIVE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8866: --- Status: Patch Available (was: In Progress) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns Key: HIVE-8866 URL: https://issues.apache.org/jira/browse/HIVE-8866 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8866.01.patch, HIVE-8866.02.patch, HIVE-8866.03.patch Vectorization assumes partitions are of same number of columns, and takes upon # of columns on first read. consequent addPartitionColsToBatch throws ArrayIndexOutOfboundsException if the # columns is bigger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8638) Implement bucket map join optimization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-8638: -- Attachment: HIVE-8638.3-spark.patch Attached v3 that works only when bucket number matches. Implement bucket map join optimization [Spark Branch] - Key: HIVE-8638 URL: https://issues.apache.org/jira/browse/HIVE-8638 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8638.1-spark.patch, HIVE-8638.2-spark.patch, HIVE-8638.3-spark.patch In the hive-on-mr implementation, bucket map join optimization has to depend on the map join hint. While in the hive-on-tez implementation, a join can be automatically converted to bucket map join if certain conditions are met such as: 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON 2. all join tables are buckets and each small table's bucket number can be divided by big table's bucket number 3. bucket columns == join columns In the hive-on-spark implementation, it is ideal to have the bucket map join auto-convertion support. when all the required criteria are met, a join can be automatically converted to a bucket map join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6998) Select query can only support maximum 128 distinct expressions
[ https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234794#comment-14234794 ] Pengcheng Xiong commented on HIVE-6998: --- run with 129 and also 200 distinct expressions, no problem hive select count(distinct c0),count(distinct c1),count(distinct c2),count(distinct c3),count(distinct c4),count(distinct c5),count(distinct c6),count(distinct c7),count(distinct c8),count(distinct c9),count(distinct c10),count(distinct c11),count(distinct c12),count(distinct c13),count(distinct c14),count(distinct c15),count(distinct c16),count(distinct c17),count(distinct c18),count(distinct c19),count(distinct c20),count(distinct c21),count(distinct c22),count(distinct c23),count(distinct c24),count(distinct c25),count(distinct c26),count(distinct c27),count(distinct c28),count(distinct c29),count(distinct c30),count(distinct c31),count(distinct c32),count(distinct c33),count(distinct c34),count(distinct c35),count(distinct c36),count(distinct c37),count(distinct c38),count(distinct c39),count(distinct c40),count(distinct c41),count(distinct c42),count(distinct c43),count(distinct c44),count(distinct c45),count(distinct c46),count(distinct c47),count(distinct c48),count(distinct c49),count(distinct c50),count(distinct c51),count(distinct c52),count(distinct c53),count(distinct c54),count(distinct c55),count(distinct c56),count(distinct c57),count(distinct c58),count(distinct c59),count(distinct c60),count(distinct c61),count(distinct c62),count(distinct c63),count(distinct c64),count(distinct c65),count(distinct c66),count(distinct c67),count(distinct c68),count(distinct c69),count(distinct c70),count(distinct c71),count(distinct c72),count(distinct c73),count(distinct c74),count(distinct c75),count(distinct c76),count(distinct c77),count(distinct c78),count(distinct c79),count(distinct c80),count(distinct c81),count(distinct c82),count(distinct c83),count(distinct c84),count(distinct c85),count(distinct c86),count(distinct c87),count(distinct c88),count(distinct c89),count(distinct c90),count(distinct c91),count(distinct c92),count(distinct c93),count(distinct c94),count(distinct c95),count(distinct c96),count(distinct c97),count(distinct c98),count(distinct c99),count(distinct c100),count(distinct c101),count(distinct c102),count(distinct c103),count(distinct c104),count(distinct c105),count(distinct c106),count(distinct c107),count(distinct c108),count(distinct c109),count(distinct c110),count(distinct c111),count(distinct c112),count(distinct c113),count(distinct c114),count(distinct c115),count(distinct c116),count(distinct c117),count(distinct c118),count(distinct c119),count(distinct c120),count(distinct c121),count(distinct c122),count(distinct c123),count(distinct c124),count(distinct c125),count(distinct c126),count(distinct c127),count(distinct c128),count(distinct c129),count(distinct c130),count(distinct c131),count(distinct c132),count(distinct c133),count(distinct c134),count(distinct c135),count(distinct c136),count(distinct c137),count(distinct c138),count(distinct c139),count(distinct c140),count(distinct c141),count(distinct c142),count(distinct c143),count(distinct c144),count(distinct c145),count(distinct c146),count(distinct c147),count(distinct c148),count(distinct c149),count(distinct c150),count(distinct c151),count(distinct c152),count(distinct c153),count(distinct c154),count(distinct c155),count(distinct c156),count(distinct c157),count(distinct c158),count(distinct c159),count(distinct c160),count(distinct c161),count(distinct c162),count(distinct c163),count(distinct c164),count(distinct c165),count(distinct c166),count(distinct c167),count(distinct c168),count(distinct c169),count(distinct c170),count(distinct c171),count(distinct c172),count(distinct c173),count(distinct c174),count(distinct c175),count(distinct c176),count(distinct c177),count(distinct c178),count(distinct c179),count(distinct c180),count(distinct c181),count(distinct c182),count(distinct c183),count(distinct c184),count(distinct c185),count(distinct c186),count(distinct c187),count(distinct c188),count(distinct c189),count(distinct c190),count(distinct c191),count(distinct c192),count(distinct c193),count(distinct c194),count(distinct c195),count(distinct c196),count(distinct c197),count(distinct c198),count(distinct c199)from tbl_200columns; OK 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[jira] [Created] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set
Na Yang created HIVE-9024: - Summary: NullPointerException when starting webhcat server if templeton.hive.properties is not set Key: HIVE-9024 URL: https://issues.apache.org/jira/browse/HIVE-9024 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Na Yang Assignee: Na Yang If templeton.hive.properties is not set, when starting webhcat server, the following NullPointerException is thrown and webhcat server could not start: {noformat} Exception in thread main java.lang.NullPointerException at org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318) at org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155) at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75) at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 28727: HIVE-8638 Implement bucket map join optimization [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28727/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8638 https://issues.apache.org/jira/browse/HIVE-8638 Repository: hive-git Description --- Patch v3 that works when bucket number matches Diffs - itests/src/test/resources/testconfiguration.properties 09c667e ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java cfc1501 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java 2f9e55a ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java 4054173 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkBucketJoinProcCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 8b78123 ql/src/test/queries/clientpositive/bucket_map_join_spark1.q PRE-CREATION ql/src/test/queries/clientpositive/bucket_map_join_spark2.q PRE-CREATION ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out PRE-CREATION ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28727/diff/ Testing --- Thanks, Jimmy Xiang
[jira] [Resolved] (HIVE-6998) Select query can only support maximum 128 distinct expressions
[ https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong resolved HIVE-6998. --- Resolution: Fixed the reported issue can not be reproduced. Select query can only support maximum 128 distinct expressions -- Key: HIVE-6998 URL: https://issues.apache.org/jira/browse/HIVE-6998 Project: Hive Issue Type: Bug Components: Query Processor, Serializers/Deserializers Affects Versions: 0.14.0 Reporter: Chaoyu Tang Select query can only support maximum 128 distinct expressions. Otherwise, you will be thrown ArrayIndexOutOfBoundsException. For a query like: select count(distinct c1), count(distinct c2), count(distinct c3), count(distinct c4), count(distinct c5), count(distinct c6), , count(distinct c128), count(distinct c129) from tbl_129columns; you will get error like: {code} java.lang.Exception: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 10 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064) at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082) ... 16 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -128 at java.util.ArrayList.get(ArrayList.java:324) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:838) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:600) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:401) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:320) ... 19 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8638) Implement bucket map join optimization [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234797#comment-14234797 ] Jimmy Xiang commented on HIVE-8638: --- Patch v3 is on RB: https://reviews.apache.org/r/28727/ Implement bucket map join optimization [Spark Branch] - Key: HIVE-8638 URL: https://issues.apache.org/jira/browse/HIVE-8638 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8638.1-spark.patch, HIVE-8638.2-spark.patch, HIVE-8638.3-spark.patch In the hive-on-mr implementation, bucket map join optimization has to depend on the map join hint. While in the hive-on-tez implementation, a join can be automatically converted to bucket map join if certain conditions are met such as: 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON 2. all join tables are buckets and each small table's bucket number can be divided by big table's bucket number 3. bucket columns == join columns In the hive-on-spark implementation, it is ideal to have the bucket map join auto-convertion support. when all the required criteria are met, a join can be automatically converted to a bucket map join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6998) Select query can only support maximum 128 distinct expressions
[ https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234798#comment-14234798 ] Pengcheng Xiong commented on HIVE-6998: --- Tried version: hive 0.15, commit 8d0d1de18b2439b88 Select query can only support maximum 128 distinct expressions -- Key: HIVE-6998 URL: https://issues.apache.org/jira/browse/HIVE-6998 Project: Hive Issue Type: Bug Components: Query Processor, Serializers/Deserializers Affects Versions: 0.14.0 Reporter: Chaoyu Tang Select query can only support maximum 128 distinct expressions. Otherwise, you will be thrown ArrayIndexOutOfBoundsException. For a query like: select count(distinct c1), count(distinct c2), count(distinct c3), count(distinct c4), count(distinct c5), count(distinct c6), , count(distinct c128), count(distinct c129) from tbl_129columns; you will get error like: {code} java.lang.Exception: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 10 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064) at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082) ... 16 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -128 at java.util.ArrayList.get(ArrayList.java:324) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:838) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:600) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:401) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:320) ... 19 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8886: --- Status: In Progress (was: Patch Available) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8886: --- Attachment: HIVE-8886.03.patch Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch, HIVE-8886.03.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8886: --- Status: Patch Available (was: In Progress) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch, HIVE-8886.03.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set
[ https://issues.apache.org/jira/browse/HIVE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang updated HIVE-9024: -- Status: Patch Available (was: Open) NullPointerException when starting webhcat server if templeton.hive.properties is not set - Key: HIVE-9024 URL: https://issues.apache.org/jira/browse/HIVE-9024 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-9024.patch If templeton.hive.properties is not set, when starting webhcat server, the following NullPointerException is thrown and webhcat server could not start: {noformat} Exception in thread main java.lang.NullPointerException at org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318) at org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175) at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155) at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80) at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75) at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-6998) Select query can only support maximum 128 distinct expressions
[ https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reopened HIVE-6998: --- Sorry, the problem still remains in the run time Select query can only support maximum 128 distinct expressions -- Key: HIVE-6998 URL: https://issues.apache.org/jira/browse/HIVE-6998 Project: Hive Issue Type: Bug Components: Query Processor, Serializers/Deserializers Affects Versions: 0.14.0 Reporter: Chaoyu Tang Select query can only support maximum 128 distinct expressions. Otherwise, you will be thrown ArrayIndexOutOfBoundsException. For a query like: select count(distinct c1), count(distinct c2), count(distinct c3), count(distinct c4), count(distinct c5), count(distinct c6), , count(distinct c128), count(distinct c129) from tbl_129columns; you will get error like: {code} java.lang.Exception: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 10 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138) ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: -128 at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064) at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082) ... 16 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -128 at java.util.ArrayList.get(ArrayList.java:324) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:838) at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:600) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:401) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:320) ... 19 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup
[ https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234842#comment-14234842 ] Jason Dere commented on HIVE-8886: -- I think it looks ok, let's see how the tests go. Might have to resubmit patch once SVN is back up to get the tests to run. Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup --- Key: HIVE-8886 URL: https://issues.apache.org/jira/browse/HIVE-8886 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.1 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch, HIVE-8886.03.patch {noformat} SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field` FROM vectortab2korc GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) LIMIT 50; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers
Chao created HIVE-9025: -- Summary: join38.q (without map join) produces incorrect result when testing with multiple reducers Key: HIVE-9025 URL: https://issues.apache.org/jira/browse/HIVE-9025 Project: Hive Issue Type: Bug Reporter: Chao I have this query from a modified version of {{join38.q}}, which does NOT use map join: {code} FROM src a JOIN tmp b ON (a.key = b.col11) SELECT a.value, b.col5, count(1) as count where b.col11 = 111 group by a.value, b.col5; {code} If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set it to be a larger number (3 for instance), then result will be {noformat} val_111 105 1 {noformat} which is wrong. I think the issue is that, for this case, ConstantPropagationProcFactory will overwrite the partition cols for the reduce sink desc, with an empty list. Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is length 0, it will use an random number as hashcode, for each separate row. As result, rows with same key will be distributed to different reducers, and hence leads to incorrect result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9026) Re-enable remaining tests after HIVE-8970 [Spark Branch]
Chao created HIVE-9026: -- Summary: Re-enable remaining tests after HIVE-8970 [Spark Branch] Key: HIVE-9026 URL: https://issues.apache.org/jira/browse/HIVE-9026 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: spark-branch Reporter: Chao In HIVE-8970, we disabled several tests which seem to be related to an bug in upstream. I filed HIVE-9025 to track it. {noformat} join38.q join_literals.q join_nullsafe.q subquery_in.q ppd_multi_insert.q {noformat} We need to re-enable these tests after HIVE-9025 is resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9026) Re-enable remaining tests after HIVE-8970 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-9026: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 Re-enable remaining tests after HIVE-8970 [Spark Branch] Key: HIVE-9026 URL: https://issues.apache.org/jira/browse/HIVE-9026 Project: Hive Issue Type: Sub-task Components: spark-branch Affects Versions: spark-branch Reporter: Chao In HIVE-8970, we disabled several tests which seem to be related to an bug in upstream. I filed HIVE-9025 to track it. {noformat} join38.q join_literals.q join_nullsafe.q subquery_in.q ppd_multi_insert.q {noformat} We need to re-enable these tests after HIVE-9025 is resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8911) Enable mapjoin hints [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-8911: -- Assignee: Chao Enable mapjoin hints [Spark Branch] --- Key: HIVE-8911 URL: https://issues.apache.org/jira/browse/HIVE-8911 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Chao Currently the big table selection in a mapjoin is based on stats. We should also enable the big-table selection based on hints. See class MapJoinProcessor. This is a logical-optimizer class, so we should be able to re-use this without too many changes to hook up with SparkMapJoinResolver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234870#comment-14234870 ] Chao commented on HIVE-9007: We can re-enable {{ppd_join4.q}} after this is resolved, although we can also enable it right now since it's uses reduce-side join. Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch] -- Key: HIVE-9007 URL: https://issues.apache.org/jira/browse/HIVE-9007 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, which may cause map join in spark branch to generate wrong plan. Currently, the map join conversion in spark branch first goes through a method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, removes RS associated with big table, and keep RSs for all small tables. Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of the mapjoin op with HTS (note it doesn't check whether the RS belongs to small table or big table.) The issue arises, when IdentityProjectRemover comes into play, which may result into a situation that a operator tree has two consecutive RSs. Imaging the following example: {noformat} Join MapJoin / \/ \ RS RS --- RS RS / \ / \ TS RS TS TS (big table) \ (small table) TS {noformat} In this case, all parents of the mapjoin op will be RS, even the branch for big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, which is obviously incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234874#comment-14234874 ] Szehon Ho commented on HIVE-9007: - I'm ok with enable it anytime. This JIRA will have to add another version of this test that uses mapjoin. Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch] -- Key: HIVE-9007 URL: https://issues.apache.org/jira/browse/HIVE-9007 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, which may cause map join in spark branch to generate wrong plan. Currently, the map join conversion in spark branch first goes through a method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, removes RS associated with big table, and keep RSs for all small tables. Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of the mapjoin op with HTS (note it doesn't check whether the RS belongs to small table or big table.) The issue arises, when IdentityProjectRemover comes into play, which may result into a situation that a operator tree has two consecutive RSs. Imaging the following example: {noformat} Join MapJoin / \/ \ RS RS --- RS RS / \ / \ TS RS TS TS (big table) \ (small table) TS {noformat} In this case, all parents of the mapjoin op will be RS, even the branch for big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, which is obviously incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234875#comment-14234875 ] Chao commented on HIVE-9007: OK, I'll create another JIRA to enable it now. Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch] -- Key: HIVE-9007 URL: https://issues.apache.org/jira/browse/HIVE-9007 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, which may cause map join in spark branch to generate wrong plan. Currently, the map join conversion in spark branch first goes through a method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, removes RS associated with big table, and keep RSs for all small tables. Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of the mapjoin op with HTS (note it doesn't check whether the RS belongs to small table or big table.) The issue arises, when IdentityProjectRemover comes into play, which may result into a situation that a operator tree has two consecutive RSs. Imaging the following example: {noformat} Join MapJoin / \/ \ RS RS --- RS RS / \ / \ TS RS TS TS (big table) \ (small table) TS {noformat} In this case, all parents of the mapjoin op will be RS, even the branch for big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, which is obviously incorrect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8131) Support timestamp in Avro
[ https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-8131: -- Assignee: Ferdinand Xu Support timestamp in Avro - Key: HIVE-8131 URL: https://issues.apache.org/jira/browse/HIVE-8131 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9027) Enable ppd_join4 [Spark Branch]
Chao created HIVE-9027: -- Summary: Enable ppd_join4 [Spark Branch] Key: HIVE-9027 URL: https://issues.apache.org/jira/browse/HIVE-9027 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Priority: Trivial We disabled {{ppd_join4}} in HIVE-8970, after seeing an issue when running it with map join. However, since this test uses reduce-side join, we should have no problem enabling it. The issue with map join is tracked by HIVE-9007, and we will create separate test for the map join case in that JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7817) distinct/group by don't work on partition columns
[ https://issues.apache.org/jira/browse/HIVE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234880#comment-14234880 ] Pengcheng Xiong commented on HIVE-7817: --- by the way, i do not think hive-3108 is ever solved. distinct/group by don't work on partition columns - Key: HIVE-7817 URL: https://issues.apache.org/jira/browse/HIVE-7817 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Eugene Koifman suppose you have a table like this: {code:sql} CREATE TABLE page_view( viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) CLUSTERED BY(userid) INTO 4 BUCKETS {code} Then {code:sql} select distinct dt from page_view; select distinct dt, country from page_view; select dt, country from page_view group by dt, country; {code} all fail with {noformat} Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Job running in-process (local Hadoop) Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-08-20 17:26:13,018 Stage-1 map = 0%, reduce = 0% Ended Job = job_local165359429_0013 with errors Error during job, obtaining debugging information... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec {noformat} but {code:sql} select dt, country, count(*) from page_view group by dt, country; {code} works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 28699: HIVE-8783 Create some tests that use Spark counter for stats collection [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28699/ --- (Updated Dec. 5, 2014, 2:05 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8783 https://issues.apache.org/jira/browse/HIVE-8783 Repository: hive-git Description --- Hive already has stats_counter.q and stats_counter_partitioned.q for unit test of table statistic collection on Counter. stats_counter.q has enabled yet, I enable stats_counter_partitioned.q in this patch. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 09c667e ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 ql/src/test/results/clientpositive/spark/stats_counter_partitioned.q.out PRE-CREATION Diff: https://reviews.apache.org/r/28699/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-2573) Create per-session function registry
[ https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234923#comment-14234923 ] Jason Dere commented on HIVE-2573: -- Is this one still being worked on? I think it's basically ready, with the exception of 2 questions/comments from the RB of patch v13: - HiveParser.g has a error message that should be changed from drop function statement to reload function statement - SessionConf.java: What about the idea of moving static call to resolveFunctions() to SessionState? I thought that would remove the need for SessionConf, because then Hive class would once again be usable during query runtime. Unless you think it's cleaner to use SessionConf to get HiveConf rather than the Hive object. Create per-session function registry - Key: HIVE-2573 URL: https://issues.apache.org/jira/browse/HIVE-2573 Project: Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Navis Assignee: Navis Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, HIVE-2573.1.patch.txt, HIVE-2573.10.patch.txt, HIVE-2573.11.patch.txt, HIVE-2573.12.patch.txt, HIVE-2573.13.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, HIVE-2573.4.patch.txt, HIVE-2573.5.patch, HIVE-2573.6.patch, HIVE-2573.7.patch, HIVE-2573.8.patch.txt, HIVE-2573.9.patch.txt Currently the function registry is shared resource and could be overrided by other users when using HiveServer. If per-session function registry is provided, this situation could be prevented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8783: Attachment: HIVE-8783.2-spark.patch Create some tests that use Spark counter for stats collection [Spark Branch] Key: HIVE-8783 URL: https://issues.apache.org/jira/browse/HIVE-8783 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Attachments: HIVE-8783.1-spark.patch, HIVE-8783.2-spark.patch Currently when .q tests are run with Spark, the default stats collection is fs. We need to have some tests that use Spark counter for stats collection to enhance coverage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9016: Attachment: HIVE-9016.2-spark.patch Thanks, [~xuefuz]. SparkCounter display name is not set correctly[Spark Branch] Key: HIVE-9016 URL: https://issues.apache.org/jira/browse/HIVE-9016 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch, HIVE-9016.2-spark.patch SparkCounter displayName is set with SparkCounterGroup displayName, we should not do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)