date:20141204


[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233993#comment-14233993
 ] 

Rui Li commented on HIVE-8991:
--

I looked a little more into this. It seems hive-exec is properly added to class 
path (as user application jar in {{SparkSubmit}}) and class loader can load 
{{HiveIgnoreKeyTextOutputFormat}}:
{noformat}
2014-12-04 08:35:33,383 INFO  [stdout-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(384)) - [Loaded 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat from 
file:/home/hive/packaging/target/apache-hive-0.15.0-SNAPSHOT-bin/apache-hive-0.15.0-SNAPSHOT-bin/lib/hive-exec-0.15.0-SNAPSHOT.jar]
{noformat}
Nevertheless I still get the following error:
{noformat}
2014-12-04 08:32:26,681 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(384)) - java.lang.NoClassDefFoundError: 
org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat
{noformat}
Besides, the exception is thrown when we try to deserialize SparkWork in the 
job, which means {{org.apache.hadoop.hive.ql.exec.spark.KryoSerializer}} has 
been loaded properly.
I'll do more debugging. Wondering if it's possible the error message is not 
accurate.

As for {{SparkSubmitDriverBootstrapper}} hanging issue, it's because it calls 
System.exit in a shutdown hook which causes deadlock. It's been fixed in latest 
branch.

 Fix custom_input_output_format [Spark Branch]
 -

 Key: HIVE-8991
 URL: https://issues.apache.org/jira/browse/HIVE-8991
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8991.1-spark.patch


 After HIVE-8836, {{custom_input_output_format}} fails because of missing 
 hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9019) Avoid using SPARK_JAVA_OPTS [Spark Branch]

Rui Li created HIVE-9019:


 Summary: Avoid using SPARK_JAVA_OPTS [Spark Branch]
 Key: HIVE-9019
 URL: https://issues.apache.org/jira/browse/HIVE-9019
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li


SPARK_JAVA_OPTS has been deprecated, see {{SparkConf.validateSettings}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9019) Avoid using SPARK_JAVA_OPTS [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9019:
-
Issue Type: Sub-task  (was: Test)
Parent: HIVE-7292

 Avoid using SPARK_JAVA_OPTS [Spark Branch]
 --

 Key: HIVE-9019
 URL: https://issues.apache.org/jira/browse/HIVE-9019
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li

 SPARK_JAVA_OPTS has been deprecated, see {{SparkConf.validateSettings}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9018) SHOW GRANT ROLE in Hive should return grant_time in human readable format

2014-12-04 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-9018:
-
Attachment: HIVE-9018.003.patch

 SHOW GRANT ROLE in Hive should return grant_time in human readable format
 ---

 Key: HIVE-9018
 URL: https://issues.apache.org/jira/browse/HIVE-9018
 Project: Hive
  Issue Type: Improvement
Reporter: Dapeng Sun
Priority: Minor
 Attachments: HIVE-9018.003.patch


 Currently, SHOW GRANT ROLE will return the 'grant_time' in microseconds 
 since epoch. It would be nice if this were in human readable format.
 Current output: 1411801585902000
 Desired output: Sat, Sep 27 2014 00:06:25.902



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9018) SHOW GRANT ROLE in Hive should return grant_time in human readable format

2014-12-04 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-9018:
-
Attachment: (was: HIVE-9018.002.patch)

 SHOW GRANT ROLE in Hive should return grant_time in human readable format
 ---

 Key: HIVE-9018
 URL: https://issues.apache.org/jira/browse/HIVE-9018
 Project: Hive
  Issue Type: Improvement
Reporter: Dapeng Sun
Priority: Minor
 Attachments: HIVE-9018.003.patch


 Currently, SHOW GRANT ROLE will return the 'grant_time' in microseconds 
 since epoch. It would be nice if this were in human readable format.
 Current output: 1411801585902000
 Desired output: Sat, Sep 27 2014 00:06:25.902



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9018) SHOW GRANT ROLE in Hive should return grant_time in human readable format

2014-12-04 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-9018:
-
Attachment: (was: HIVE-9018.patch)

 SHOW GRANT ROLE in Hive should return grant_time in human readable format
 ---

 Key: HIVE-9018
 URL: https://issues.apache.org/jira/browse/HIVE-9018
 Project: Hive
  Issue Type: Improvement
Reporter: Dapeng Sun
Priority: Minor
 Attachments: HIVE-9018.003.patch


 Currently, SHOW GRANT ROLE will return the 'grant_time' in microseconds 
 since epoch. It would be nice if this were in human readable format.
 Current output: 1411801585902000
 Desired output: Sat, Sep 27 2014 00:06:25.902



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.

Anant Nag created HIVE-9020:
---

 Summary: When dropping external tables, Hive should not verify 
whether user has access to the data. 
 Key: HIVE-9020
 URL: https://issues.apache.org/jira/browse/HIVE-9020
 Project: Hive
  Issue Type: Bug
Reporter: Anant Nag


When dropping tables, hive verifies whether the user has access to the data on 
hdfs. It fails, if user doesn't have access. It makes sense for internal tables 
since the data has to be deleted when dropping internal tables but for external 
tables, Hive should not check for data access. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9021) Hive should not allow any user to create tables in other hive DB's that user doesn't own

Anant Nag created HIVE-9021:
---

 Summary: Hive should not allow any user to create tables in other 
hive DB's that user doesn't own
 Key: HIVE-9021
 URL: https://issues.apache.org/jira/browse/HIVE-9021
 Project: Hive
  Issue Type: Bug
Reporter: Anant Nag


Hive allows users to create tables in other users db. This should not be 
allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9022) When creating external tables, Hive needs to verify whether the user has read permissions to the data

Anant Nag created HIVE-9022:
---

 Summary: When creating external tables, Hive needs to verify 
whether the user has read permissions to the data
 Key: HIVE-9022
 URL: https://issues.apache.org/jira/browse/HIVE-9022
 Project: Hive
  Issue Type: Bug
Reporter: Anant Nag


Hive doesn't verify whether user has read permissions on the data before 
creating external table referring to the data. This needs to be fixed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9016:

Attachment: HIVE-9016.1-spark.patch

It's weird that unit test has not been triggered after 6 hours, upload patch 
again.

 SparkCounter display name is not set correctly[Spark Branch]
 

 Key: HIVE-9016
 URL: https://issues.apache.org/jira/browse/HIVE-9016
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch


 SparkCounter displayName is set with SparkCounterGroup displayName, we should 
 not do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8783:

Attachment: HIVE-8783.1-spark.patch

Actually, Hive already has stats_counter.q and stats_counter_partitioned.q for 
unit test of table statistic collection on Counter. stats_counter.q has enabled 
yet, I enable stats_counter_partitioned.q in this patch.

 Create some tests that use Spark counter for stats collection [Spark Branch]
 

 Key: HIVE-8783
 URL: https://issues.apache.org/jira/browse/HIVE-8783
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-8783.1-spark.patch


 Currently when .q tests are run with Spark, the default stats collection is 
 fs. We need to have some tests that use Spark counter for stats collection 
 to enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8783:

Status: Patch Available  (was: Open)

 Create some tests that use Spark counter for stats collection [Spark Branch]
 

 Key: HIVE-8783
 URL: https://issues.apache.org/jira/browse/HIVE-8783
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-8783.1-spark.patch


 Currently when .q tests are run with Spark, the default stats collection is 
 fs. We need to have some tests that use Spark counter for stats collection 
 to enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.


 [ 
https://issues.apache.org/jira/browse/HIVE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9020:

Affects Version/s: 0.13.1
   Status: Patch Available  (was: Open)

Hive now doesn't verify whether the user has access to the data while dropping 
an external table. It also checks now whether the user is the owner of the 
table before dropping it. 

 When dropping external tables, Hive should not verify whether user has access 
 to the data. 
 ---

 Key: HIVE-9020
 URL: https://issues.apache.org/jira/browse/HIVE-9020
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Anant Nag

 When dropping tables, hive verifies whether the user has access to the data 
 on hdfs. It fails, if user doesn't have access. It makes sense for internal 
 tables since the data has to be deleted when dropping internal tables but for 
 external tables, Hive should not check for data access. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9020) When dropping external tables, Hive should not verify whether user has access to the data.


 [ 
https://issues.apache.org/jira/browse/HIVE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9020:

Attachment: dropExternal.patch

Hive now doesn't verify whether the user has access to the data while dropping 
an external table. It also checks now whether the user is the owner of the 
table before dropping it. 

 When dropping external tables, Hive should not verify whether user has access 
 to the data. 
 ---

 Key: HIVE-9020
 URL: https://issues.apache.org/jira/browse/HIVE-9020
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Anant Nag
 Attachments: dropExternal.patch


 When dropping tables, hive verifies whether the user has access to the data 
 on hdfs. It fails, if user doesn't have access. It makes sense for internal 
 tables since the data has to be deleted when dropping internal tables but for 
 external tables, Hive should not check for data access. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9022) When creating external tables, Hive needs to verify whether the user has read permissions to the data


 [ 
https://issues.apache.org/jira/browse/HIVE-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9022:

Attachment: createExternal.patch

 When creating external tables, Hive needs to verify whether the user has read 
 permissions to the data
 -

 Key: HIVE-9022
 URL: https://issues.apache.org/jira/browse/HIVE-9022
 Project: Hive
  Issue Type: Bug
Reporter: Anant Nag
 Attachments: createExternal.patch


 Hive doesn't verify whether user has read permissions on the data before 
 creating external table referring to the data. This needs to be fixed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9022) When creating external tables, Hive needs to verify whether the user has read permissions to the data


 [ 
https://issues.apache.org/jira/browse/HIVE-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9022:

   Labels: patch  (was: )
Affects Version/s: 0.13.1
   Status: Patch Available  (was: Open)

The user should have read and execute permissions of the parent folder of the 
data location as well as the location itself. Hive now checks if both parent 
and data location has read and execute permissions before creating the table. 

 When creating external tables, Hive needs to verify whether the user has read 
 permissions to the data
 -

 Key: HIVE-9022
 URL: https://issues.apache.org/jira/browse/HIVE-9022
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Anant Nag
  Labels: patch
 Attachments: createExternal.patch


 Hive doesn't verify whether user has read permissions on the data before 
 creating external table referring to the data. This needs to be fixed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9021) Hive should not allow any user to create tables in other hive DB's that user doesn't own


 [ 
https://issues.apache.org/jira/browse/HIVE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9021:

Attachment: db.patch

 Hive should not allow any user to create tables in other hive DB's that user 
 doesn't own
 

 Key: HIVE-9021
 URL: https://issues.apache.org/jira/browse/HIVE-9021
 Project: Hive
  Issue Type: Bug
Reporter: Anant Nag
 Attachments: db.patch


 Hive allows users to create tables in other users db. This should not be 
 allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9021) Hive should not allow any user to create tables in other hive DB's that user doesn't own


 [ 
https://issues.apache.org/jira/browse/HIVE-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Nag updated HIVE-9021:

   Labels: patch  (was: )
Affects Version/s: 0.13.1
   Status: Patch Available  (was: Open)

Hive now checks if the user is owner of the database before creating table in 
the database.

 Hive should not allow any user to create tables in other hive DB's that user 
 doesn't own
 

 Key: HIVE-9021
 URL: https://issues.apache.org/jira/browse/HIVE-9021
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Anant Nag
  Labels: patch
 Attachments: db.patch


 Hive allows users to create tables in other users db. This should not be 
 allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234082#comment-14234082
 ] 

Rui Li commented on HIVE-8991:
--

Not sure if it's because how we add hive-exec:
If added dynamically (as application jar), spark loads it with 
{{ExecutorURLClassLoader}} and set it as the thread's ContextClassLoader. Then 
we hit the NoClassDefFoundError.
If added to {{spark.driver.extraClassPath}}, then it's loaded with the system 
class loader and the error is gone.

 Fix custom_input_output_format [Spark Branch]
 -

 Key: HIVE-8991
 URL: https://issues.apache.org/jira/browse/HIVE-8991
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8991.1-spark.patch


 After HIVE-8836, {{custom_input_output_format}} fails because of missing 
 hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9023) HiveHistoryImpl relies on removed counters to print num rows

2014-12-04 Thread Slava Markeyev (JIRA)

Slava Markeyev created HIVE-9023:


 Summary: HiveHistoryImpl relies on removed counters to print num 
rows
 Key: HIVE-9023
 URL: https://issues.apache.org/jira/browse/HIVE-9023
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1, 0.13.0, 0.14.0, 0.14.1
Reporter: Slava Markeyev
Priority: Minor


HiveHistoryImpl still relies on the counters that were removed in HIVE-5982 to 
determine the number of rows loaded. This results in regression of 
functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)


 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8974:
--
Attachment: (was: HIVE-8974.03.patch)

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)


 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8974:
--
Attachment: (was: HIVE-8974.03.patch)

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)


 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8974:
--
Attachment: HIVE-8974.03.patch

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, 
 HIVE-8974.03.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Apache Hive 1.0 ?

On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote:

Hi,

I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I
think both HBase and Hive are past due for doing 1.0 releases. So I am a
major +1 for Hive-1.0 (non-binding of course).

Agreed :)

The important thing for calling something 1.0 I think is the focus on user
level API and compatibility issues.

I think we need to remember that not all our users write SQL statements.
For example Drill, Spark SQL, PrestoDB, Impala, Kylin, Ranger, Sentry,
Carl's view project at LI, and more are all users of Hive as well. At the
user meetup tonight Carl suggest we get our act together in regards to
documenting which API's are public or not, for these users. I think that
makes a lot of sense.

But still, you should think about
future releases and for example when you can do a 1.x release versus 2.x
release. We have started thinking about that some time ago, and we are
adopting a semantic versioning proposal (

https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E
)
for this exact same reason. In Hive, things may be a bit different than
HBase or Hadoop (since the major interface is SQL) but still I think you
should consider the implications for all the APIs that Hive surfaces and
for deployment, etc for a 1.0 discussion.

For HBase, the official theme of the 1.0 release is (from my RC mail):
The theme of (eventual) 1.0 release is to
become a stable base for future 1.x series of releases. 1.0 release will
aim to achieve at least the same level of stability of 0.98 releases
without introducing too many new features.

What I am getting at is that, in HBase, we opted for not introducing a lot
of major features and branched relatively early to give more time to
stabilize the branch. In the end what you want to deliver and market as 1.0
should be relatively stable in my opinion. Just my 2 cents from an outsider
perspective.

Enis

On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

Would everyone just laugh if I suggested that a 1.0 release ought to
include complete documentation?

-- Lefty

On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com
wrote:

The reasons for confusion in the Hadoop case were different. There
were many branches, and new features were added in minor version
releases, eg kerberos security was not there in 0.20.2, but it was
added in 0.20.20x. Then you had other versions like 0.21, but the
older 0.20.20x version was the one that was converted as 1.x.

This confusion isn't there in hive. In case of hive, every 0.x
release has been adding new features, and releases have been
sequential. 0.x.y releases have been maintenance releases. 1.0 is a
sequential release after 0.14, and it is a newer release than 0.14. I
agree that the version in Hadoop created lot of confusion, but I don't
see this as being the same. We could check in the user mailing list to
see if they are going to be HUGELY confused by this.

If it makes things better, we can also include the change to delete
HiveServer1 in the new release. That is a safer change, which was
mainly just deleting that old code. That would be a major difference
from 0.14. (The docs have already been updated to say that 0.14 does
not support 0.20, so I don't think we need that in 1.0).

Looks like we have agreement that 1.0 versioning scheme is a great
thing for hive. I don't think there is a strong reason to delay a 1.0
release by several months to the detriment of hive.

On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com
wrote:
Major release means more functionality, while minor releases provides
stability. Therefore, I'd think, 1.0, as a major release, should
bring
in
something new to the user. If it's desirable to provide more stable
release, then 0.14.1, 0.14.2, and so on are the right ones. In my
opinion,
we should avoid doing anti-pattern by introducing major release like
a
maintenance release and creating confusions among users.

In one word, major release is NOT equal to major confusion.

--Xuefu

On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin
ser...@hortonworks.com

wrote:

I think it's better to do 1.0 release off a maintenance release,
since
that
is more stable. Trunk is moving fast.
HBase uses odd release numbers for this purpose, where 0.95, 97, 99
etc.
are dev releases and 0.96, 0.98, 1.0 etc. are public; that works
well
for
baking, but since we don't have that seems like 14.0 would be a good
place
to bake. 15.0 with bunch of new bugs that we are busy introducing
may
not
be as good for 1.0 IMHO...

On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com
wrote:

Re: Review Request 28283: HIVE-8900:Create encryption testing framework

2014-12-04 Thread Sergio Pena


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28283/#review63681
---

Ship it!


Ship It!

- Sergio Pena


On Dic. 3, 2014, 1:02 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28283/
 ---
 
 (Updated Dic. 3, 2014, 1:02 a.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The patch includes:
 1. enable security properties for hive security cluster
 
 
 Diffs
 -
 
   .gitignore c5decaf 
   data/scripts/q_test_cleanup_for_encryption.sql PRE-CREATION 
   data/scripts/q_test_init_for_encryption.sql PRE-CREATION 
   itests/qtest/pom.xml 376f4a9 
   itests/src/test/resources/testconfiguration.properties 3ae001d 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 31d5c29 
   ql/src/test/queries/clientpositive/create_encrypted_table.q PRE-CREATION 
   shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
 2e00d93 
   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
 8161fc1 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 fa66a4a 
 
 Diff: https://reviews.apache.org/r/28283/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 cheng xu

Re: Review Request 27713: CBO: enable groupBy index

2014-12-04 Thread John Pullokkaran


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27713/#review63710
---

Ship it!


Ship It!

- John Pullokkaran


On Dec. 2, 2014, 11:18 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27713/
 ---
 
 (Updated Dec. 2, 2014, 11:18 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build. The 
 basic problem is that 
 for SEL1-SEL2-GRY-...-SEL3,
 the previous version only modify SEL2, which immediately precedes GRY.
 Now, with CBO, we have lots of SELs, e.g., SEL1.
 So, the solution is to modify all of them.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties fc1f345 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 9ffa708 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  02216de 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  0f06ec9 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  74614f3 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  d699308 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27713/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong

Re: Review Request 27713: CBO: enable groupBy index

2014-12-04 Thread John Pullokkaran


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27713/#review63711
---

Ship it!


- John Pullokkaran


On Dec. 2, 2014, 11:18 p.m., pengcheng xiong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27713/
 ---
 
 (Updated Dec. 2, 2014, 11:18 p.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build. The 
 basic problem is that 
 for SEL1-SEL2-GRY-...-SEL3,
 the previous version only modify SEL2, which immediately precedes GRY.
 Now, with CBO, we have lots of SELs, e.g., SEL1.
 So, the solution is to modify all of them.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties fc1f345 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
 9ffa708 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
  02216de 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  0f06ec9 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
  74614f3 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  d699308 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/27713/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 pengcheng xiong

Review Request 28699: HIVE-8783 Create some tests that use Spark counter for stats collection [Spark Branch]

2014-12-04 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28699/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8783
https://issues.apache.org/jira/browse/HIVE-8783


Repository: hive-git


Description
---

Hive already has stats_counter.q and stats_counter_partitioned.q for unit test 
of table statistic collection on Counter. stats_counter.q has enabled yet, I 
enable stats_counter_partitioned.q in this patch.


Diffs
-

  itests/src/test/resources/testconfiguration.properties 09c667e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 
  ql/src/test/results/clientpositive/spark/stats_counter_partitioned.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28699/diff/


Testing
---


Thanks,

chengxiang li

Re: SVN server hanging

2014-12-04 Thread Thejas Nair

Apache infra team is looking into it .

-- Forwarded message --
From: Geoffrey Corey cor...@apache.org
Date: Wed, Dec 3, 2014 at 9:56 AM
Subject: Notice: Subversion master undergoing emergency maintenance
To: committ...@apache.org

Eris is currently undergoing some emergency maintenance due to disk errors.
We do not currently have an ETA on when this will be fixed.

In the meantime, there will be no access to commit to SVN.
The read-only mirror at svn.eu.apache.org is still working.

The blog post can be found here. [1]

[1] - 
https://blogs.apache.org/infra/entry/subversion_master_undergoing_emergency_maintenance

-- Geoff
On behalf of Infra.

On Wed, Dec 3, 2014 at 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote:
 It seems Hive svn server is hanging. Does anyone have means to restart it?

 Thanks,
 Xuefu

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: SVN server hanging

https://blogs.apache.org/infra/entry/subversion_master_undergoing_emergency_maintenance
On Dec 3, 2014 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 It seems Hive svn server is hanging. Does anyone have means to restart it?

 Thanks,
 Xuefu

Re: Review Request 27713: CBO: enable groupBy index

2014-12-04 Thread pengcheng xiong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27713/
---

(Updated Dec. 3, 2014, 7:40 p.m.)


Review request for hive and John Pullokkaran.


Repository: hive-git


Description
---

Right now, even when groupby index is build, CBO is not able to use it. In this 
patch, we are trying to make it use groupby index that we build. The basic 
problem is that 
for SEL1-SEL2-GRY-...-SEL3,
the previous version only modify SEL2, which immediately precedes GRY.
Now, with CBO, we have lots of SELs, e.g., SEL1.
So, the solution is to modify all of them.


Diffs
-

  itests/src/test/resources/testconfiguration.properties fc1f345 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
9ffa708 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 02216de 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
0f06ec9 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 74614f3 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 d699308 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27713/diff/


Testing
---


Thanks,

pengcheng xiong

Re: Review Request 27713: CBO: enable groupBy index

2014-12-04 Thread pengcheng xiong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27713/
---

(Updated Dec. 3, 2014, 7:40 p.m.)


Review request for hive and John Pullokkaran.


Changes
---

remove white spaces


Repository: hive-git


Description
---

Right now, even when groupby index is build, CBO is not able to use it. In this 
patch, we are trying to make it use groupby index that we build. The basic 
problem is that 
for SEL1-SEL2-GRY-...-SEL3,
the previous version only modify SEL2, which immediately precedes GRY.
Now, with CBO, we have lots of SELs, e.g., SEL1.
So, the solution is to modify all of them.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties fc1f345 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java 
9ffa708 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
 02216de 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 
0f06ec9 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
 74614f3 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
 d699308 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx_cbo_2.q PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out fdc1dc6 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/ql_rewrite_gbtoidx_cbo_2.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27713/diff/


Testing
---


Thanks,

pengcheng xiong

Remove lock on compilation stage

I mentioned this at the meetup tonight. With all the new work being done in
the compilation phase I think that this global compiler lock might be more
impactful. Of course it does not impact Hive CLI, but most of the users I
know use HS2.

https://issues.apache.org/jira/browse/HIVE-4239

Does anyone have interest in doing some parallel testing with the lock
removed?

Brock

RE: Apache Hive 1.0 ?

2014-12-04 Thread Bill Busch

Hi,

From more of an end user perspective, if we were to move to a 1.0 release then 
it should be a complete offering.  Have we defined what this would include?   
What is our definition of complete documentation?  
In general, I would expect a 1.0 release to include:

1. A stable code base that is reasonable current (eg. Implemented on YARN).
2. A complete set of functionality that would enable a company to use Hive as 
an analytical /BI database.This would include a rather complete 
implementation of SQL (minus transaction processing).
3. A reliable install program/kit.
4. Documentation including:
a. API specification
b. User guide - to include deviations from ANSI standard SQL and any 
extensions
c.  Administration guidance including how to install, configure and 
administer Hive.
d.  Release Notes clearly detailing release inclusions and known issues 
(open Jiras), and capability with other Apache projects.  



Thank You,
Follow me on   @BigData73
-
Bill Busch | SSA | Enterprise Information Solutions CWP
m: 704.806.2485 |  NASDAQ: PRFT  |  Perficient.com




BI/DW | Advanced Analytics | Big Data |  ECI|  EPM | MDM

-Original Message-
From: Enis Söztutar [mailto:e...@apache.org] 
Sent: Wednesday, December 03, 2014 5:27 PM
To: dev@hive.apache.org
Subject: Re: Apache Hive 1.0 ?

Hi,

I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I think 
both HBase and Hive are past due for doing 1.0 releases. So I am a major +1 for 
Hive-1.0 (non-binding of course).

The important thing for calling something 1.0 I think is the focus on user 
level API and compatibility issues. But still, you should think about future 
releases and for example when you can do a 1.x release versus 2.x release. We 
have started thinking about that some time ago, and we are adopting a semantic 
versioning proposal (
https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E)
for this exact same reason. In Hive, things may be a bit different than HBase 
or Hadoop (since the major interface is SQL) but still I think you should 
consider the implications for all the APIs that Hive surfaces and for 
deployment, etc for a 1.0 discussion.

For HBase, the official theme of the 1.0 release is (from my RC mail):
 The theme of (eventual) 1.0 release is to become a stable base for 
 future 1.x series of releases. 1.0 release will aim to achieve at 
 least the same level of stability of 0.98 releases without introducing 
 too many new features.

What I am getting at is that, in HBase, we opted for not introducing a lot of 
major features and branched relatively early to give more time to stabilize the 
branch. In the end what you want to deliver and market as 1.0 should be 
relatively stable in my opinion. Just my 2 cents from an outsider perspective.

Enis

On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Would everyone just laugh if I suggested that a 1.0 release ought to 
 include complete documentation?


 -- Lefty

 On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com
 wrote:

  The reasons for confusion in the Hadoop case were different. There 
  were many branches, and new features were added in minor version 
  releases, eg kerberos security was not there in 0.20.2, but it was 
  added in 0.20.20x.  Then you had other versions like 0.21, but 
  the older 0.20.20x version was the one that was converted as 1.x.
 
  This confusion isn't there in hive. In case of hive, every 0.x
  release has been adding new features, and releases have been 
  sequential. 0.x.y releases have been maintenance releases. 1.0 is 
  a sequential release after 0.14, and it is a newer release than 
  0.14. I agree that the version in Hadoop created lot of confusion, 
  but I don't see this as being the same. We could check in the user 
  mailing list to see if they are going to be HUGELY confused by this.
 
  If it makes things better, we can also include the change to delete
  HiveServer1 in the new release. That is a safer change, which was 
  mainly just deleting that old code. That would be a major difference 
  from 0.14. (The docs have already been updated to say that 0.14 does 
  not support 0.20, so I don't think we need that in 1.0).
 
  Looks like we have agreement that 1.0 versioning scheme is a great 
  thing for hive. I don't think there is a strong reason to delay a 
  1.0 release by several months to the detriment of hive.
 
 
  On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote:
   Major release means more functionality, while minor releases 
   provides stability. Therefore, I'd think, 1.0, as a major release, 
   should bring
 in
   something new to the user. If it's desirable to provide more 
   stable release, then 0.14.1, 0.14.2, and so on are the right ones.

Re: Apache Hive 1.0 ?

2014-12-04 Thread Sergey Shelukhin

I think 1.0 release in particular should be a relatively stable release,
since we go from beta(?) stage of 0.x to 1.0.
Otherwise, what prevents us from promoting 0.14 to 1.0? 0.14.1 is not done
yet, so it would be great, we will have no 0.x.y releases, and 1.1 will
become the first fix release, no confusion.

On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote:

 Major release means more functionality, while minor releases provides
 stability. Therefore, I'd think, 1.0, as a major release, should bring in
 something new to the user. If it's desirable to provide more stable
 release, then 0.14.1, 0.14.2, and so on are the right ones. In my opinion,
 we should avoid doing anti-pattern by introducing major release like a
 maintenance release and creating confusions among users.

 In one word, major release is NOT equal to major confusion.

 --Xuefu

 On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin ser...@hortonworks.com
 wrote:

  I think it's better to do 1.0 release off a maintenance release, since
 that
  is more stable. Trunk is moving fast.
  HBase uses odd release numbers for this purpose, where 0.95, 97, 99 etc.
  are dev releases and 0.96, 0.98, 1.0 etc. are public; that works well for
  baking, but since we don't have that seems like 14.0 would be a good
 place
  to bake. 15.0 with bunch of new bugs that we are busy introducing may not
  be as good for 1.0 IMHO...
 
  On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com wrote:
 
   Hi Thejas,
  
   Thank you very much for your proposal!
  
   Hadoop did something similar renaming branches to branch-1 and
   branch-2. At the time, although I was very much in favor of the new
   release numbers, I thought it could have been handled better. Renaming
   release branches ended up being very confusing for users and I had a
   ton of conversations with users about how releases were related.
  
   In this situation, I feel the situation is similar, we'll release 1.0
   which is really just the second maintainence release of the 0.14
   branch. Thus it's 1.0 but really it's just 0.14 + some fixes. I feel
   this will again be confusing for users. For this important change, I
   think we should use a new release vehicle.
  
   Thus, I'd suggest we do the rename in trunk, soon, and then the next
   release of Hive will be 1.0.
  
   Cheers,
   Brock
  
   On Tue, Dec 2, 2014 at 10:07 AM, Thejas Nair the...@hortonworks.com
   wrote:
Apache Hive is the de facto SQL query engine in the hadoop ecosystem.
I believe it is also the most widely used one as well. Hive is used
 in
production in large number of enterprises.
However, this 0.x.y versioning that we have been using for Hive
obscures this status of Hive.
   
I propose creating a 1.0 release out of the 0.14 branch of Hive. We
already have some bug fixes for 0.14 release that have been added to
the branch and a maintenance release is due. Having it out of this
maintenance branch would create a better first 1.0 version, and we
would be able to do it soon. What would have been 0.15 version would
then become 1.1 version .
   
Thoughts ?
   
Thanks,
Thejas
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from

Re: Apache Hive 1.0 ?

2014-12-04 Thread Carl Steinbach

I'd like to see HiveCLI, HiveServer, and the original JDBC driver
deprecated and purged from the codebase before the 1.0 release. This topic
probably needs its own thread, but I thought I should mention it here.

Thanks.

- Carl

On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote:

Hi,

I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I
think both HBase and Hive are past due for doing 1.0 releases. So I am a
major +1 for Hive-1.0 (non-binding of course).

Enis

On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

Would everyone just laugh if I suggested that a 1.0 release ought to
include complete documentation?

-- Lefty

On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com
wrote:

Looks like we have agreement that 1.0 versioning scheme is a great
thing for hive. I don't think there is a strong reason to delay a 1.0
release by several months to the detriment of hive.

In one word, major release is NOT equal to major confusion.

--Xuefu

On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin
ser...@hortonworks.com

wrote:

On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com
wrote:

Hi Thejas,

Thank you very much for your proposal!

Hadoop did something similar renaming branches to branch-1 and
branch-2. At the time, although I was

Review Request 28632: Turn CBO on

2014-12-04 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28632/
---

Review request for hive and Sergey Shelukhin.


Bugs: HIVE-8395
https://issues.apache.org/jira/browse/HIVE-8395


Repository: hive-git


Description
---

Turn CBO on


Diffs
-

  accumulo-handler/src/test/results/positive/accumulo_predicate_pushdown.q.out 
309f2f7 
  accumulo-handler/src/test/results/positive/accumulo_queries.q.out 8d7f19c 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2e2bf5a 
  contrib/src/test/results/clientpositive/dboutput.q.out 554ca02 
  contrib/src/test/results/clientpositive/udaf_example_avg.q.out d300b0f 
  contrib/src/test/results/clientpositive/udaf_example_group_concat.q.out 
762461b 
  contrib/src/test/results/clientpositive/udaf_example_max.q.out 82aeca7 
  contrib/src/test/results/clientpositive/udaf_example_max_n.q.out db95fcb 
  contrib/src/test/results/clientpositive/udaf_example_min.q.out b62ff39 
  contrib/src/test/results/clientpositive/udaf_example_min_n.q.out 1344186 
  hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 4e4364e 
  hbase-handler/src/test/results/positive/hbase_queries.q.out 0b4ed37 
  hbase-handler/src/test/results/positive/hbase_timestamp.q.out f70d371 
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
23a1b97 
  ql/src/test/queries/clientnegative/join_nonexistent_part.q b4a4757 
  ql/src/test/queries/clientpositive/ambiguous_col.q 5ccd2c8 
  ql/src/test/queries/clientpositive/annotate_stats_groupby2.q 6e65577 
  ql/src/test/queries/clientpositive/constantPropagateForSubQuery.q 149a290 
  ql/src/test/queries/clientpositive/filter_join_breaktask2.q 7f4258f 
  ql/src/test/queries/clientpositive/join_vc.q bbf3e85 
  ql/src/test/queries/clientpositive/mrr.q 9f068cc 
  ql/src/test/queries/clientpositive/optimize_nullscan.q f3b896b 
  ql/src/test/queries/clientpositive/ppd_gby_join.q 82f358b 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q fbfcbe1 
  ql/src/test/queries/clientpositive/subquery_exists_explain_rewrite.q 60dfdaf 
  ql/src/test/queries/clientpositive/subquery_in_explain_rewrite.q 1d1639d 
  ql/src/test/results/clientnegative/join_nonexistent_part.q.out a924895 
  ql/src/test/results/clientnegative/ptf_negative_InvalidValueBoundary.q.out 
6ad9905 
  ql/src/test/results/clientpositive/allcolref_in_udf.q.out 969f64b 
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out f71fa05 
  ql/src/test/results/clientpositive/ambiguous_col.q.out d583162 
  ql/src/test/results/clientpositive/annotate_stats_filter.q.out 70df189 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 2640ff7 
  ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out 2f85c92 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out ee46003 
  ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out 70c9e1d 
  ql/src/test/results/clientpositive/annotate_stats_limit.q.out b61a597 
  ql/src/test/results/clientpositive/annotate_stats_part.q.out fb3c17b 
  ql/src/test/results/clientpositive/annotate_stats_select.q.out 8fbb208 
  ql/src/test/results/clientpositive/annotate_stats_table.q.out a74d85c 
  ql/src/test/results/clientpositive/annotate_stats_union.q.out 919015a 
  ql/src/test/results/clientpositive/ansi_sql_arithmetic.q.out 4917ac0 
  ql/src/test/results/clientpositive/authorization_explain.q.out 3d97227 
  ql/src/test/results/clientpositive/auto_join1.q.out 8096a94 
  ql/src/test/results/clientpositive/auto_join10.q.out 7fb3070 
  ql/src/test/results/clientpositive/auto_join11.q.out 98c8285 
  ql/src/test/results/clientpositive/auto_join12.q.out f116e23 
  ql/src/test/results/clientpositive/auto_join13.q.out 3396a0c 
  ql/src/test/results/clientpositive/auto_join14.q.out 55c9b5d 
  ql/src/test/results/clientpositive/auto_join16.q.out bd5b378 
  ql/src/test/results/clientpositive/auto_join17.q.out 0fa7aa9 
  ql/src/test/results/clientpositive/auto_join18.q.out 2303f18 
  ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out ee5a32c 
  ql/src/test/results/clientpositive/auto_join19.q.out 4c2e26e 
  ql/src/test/results/clientpositive/auto_join2.q.out 11d57e9 
  ql/src/test/results/clientpositive/auto_join22.q.out c4a0084 
  ql/src/test/results/clientpositive/auto_join25.q.out 08cbe42 
  ql/src/test/results/clientpositive/auto_join26.q.out a40615f 
  ql/src/test/results/clientpositive/auto_join27.q.out db348b7 
  ql/src/test/results/clientpositive/auto_join3.q.out 0bfb27a 
  ql/src/test/results/clientpositive/auto_join33.q.out e5a7c52 
  ql/src/test/results/clientpositive/auto_join4.q.out 6492a64 
  ql/src/test/results/clientpositive/auto_join5.q.out 1073302 
  ql/src/test/results/clientpositive/auto_join6.q.out 88b7770 
  ql/src/test/results/clientpositive/auto_join7.q.out 5de5640

Hive-0.14 - Build # 760 - Failure

2014-12-04 Thread Apache Jenkins Server

Changes for Build #760



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #760)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-0.14/760/ to view 
the results.

Review Request 28632: Turn CBO on

2014-12-04 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28632/
---

Review request for hive and Sergey Shelukhin.


Bugs: HIVE-8395
https://issues.apache.org/jira/browse/HIVE-8395


Repository: hive-git


Description
---

Turn CBO on


Diffs
-

  accumulo-handler/src/test/results/positive/accumulo_predicate_pushdown.q.out 
309f2f7 
  accumulo-handler/src/test/results/positive/accumulo_queries.q.out 8d7f19c 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2e2bf5a 
  contrib/src/test/results/clientpositive/dboutput.q.out 554ca02 
  contrib/src/test/results/clientpositive/udaf_example_avg.q.out d300b0f 
  contrib/src/test/results/clientpositive/udaf_example_group_concat.q.out 
762461b 
  contrib/src/test/results/clientpositive/udaf_example_max.q.out 82aeca7 
  contrib/src/test/results/clientpositive/udaf_example_max_n.q.out db95fcb 
  contrib/src/test/results/clientpositive/udaf_example_min.q.out b62ff39 
  contrib/src/test/results/clientpositive/udaf_example_min_n.q.out 1344186 
  hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out 4e4364e 
  hbase-handler/src/test/results/positive/hbase_queries.q.out 0b4ed37 
  hbase-handler/src/test/results/positive/hbase_timestamp.q.out f70d371 
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
23a1b97 
  ql/src/test/queries/clientnegative/join_nonexistent_part.q b4a4757 
  ql/src/test/queries/clientpositive/ambiguous_col.q 5ccd2c8 
  ql/src/test/queries/clientpositive/annotate_stats_groupby2.q 6e65577 
  ql/src/test/queries/clientpositive/constantPropagateForSubQuery.q 149a290 
  ql/src/test/queries/clientpositive/filter_join_breaktask2.q 7f4258f 
  ql/src/test/queries/clientpositive/join_vc.q bbf3e85 
  ql/src/test/queries/clientpositive/mrr.q 9f068cc 
  ql/src/test/queries/clientpositive/optimize_nullscan.q f3b896b 
  ql/src/test/queries/clientpositive/ppd_gby_join.q 82f358b 
  ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q fbfcbe1 
  ql/src/test/queries/clientpositive/subquery_exists_explain_rewrite.q 60dfdaf 
  ql/src/test/queries/clientpositive/subquery_in_explain_rewrite.q 1d1639d 
  ql/src/test/results/clientnegative/join_nonexistent_part.q.out a924895 
  ql/src/test/results/clientnegative/ptf_negative_InvalidValueBoundary.q.out 
6ad9905 
  ql/src/test/results/clientpositive/allcolref_in_udf.q.out 969f64b 
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out f71fa05 
  ql/src/test/results/clientpositive/ambiguous_col.q.out d583162 
  ql/src/test/results/clientpositive/annotate_stats_filter.q.out 70df189 
  ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 2640ff7 
  ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out 2f85c92 
  ql/src/test/results/clientpositive/annotate_stats_join.q.out ee46003 
  ql/src/test/results/clientpositive/annotate_stats_join_pkfk.q.out 70c9e1d 
  ql/src/test/results/clientpositive/annotate_stats_limit.q.out b61a597 
  ql/src/test/results/clientpositive/annotate_stats_part.q.out fb3c17b 
  ql/src/test/results/clientpositive/annotate_stats_select.q.out 8fbb208 
  ql/src/test/results/clientpositive/annotate_stats_table.q.out a74d85c 
  ql/src/test/results/clientpositive/annotate_stats_union.q.out 919015a 
  ql/src/test/results/clientpositive/ansi_sql_arithmetic.q.out 4917ac0 
  ql/src/test/results/clientpositive/authorization_explain.q.out 3d97227 
  ql/src/test/results/clientpositive/auto_join1.q.out 8096a94 
  ql/src/test/results/clientpositive/auto_join10.q.out 7fb3070 
  ql/src/test/results/clientpositive/auto_join11.q.out 98c8285 
  ql/src/test/results/clientpositive/auto_join12.q.out f116e23 
  ql/src/test/results/clientpositive/auto_join13.q.out 3396a0c 
  ql/src/test/results/clientpositive/auto_join14.q.out 55c9b5d 
  ql/src/test/results/clientpositive/auto_join16.q.out bd5b378 
  ql/src/test/results/clientpositive/auto_join17.q.out 0fa7aa9 
  ql/src/test/results/clientpositive/auto_join18.q.out 2303f18 
  ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out ee5a32c 
  ql/src/test/results/clientpositive/auto_join19.q.out 4c2e26e 
  ql/src/test/results/clientpositive/auto_join2.q.out 11d57e9 
  ql/src/test/results/clientpositive/auto_join22.q.out c4a0084 
  ql/src/test/results/clientpositive/auto_join25.q.out 08cbe42 
  ql/src/test/results/clientpositive/auto_join26.q.out a40615f 
  ql/src/test/results/clientpositive/auto_join27.q.out db348b7 
  ql/src/test/results/clientpositive/auto_join3.q.out 0bfb27a 
  ql/src/test/results/clientpositive/auto_join33.q.out e5a7c52 
  ql/src/test/results/clientpositive/auto_join4.q.out 6492a64 
  ql/src/test/results/clientpositive/auto_join5.q.out 1073302 
  ql/src/test/results/clientpositive/auto_join6.q.out 88b7770 
  ql/src/test/results/clientpositive/auto_join7.q.out 5de5640

dev-ow...@hive.apache.org.

2014-12-04 Thread Mohan Krishna

Can Hive handles Unstructured data  o it handles only structured data?
Please confirm


Thanks
Mohan

Re: Apache Hive 1.0 ?

2014-12-04 Thread Thejas Nair

Enis,
What you said about backward compatibility makes sense. Since we are
planning to remove HiveServer1 support, it makes sense to do that in
1.0.
Ending Java 6 support is also something we have been discussing in the
mailing list. We can document Java 7 as minimum requirement for 1.0 .

On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote:
Hi,

I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I
think both HBase and Hive are past due for doing 1.0 releases. So I am a
major +1 for Hive-1.0 (non-binding of course).

The important thing for calling something 1.0 I think is the focus on user
level API and compatibility issues. But still, you should think about
future releases and for example when you can do a 1.x release versus 2.x
release. We have started thinking about that some time ago, and we are
adopting a semantic versioning proposal (
https://mail-archives.apache.org/mod_mbox/hbase-dev/201411.mbox/%3c53115341.900549.1416100552603.javamail.ya...@jws106116.mail.bf1.yahoo.com%3E)
for this exact same reason. In Hive, things may be a bit different than
HBase or Hadoop (since the major interface is SQL) but still I think you
should consider the implications for all the APIs that Hive surfaces and
for deployment, etc for a 1.0 discussion.

Enis

On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

Would everyone just laugh if I suggested that a 1.0 release ought to
include complete documentation?

-- Lefty

On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com
wrote:

Looks like we have agreement that 1.0 versioning scheme is a great
thing for hive. I don't think there is a strong reason to delay a 1.0
release by several months to the detriment of hive.

On Tue, Dec 2, 2014 at 8:05 PM, Xuefu Zhang xzh...@cloudera.com wrote:
Major release means more functionality, while minor releases provides
stability. Therefore, I'd think, 1.0, as a major release, should bring
in
something new to the user. If it's desirable to provide more stable
release, then 0.14.1, 0.14.2, and so on are the right ones. In my
opinion,
we should avoid doing anti-pattern by introducing major release like a
maintenance release and creating confusions among users.

In one word, major release is NOT equal to major confusion.

--Xuefu

On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin
ser...@hortonworks.com

wrote:

I think it's better to do 1.0 release off a maintenance release, since
that
is more stable. Trunk is moving fast.
HBase uses odd release numbers for this purpose, where 0.95, 97, 99
etc.
are dev releases and 0.96, 0.98, 1.0 etc. are public; that works well
for
baking, but since we don't have that seems like 14.0 would be a good
place
to bake. 15.0 with bunch of new bugs that we are busy introducing may
not
be as good for 1.0 IMHO...

On Tue, Dec 2, 2014 at 7:21 PM, Brock Noland br...@cloudera.com
wrote:

Hi Thejas,

Thank you very much for your proposal!

Hadoop did something similar renaming branches to branch-1 and
branch-2. At the time, although I was very much in favor of the

Re: SVN server hanging

2014-12-04 Thread Xuefu Zhang

FYI: I logged an INFRA JIRA:
https://issues.apache.org/jira/browse/INFRA-8782

Thanks,
Xuefu


On Wed, Dec 3, 2014 at 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 It seems Hive svn server is hanging. Does anyone have means to restart it?

 Thanks,
 Xuefu

Re: SVN server hanging

Note that this means all pre-commit builds will fail...

On Wed, Dec 3, 2014 at 10:13 AM, Brock Noland br...@cloudera.com wrote:


 https://blogs.apache.org/infra/entry/subversion_master_undergoing_emergency_maintenance
 On Dec 3, 2014 7:04 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 It seems Hive svn server is hanging. Does anyone have means to restart it?

 Thanks,
 Xuefu

[jira] [Commented] (HIVE-7292) Hive on Spark

[
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234264#comment-14234264
]

Xuefu Zhang commented on HIVE-7292:
---

[~libing], I assume you assigned this JIRA to yourself by mistake. However, let
me know if you plan to work on this. Thanks.

Hive on Spark
-

Key: HIVE-7292
URL: https://issues.apache.org/jira/browse/HIVE-7292
Project: Hive
Issue Type: Improvement
Components: Spark
Reporter: Xuefu Zhang
Assignee: Bing Li
Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
Attachments: Hive-on-Spark.pdf

Spark as an open-source data analytics cluster computing framework has gained
significant momentum recently. Many Hive users already have Spark installed
as their computing backbone. To take advantages of Hive, they still need to
have either MapReduce or Tez on their cluster. This initiative will provide
user a new alternative so that those user can consolidate their backend.
Secondly, providing such an alternative further increases Hive's adoption as
it exposes Spark users to a viable, feature-rich de facto standard SQL tools
on Hadoop.
Finally, allowing Hive to run on Spark also has performance benefits. Hive
queries, especially those involving multiple reducer stages, will run faster,
thus improving user experience as Tez does.
This is an umbrella JIRA which will cover many coming subtask. Design doc
will be attached here shortly, and will be on the wiki as well. Feedback from
the community is greatly appreciated!

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8991) Fix custom_input_output_format [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234275#comment-14234275
 ] 

Xuefu Zhang commented on HIVE-8991:
---

Hi [~lirui], many thanks for the new findings. I think the patch here is good 
to be checked in to fix the the test. However, if you and [~vanzin] find 
additional improvement needed for library loading mechanism, please create a 
new JIRA and linked with this one. Thanks.

 Fix custom_input_output_format [Spark Branch]
 -

 Key: HIVE-8991
 URL: https://issues.apache.org/jira/browse/HIVE-8991
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8991.1-spark.patch


 After HIVE-8836, {{custom_input_output_format}} fails because of missing 
 hive-it-util in remote driver's class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234280#comment-14234280
 ] 

Xuefu Zhang commented on HIVE-8783:
---

Patch looks good. I had a couple of minor comments on review board.

 Create some tests that use Spark counter for stats collection [Spark Branch]
 

 Key: HIVE-8783
 URL: https://issues.apache.org/jira/browse/HIVE-8783
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-8783.1-spark.patch


 Currently when .q tests are run with Spark, the default stats collection is 
 fs. We need to have some tests that use Spark counter for stats collection 
 to enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234285#comment-14234285
 ] 

Xuefu Zhang commented on HIVE-9016:
---

[~chengxiang li], Precommit test needs to get source from svn. Currently svn 
server is down. Thus, there is no precommit test until it's fixed.

 SparkCounter display name is not set correctly[Spark Branch]
 

 Key: HIVE-9016
 URL: https://issues.apache.org/jira/browse/HIVE-9016
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch


 SparkCounter displayName is set with SparkCounterGroup displayName, we should 
 not do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28699: HIVE-8783 Create some tests that use Spark counter for stats collection [Spark Branch]

2014-12-04 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28699/#review63852
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
https://reviews.apache.org/r/28699/#comment106158

Name this variable as partitions might be a little confusing. Same as 
below partition. Maybe we can call them as partitionSpecs and partitionSpec 
respectively.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java
https://reviews.apache.org/r/28699/#comment106159

list seems too general to be meaningful.


- Xuefu Zhang


On Dec. 4, 2014, 9:22 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28699/
 ---
 
 (Updated Dec. 4, 2014, 9:22 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8783
 https://issues.apache.org/jira/browse/HIVE-8783
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive already has stats_counter.q and stats_counter_partitioned.q for unit 
 test of table statistic collection on Counter. stats_counter.q has enabled 
 yet, I enable stats_counter_partitioned.q in this patch.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties 09c667e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 
   ql/src/test/results/clientpositive/spark/stats_counter_partitioned.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/28699/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234332#comment-14234332
 ] 

Xuefu Zhang commented on HIVE-9016:
---

[~chengxiang li], could you also remove 
ShimLoader.getHadoopShims().getCounterGroupName() and related methods, since 
they are not used any more?

 SparkCounter display name is not set correctly[Spark Branch]
 

 Key: HIVE-9016
 URL: https://issues.apache.org/jira/browse/HIVE-9016
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-9016.1-spark.patch, HIVE-9016.1-spark.patch


 SparkCounter displayName is set with SparkCounterGroup displayName, we should 
 not do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hive-0.14 - Build # 761 - Still Failing

2014-12-04 Thread Apache Jenkins Server

Changes for Build #760

Changes for Build #761



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #761)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/761/ to view 
the results.

Re: dev-ow...@hive.apache.org.

2014-12-04 Thread Alan Gates

Define unstructured.  Hive can handle data such Avro or JSON, which I 
would call self-structured.  I believe the SerDes for these types can 
even set the schema for the table or partition you are reading based on 
the data in the file.


Alan.


Mohan Krishna mailto:mohan.25fe...@gmail.com
December 3, 2014 at 17:01
Can Hive handles Unstructured data o it handles only structured data?
Please confirm


Thanks
Mohan



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: dev-ow...@hive.apache.org.

2014-12-04 Thread Mohan Krishna

Thanks alan for the answer/
So, can i conclude that Hive handles unstructured data?


On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates ga...@hortonworks.com wrote:

 Define unstructured.  Hive can handle data such Avro or JSON, which I
 would call self-structured.  I believe the SerDes for these types can even
 set the schema for the table or partition you are reading based on the data
 in the file.

 Alan.

   Mohan Krishna mohan.25fe...@gmail.com
  December 3, 2014 at 17:01
 Can Hive handles Unstructured data o it handles only structured data?
 Please confirm


 Thanks
 Mohan


 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)


 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8974:
--
Attachment: HIVE-8974.04.patch

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, 
 HIVE-8974.03.patch, HIVE-8974.04.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8774) CBO: enable groupBy index


 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Status: Open  (was: Patch Available)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.11.patch, HIVE-8774.12.patch, HIVE-8774.13.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8774) CBO: enable groupBy index

2014-12-04 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8774:
--
Status: Patch Available  (was: Open)

 CBO: enable groupBy index
 -

 Key: HIVE-8774
 URL: https://issues.apache.org/jira/browse/HIVE-8774
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-8774.1.patch, HIVE-8774.10.patch, 
 HIVE-8774.11.patch, HIVE-8774.12.patch, HIVE-8774.13.patch, 
 HIVE-8774.2.patch, HIVE-8774.3.patch, HIVE-8774.4.patch, HIVE-8774.5.patch, 
 HIVE-8774.6.patch, HIVE-8774.7.patch, HIVE-8774.8.patch, HIVE-8774.9.patch


 Right now, even when groupby index is build, CBO is not able to use it. In 
 this patch, we are trying to make it use groupby index that we build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

RE: dev-ow...@hive.apache.org.

2014-12-04 Thread Bill Busch

Mohan,

It will handle it, but it is probably (depending on your use case) not optimal. 
 Hive's sweat spot is structured data.  

Bill

Thank You,
Follow me on   @BigData73
-
Bill Busch | SSA | Enterprise Information Solutions CWP
m: 704.806.2485 |  NASDAQ: PRFT  |  Perficient.com




BI/DW | Advanced Analytics | Big Data |  ECI|  EPM | MDM

-Original Message-
From: Mohan Krishna [mailto:mohan.25fe...@gmail.com] 
Sent: Thursday, December 04, 2014 1:09 PM
To: dev@hive.apache.org
Subject: Re: dev-ow...@hive.apache.org.

Thanks alan for the answer/
So, can i conclude that Hive handles unstructured data?


On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates ga...@hortonworks.com wrote:

 Define unstructured.  Hive can handle data such Avro or JSON, which I 
 would call self-structured.  I believe the SerDes for these types can 
 even set the schema for the table or partition you are reading based 
 on the data in the file.

 Alan.

   Mohan Krishna mohan.25fe...@gmail.com  December 3, 2014 at 17:01 
 Can Hive handles Unstructured data o it handles only structured data?
 Please confirm


 Thanks
 Mohan


 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or 
 entity to which it is addressed and may contain information that is 
 confidential, privileged and exempt from disclosure under applicable 
 law. If the reader of this message is not the intended recipient, you 
 are hereby notified that any printing, copying, dissemination, 
 distribution, disclosure or forwarding of this communication is 
 strictly prohibited. If you have received this communication in error, 
 please contact the sender immediately and delete it from your system. Thank 
 You.

[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234532#comment-14234532
 ] 

Nick Dimiduk commented on HIVE-8809:


Using activeByDefault causes issues -- if you specify some other unrelated 
profiles (thrift generation, for instance), you end up disabling your default 
profile. Better to use a property flag.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.1.patch, dep_itests_with_hadoop_2.txt, 
 dep_itests_without_hadoop_2.txt, dep_with_hadoop_2.txt, 
 dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: dev-ow...@hive.apache.org.

2014-12-04 Thread Mohan Krishna

Thankyou Bill
Now it is clear for me, Thanks



On Fri, Dec 5, 2014 at 12:54 AM, Bill Busch bill.bu...@perficient.com
wrote:

 Mohan,

 It will handle it, but it is probably (depending on your use case) not
 optimal.  Hive's sweat spot is structured data.

 Bill

 Thank You,
 Follow me on   @BigData73

 -
 Bill Busch | SSA | Enterprise Information Solutions CWP
 m: 704.806.2485 |  NASDAQ: PRFT  |  Perficient.com




 BI/DW | Advanced Analytics | Big Data |  ECI|  EPM | MDM

 -Original Message-
 From: Mohan Krishna [mailto:mohan.25fe...@gmail.com]
 Sent: Thursday, December 04, 2014 1:09 PM
 To: dev@hive.apache.org
 Subject: Re: dev-ow...@hive.apache.org.

 Thanks alan for the answer/
 So, can i conclude that Hive handles unstructured data?


 On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates ga...@hortonworks.com wrote:

  Define unstructured.  Hive can handle data such Avro or JSON, which I
  would call self-structured.  I believe the SerDes for these types can
  even set the schema for the table or partition you are reading based
  on the data in the file.
 
  Alan.
 
Mohan Krishna mohan.25fe...@gmail.com  December 3, 2014 at 17:01
  Can Hive handles Unstructured data o it handles only structured data?
  Please confirm
 
 
  Thanks
  Mohan
 
 
  --
  Sent with Postbox http://www.getpostbox.com
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity to which it is addressed and may contain information that is
  confidential, privileged and exempt from disclosure under applicable
  law. If the reader of this message is not the intended recipient, you
  are hereby notified that any printing, copying, dissemination,
  distribution, disclosure or forwarding of this communication is
  strictly prohibited. If you have received this communication in error,
  please contact the sender immediately and delete it from your system.
 Thank You.

[jira] [Updated] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-8809:
---
Attachment: HIVE-8809.01.patch

Over on HBase, we have the property hadoop.profile and check it's value. See 
also http://java.dzone.com/articles/maven-profile-best-practices

Give this patch a spin. For hadoop1 build, add {{-Dhadoop.profile=1}}.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, 
 dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, 
 dep_with_hadoop_2.txt, dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2014-12-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234593#comment-14234593
 ] 

Sergey Shelukhin commented on HIVE-9013:


Can you add some message if it's restricted for single-property case? Otherwise 
+1

 Hive set command exposes metastore db password
 --

 Key: HIVE-9013
 URL: https://issues.apache.org/jira/browse/HIVE-9013
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Binglin Chang
 Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch


 When auth is enabled, we still need set command to set some variables(e.g. 
 mapreduce.job.queuename), but set command alone also list all 
 information(including vars in restrict list), this exposes like 
 javax.jdo.option.ConnectionPassword
 I think conf var in the restrict list should also excluded from dump vars 
 command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8870) errors when selecting a struct field within an array from ORC based tables

2014-12-04 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234642#comment-14234642
 ] 

Vikram Dixit K commented on HIVE-8870:
--

+1 for 0.14

 errors when selecting a struct field within an array from ORC based tables
 --

 Key: HIVE-8870
 URL: https://issues.apache.org/jira/browse/HIVE-8870
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0
 Environment: HDP 2.1 / HDP 2.2 (YARN, but no Tez)
Reporter: Michael Haeusler
Assignee: Sergio Peña
 Attachments: HIVE-8870.3.patch


 When using ORC as storage for a table, we get errors on selecting a struct 
 field within an array. These errors do not appear with default format.
 {code:sql}
 CREATE  TABLE `foobar_orc`(
   `uid` bigint,
   `elements` arraystructelementid:bigint,foo:structbar:string)
 STORED AS ORC;
 {code}
 When selecting from this _empty_ table, we get a direct NPE within the Hive 
 CLI:
 {code:sql}
 SELECT
   elements.elementId
 FROM
   foobar_orc;
 -- FAILED: RuntimeException java.lang.NullPointerException
 {code}
 A more real-world query produces a RuntimeException / NullPointerException in 
 the mapper:
 {code:sql}
 SELECT
   uid,
   element.elementId
 FROM
   foobar_orc
 LATERAL VIEW
   EXPLODE(elements) e AS element;
 Error: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 [...]
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
 [...]
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 {code}
 Both queries run fine on a non-orc table:
 {code:sql}
 CREATE  TABLE `foobar`(
   `uid` bigint,
   `elements` arraystructelementid:bigint,foo:structbar:string);  
 SELECT
   elements.elementId
 FROM
   foobar;
 -- OK
 -- Time taken: 0.225 seconds
 SELECT
   uid,
   element.elementId
 FROM
   foobar
 LATERAL VIEW
   EXPLODE(elements) e AS element;
 -- Total MapReduce CPU Time Spent: 1 seconds 920 msec
 -- OK
 -- Time taken: 25.905 seconds
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8809) Activate maven profile hadoop-2 by default

2014-12-04 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234694#comment-14234694
 ] 

Brock Noland commented on HIVE-8809:


I had some pretty ugly issues with the system property approach (in HBase's 
pom) when we used ivy and ant. However, now that we are on maven, perhaps it 
won't be an issue.

 Activate maven profile hadoop-2 by default
 --

 Key: HIVE-8809
 URL: https://issues.apache.org/jira/browse/HIVE-8809
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Minor
 Attachments: HIVE-8809.01.patch, HIVE-8809.1.patch, 
 dep_itests_with_hadoop_2.txt, dep_itests_without_hadoop_2.txt, 
 dep_with_hadoop_2.txt, dep_without_hadoop_2.txt


 For every maven command profile needs to be specified explicitly. It will be 
 better to activate hadoop-2 profile by default as HIVE QA uses hadoop-2 
 profile. With this change both the following commands will be equivalent
 {code}
 mvn clean install -DskipTests
 mvn clean install -DskipTests -Phadoop-2
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy


 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Attachment: (was: HIVE-9001.1.patch)

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan

 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9001) Ship with log4j.properties file that has a reliable time based rolling policy

2014-12-04 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9001:

Attachment: HIVE-9001.1.patch

[~sushanth] Made the space adjustments and overwrote the previous patch since 
this is a very minor change.

Thanks
Hari

 Ship with log4j.properties file that has a reliable time based rolling policy
 -

 Key: HIVE-9001
 URL: https://issues.apache.org/jira/browse/HIVE-9001
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9001.1.patch


 The hive log gets locked by the hive process and cannot be rolled in windows 
 OS.
 Install Hive in  Windows, start hive, try and rename hive log while Hive is 
 running. 
 Wait for log4j tries to rename it and it will throw the same error as it is 
 locked by the process.
 The changes in https://issues.apache.org/bugzilla/show_bug.cgi?id=29726 
 should be integrated to Hive for a reliable rollover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]

2014-12-04 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234767#comment-14234767
 ] 

Szehon Ho commented on HIVE-9007:
-

I'll leave this JIRA for now.

One observation to note here is that it is revealed in ppd_join4.q test, if you 
add set hive.auto.convert.join=true for the test.

The plan has too many HashTableSinks.

{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Spark
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test_tbl
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
  Filter Operator
predicate: ((id is not null and (name = 'c')) and (id = 
'a')) (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
Column stats: NONE
Select Operator
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
Column stats: NONE
  Spark HashTable Sink Operator
condition expressions:
  0 
  1 
keys:
  0 'a' (type: string)
  1 'a' (type: string)
Local Work:
  Map Reduce Local Work
Map 2 
Map Operator Tree:
TableScan
  alias: t3
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
  Filter Operator
predicate: (id = 'a') (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
Column stats: NONE
Spark HashTable Sink Operator
  condition expressions:
0 
1 
  keys:
0 'a' (type: string)
1 'a' (type: string)
Local Work:
  Map Reduce Local Work

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{noformat}

It could be related to this issue.  I'll come back to this JIRA at a later 
point, or others who are free can take it.

 Hive may generate wrong plan for map join queries due to 
 IdentityProjectRemover [Spark Branch]
 --

 Key: HIVE-9007
 URL: https://issues.apache.org/jira/browse/HIVE-9007
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho

 HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, 
 which may cause map join in spark branch to generate wrong plan.
 Currently, the map join conversion in spark branch first goes through a 
 method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, 
 removes RS associated with big table, and keep RSs for all small tables. 
 Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of 
 the mapjoin op with HTS (note it doesn't check whether the RS belongs to 
 small table or big table.)
 The issue arises, when IdentityProjectRemover comes into play, which may 
 result into a situation that a operator tree has two consecutive RSs. Imaging 
 the following example:
 {noformat}
   Join   MapJoin
   / \/   \
 RS   RS   --- RS RS
/  \   / \
   TS   RS   TS  TS (big table)
 \  (small table)
  TS
 {noformat}
 In this case, all parents of the mapjoin op will be RS, even the branch for 
 big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, 
 which is obviously incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-12-04 Thread Jihong Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong Liu updated HIVE-8966:
-
Status: Open  (was: Patch Available)

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8966.patch


 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-12-04 Thread Jihong Liu (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234769#comment-14234769
]

Jihong Liu commented on HIVE-8966:
--

I think we may have to withdraw this patch for now. It looks like currently
hive must not support doing compaction and loading in the same time for a
partition.
Without this patch, if loading for a partition is not completely finished,
compaction will always fail, so nothing happen. After apply this patch,
compaction will go through and finish. However we may loss data! I did a test.
Data could be lost if we do compaction meanwhile the loading is not finished
yet.
But if keep the current version, it must be a limitation for hive. If streaming
load to a partition for a long period, performance will be affected if cannot
do compaction on it.

For completely solve this issue, my initial thinking is that the delta files
with open transaction should not be compacted. Currently they must be inlcuded,
and it is probably the reason for data lost. But other closed delta files
should be able to compact. So we can do compaction and loading in the same time.

Delta files created by hive hcatalog streaming cannot be compacted
--

Key: HIVE-8966
URL: https://issues.apache.org/jira/browse/HIVE-8966
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 0.14.0
Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical
Fix For: 0.14.1

Attachments: HIVE-8966.patch

hive hcatalog streaming will also create a file like bucket_n_flush_length in
each delta directory. Where n is the bucket number. But the
compactor.CompactorMR think this file also needs to compact. However this
file of course cannot be compacted, so compactor.CompactorMR will not
continue to do the compaction.
Did a test, after removed the bucket_n_flush_length file, then the alter
table partition compact finished successfully. If don't delete that file,
nothing will be compacted.
This is probably a very severity bug. Both 0.13 and 0.14 have this issue

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)


 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8974:
--
Attachment: HIVE-8974.04.patch

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, 
 HIVE-8974.03.patch, HIVE-8974.04.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)


 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-8974:
--
Attachment: (was: HIVE-8974.04.patch)

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Affects Versions: 0.15.0
Reporter: Julian Hyde
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-8974.01.patch, HIVE-8974.02.patch, 
 HIVE-8974.03.patch, HIVE-8974.04.patch, HIVE-8974.patch


 CLEAR LIBRARY CACHE
 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8866) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8866:
---
Status: In Progress  (was: Patch Available)

 Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when 
 partitions are not of same #of columns
 

 Key: HIVE-8866
 URL: https://issues.apache.org/jira/browse/HIVE-8866
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8866.01.patch, HIVE-8866.02.patch


 Vectorization assumes partitions are of same number of columns, and takes 
 upon # of columns on first read. consequent addPartitionColsToBatch throws 
 ArrayIndexOutOfboundsException if the # columns is bigger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Apache Hive 1.0 ?

2014-12-04 Thread Thejas Nair

HiveServer, and the original JDBC driver have already been purged in
trunk. The HiveServer1 docs have been asking users to use HiveServer2
for a long time.

The case with Hive CLI is different. We never marked that as
deprecated or asked users to use beeline instead. Beeline had been
lacking in some features until recently. We just added some
capabilities to beeline such has progress/log information support. We
need to discuss deprecating that, deprecate it and wait for some time
(at least a year or so considering how widely it is used), before we
can remove it. I think that is more like a candidate for a 2.0 .

Thanks,
Thejas

On Wed, Dec 3, 2014 at 3:43 PM, Carl Steinbach cwsteinb...@gmail.com wrote:
I'd like to see HiveCLI, HiveServer, and the original JDBC driver
deprecated and purged from the codebase before the 1.0 release. This topic
probably needs its own thread, but I thought I should mention it here.

Thanks.

- Carl

On Wed, Dec 3, 2014 at 2:27 PM, Enis Söztutar e...@apache.org wrote:

Hi,

I am the RM for HBase-1.0 coming in a a couple of weeks (hopefully). I
think both HBase and Hive are past due for doing 1.0 releases. So I am a
major +1 for Hive-1.0 (non-binding of course).

Enis

On Tue, Dec 2, 2014 at 11:07 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

Would everyone just laugh if I suggested that a 1.0 release ought to
include complete documentation?

-- Lefty

On Tue, Dec 2, 2014 at 9:32 PM, Thejas Nair the...@hortonworks.com
wrote:

Looks like we have agreement that 1.0 versioning scheme is a great
thing for hive. I don't think there is a strong reason to delay a 1.0
release by several months to the detriment of hive.

In one word, major release is NOT equal to major confusion.

--Xuefu

On Tue, Dec 2, 2014 at 7:29 PM, Sergey Shelukhin
ser...@hortonworks.com

wrote:

I think it's better to

[jira] [Updated] (HIVE-8866) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8866:
---
Attachment: HIVE-8866.03.patch

 Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when 
 partitions are not of same #of columns
 

 Key: HIVE-8866
 URL: https://issues.apache.org/jira/browse/HIVE-8866
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8866.01.patch, HIVE-8866.02.patch, 
 HIVE-8866.03.patch


 Vectorization assumes partitions are of same number of columns, and takes 
 upon # of columns on first read. consequent addPartitionColsToBatch throws 
 ArrayIndexOutOfboundsException if the # columns is bigger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8866) Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when partitions are not of same #of columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8866:
---
Status: Patch Available  (was: In Progress)

 Vectorization on partitioned table throws ArrayIndexOutOfBoundsException when 
 partitions are not of same #of columns
 

 Key: HIVE-8866
 URL: https://issues.apache.org/jira/browse/HIVE-8866
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8866.01.patch, HIVE-8866.02.patch, 
 HIVE-8866.03.patch


 Vectorization assumes partitions are of same number of columns, and takes 
 upon # of columns on first read. consequent addPartitionColsToBatch throws 
 ArrayIndexOutOfboundsException if the # columns is bigger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8638) Implement bucket map join optimization [Spark Branch]

2014-12-04 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8638:
--
Attachment: HIVE-8638.3-spark.patch

Attached v3 that works only when bucket number matches.

 Implement bucket map join optimization [Spark Branch]
 -

 Key: HIVE-8638
 URL: https://issues.apache.org/jira/browse/HIVE-8638
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8638.1-spark.patch, HIVE-8638.2-spark.patch, 
 HIVE-8638.3-spark.patch


 In the hive-on-mr implementation, bucket map join optimization has to depend 
 on the map join hint. While in the hive-on-tez implementation, a join can be 
 automatically converted to bucket map join if certain conditions are met such 
 as: 
 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON
 2. all join tables are buckets and each small table's bucket number can be 
 divided by big table's bucket number
 3. bucket columns == join columns
 In the hive-on-spark implementation, it is ideal to have the bucket map join 
 auto-convertion support. when all the required criteria are met, a join can 
 be automatically converted to a bucket map join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6998) Select query can only support maximum 128 distinct expressions


[ 
https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234794#comment-14234794
 ] 

Pengcheng Xiong commented on HIVE-6998:
---

run with 129 and also 200 distinct expressions, no problem

hive select count(distinct c0),count(distinct c1),count(distinct 
c2),count(distinct c3),count(distinct c4),count(distinct c5),count(distinct 
c6),count(distinct c7),count(distinct c8),count(distinct c9),count(distinct 
c10),count(distinct c11),count(distinct c12),count(distinct c13),count(distinct 
c14),count(distinct c15),count(distinct c16),count(distinct c17),count(distinct 
c18),count(distinct c19),count(distinct c20),count(distinct c21),count(distinct 
c22),count(distinct c23),count(distinct c24),count(distinct c25),count(distinct 
c26),count(distinct c27),count(distinct c28),count(distinct c29),count(distinct 
c30),count(distinct c31),count(distinct c32),count(distinct c33),count(distinct 
c34),count(distinct c35),count(distinct c36),count(distinct c37),count(distinct 
c38),count(distinct c39),count(distinct c40),count(distinct c41),count(distinct 
c42),count(distinct c43),count(distinct c44),count(distinct c45),count(distinct 
c46),count(distinct c47),count(distinct c48),count(distinct c49),count(distinct 
c50),count(distinct c51),count(distinct c52),count(distinct c53),count(distinct 
c54),count(distinct c55),count(distinct c56),count(distinct c57),count(distinct 
c58),count(distinct c59),count(distinct c60),count(distinct c61),count(distinct 
c62),count(distinct c63),count(distinct c64),count(distinct c65),count(distinct 
c66),count(distinct c67),count(distinct c68),count(distinct c69),count(distinct 
c70),count(distinct c71),count(distinct c72),count(distinct c73),count(distinct 
c74),count(distinct c75),count(distinct c76),count(distinct c77),count(distinct 
c78),count(distinct c79),count(distinct c80),count(distinct c81),count(distinct 
c82),count(distinct c83),count(distinct c84),count(distinct c85),count(distinct 
c86),count(distinct c87),count(distinct c88),count(distinct c89),count(distinct 
c90),count(distinct c91),count(distinct c92),count(distinct c93),count(distinct 
c94),count(distinct c95),count(distinct c96),count(distinct c97),count(distinct 
c98),count(distinct c99),count(distinct c100),count(distinct 
c101),count(distinct c102),count(distinct c103),count(distinct 
c104),count(distinct c105),count(distinct c106),count(distinct 
c107),count(distinct c108),count(distinct c109),count(distinct 
c110),count(distinct c111),count(distinct c112),count(distinct 
c113),count(distinct c114),count(distinct c115),count(distinct 
c116),count(distinct c117),count(distinct c118),count(distinct 
c119),count(distinct c120),count(distinct c121),count(distinct 
c122),count(distinct c123),count(distinct c124),count(distinct 
c125),count(distinct c126),count(distinct c127),count(distinct 
c128),count(distinct c129),count(distinct c130),count(distinct 
c131),count(distinct c132),count(distinct c133),count(distinct 
c134),count(distinct c135),count(distinct c136),count(distinct 
c137),count(distinct c138),count(distinct c139),count(distinct 
c140),count(distinct c141),count(distinct c142),count(distinct 
c143),count(distinct c144),count(distinct c145),count(distinct 
c146),count(distinct c147),count(distinct c148),count(distinct 
c149),count(distinct c150),count(distinct c151),count(distinct 
c152),count(distinct c153),count(distinct c154),count(distinct 
c155),count(distinct c156),count(distinct c157),count(distinct 
c158),count(distinct c159),count(distinct c160),count(distinct 
c161),count(distinct c162),count(distinct c163),count(distinct 
c164),count(distinct c165),count(distinct c166),count(distinct 
c167),count(distinct c168),count(distinct c169),count(distinct 
c170),count(distinct c171),count(distinct c172),count(distinct 
c173),count(distinct c174),count(distinct c175),count(distinct 
c176),count(distinct c177),count(distinct c178),count(distinct 
c179),count(distinct c180),count(distinct c181),count(distinct 
c182),count(distinct c183),count(distinct c184),count(distinct 
c185),count(distinct c186),count(distinct c187),count(distinct 
c188),count(distinct c189),count(distinct c190),count(distinct 
c191),count(distinct c192),count(distinct c193),count(distinct 
c194),count(distinct c195),count(distinct c196),count(distinct 
c197),count(distinct c198),count(distinct c199)from tbl_200columns;

OK
0   0   0   0   0   0   0   0   0   0   
0   0   0   0   0   0   0   0   0   0   
0   0   0   0   0   0   0   0   0   0   
0   0   0   0   0   0   0   0   0   0   
0   0   0   0   0   0   0   0   0   0   
0   0   0   0   0   0   0   0   0   0   
0   0

[jira] [Created] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set

2014-12-04 Thread Na Yang (JIRA)

Na Yang created HIVE-9024:
-

 Summary: NullPointerException when starting webhcat server if 
templeton.hive.properties is not set
 Key: HIVE-9024
 URL: https://issues.apache.org/jira/browse/HIVE-9024
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Na Yang
Assignee: Na Yang


If templeton.hive.properties is not set, when starting webhcat server, the 
following NullPointerException is thrown and webhcat server could not start:
{noformat}
Exception in thread main java.lang.NullPointerException
at 
org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318)
at 
org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194)
at org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175)
at 
org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155)
at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96)
at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80)
at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75)
at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 28727: HIVE-8638 Implement bucket map join optimization [Spark Branch]

2014-12-04 Thread Jimmy Xiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28727/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8638
https://issues.apache.org/jira/browse/HIVE-8638


Repository: hive-git


Description
---

Patch v3 that works when bucket number matches


Diffs
-

  itests/src/test/resources/testconfiguration.properties 09c667e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java 
cfc1501 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 2f9e55a 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 4054173 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkBucketJoinProcCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 8b78123 
  ql/src/test/queries/clientpositive/bucket_map_join_spark1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/bucket_map_join_spark2.q PRE-CREATION 
  ql/src/test/results/clientpositive/bucket_map_join_spark1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/bucket_map_join_spark2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket_map_join_spark1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket_map_join_spark2.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28727/diff/


Testing
---


Thanks,

Jimmy Xiang

[jira] [Resolved] (HIVE-6998) Select query can only support maximum 128 distinct expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-6998.
---
Resolution: Fixed

the reported issue can not be reproduced.

 Select query can only support maximum 128 distinct expressions
 --

 Key: HIVE-6998
 URL: https://issues.apache.org/jira/browse/HIVE-6998
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Serializers/Deserializers
Affects Versions: 0.14.0
Reporter: Chaoyu Tang

 Select query can only support maximum 128 distinct expressions. Otherwise, 
 you will be thrown ArrayIndexOutOfBoundsException. For a query like:
 select count(distinct c1),  count(distinct c2),  count(distinct c3),  
 count(distinct c4),  count(distinct c5),  count(distinct c6), , 
 count(distinct c128),  count(distinct c129) from tbl_129columns;
 you will get error like:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Hive Runtime Error while 
 closing operators
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 10 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138)
 ... 15 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082)
 ... 16 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -128
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:838)
 at 
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:600)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:401)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:320)
 ... 19 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8638) Implement bucket map join optimization [Spark Branch]

2014-12-04 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234797#comment-14234797
 ] 

Jimmy Xiang commented on HIVE-8638:
---

Patch v3 is on RB: https://reviews.apache.org/r/28727/

 Implement bucket map join optimization [Spark Branch]
 -

 Key: HIVE-8638
 URL: https://issues.apache.org/jira/browse/HIVE-8638
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8638.1-spark.patch, HIVE-8638.2-spark.patch, 
 HIVE-8638.3-spark.patch


 In the hive-on-mr implementation, bucket map join optimization has to depend 
 on the map join hint. While in the hive-on-tez implementation, a join can be 
 automatically converted to bucket map join if certain conditions are met such 
 as: 
 1. the optimization flag hive.convert.join.bucket.mapjoin.tez is ON
 2. all join tables are buckets and each small table's bucket number can be 
 divided by big table's bucket number
 3. bucket columns == join columns
 In the hive-on-spark implementation, it is ideal to have the bucket map join 
 auto-convertion support. when all the required criteria are met, a join can 
 be automatically converted to a bucket map join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6998) Select query can only support maximum 128 distinct expressions


[ 
https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234798#comment-14234798
 ] 

Pengcheng Xiong commented on HIVE-6998:
---

Tried version: hive 0.15, commit 8d0d1de18b2439b88

 Select query can only support maximum 128 distinct expressions
 --

 Key: HIVE-6998
 URL: https://issues.apache.org/jira/browse/HIVE-6998
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Serializers/Deserializers
Affects Versions: 0.14.0
Reporter: Chaoyu Tang

 Select query can only support maximum 128 distinct expressions. Otherwise, 
 you will be thrown ArrayIndexOutOfBoundsException. For a query like:
 select count(distinct c1),  count(distinct c2),  count(distinct c3),  
 count(distinct c4),  count(distinct c5),  count(distinct c6), , 
 count(distinct c128),  count(distinct c129) from tbl_129columns;
 you will get error like:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Hive Runtime Error while 
 closing operators
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 10 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138)
 ... 15 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082)
 ... 16 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -128
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:838)
 at 
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:600)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:401)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:320)
 ... 19 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup


 [ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8886:
---
Status: In Progress  (was: Patch Available)

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup


 [ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8886:
---
Attachment: HIVE-8886.03.patch

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch, 
 HIVE-8886.03.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup


 [ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8886:
---
Status: Patch Available  (was: In Progress)

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch, 
 HIVE-8886.03.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set

2014-12-04 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-9024:
--
Status: Patch Available  (was: Open)

 NullPointerException when starting webhcat server if 
 templeton.hive.properties is not set
 -

 Key: HIVE-9024
 URL: https://issues.apache.org/jira/browse/HIVE-9024
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-9024.patch


 If templeton.hive.properties is not set, when starting webhcat server, the 
 following NullPointerException is thrown and webhcat server could not start:
 {noformat}
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155)
 at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96)
 at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80)
 at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75)
 at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-6998) Select query can only support maximum 128 distinct expressions


 [ 
https://issues.apache.org/jira/browse/HIVE-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reopened HIVE-6998:
---

Sorry, the problem still remains in the run time

 Select query can only support maximum 128 distinct expressions
 --

 Key: HIVE-6998
 URL: https://issues.apache.org/jira/browse/HIVE-6998
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Serializers/Deserializers
Affects Versions: 0.14.0
Reporter: Chaoyu Tang

 Select query can only support maximum 128 distinct expressions. Otherwise, 
 you will be thrown ArrayIndexOutOfBoundsException. For a query like:
 select count(distinct c1),  count(distinct c2),  count(distinct c3),  
 count(distinct c4),  count(distinct c5),  count(distinct c6), , 
 count(distinct c128),  count(distinct c129) from tbl_129columns;
 you will get error like:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Hive Runtime Error while 
 closing operators
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1141)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 10 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1099)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1138)
 ... 15 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ArrayIndexOutOfBoundsException: -128
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1064)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1082)
 ... 16 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -128
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:838)
 at 
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.serialize(BinarySortableSerDe.java:600)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.toHiveKey(ReduceSinkOperator.java:401)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:320)
 ... 19 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8886) Some Vectorized String CONCAT expressions result in runtime error Vectorization: Unsuported vector output type: StringGroup

2014-12-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234842#comment-14234842
 ] 

Jason Dere commented on HIVE-8886:
--

I think it looks ok, let's see how the tests go. Might have to resubmit patch 
once SVN is back up to get the tests to run.

 Some Vectorized String CONCAT expressions result in runtime error 
 Vectorization: Unsuported vector output type: StringGroup
 ---

 Key: HIVE-8886
 URL: https://issues.apache.org/jira/browse/HIVE-8886
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.1

 Attachments: HIVE-8886.01.patch, HIVE-8886.02.patch, 
 HIVE-8886.03.patch


 {noformat}
 SELECT CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 1 AS 
 INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING)) AS `field`
 FROM vectortab2korc 
 GROUP BY CONCAT(CONCAT(CONCAT('Quarter ',CAST(CAST((MONTH(dt) - 1) / 3 + 
 1 AS INT) AS STRING)),'-'),CAST(YEAR(dt) AS STRING))
 LIMIT 50;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

Chao created HIVE-9025:
--

 Summary: join38.q (without map join) produces incorrect result 
when testing with multiple reducers
 Key: HIVE-9025
 URL: https://issues.apache.org/jira/browse/HIVE-9025
 Project: Hive
  Issue Type: Bug
Reporter: Chao


I have this query from a modified version of {{join38.q}}, which does NOT use 
map join:

{code}
FROM src a JOIN tmp b ON (a.key = b.col11)
SELECT a.value, b.col5, count(1) as count
where b.col11 = 111
group by a.value, b.col5;
{code}

If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set it 
to be a larger number (3 for instance), then result will be 

{noformat}
val_111 105 1
{noformat}

which is wrong.

I think the issue is that, for this case, ConstantPropagationProcFactory will 
overwrite the partition cols for the reduce sink desc, with an empty list. 
Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
length 0, it will use an random number as hashcode, for each separate row. As 
result, rows with same key will be distributed to different reducers, and hence 
leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9026) Re-enable remaining tests after HIVE-8970 [Spark Branch]

Chao created HIVE-9026:
--

 Summary: Re-enable remaining tests after HIVE-8970 [Spark Branch]
 Key: HIVE-9026
 URL: https://issues.apache.org/jira/browse/HIVE-9026
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Chao


In HIVE-8970, we disabled several tests which seem to be related to an bug in 
upstream. I filed HIVE-9025 to track it.

{noformat}
join38.q
join_literals.q
join_nullsafe.q
subquery_in.q
ppd_multi_insert.q
{noformat}

We need to re-enable these tests after HIVE-9025 is resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9026) Re-enable remaining tests after HIVE-8970 [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9026:
---
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 Re-enable remaining tests after HIVE-8970 [Spark Branch]
 

 Key: HIVE-9026
 URL: https://issues.apache.org/jira/browse/HIVE-9026
 Project: Hive
  Issue Type: Sub-task
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Chao

 In HIVE-8970, we disabled several tests which seem to be related to an bug in 
 upstream. I filed HIVE-9025 to track it.
 {noformat}
 join38.q
 join_literals.q
 join_nullsafe.q
 subquery_in.q
 ppd_multi_insert.q
 {noformat}
 We need to re-enable these tests after HIVE-9025 is resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8911) Enable mapjoin hints [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-8911:
--

Assignee: Chao

 Enable mapjoin hints [Spark Branch]
 ---

 Key: HIVE-8911
 URL: https://issues.apache.org/jira/browse/HIVE-8911
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Chao

 Currently the big table selection in a mapjoin is based on stats.
 We should also enable the big-table selection based on hints.  See class 
 MapJoinProcessor.  This is a logical-optimizer class, so we should be able to 
 re-use this without too many changes to hook up with SparkMapJoinResolver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234870#comment-14234870
 ] 

Chao commented on HIVE-9007:


We can re-enable {{ppd_join4.q}} after this is resolved, although we can also 
enable it right now since it's uses reduce-side join.

 Hive may generate wrong plan for map join queries due to 
 IdentityProjectRemover [Spark Branch]
 --

 Key: HIVE-9007
 URL: https://issues.apache.org/jira/browse/HIVE-9007
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho

 HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, 
 which may cause map join in spark branch to generate wrong plan.
 Currently, the map join conversion in spark branch first goes through a 
 method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, 
 removes RS associated with big table, and keep RSs for all small tables. 
 Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of 
 the mapjoin op with HTS (note it doesn't check whether the RS belongs to 
 small table or big table.)
 The issue arises, when IdentityProjectRemover comes into play, which may 
 result into a situation that a operator tree has two consecutive RSs. Imaging 
 the following example:
 {noformat}
   Join   MapJoin
   / \/   \
 RS   RS   --- RS RS
/  \   / \
   TS   RS   TS  TS (big table)
 \  (small table)
  TS
 {noformat}
 In this case, all parents of the mapjoin op will be RS, even the branch for 
 big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, 
 which is obviously incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]

2014-12-04 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234874#comment-14234874
 ] 

Szehon Ho commented on HIVE-9007:
-

I'm ok with enable it anytime.  This JIRA will have to add another version of 
this test that uses mapjoin.

 Hive may generate wrong plan for map join queries due to 
 IdentityProjectRemover [Spark Branch]
 --

 Key: HIVE-9007
 URL: https://issues.apache.org/jira/browse/HIVE-9007
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho

 HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, 
 which may cause map join in spark branch to generate wrong plan.
 Currently, the map join conversion in spark branch first goes through a 
 method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, 
 removes RS associated with big table, and keep RSs for all small tables. 
 Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of 
 the mapjoin op with HTS (note it doesn't check whether the RS belongs to 
 small table or big table.)
 The issue arises, when IdentityProjectRemover comes into play, which may 
 result into a situation that a operator tree has two consecutive RSs. Imaging 
 the following example:
 {noformat}
   Join   MapJoin
   / \/   \
 RS   RS   --- RS RS
/  \   / \
   TS   RS   TS  TS (big table)
 \  (small table)
  TS
 {noformat}
 In this case, all parents of the mapjoin op will be RS, even the branch for 
 big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, 
 which is obviously incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9007) Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234875#comment-14234875
 ] 

Chao commented on HIVE-9007:


OK, I'll create another JIRA to enable it now.

 Hive may generate wrong plan for map join queries due to 
 IdentityProjectRemover [Spark Branch]
 --

 Key: HIVE-9007
 URL: https://issues.apache.org/jira/browse/HIVE-9007
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho

 HIVE-8435 introduces a new logical optimizer called IdentityProjectRemover, 
 which may cause map join in spark branch to generate wrong plan.
 Currently, the map join conversion in spark branch first goes through a 
 method {{convertJoinMapJoin}}, which replaces a join op with a mapjoin op, 
 removes RS associated with big table, and keep RSs for all small tables. 
 Afterwards, in {{SparkReduceSinkMapJoinProc}} it replaces all parent RSs of 
 the mapjoin op with HTS (note it doesn't check whether the RS belongs to 
 small table or big table.)
 The issue arises, when IdentityProjectRemover comes into play, which may 
 result into a situation that a operator tree has two consecutive RSs. Imaging 
 the following example:
 {noformat}
   Join   MapJoin
   / \/   \
 RS   RS   --- RS RS
/  \   / \
   TS   RS   TS  TS (big table)
 \  (small table)
  TS
 {noformat}
 In this case, all parents of the mapjoin op will be RS, even the branch for 
 big table! In {{SparkReduceSinkMapJoinProc}}, they will be replaced with HTS, 
 which is obviously incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8131) Support timestamp in Avro

2014-12-04 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-8131:
--

Assignee: Ferdinand Xu

 Support timestamp in Avro
 -

 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9027) Enable ppd_join4 [Spark Branch]

Chao created HIVE-9027:
--

 Summary: Enable ppd_join4 [Spark Branch]
 Key: HIVE-9027
 URL: https://issues.apache.org/jira/browse/HIVE-9027
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Priority: Trivial


We disabled {{ppd_join4}} in HIVE-8970, after seeing an issue when running it 
with map join. However, since this test uses reduce-side join, we should have 
no problem enabling it. The issue with map join is tracked by HIVE-9007, and we 
will create separate test for the map join case in that JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7817) distinct/group by don't work on partition columns


[ 
https://issues.apache.org/jira/browse/HIVE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234880#comment-14234880
 ] 

Pengcheng Xiong commented on HIVE-7817:
---

by the way, i do not think hive-3108 is ever solved.

 distinct/group by don't work on partition columns
 -

 Key: HIVE-7817
 URL: https://issues.apache.org/jira/browse/HIVE-7817
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Eugene Koifman

 suppose you have a table like this:
 {code:sql}
 CREATE TABLE page_view(
viewTime INT,
userid BIGINT,
 page_url STRING,
 referrer_url STRING,
 ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(dt STRING, country STRING)
 CLUSTERED BY(userid) INTO 4 BUCKETS
 {code}
 Then 
 {code:sql}
 select distinct dt from page_view;
 select distinct dt, country from page_view;
 select dt, country from page_view group by dt, country;
 {code}
 all fail with
 {noformat}
 Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapreduce.job.reduces=number
 Job running in-process (local Hadoop)
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers:  0
 2014-08-20 17:26:13,018 Stage-1 map = 0%,  reduce = 0%
 Ended Job = job_local165359429_0013 with errors
 Error during job, obtaining debugging information...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 MapReduce Jobs Launched: 
 Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 {noformat}
 but 
 {code:sql}
 select dt, country, count(*) from page_view group by dt, country;
 {code}
 works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28699: HIVE-8783 Create some tests that use Spark counter for stats collection [Spark Branch]

2014-12-04 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28699/
---

(Updated Dec. 5, 2014, 2:05 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8783
https://issues.apache.org/jira/browse/HIVE-8783


Repository: hive-git


Description
---

Hive already has stats_counter.q and stats_counter_partitioned.q for unit test 
of table statistic collection on Counter. stats_counter.q has enabled yet, I 
enable stats_counter_partitioned.q in this patch.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 09c667e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 30b7632 
  ql/src/test/results/clientpositive/spark/stats_counter_partitioned.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/28699/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Commented] (HIVE-2573) Create per-session function registry

2014-12-04 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234923#comment-14234923
 ] 

Jason Dere commented on HIVE-2573:
--

Is this one still being worked on? I think it's basically ready, with the 
exception of 2 questions/comments from the RB of patch v13:

- HiveParser.g has a error message that should be changed from drop function 
statement to reload function statement

- SessionConf.java: What about the idea of moving static call to 
resolveFunctions() to SessionState? I thought that would remove the need for 
SessionConf, because then Hive class would once again be usable during query 
runtime. Unless you think it's cleaner to use SessionConf to get HiveConf 
rather than the Hive object.

 Create per-session function registry 
 -

 Key: HIVE-2573
 URL: https://issues.apache.org/jira/browse/HIVE-2573
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, 
 HIVE-2573.1.patch.txt, HIVE-2573.10.patch.txt, HIVE-2573.11.patch.txt, 
 HIVE-2573.12.patch.txt, HIVE-2573.13.patch.txt, HIVE-2573.2.patch.txt, 
 HIVE-2573.3.patch.txt, HIVE-2573.4.patch.txt, HIVE-2573.5.patch, 
 HIVE-2573.6.patch, HIVE-2573.7.patch, HIVE-2573.8.patch.txt, 
 HIVE-2573.9.patch.txt


 Currently the function registry is shared resource and could be overrided by 
 other users when using HiveServer. If per-session function registry is 
 provided, this situation could be prevented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8783:

Attachment: HIVE-8783.2-spark.patch

 Create some tests that use Spark counter for stats collection [Spark Branch]
 

 Key: HIVE-8783
 URL: https://issues.apache.org/jira/browse/HIVE-8783
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Attachments: HIVE-8783.1-spark.patch, HIVE-8783.2-spark.patch


 Currently when .q tests are run with Spark, the default stats collection is 
 fs. We need to have some tests that use Spark counter for stats collection 
 to enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9016) SparkCounter display name is not set correctly[Spark Branch]