Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Sushanth Sowmyan
+1

On Jun 30, 2017 17:05, "Owen O'Malley"  wrote:

> On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
>
> > and maybe a different project name?
> >
>
> Yes, it certainly needs a new name. I'd like to suggest Riven.
>
> .. Owen
>


[jira] [Created] (HIVE-17009) ifnull() compatibility with explain or use of constants

2017-06-30 Thread Timothy Miron (JIRA)
Timothy Miron created HIVE-17009:


 Summary: ifnull() compatibility with explain or use of constants
 Key: HIVE-17009
 URL: https://issues.apache.org/jira/browse/HIVE-17009
 Project: Hive
  Issue Type: Bug
  Components: Hive, SQL
Affects Versions: 1.2.1
 Environment: Running hive queries from Toad Data Point 4.2 via ODBC 
connection.
Reporter: Timothy Miron


Error "Invalid function 'nullif'" thrown if nullif() used in conjunction with 
certain other commands, but equivalently behaving  CASE WHEN ... END block 
behaves fine.

Example:
(note the use of `dual` table)

# This throws an {color:#ff}"Invalid function 'nullif'" error{color}:
{code:sql}
select
 coalesce(nullif('a','a'),'b') result_value 
from
workspace_t886880.dual
where current_date > date_sub(current_date,1); 
{code}

As well as simply attaching 'EXPLAIN' like this:
{code:sql}
EXPLAIN select
 coalesce(nullif('a','a'),'b') result_value 
from
workspace_t886880.dual
{code}

# This similarily behaving CASE..WHEN block does {color:#14892c}not{color} 
throw an error:
{code:sql}
select
 coalesce(case when 'a' = 'a' then null else 'a' end,'b') result_value
from
workspace_t886880.dual
where current_date > date_sub(current_date,1);
{code}

Similarly, omitting any where clause returns functionality to normal, as does 
removing any _operations _from the where clause also {color:#14892c}allows 
{color}the query to execute:
{code:sql}
/* this works */
select
 coalesce(nullif('a','a'),'b') result_value 
from
workspace_t886880.dual;
/* no where clause! */

/* and this works too */
select
 coalesce(nullif('a','a'),'b') result_value 
from
workspace_t886880.dual
where DATE '2016-01-02' > DATE '2016-01-01'
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17008) HiveMetastore.drop_database can return NPE if database does not exist

2017-06-30 Thread Dan Burkert (JIRA)
Dan Burkert created HIVE-17008:
--

 Summary: HiveMetastore.drop_database can return NPE if database 
does not exist
 Key: HIVE-17008
 URL: https://issues.apache.org/jira/browse/HIVE-17008
 Project: Hive
  Issue Type: Bug
  Components: HBase Metastore
Reporter: Dan Burkert


When dropping a non-existent database, the HMS will still fire registered 
{{DROP_DATABASE}} event listeners.  This results in an NPE when the listeners 
attempt to deref the {{null}} database parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Owen O'Malley
On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:

> and maybe a different project name?
>

Yes, it certainly needs a new name. I'd like to suggest Riven.

.. Owen


[jira] [Created] (HIVE-17007) NPE introduced by HIVE-16871

2017-06-30 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-17007:
-

 Summary: NPE introduced by HIVE-16871
 Key: HIVE-17007
 URL: https://issues.apache.org/jira/browse/HIVE-17007
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Daniel Dai
Assignee: Daniel Dai


Stack:
{code}
2017-06-30T02:39:43,739 ERROR [HiveServer2-Background-Pool: Thread-2873]: 
metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(200)) - 
MetaException(message:java.lang.NullPointerException)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6066)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3993)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3944)
at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy32.alter_table_with_environment_context(Unknown 
Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:397)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:325)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy33.alter_table_with_environmentContext(Unknown 
Source)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2306)
at com.sun.proxy.$Proxy33.alter_table_with_environmentContext(Unknown 
Source)
at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:624)
at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3490)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:383)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1905)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1607)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1354)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:242)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:348)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.metastore.cache.SharedCache.getCachedTableColStats(SharedCache.java:140)
at 
org.apache.hadoop.hive.metastore.cache.CachedStore.getTableColumnStatistics(CachedStore.java:1409)
at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Jimmy Xiang
Yeah, this is good idea. +1

On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> HMS has become the shared catalog service for multiple projects outside
> Hive,
> so +1 on this move (and maybe a different project name?).
>
> On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley 
> wrote:
>
>> I'm +1 on separating out the metastore. It recognizes the reality that a
>> lot of different projects use the Hive Metastore and opening up the
>> community is a great move.
>>
>> ..Owen
>>
>> On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang  wrote:
>>
>> > +1, sounds like a good idea!
>> >
>> > On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:
>> >
>> > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
>> > > This is a great opportunity for building a Metastore to not only
>> address
>> > > schemas for the data at rest but also for the data in motion. We have a
>> > > SchemaRegistry (http://github.com/hortonworks/registry)  project that
>> > > allows users to register schemas for data in motion and integrates with
>> > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
>> > > us with opportunity to integrate our apis with Hive Metastore and
>> > > provide with one project that is truly a single metastore that can hold
>> > > all schemas.
>> > >
>> > > Thanks,
>> > > Harsha
>> > >
>> > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
>> > > > Great, thanks Alan for putting all this in the email.
>> > > > +1
>> > > >
>> > > > Allowing other components to continue to use the Metastore without
>> the
>> > > > need
>> > > > to use Hive dependencies is a big plus for them. I agree with
>> > everything
>> > > > you mention on the email.
>> > > >
>> > > > - Sergio
>> > > >
>> > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde 
>> wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > As a Calcite PMC member, I am very pleased to see this change.
>> > Calcite
>> > > > > reads metadata from a variety of sources (including JDBC databases,
>> > > NoSQL
>> > > > > databases such as Cassandra and Druid, and streaming systems), and
>> if
>> > > more
>> > > > > of those sources choose to store their metadata in the metastore it
>> > > will
>> > > > > make our lives easier.
>> > > > >
>> > > > > Hive’s metastore has established a position as the place to go for
>> > > > > metadata in the Hadoop ecosystem. Not all metadata is relational,
>> or
>> > > > > processed by Hive, so there are other parties using the metastore
>> who
>> > > > > justifiably would like to influence its direction. Opening up the
>> > > metastore
>> > > > > will help retain and extend this position.
>> > > > >
>> > > > > Julian
>> > > > >
>> > > > >
>> > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
>> > > > > >
>> > > > > >
>> > > > > > On 2017-06-30 07:56 (-0700), Alan Gates 
>> wrote: >
>> > > > > > > A few of us have been talking and come to the conclussion that
>> it
>> > > > > would be>
>> > > > > > > a good thing to split out the Hive metastore into its own
>> Apache
>> > > > > project.>
>> > > > > > > Below and in the linked wiki page we explain what we see as the
>> > > > > advantages>
>> > > > > > > to this and how we would go about it.>
>> > > > > > > >
>> > > > > > > Hive’s metastore has long been used by other projects in the
>> > > Hadoop>
>> > > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
>> > > Spark,>
>> > > > > > > Apache Drill, Presto, and other systems all use Hive’s
>> metastore.
>> > > > > Some,>
>> > > > > > > like Impala and Presto can use it as their own metadata system
>> > with
>> > > > > the>
>> > > > > > > rest of Hive not present.>
>> > > > > > > >
>> > > > > > > This sharing is excellent for the ecosystem.  Together with
>> HDFS
>> > it
>> > > > > allows>
>> > > > > > > users to use the tool of their choice while still accessing the
>> > > same
>> > > > > shared>
>> > > > > > > data.  But having this shared metadata inside the Hive project
>> > > limits
>> > > > > the>
>> > > > > > > ability of other projects to contribute to the metastore.  It
>> > also
>> > > > > makes it>
>> > > > > > > harder for new systems that have similar but not identical
>> > > metadata>
>> > > > > > > requirements (for example, stream processing systems on top of
>> > > Apache>
>> > > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
>> > systems
>> > > > > comes>
>> > > > > > > out in two ways.  One, it is hard for non-Hive community
>> members
>> > > to>
>> > > > > > > participate in the project.  Second, it adds operational cost
>> > since
>> > > > > users>
>> > > > > > > are forced to deploy all of the Hive jars just to get the
>> > > metastore to
>> > > > > work.>
>> > > > > > > >
>> > > > > > > Therefore we propose to split Hive’s metastore out into a
>> > separate
>> > > > > Apache>
>> > > > > > > project.  This new project will 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Chao Sun
HMS has become the shared catalog service for multiple projects outside
Hive,
so +1 on this move (and maybe a different project name?).

On Fri, Jun 30, 2017 at 2:10 PM, Owen O'Malley 
wrote:

> I'm +1 on separating out the metastore. It recognizes the reality that a
> lot of different projects use the Hive Metastore and opening up the
> community is a great move.
>
> ..Owen
>
> On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang  wrote:
>
> > +1, sounds like a good idea!
> >
> > On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:
> >
> > > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> > > This is a great opportunity for building a Metastore to not only
> address
> > > schemas for the data at rest but also for the data in motion. We have a
> > > SchemaRegistry (http://github.com/hortonworks/registry)  project that
> > > allows users to register schemas for data in motion and integrates with
> > > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> > > us with opportunity to integrate our apis with Hive Metastore and
> > > provide with one project that is truly a single metastore that can hold
> > > all schemas.
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > > > Great, thanks Alan for putting all this in the email.
> > > > +1
> > > >
> > > > Allowing other components to continue to use the Metastore without
> the
> > > > need
> > > > to use Hive dependencies is a big plus for them. I agree with
> > everything
> > > > you mention on the email.
> > > >
> > > > - Sergio
> > > >
> > > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde 
> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > As a Calcite PMC member, I am very pleased to see this change.
> > Calcite
> > > > > reads metadata from a variety of sources (including JDBC databases,
> > > NoSQL
> > > > > databases such as Cassandra and Druid, and streaming systems), and
> if
> > > more
> > > > > of those sources choose to store their metadata in the metastore it
> > > will
> > > > > make our lives easier.
> > > > >
> > > > > Hive’s metastore has established a position as the place to go for
> > > > > metadata in the Hadoop ecosystem. Not all metadata is relational,
> or
> > > > > processed by Hive, so there are other parties using the metastore
> who
> > > > > justifiably would like to influence its direction. Opening up the
> > > metastore
> > > > > will help retain and extend this position.
> > > > >
> > > > > Julian
> > > > >
> > > > >
> > > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > > > >
> > > > > >
> > > > > > On 2017-06-30 07:56 (-0700), Alan Gates 
> wrote: >
> > > > > > > A few of us have been talking and come to the conclussion that
> it
> > > > > would be>
> > > > > > > a good thing to split out the Hive metastore into its own
> Apache
> > > > > project.>
> > > > > > > Below and in the linked wiki page we explain what we see as the
> > > > > advantages>
> > > > > > > to this and how we would go about it.>
> > > > > > > >
> > > > > > > Hive’s metastore has long been used by other projects in the
> > > Hadoop>
> > > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> > > Spark,>
> > > > > > > Apache Drill, Presto, and other systems all use Hive’s
> metastore.
> > > > > Some,>
> > > > > > > like Impala and Presto can use it as their own metadata system
> > with
> > > > > the>
> > > > > > > rest of Hive not present.>
> > > > > > > >
> > > > > > > This sharing is excellent for the ecosystem.  Together with
> HDFS
> > it
> > > > > allows>
> > > > > > > users to use the tool of their choice while still accessing the
> > > same
> > > > > shared>
> > > > > > > data.  But having this shared metadata inside the Hive project
> > > limits
> > > > > the>
> > > > > > > ability of other projects to contribute to the metastore.  It
> > also
> > > > > makes it>
> > > > > > > harder for new systems that have similar but not identical
> > > metadata>
> > > > > > > requirements (for example, stream processing systems on top of
> > > Apache>
> > > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
> > systems
> > > > > comes>
> > > > > > > out in two ways.  One, it is hard for non-Hive community
> members
> > > to>
> > > > > > > participate in the project.  Second, it adds operational cost
> > since
> > > > > users>
> > > > > > > are forced to deploy all of the Hive jars just to get the
> > > metastore to
> > > > > work.>
> > > > > > > >
> > > > > > > Therefore we propose to split Hive’s metastore out into a
> > separate
> > > > > Apache>
> > > > > > > project.  This new project will continue to support the same
> > Thrift
> > > > > API as>
> > > > > > > the current metastore.  It will continue to focus on being a
> > high>
> > > > > > > performance, fault tolerant, large scale, operational metastore
> > for
> > > > > SQL>
> 

[jira] [Created] (HIVE-17006) LLAP: Parquet caching

2017-06-30 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-17006:
---

 Summary: LLAP: Parquet caching
 Key: HIVE-17006
 URL: https://issues.apache.org/jira/browse/HIVE-17006
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


There are multiple options to do Parquet caching in LLAP:
1) Full elevator (too intrusive for now).
2) Page based cache like ORC (requires some changes to Parquet or copy-pasted).
3) Cache disk data on column chunk level as is.

Given that Parquet reads at column chunk granularity, (2) is not as useful as 
for ORC, but still a good idea. I messaged the dev list about it but didn't get 
a response, we may follow up later.

For now, do (3). 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Owen O'Malley
I'm +1 on separating out the metastore. It recognizes the reality that a
lot of different projects use the Hive Metastore and opening up the
community is a great move.

..Owen

On Fri, Jun 30, 2017 at 1:30 PM, Xuefu Zhang  wrote:

> +1, sounds like a good idea!
>
> On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:
>
> > Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> > This is a great opportunity for building a Metastore to not only address
> > schemas for the data at rest but also for the data in motion. We have a
> > SchemaRegistry (http://github.com/hortonworks/registry)  project that
> > allows users to register schemas for data in motion and integrates with
> > Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> > us with opportunity to integrate our apis with Hive Metastore and
> > provide with one project that is truly a single metastore that can hold
> > all schemas.
> >
> > Thanks,
> > Harsha
> >
> > On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > > Great, thanks Alan for putting all this in the email.
> > > +1
> > >
> > > Allowing other components to continue to use the Metastore without the
> > > need
> > > to use Hive dependencies is a big plus for them. I agree with
> everything
> > > you mention on the email.
> > >
> > > - Sergio
> > >
> > > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> > >
> > > > +1
> > > >
> > > > As a Calcite PMC member, I am very pleased to see this change.
> Calcite
> > > > reads metadata from a variety of sources (including JDBC databases,
> > NoSQL
> > > > databases such as Cassandra and Druid, and streaming systems), and if
> > more
> > > > of those sources choose to store their metadata in the metastore it
> > will
> > > > make our lives easier.
> > > >
> > > > Hive’s metastore has established a position as the place to go for
> > > > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > > > processed by Hive, so there are other parties using the metastore who
> > > > justifiably would like to influence its direction. Opening up the
> > metastore
> > > > will help retain and extend this position.
> > > >
> > > > Julian
> > > >
> > > >
> > > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > > >
> > > > >
> > > > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > > > A few of us have been talking and come to the conclussion that it
> > > > would be>
> > > > > > a good thing to split out the Hive metastore into its own Apache
> > > > project.>
> > > > > > Below and in the linked wiki page we explain what we see as the
> > > > advantages>
> > > > > > to this and how we would go about it.>
> > > > > > >
> > > > > > Hive’s metastore has long been used by other projects in the
> > Hadoop>
> > > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> > Spark,>
> > > > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > > > Some,>
> > > > > > like Impala and Presto can use it as their own metadata system
> with
> > > > the>
> > > > > > rest of Hive not present.>
> > > > > > >
> > > > > > This sharing is excellent for the ecosystem.  Together with HDFS
> it
> > > > allows>
> > > > > > users to use the tool of their choice while still accessing the
> > same
> > > > shared>
> > > > > > data.  But having this shared metadata inside the Hive project
> > limits
> > > > the>
> > > > > > ability of other projects to contribute to the metastore.  It
> also
> > > > makes it>
> > > > > > harder for new systems that have similar but not identical
> > metadata>
> > > > > > requirements (for example, stream processing systems on top of
> > Apache>
> > > > > > Kafka) to use Hive’s metastore.  This difficulty for other
> systems
> > > > comes>
> > > > > > out in two ways.  One, it is hard for non-Hive community members
> > to>
> > > > > > participate in the project.  Second, it adds operational cost
> since
> > > > users>
> > > > > > are forced to deploy all of the Hive jars just to get the
> > metastore to
> > > > work.>
> > > > > > >
> > > > > > Therefore we propose to split Hive’s metastore out into a
> separate
> > > > Apache>
> > > > > > project.  This new project will continue to support the same
> Thrift
> > > > API as>
> > > > > > the current metastore.  It will continue to focus on being a
> high>
> > > > > > performance, fault tolerant, large scale, operational metastore
> for
> > > > SQL>
> > > > > > engines and other systems that want to store schema information
> > about
> > > > their>
> > > > > > data.>
> > > > > > >
> > > > > > By making it a separate project we will enable other projects to
> > join
> > > > us in>
> > > > > > innovating on the metastore.  It will simplify operations for
> > non-Hive>
> > > > > > users that want to use the metastore as they will no longer need
> to
> > > > install>
> > > > > > Hive just to get the metastore.  And it will 

[jira] [Created] (HIVE-17005) Ensure REPL DUMP and REPL LOAD are authorized properly

2017-06-30 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-17005:
---

 Summary: Ensure REPL DUMP and REPL LOAD are authorized properly
 Key: HIVE-17005
 URL: https://issues.apache.org/jira/browse/HIVE-17005
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, we piggyback REPL DUMP and REPL LOAD on EXPORT and IMPORT auth 
privileges. However, work is on to not populate all the relevant objects in 
inputObjs and outputObjs, which then requires that REPL DUMP and REPL LOAD be 
authorized at a higher level, and simply require ADMIN_PRIV to run,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17004) Calculating Number Of Reducers Looks At All Files

2017-06-30 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-17004:
--

 Summary: Calculating Number Of Reducers Looks At All Files
 Key: HIVE-17004
 URL: https://issues.apache.org/jira/browse/HIVE-17004
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 2.1.1
Reporter: BELUGA BEHR


When calculating the number of Mappers and Reducers, the two algorithms are 
looking at different data sets.  The number of Mappers are calculated based on 
the number of splits and the number of Reducers are based on the number of 
files within the HDFS directory.  What you see is that if I add files to a 
sub-directory of the HDFS directory, the number of splits remains the same 
since I did not tell Hive to search recursively, and the number of Reducers 
increases.  Please improve this so that Reducers are looking at the same files 
that are considered for splits and not at files within sub-directories (unless 
configured to do so).

{code}
CREATE EXTERNAL TABLE Complaints (
  a string,
  b string,
  c string,
  d string,
  e string,
  f string,
  g string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/admin/complaints';
{code}
{code}
[root@host ~]# sudo -u hdfs hdfs dfs -ls -R /user/admin/complaints
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.1.csv
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.2.csv
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.3.csv
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.4.csv
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.5.csv
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.csv
{code}

{code}
INFO  : Compiling 
command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): 
select a, count(1) from complaints group by a limit 10
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:a, 
type:string, comment:null), FieldSchema(name:_c1, type:bigint, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae); Time 
taken: 0.077 seconds
INFO  : Executing 
command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae): 
select a, count(1) from complaints group by a limit 10
INFO  : Query ID = hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 11
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
INFO  : number of splits:2
INFO  : Submitting tokens for job: job_1493729203063_0003
INFO  : The url to track the job: 
http://host:8088/proxy/application_1493729203063_0003/
INFO  : Starting Job = job_1493729203063_0003, Tracking URL = 
http://host:8088/proxy/application_1493729203063_0003/
INFO  : Kill Command = 
/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/bin/hadoop job  
-kill job_1493729203063_0003
INFO  : Hadoop job information for Stage-1: number of mappers: 2; number of 
reducers: 11
INFO  : 2017-05-02 14:20:14,206 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-05-02 14:20:22,520 Stage-1 map = 100%,  reduce = 0%, Cumulative 
CPU 4.48 sec
INFO  : 2017-05-02 14:20:34,029 Stage-1 map = 100%,  reduce = 27%, Cumulative 
CPU 15.72 sec
INFO  : 2017-05-02 14:20:35,069 Stage-1 map = 100%,  reduce = 55%, Cumulative 
CPU 21.94 sec
INFO  : 2017-05-02 14:20:36,110 Stage-1 map = 100%,  reduce = 64%, Cumulative 
CPU 23.97 sec
INFO  : 2017-05-02 14:20:39,233 Stage-1 map = 100%,  reduce = 73%, Cumulative 
CPU 25.26 sec
INFO  : 2017-05-02 14:20:43,392 Stage-1 map = 100%,  reduce = 100%, Cumulative 
CPU 30.9 sec
INFO  : MapReduce Total cumulative CPU time: 30 seconds 900 msec
INFO  : Ended Job = job_1493729203063_0003
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 2  Reduce: 11   Cumulative CPU: 30.9 sec   HDFS 
Read: 735691149 HDFS Write: 153 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 30 seconds 900 msec
INFO  : Completed executing 
command(queryId=hive_20170502142020_dfcf77ef-56b7-4544-ab90-6e9726ea86ae); Time 
taken: 36.035 seconds
INFO  : OK
{code}

{code}
[root@host ~]# sudo -u hdfs hdfs dfs -ls -R /user/admin/complaints
-rwxr-xr-x   2 admin admin  122607137 2017-05-02 14:12 
/user/admin/complaints/Consumer_Complaints.1.csv
-rwxr-xr-x   2 admin admin  

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Xuefu Zhang
+1, sounds like a good idea!

On Fri, Jun 30, 2017 at 1:24 PM, Harsha  wrote:

> Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
> This is a great opportunity for building a Metastore to not only address
> schemas for the data at rest but also for the data in motion. We have a
> SchemaRegistry (http://github.com/hortonworks/registry)  project that
> allows users to register schemas for data in motion and integrates with
> Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
> us with opportunity to integrate our apis with Hive Metastore and
> provide with one project that is truly a single metastore that can hold
> all schemas.
>
> Thanks,
> Harsha
>
> On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> > Great, thanks Alan for putting all this in the email.
> > +1
> >
> > Allowing other components to continue to use the Metastore without the
> > need
> > to use Hive dependencies is a big plus for them. I agree with everything
> > you mention on the email.
> >
> > - Sergio
> >
> > On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> >
> > > +1
> > >
> > > As a Calcite PMC member, I am very pleased to see this change. Calcite
> > > reads metadata from a variety of sources (including JDBC databases,
> NoSQL
> > > databases such as Cassandra and Druid, and streaming systems), and if
> more
> > > of those sources choose to store their metadata in the metastore it
> will
> > > make our lives easier.
> > >
> > > Hive’s metastore has established a position as the place to go for
> > > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > > processed by Hive, so there are other parties using the metastore who
> > > justifiably would like to influence its direction. Opening up the
> metastore
> > > will help retain and extend this position.
> > >
> > > Julian
> > >
> > >
> > > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > > >
> > > >
> > > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > > A few of us have been talking and come to the conclussion that it
> > > would be>
> > > > > a good thing to split out the Hive metastore into its own Apache
> > > project.>
> > > > > Below and in the linked wiki page we explain what we see as the
> > > advantages>
> > > > > to this and how we would go about it.>
> > > > > >
> > > > > Hive’s metastore has long been used by other projects in the
> Hadoop>
> > > > > ecosystem to store and access metadata.  Apache Impala, Apache
> Spark,>
> > > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > > Some,>
> > > > > like Impala and Presto can use it as their own metadata system with
> > > the>
> > > > > rest of Hive not present.>
> > > > > >
> > > > > This sharing is excellent for the ecosystem.  Together with HDFS it
> > > allows>
> > > > > users to use the tool of their choice while still accessing the
> same
> > > shared>
> > > > > data.  But having this shared metadata inside the Hive project
> limits
> > > the>
> > > > > ability of other projects to contribute to the metastore.  It also
> > > makes it>
> > > > > harder for new systems that have similar but not identical
> metadata>
> > > > > requirements (for example, stream processing systems on top of
> Apache>
> > > > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> > > comes>
> > > > > out in two ways.  One, it is hard for non-Hive community members
> to>
> > > > > participate in the project.  Second, it adds operational cost since
> > > users>
> > > > > are forced to deploy all of the Hive jars just to get the
> metastore to
> > > work.>
> > > > > >
> > > > > Therefore we propose to split Hive’s metastore out into a separate
> > > Apache>
> > > > > project.  This new project will continue to support the same Thrift
> > > API as>
> > > > > the current metastore.  It will continue to focus on being a high>
> > > > > performance, fault tolerant, large scale, operational metastore for
> > > SQL>
> > > > > engines and other systems that want to store schema information
> about
> > > their>
> > > > > data.>
> > > > > >
> > > > > By making it a separate project we will enable other projects to
> join
> > > us in>
> > > > > innovating on the metastore.  It will simplify operations for
> non-Hive>
> > > > > users that want to use the metastore as they will no longer need to
> > > install>
> > > > > Hive just to get the metastore.  And it will attract new projects
> that>
> > > > > might otherwise feel the need to solve their metadata problems on
> > > their own.>
> > > > > >
> > > > > Any Hive PMC member or committer will be welcome to join the new
> > > project at>
> > > > > the same level.  We propose this project go straight to a top
> level>
> > > > > project.  Given that the initial PMC will be formed from
> experienced
> > > Hive>
> > > > > PMC members we do not believe incubation will be necessary.  (Note
> > > that the>
> > > > > 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Harsha
Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
This is a great opportunity for building a Metastore to not only address
schemas for the data at rest but also for the data in motion. We have a
SchemaRegistry (http://github.com/hortonworks/registry)  project that
allows users to register schemas for data in motion and integrates with
Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
us with opportunity to integrate our apis with Hive Metastore and
provide with one project that is truly a single metastore that can hold
all schemas. 

Thanks,
Harsha

On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> Great, thanks Alan for putting all this in the email.
> +1
> 
> Allowing other components to continue to use the Metastore without the
> need
> to use Hive dependencies is a big plus for them. I agree with everything
> you mention on the email.
> 
> - Sergio
> 
> On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:
> 
> > +1
> >
> > As a Calcite PMC member, I am very pleased to see this change. Calcite
> > reads metadata from a variety of sources (including JDBC databases, NoSQL
> > databases such as Cassandra and Druid, and streaming systems), and if more
> > of those sources choose to store their metadata in the metastore it will
> > make our lives easier.
> >
> > Hive’s metastore has established a position as the place to go for
> > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > processed by Hive, so there are other parties using the metastore who
> > justifiably would like to influence its direction. Opening up the metastore
> > will help retain and extend this position.
> >
> > Julian
> >
> >
> > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > >
> > >
> > > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > > A few of us have been talking and come to the conclussion that it
> > would be>
> > > > a good thing to split out the Hive metastore into its own Apache
> > project.>
> > > > Below and in the linked wiki page we explain what we see as the
> > advantages>
> > > > to this and how we would go about it.>
> > > > >
> > > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > Some,>
> > > > like Impala and Presto can use it as their own metadata system with
> > the>
> > > > rest of Hive not present.>
> > > > >
> > > > This sharing is excellent for the ecosystem.  Together with HDFS it
> > allows>
> > > > users to use the tool of their choice while still accessing the same
> > shared>
> > > > data.  But having this shared metadata inside the Hive project limits
> > the>
> > > > ability of other projects to contribute to the metastore.  It also
> > makes it>
> > > > harder for new systems that have similar but not identical metadata>
> > > > requirements (for example, stream processing systems on top of Apache>
> > > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> > comes>
> > > > out in two ways.  One, it is hard for non-Hive community members to>
> > > > participate in the project.  Second, it adds operational cost since
> > users>
> > > > are forced to deploy all of the Hive jars just to get the metastore to
> > work.>
> > > > >
> > > > Therefore we propose to split Hive’s metastore out into a separate
> > Apache>
> > > > project.  This new project will continue to support the same Thrift
> > API as>
> > > > the current metastore.  It will continue to focus on being a high>
> > > > performance, fault tolerant, large scale, operational metastore for
> > SQL>
> > > > engines and other systems that want to store schema information about
> > their>
> > > > data.>
> > > > >
> > > > By making it a separate project we will enable other projects to join
> > us in>
> > > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > > users that want to use the metastore as they will no longer need to
> > install>
> > > > Hive just to get the metastore.  And it will attract new projects that>
> > > > might otherwise feel the need to solve their metadata problems on
> > their own.>
> > > > >
> > > > Any Hive PMC member or committer will be welcome to join the new
> > project at>
> > > > the same level.  We propose this project go straight to a top level>
> > > > project.  Given that the initial PMC will be formed from experienced
> > Hive>
> > > > PMC members we do not believe incubation will be necessary.  (Note
> > that the>
> > > > Apache board will need to approve this.)>
> > > > >
> > > > Obviously there a many details involved in a proposal like this.
> > Rather>
> > > > than make this a ten page email we have filled out many of the details
> > in a>
> > > > wiki page:>
> > > > https://cwiki.apache.org/confluence/display/Hive/
> > Metastore+TLP+Proposal>
> > > > >
> > 

Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Sergio Pena
Great, thanks Alan for putting all this in the email.
+1

Allowing other components to continue to use the Metastore without the need
to use Hive dependencies is a big plus for them. I agree with everything
you mention on the email.

- Sergio

On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde  wrote:

> +1
>
> As a Calcite PMC member, I am very pleased to see this change. Calcite
> reads metadata from a variety of sources (including JDBC databases, NoSQL
> databases such as Cassandra and Druid, and streaming systems), and if more
> of those sources choose to store their metadata in the metastore it will
> make our lives easier.
>
> Hive’s metastore has established a position as the place to go for
> metadata in the Hadoop ecosystem. Not all metadata is relational, or
> processed by Hive, so there are other parties using the metastore who
> justifiably would like to influence its direction. Opening up the metastore
> will help retain and extend this position.
>
> Julian
>
>
> On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> >
> >
> > On 2017-06-30 07:56 (-0700), Alan Gates  wrote: >
> > > A few of us have been talking and come to the conclussion that it
> would be>
> > > a good thing to split out the Hive metastore into its own Apache
> project.>
> > > Below and in the linked wiki page we explain what we see as the
> advantages>
> > > to this and how we would go about it.>
> > > >
> > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> Some,>
> > > like Impala and Presto can use it as their own metadata system with
> the>
> > > rest of Hive not present.>
> > > >
> > > This sharing is excellent for the ecosystem.  Together with HDFS it
> allows>
> > > users to use the tool of their choice while still accessing the same
> shared>
> > > data.  But having this shared metadata inside the Hive project limits
> the>
> > > ability of other projects to contribute to the metastore.  It also
> makes it>
> > > harder for new systems that have similar but not identical metadata>
> > > requirements (for example, stream processing systems on top of Apache>
> > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> comes>
> > > out in two ways.  One, it is hard for non-Hive community members to>
> > > participate in the project.  Second, it adds operational cost since
> users>
> > > are forced to deploy all of the Hive jars just to get the metastore to
> work.>
> > > >
> > > Therefore we propose to split Hive’s metastore out into a separate
> Apache>
> > > project.  This new project will continue to support the same Thrift
> API as>
> > > the current metastore.  It will continue to focus on being a high>
> > > performance, fault tolerant, large scale, operational metastore for
> SQL>
> > > engines and other systems that want to store schema information about
> their>
> > > data.>
> > > >
> > > By making it a separate project we will enable other projects to join
> us in>
> > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > users that want to use the metastore as they will no longer need to
> install>
> > > Hive just to get the metastore.  And it will attract new projects that>
> > > might otherwise feel the need to solve their metadata problems on
> their own.>
> > > >
> > > Any Hive PMC member or committer will be welcome to join the new
> project at>
> > > the same level.  We propose this project go straight to a top level>
> > > project.  Given that the initial PMC will be formed from experienced
> Hive>
> > > PMC members we do not believe incubation will be necessary.  (Note
> that the>
> > > Apache board will need to approve this.)>
> > > >
> > > Obviously there a many details involved in a proposal like this.
> Rather>
> > > than make this a ten page email we have filled out many of the details
> in a>
> > > wiki page:>
> > > https://cwiki.apache.org/confluence/display/Hive/
> Metastore+TLP+Proposal>
> > > >
> > > Yongzhi Chen>
> > > Vihang Karajgaonkar>
> > > Sergio Pena>
> > > Sahil Takiar>
> > > Aihua Xu>
> > > Gunther Hagleitner>
> > > Thejas Nair>
> > > Alan Gates>
> > > >
> >
> > +1 (from Apache Impala's (incubating) perspective)>
> >
> > Dimitris>
> >
>


Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Julian Hyde
+1

As a Calcite PMC member, I am very pleased to see this change. Calcite reads 
metadata from a variety of sources (including JDBC databases, NoSQL databases 
such as Cassandra and Druid, and streaming systems), and if more of those 
sources choose to store their metadata in the metastore it will make our lives 
easier.

Hive’s metastore has established a position as the place to go for metadata in 
the Hadoop ecosystem. Not all metadata is relational, or processed by Hive, so 
there are other parties using the metastore who justifiably would like to 
influence its direction. Opening up the metastore will help retain and extend 
this position.

Julian


On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote: 
> 
> 
> On 2017-06-30 07:56 (-0700), Alan Gates  wrote: > 
> > A few of us have been talking and come to the conclussion that it would be> 
> > a good thing to split out the Hive metastore into its own Apache project.> 
> > Below and in the linked wiki page we explain what we see as the advantages> 
> > to this and how we would go about it.> 
> > > 
> > Hive’s metastore has long been used by other projects in the Hadoop> 
> > ecosystem to store and access metadata.  Apache Impala, Apache Spark,> 
> > Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,> 
> > like Impala and Presto can use it as their own metadata system with the> 
> > rest of Hive not present.> 
> > > 
> > This sharing is excellent for the ecosystem.  Together with HDFS it allows> 
> > users to use the tool of their choice while still accessing the same 
> > shared> 
> > data.  But having this shared metadata inside the Hive project limits the> 
> > ability of other projects to contribute to the metastore.  It also makes 
> > it> 
> > harder for new systems that have similar but not identical metadata> 
> > requirements (for example, stream processing systems on top of Apache> 
> > Kafka) to use Hive’s metastore.  This difficulty for other systems comes> 
> > out in two ways.  One, it is hard for non-Hive community members to> 
> > participate in the project.  Second, it adds operational cost since users> 
> > are forced to deploy all of the Hive jars just to get the metastore to 
> > work.> 
> > > 
> > Therefore we propose to split Hive’s metastore out into a separate Apache> 
> > project.  This new project will continue to support the same Thrift API as> 
> > the current metastore.  It will continue to focus on being a high> 
> > performance, fault tolerant, large scale, operational metastore for SQL> 
> > engines and other systems that want to store schema information about 
> > their> 
> > data.> 
> > > 
> > By making it a separate project we will enable other projects to join us 
> > in> 
> > innovating on the metastore.  It will simplify operations for non-Hive> 
> > users that want to use the metastore as they will no longer need to 
> > install> 
> > Hive just to get the metastore.  And it will attract new projects that> 
> > might otherwise feel the need to solve their metadata problems on their 
> > own.> 
> > > 
> > Any Hive PMC member or committer will be welcome to join the new project 
> > at> 
> > the same level.  We propose this project go straight to a top level> 
> > project.  Given that the initial PMC will be formed from experienced Hive> 
> > PMC members we do not believe incubation will be necessary.  (Note that 
> > the> 
> > Apache board will need to approve this.)> 
> > > 
> > Obviously there a many details involved in a proposal like this.  Rather> 
> > than make this a ten page email we have filled out many of the details in 
> > a> 
> > wiki page:> 
> > https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal> 
> > > 
> > Yongzhi Chen> 
> > Vihang Karajgaonkar> 
> > Sergio Pena> 
> > Sahil Takiar> 
> > Aihua Xu> 
> > Gunther Hagleitner> 
> > Thejas Nair> 
> > Alan Gates> 
> > > 
> 
> +1 (from Apache Impala's (incubating) perspective)> 
> 
> Dimitris> 
> 

Re: Review Request 60259: HIVE-16926 LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-30 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60259/
---

(Updated June 30, 2017, 6:07 p.m.)


Review request for hive, Sergey Shelukhin and Siddharth Seth.


Changes
---

Fix class not found errors occurring in some tests due to GenericUDTFGetSplits 
referencing Tez token classes.


Bugs: HIVE-16926
https://issues.apache.org/jira/browse/HIVE-16926


Repository: hive-git


Description
---

Initial patch, restructured the LlapTaskUmbilicalExternalClient code a bit.
- Uses shared LLAP umbilical server rather than a new server per external client
- Retries rejected submissions (WorkSubmitter helper class)
- No more deferred cleanup (from HIVE-16652). One thing about this is that once 
clients are closed/unregistered, communicator.stop() is called and it's removed 
from the registered list of clients. So we might get a few warning messages 
about untracked taskAttemptIds coming in during heartbeat() .. if this is 
undesirable we might be able to leave them in the registeredClients list (but 
ignore heartbeats to them as they are tagged as closed), and remove them using 
the HeartbeatCheckTask once they get too old.


Diffs (updated)
-

  llap-client/src/java/org/apache/hadoop/hive/llap/LlapBaseRecordReader.java 
7fff147 
  llap-client/src/java/org/apache/hadoop/hive/llap/SubmitWorkInfo.java 95b0ffc 
  
llap-client/src/java/org/apache/hadoop/hive/llap/ext/LlapTaskUmbilicalExternalClient.java
 406bdda 
  
llap-client/src/java/org/apache/hadoop/hive/llap/tezplugins/helpers/LlapTaskUmbilicalServer.java
 403381d 
  llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java 
eb93241 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 
d4ec44e 


Diff: https://reviews.apache.org/r/60259/diff/3/

Changes: https://reviews.apache.org/r/60259/diff/2-3/


Testing
---


Thanks,

Jason Dere



Re: [DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Dimitris Tsirogiannis


On 2017-06-30 07:56 (-0700), Alan Gates  wrote: 
> A few of us have been talking and come to the conclussion that it would be
> a good thing to split out the Hive metastore into its own Apache project.
> Below and in the linked wiki page we explain what we see as the advantages
> to this and how we would go about it.
> 
> Hive’s metastore has long been used by other projects in the Hadoop
> ecosystem to store and access metadata.  Apache Impala, Apache Spark,
> Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,
> like Impala and Presto can use it as their own metadata system with the
> rest of Hive not present.
> 
> This sharing is excellent for the ecosystem.  Together with HDFS it allows
> users to use the tool of their choice while still accessing the same shared
> data.  But having this shared metadata inside the Hive project limits the
> ability of other projects to contribute to the metastore.  It also makes it
> harder for new systems that have similar but not identical metadata
> requirements (for example, stream processing systems on top of Apache
> Kafka) to use Hive’s metastore.  This difficulty for other systems comes
> out in two ways.  One, it is hard for non-Hive community members to
> participate in the project.  Second, it adds operational cost since users
> are forced to deploy all of the Hive jars just to get the metastore to work.
> 
> Therefore we propose to split Hive’s metastore out into a separate Apache
> project.  This new project will continue to support the same Thrift API as
> the current metastore.  It will continue to focus on being a high
> performance, fault tolerant, large scale, operational metastore for SQL
> engines and other systems that want to store schema information about their
> data.
> 
> By making it a separate project we will enable other projects to join us in
> innovating on the metastore.  It will simplify operations for non-Hive
> users that want to use the metastore as they will no longer need to install
> Hive just to get the metastore.  And it will attract new projects that
> might otherwise feel the need to solve their metadata problems on their own.
> 
> Any Hive PMC member or committer will be welcome to join the new project at
> the same level.  We propose this project go straight to a top level
> project.  Given that the initial PMC will be formed from experienced Hive
> PMC members we do not believe incubation will be necessary.  (Note that the
> Apache board will need to approve this.)
> 
> Obviously there a many details involved in a proposal like this.  Rather
> than make this a ten page email we have filled out many of the details in a
> wiki page:
> https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal
> 
> Yongzhi Chen
> Vihang Karajgaonkar
> Sergio Pena
> Sahil Takiar
> Aihua Xu
> Gunther Hagleitner
> Thejas Nair
> Alan Gates
> 

+1 (from Apache Impala's (incubating) perspective)

Dimitris


[jira] [Created] (HIVE-17003) WebHCat Hive jobs lost progress info since HIVE_CLI_SERVICE_PROTOCOL_V10

2017-06-30 Thread JIRA
Pau Tallada Crespí created HIVE-17003:
-

 Summary: WebHCat Hive jobs lost progress info since 
HIVE_CLI_SERVICE_PROTOCOL_V10
 Key: HIVE-17003
 URL: https://issues.apache.org/jira/browse/HIVE-17003
 Project: Hive
  Issue Type: Bug
  Components: Thrift API, WebHCat
Affects Versions: 1.2.1
 Environment: HDP 2.6.0 on Centos7
Hive 1.2.1.2.6 and probably some version of Hive 2.x (I think 2.1.x)
Reporter: Pau Tallada Crespí
Priority: Minor


Hi,

We rely on WebHCat REST interface to retrieve Hive job progress for 
long-running queries.

Like this GET /templeton/v1/jobs/job_1498806400392_0014?user.name=xxx HTTP/1.1

Before updating (using HDP 2.5 with an older Hive version), we used progress 
info in "percentComplete" JSON response field.

After update, this field is always "null". I suppose this may be to changes in 
the progress reporting API in https://issues.apache.org/jira/browse/HIVE-15473

See also: 
https://github.com/apache/hive/commit/3e01ef3268ffbcb69c5c18c2c9f8810512c91bf8#diff-bd8c8f77e81bc08903e14233b3be1687

Thank you very much for your work!

Let me know if I can provide more info or help in any way :)

Pau.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 60445: HIVE-16935: Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-30 Thread Peter Vary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60445/#review179380
---


Ship it!




Ship It!

- Peter Vary


On June 30, 2017, 4:18 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60445/
> ---
> 
> (Updated June 30, 2017, 4:18 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Bugs: HIVE-16935
> https://issues.apache.org/jira/browse/HIVE-16935
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> We strip sql comments from a command string. The stripped command is use to 
> determine which
> CommandProcessor will execute the command. If the CommandProcessorFactory 
> does not select a special
> CommandProcessor then we execute the original unstripped command so that the 
> sql parser can remove comments.
> Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine 
> to call it from there
> Add a better test with separate tokens for "set role" in 
> TestCommandProcessorFactory.
> Add a test case for comment removal in set_processor_namespaces.q using an 
> indented comment as
> unindented comments are removed by the test driver.
> 
> Change-Id: I166dc1e7588ec9802ba373d88e69e716aecd33c2
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/Commands.java 
> 3b2d72ed79771e6198e62c47060a7f80665dbcb2 
>   beeline/src/test/org/apache/hive/beeline/TestCommands.java 
> 04c939a04c7a56768286743c2bb9c9797507e3aa 
>   cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 
> 27fd66d35ea89b0de0d17763625fbf564584fcca 
>   common/src/java/org/apache/hive/common/util/HiveStringUtils.java 
> 4a6413a7c376ffb4de6d20d24707ac5bf89ebc0c 
>   common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java 
> 6bd7037152c6f809daec8af42708693c05fe00cf 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  21bdcf44436a02b11f878fa439e916d4b55ac63d 
>   ql/src/test/queries/clientpositive/set_processor_namespaces.q 
> 612807f0c871b1881446d088e1c2c399d1afe970 
>   ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
> c05ce4d61d00a9ee6671d97f2fd178f18d44cc8c 
>   
> service/src/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java
>  2dd90b69b3bf789b1a3928129cf801b17884033f 
> 
> 
> Diff: https://reviews.apache.org/r/60445/diff/3/
> 
> 
> Testing
> ---
> 
> Added new test case.
> Hand tested with Hue and Jdbc.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



Re: Review Request 60445: HIVE-16935: Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-30 Thread Andrew Sherman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60445/
---

(Updated June 30, 2017, 4:18 p.m.)


Review request for hive and Sahil Takiar.


Bugs: HIVE-16935
https://issues.apache.org/jira/browse/HIVE-16935


Repository: hive-git


Description (updated)
---

We strip sql comments from a command string. The stripped command is use to 
determine which
CommandProcessor will execute the command. If the CommandProcessorFactory does 
not select a special
CommandProcessor then we execute the original unstripped command so that the 
sql parser can remove comments.
Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine to 
call it from there
Add a better test with separate tokens for "set role" in 
TestCommandProcessorFactory.
Add a test case for comment removal in set_processor_namespaces.q using an 
indented comment as
unindented comments are removed by the test driver.

Change-Id: I166dc1e7588ec9802ba373d88e69e716aecd33c2


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/Commands.java 
3b2d72ed79771e6198e62c47060a7f80665dbcb2 
  beeline/src/test/org/apache/hive/beeline/TestCommands.java 
04c939a04c7a56768286743c2bb9c9797507e3aa 
  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 
27fd66d35ea89b0de0d17763625fbf564584fcca 
  common/src/java/org/apache/hive/common/util/HiveStringUtils.java 
4a6413a7c376ffb4de6d20d24707ac5bf89ebc0c 
  common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java 
6bd7037152c6f809daec8af42708693c05fe00cf 
  
ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
 21bdcf44436a02b11f878fa439e916d4b55ac63d 
  ql/src/test/queries/clientpositive/set_processor_namespaces.q 
612807f0c871b1881446d088e1c2c399d1afe970 
  ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
c05ce4d61d00a9ee6671d97f2fd178f18d44cc8c 
  
service/src/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java
 2dd90b69b3bf789b1a3928129cf801b17884033f 


Diff: https://reviews.apache.org/r/60445/diff/3/

Changes: https://reviews.apache.org/r/60445/diff/2-3/


Testing
---

Added new test case.
Hand tested with Hue and Jdbc.


Thanks,

Andrew Sherman



Re: Review Request 60445: HIVE-16935: Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-30 Thread Peter Vary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60445/#review179369
---



Thanks for the fast update! LGTM, just one little nit :)

Peter


common/src/java/org/apache/hive/common/util/HiveStringUtils.java
Lines 1101-1103 (original), 1101-1103 (patched)


nit: I think your new javadoc is good, so we can remove this. What do you 
think?


- Peter Vary


On June 30, 2017, 12:30 a.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60445/
> ---
> 
> (Updated June 30, 2017, 12:30 a.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Bugs: HIVE-16935
> https://issues.apache.org/jira/browse/HIVE-16935
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> We strip sql comments from a command string. The stripped command is use to 
> determine which
> CommandProcessor will execute the command. If the CommandProcessorFactory 
> does not select a special
> CommandProcessor then we execute the original unstripped command so that the 
> sql parser can remove comments.
> Move BeeLine's comment stripping code to HiveStringUtils and change BeeLine 
> to call it from there
> Add a better test with separate tokens for "set role" in 
> TestCommandProcessorFactory.
> Add a test case for comment removal in set_processor_namespaces.q  using an 
> indented comment as
> unindented comments are removed by the test driver.
> 
> Change-Id: I166dc1e7588ec9802ba373d88e69e716aecd33c2
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/Commands.java 
> 3b2d72ed79771e6198e62c47060a7f80665dbcb2 
>   beeline/src/test/org/apache/hive/beeline/TestCommands.java 
> 04c939a04c7a56768286743c2bb9c9797507e3aa 
>   cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 
> 27fd66d35ea89b0de0d17763625fbf564584fcca 
>   common/src/java/org/apache/hive/common/util/HiveStringUtils.java 
> 4a6413a7c376ffb4de6d20d24707ac5bf89ebc0c 
>   common/src/test/org/apache/hive/common/util/TestHiveStringUtils.java 
> 6bd7037152c6f809daec8af42708693c05fe00cf 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  21bdcf44436a02b11f878fa439e916d4b55ac63d 
>   ql/src/test/queries/clientpositive/set_processor_namespaces.q 
> 612807f0c871b1881446d088e1c2c399d1afe970 
>   ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
> c05ce4d61d00a9ee6671d97f2fd178f18d44cc8c 
>   
> service/src/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java
>  2dd90b69b3bf789b1a3928129cf801b17884033f 
> 
> 
> Diff: https://reviews.apache.org/r/60445/diff/2/
> 
> 
> Testing
> ---
> 
> Added new test case.
> Hand tested with Hue and Jdbc.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



[DISCUSS] Separating out the metastore as its own TLP

2017-06-30 Thread Alan Gates
A few of us have been talking and come to the conclussion that it would be
a good thing to split out the Hive metastore into its own Apache project.
Below and in the linked wiki page we explain what we see as the advantages
to this and how we would go about it.

Hive’s metastore has long been used by other projects in the Hadoop
ecosystem to store and access metadata.  Apache Impala, Apache Spark,
Apache Drill, Presto, and other systems all use Hive’s metastore.  Some,
like Impala and Presto can use it as their own metadata system with the
rest of Hive not present.

This sharing is excellent for the ecosystem.  Together with HDFS it allows
users to use the tool of their choice while still accessing the same shared
data.  But having this shared metadata inside the Hive project limits the
ability of other projects to contribute to the metastore.  It also makes it
harder for new systems that have similar but not identical metadata
requirements (for example, stream processing systems on top of Apache
Kafka) to use Hive’s metastore.  This difficulty for other systems comes
out in two ways.  One, it is hard for non-Hive community members to
participate in the project.  Second, it adds operational cost since users
are forced to deploy all of the Hive jars just to get the metastore to work.

Therefore we propose to split Hive’s metastore out into a separate Apache
project.  This new project will continue to support the same Thrift API as
the current metastore.  It will continue to focus on being a high
performance, fault tolerant, large scale, operational metastore for SQL
engines and other systems that want to store schema information about their
data.

By making it a separate project we will enable other projects to join us in
innovating on the metastore.  It will simplify operations for non-Hive
users that want to use the metastore as they will no longer need to install
Hive just to get the metastore.  And it will attract new projects that
might otherwise feel the need to solve their metadata problems on their own.

Any Hive PMC member or committer will be welcome to join the new project at
the same level.  We propose this project go straight to a top level
project.  Given that the initial PMC will be formed from experienced Hive
PMC members we do not believe incubation will be necessary.  (Note that the
Apache board will need to approve this.)

Obviously there a many details involved in a proposal like this.  Rather
than make this a ten page email we have filled out many of the details in a
wiki page:
https://cwiki.apache.org/confluence/display/Hive/Metastore+TLP+Proposal

Yongzhi Chen
Vihang Karajgaonkar
Sergio Pena
Sahil Takiar
Aihua Xu
Gunther Hagleitner
Thejas Nair
Alan Gates


[jira] [Created] (HIVE-17002) decimal (binary) is not working when creating external table for hbase

2017-06-30 Thread Artur Tamazian (JIRA)
Artur Tamazian created HIVE-17002:
-

 Summary: decimal (binary) is not working when creating external 
table for hbase
 Key: HIVE-17002
 URL: https://issues.apache.org/jira/browse/HIVE-17002
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.1.1
 Environment: HBase 1.2.0, Hive 2.1.1
Reporter: Artur Tamazian


I have a table in Hbase which has a column stored using 
Bytes.toBytes((BigDecimal) value). Hbase version is 1.2.0

I'm creating an external table in hive to access it like this:

create external table `Users`(key int, ..., `example_column` decimal) 
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
with serdeproperties ("hbase.columns.mapping" = ":key, db:example_column") 
tblproperties("hbase.table.name" = 
"Users","hbase.table.default.storage.type" = "binary");

Table is created without errors. After that I try running "select * from 
users;" and see this error:
{noformat}
org.apache.hive.service.cli.HiveSQLException:java.io.IOException: 
java.lang.RuntimeException: java.lang.RuntimeException: Hive Internal Error: no 
LazyObject for 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyHiveDecimalObjectInspector@1f18cebb:25:24
  
   
org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:484
  
   
org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:308
  
   
org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:847
  
   sun.reflect.GeneratedMethodAccessor11:invoke::-1  
   
sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43
  
   java.lang.reflect.Method:invoke:Method.java:498  
   
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78
  
   
org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36
  
   
org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63
  
   java.security.AccessController:doPrivileged:AccessController.java:-2  
   javax.security.auth.Subject:doAs:Subject.java:422  
   
org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1698
  
   
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59
  
   com.sun.proxy.$Proxy33:fetchResults::-1  
   org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:504  
   
org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:698
  
   
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717
  
   
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702
  
   org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39  
   org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39  
   
org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56
  
   
org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286
  
   
java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142  
   
java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617  
   java.lang.Thread:run:Thread.java:748  
   *java.io.IOException:java.lang.RuntimeException: java.lang.RuntimeException: 
Hive Internal Error: no LazyObject for 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyHiveDecimalObjectInspector@1f18cebb:27:2
  
   org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:164  
   org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2098  
   
org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:479
  
   *java.lang.RuntimeException:java.lang.RuntimeException: Hive Internal Error: 
no LazyObject for 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyHiveDecimalObjectInspector@1f18cebb:43:16
  
   
org.apache.hadoop.hive.serde2.lazy.LazyStruct:initLazyFields:LazyStruct.java:172
  
   org.apache.hadoop.hive.hbase.LazyHBaseRow:initFields:LazyHBaseRow.java:122  
   org.apache.hadoop.hive.hbase.LazyHBaseRow:getField:LazyHBaseRow.java:116  
   
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector:getStructFieldData:LazySimpleStructObjectInspector.java:128
  
   
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator:_evaluate:ExprNodeColumnEvaluator.java:94
  
   
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator:evaluate:ExprNodeEvaluator.java:77
  
   
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject:get:ExprNodeGenericFuncEvaluator.java:87
  
   
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual:evaluate:GenericUDFOPEqual.java:103
  
   
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator:_evaluate:ExprNodeGenericFuncEvaluator.java:186
  
  

[jira] [Created] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-06-30 Thread Barna Zsombor Klara (JIRA)
Barna Zsombor Klara created HIVE-17001:
--

 Summary: Insert overwrite table doesn't clean partition directory 
on HDFS if partition is missing from HMS
 Key: HIVE-17001
 URL: https://issues.apache.org/jira/browse/HIVE-17001
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Reporter: Barna Zsombor Klara
Assignee: Barna Zsombor Klara


Insert overwrite table should clear existing data before creating the new data 
files.
For a partitioned table we will clean any folder of existing partitions on 
HDFS, however if the partition folder exists only on HDFS and the partition 
definition is missing in HMS, the folder is not cleared.
Reproduction steps:
1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
2. INSERT INTO test PARTITION(ds='p1') values ('a');
3. Copy the data to a different folder with different name.
4. ALTER TABLE test DROP PARTITION (ds='p1');
5. Recreate the partition directory, copy and rename the data file back
6. INSERT INTO test PARTITION(ds='p1') values ('b');
7. SELECT * from test;
will result in 2 records being returned instead of 1.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Backporting HIVE-10790 to 1.0

2017-06-30 Thread Shrif Nada
Hi,

HIVE-10790 (
https://issues.apache.org/jira/secure/attachment/12739023/HIVE-10790.0.patch.txt)
fixed an issue where files could not be written on a federated cluster in
Hive 1.0 (and subsequent versions). Libraries that depend on Hive 1.0,
therefore, cannot write to a federated cluster because of this bug. Corc,
the cascading integration for the Orc file format (
https://github.com/HotelsDotCom/corc), is an example of such a library.

Would it be possible to backport HIVE-10790 to 1.0 in order to allow such
libraries to write to federated clusters?

Shrif