[jira] [Updated] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3828:
---

Attachment: HIVE-3828.patch.1

> insert overwrite fails with stored-as-dir in cluster
> 
>
> Key: HIVE-3828
> URL: https://issues.apache.org/jira/browse/HIVE-3828
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Attachments: HIVE-3828.patch.1
>
>
> The following query works fine in hive TestCliDriver test suite but not 
> minimr because different Hadoop file system is used.
> The error is
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
> .../_tmp.-ext-10002/key=103/00_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3828:
---

Status: Patch Available  (was: In Progress)

patch is available.

> insert overwrite fails with stored-as-dir in cluster
> 
>
> Key: HIVE-3828
> URL: https://issues.apache.org/jira/browse/HIVE-3828
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
> Attachments: HIVE-3828.patch.1
>
>
> The following query works fine in hive TestCliDriver test suite but not 
> minimr because different Hadoop file system is used.
> The error is
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
> .../_tmp.-ext-10002/key=103/00_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537739#comment-13537739
 ] 

Gang Tim Liu commented on HIVE-3828:


https://reviews.facebook.net/D7581

> insert overwrite fails with stored-as-dir in cluster
> 
>
> Key: HIVE-3828
> URL: https://issues.apache.org/jira/browse/HIVE-3828
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> The following query works fine in hive TestCliDriver test suite but not 
> minimr because different Hadoop file system is used.
> The error is
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
> .../_tmp.-ext-10002/key=103/00_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3828 started by Gang Tim Liu.

> insert overwrite fails with stored-as-dir in cluster
> 
>
> Key: HIVE-3828
> URL: https://issues.apache.org/jira/browse/HIVE-3828
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> The following query works fine in hive TestCliDriver test suite but not 
> minimr because different Hadoop file system is used.
> The error is
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
> .../_tmp.-ext-10002/key=103/00_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3828:
---

Description: 
The following query works fine in hive TestCliDriver test suite but not minimr 
because different Hadoop file system is used.

The error is
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
.../_tmp.-ext-10002/key=103/00_0
{code}

  was:
The following query works fine in hive TestCliDriver test suite but not minimr 
because different Hadoop file system is used.

The error is
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
output from: 
.../_task_tmp.-ext-10002/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/_tmp.00_0 to: 
.../_tmp.-ext-10002/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/00_0
{code}


> insert overwrite fails with stored-as-dir in cluster
> 
>
> Key: HIVE-3828
> URL: https://issues.apache.org/jira/browse/HIVE-3828
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> The following query works fine in hive TestCliDriver test suite but not 
> minimr because different Hadoop file system is used.
> The error is
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
> .../_tmp.-ext-10002/key=103/00_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3828:
--

Assignee: Gang Tim Liu

> insert overwrite fails with stored-as-dir in cluster
> 
>
> Key: HIVE-3828
> URL: https://issues.apache.org/jira/browse/HIVE-3828
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> The following query works fine in hive TestCliDriver test suite but not 
> minimr because different Hadoop file system is used.
> The error is
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
> output from: 
> .../_task_tmp.-ext-10002/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/_tmp.00_0 
> to: .../_tmp.-ext-10002/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/00_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-20 Thread Gang Tim Liu (JIRA)
Gang Tim Liu created HIVE-3828:
--

 Summary: insert overwrite fails with stored-as-dir in cluster
 Key: HIVE-3828
 URL: https://issues.apache.org/jira/browse/HIVE-3828
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Reporter: Gang Tim Liu


The following query works fine in hive TestCliDriver test suite but not minimr 
because different Hadoop file system is used.

The error is
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
output from: 
.../_task_tmp.-ext-10002/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/_tmp.00_0 to: 
.../_tmp.-ext-10002/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/00_0
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3827) LATERAL VIEW doesn't work with union all statement

2012-12-20 Thread cyril liao (JIRA)
cyril liao created HIVE-3827:


 Summary: LATERAL VIEW doesn't work with union all statement
 Key: HIVE-3827
 URL: https://issues.apache.org/jira/browse/HIVE-3827
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0
 Environment: hive0.9.0 hadoop 0.20.205
Reporter: cyril liao


LATER VIEW lose data working with union all.


query NO.1:
SELECT
1 as from_pid,
1 as to_pid,
cid as from_path,
(CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
0 as status
FROM
(SELECT union_map(c_map) AS c_map
FROM
(SELECT collect_map(id,parent_id)AS c_map
FROM
wl_channels
GROUP BY id,parent_id
)tmp
)tmp2
LATERAL VIEW recursion_concat(c_map) a AS cid, pid
this query returns about 1 rows ,and their status is 0.

query NO.2:
select
a.from_pid as from_pid,
a.to_pid as to_pid, 
a.from_path as from_path,
a.to_path as to_path,
a.status as status
from wl_dc_channels a
where a.status <> 0
this query returns about 100 rows ,and their status is 1 or 2.

query NO.3:
select
from_pid,
to_pid,
from_path,
to_path,
status
from
(
SELECT
1 as from_pid,
1 as to_pid,
cid as from_path,
(CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
0 as status
FROM
(SELECT union_map(c_map) AS c_map
FROM
(SELECT collect_map(id,parent_id)AS c_map
FROM
wl_channels
GROUP BY id,parent_id
)tmp
)tmp2
LATERAL VIEW recursion_concat(c_map) a AS cid, pid
union all
select
a.from_pid as from_pid,
a.to_pid as to_pid, 
a.from_path as from_path,
a.to_path as to_path,
a.status as status
from wl_dc_channels a
where a.status <> 0
) unin_tbl
this query has the same result as query NO.2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW

2012-12-20 Thread cyril liao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537714#comment-13537714
 ] 

cyril liao commented on HIVE-3104:
--

ok

> Predicate pushdown doesn't work with multi-insert statements using LATERAL 
> VIEW
> ---
>
> Key: HIVE-3104
> URL: https://issues.apache.org/jira/browse/HIVE-3104
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0
> Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0
>Reporter: Mark Grover
>
> Predicate pushdown seems to work for single-insert queries using LATERAL 
> VIEW. It also seems to work for multi-insert queries *not* using LATERAL 
> VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW.
> Here are some examples. In the below examples, I make use of the fact that a 
> query with no partition filtering when run under "hive.mapred.mode=strict" 
> fails.
> --Table creation and population
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (col1 array, col2 int)  PARTITIONED BY (part_col int);
> INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), 
> count(*) FROM test;
> INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
> count(*) FROM test;
> -- Query 1
> -- This succeeds (using LATERAL VIEW with single insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT exp_col1
> WHERE (part_col=2);
> -- Query 2
> -- This succeeds (NOT using LATERAL VIEW with multi-insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT col1
> WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
> SELECT col1
> WHERE (part_col=2);
> -- Query 3
> -- This fails (using LATERAL VIEW with multi-insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT exp_col1
> WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
> SELECT exp_col1
> WHERE (part_col=2);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Alan Gates
Namit,

I was not proposing that promotion to full committership would be automatic.  I 
assume it would still be done via a vote by the PMC.  I agree that we cannot 
_guarantee_ committership for HCat committers in 6-9 months.  But I am trying 
to lay out a clear path they can follow.  If they don't follow the path then 
they won't be committers.  I am also trying to make it non-preferential in that 
I am setting the criteria to be what I believe the Hive PMC would expect any 
prospective Hive committer to do.  The only intended preferential part of the 
proposal is the Hive shepherds, which we have all agreed is a good idea.

Alan.

On Dec 19, 2012, at 8:23 PM, Namit Jain wrote:

> I don’t agree with the proposal. It is impractical to have a Hcat committer
> with commit access to Hcat only portions of Hive. We cannot guarantee that
> a Hcat
> committer will become a Hive committer in 6-9 months, that depends on what
> they do
> in the next 6-9 months.
> 
> The current Hcat committers should spend more time in reviewing patches,
> work on non-Hcat areas in Hive, and then gradually become a hive
> committer. They should not be given any preferential treatment, and the
> process should be same as it would be for any other hive contributor
> currently. Given that the expertise of the Hcat committers, they should
> be inline for becoming a hive committer if they continue to work in hive,
> but that cannot be guaranteed. I agree that some Hive committers should try
> and help the existing Hcat patches, and again that is voluntary and
> different
> committers cannot be assigned to different parts of the code.
> 
> Thanks,
> -namit
> 
> 
> 
> 
> 
> 
> 
> On 12/20/12 1:03 AM, "Carl Steinbach"  wrote:
> 
>> Alan's proposal sounds like a good idea to me.
>> 
>> +1
>> 
>> On Dec 18, 2012 5:36 PM, "Travis Crawford" 
>> wrote:
>> 
>>> Alan, I think your proposal sounds great.
>>> 
>>> --travis
>>> 
>>> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates 
>>> wrote:
 Carl, speaking just for myself and not as a representative of the HCat
>>> PPMC at this point, I am coming to agree with you that HCat integrating
>>> with Hive fully makes more sense.
 
 However, this makes the committer question even thornier.  Travis and
>>> Namit, I think the shepherd proposal needs to lay out a clear and time
>>> bounded path to committership for HCat committers.  Having HCat
>>> committers
>>> as second class Hive citizens for the long run will not be healthy.  I
>>> propose the following as a starting point for discussion:
 
 All active HCat committers (those who have contributed or committed a
>>> patch in the last 6 months) will be made committers in the HCat portion
>>> only of Hive.  In addition those committers will be assigned a
>>> particular
>>> shepherd who is a current Hive committer and who will be responsible for
>>> mentoring them towards full Hive committership.  As a part of this
>>> mentorship the HCat committer will review patches of other contributors,
>>> contribute patches to Hive (both inside and outside of HCatalog),
>>> respond
>>> to user issues on the mailing lists, etc.  It is intended that as a
>>> result
>>> of this mentorship program HCat committers can become full Hive
>>> committers
>>> in 6-9 months.  No new HCat only committers will be elected in Hive
>>> after
>>> this.  All Hive committers will automatically also have commit rights on
>>> HCatalog.
 
 Alan.
 
 On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote:
 
> On a functional level I don't think there is going to be much of a
> difference between the subproject option proposed by Travis and the
>>> other
> option where HCatalog becomes a TLP. In both cases HCatalog and Hive
>>> will
> have separate committers, separate code repositories, separate
>>> release
> cycles, and separate project roadmaps. Aside from ASF bureaucracy, I
>>> think
> the only major difference between the two options is that the
>>> subproject
> route will give the rest of the community the false impression that
>>> the
>>> two
> projects have coordinated roadmaps and a process to prevent
>>> overlapping
> functionality from appearing in both projects. Consequently, If these
>>> are
> the only two options then I would prefer that HCatalog become a TLP.
> 
> On the other hand, I also agree with many of the sentiments that have
> already been expressed in this thread, namely that the two projects
>>> are
> closely related and that it would benefit the community at large if
>>> the
>>> two
> projects could be brought closer together. Up to this point the major
> source of pain for the HCatalog team has been the frequent necessity
>>> of
> making changes on both the Hive and HCatalog sides when implementing
>>> new
> features in HCatalog. This situation is compounded by the ASF
>>> requirement
> that release artifacts may not depend on snapshot artifacts from
>>> other

[jira] [Commented] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537534#comment-13537534
 ] 

Kevin Wilfong commented on HIVE-3826:
-

The tests pass.

> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> -
>
> Key: HIVE-3826
> URL: https://issues.apache.org/jira/browse/HIVE-3826
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.11
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback.  I also turned off all 
> retries at all layers of the metastore.
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3826:


Status: Patch Available  (was: Open)

> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> -
>
> Key: HIVE-3826
> URL: https://issues.apache.org/jira/browse/HIVE-3826
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.11
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback.  I also turned off all 
> retries at all layers of the metastore.
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Ashish Thusoo
We are certainly not marketeers and no one as far as I know is teeing up
for such a campaign. The intent here is certainly not to claim how great
HCatalog is or how great Hive is. The intent here is to see what is best
for the project and how great both are together.

Ashish


On Thu, Dec 20, 2012 at 2:02 PM, Carl Steinbach  wrote:

> > Would the project not benefit in the long run if Hcat is
> > brought in and some day becomes the default metastore for Hive.
>
>
> Folks, can we please try to keep this straight? HCatalog on its own does
> not provide any support for metadata. It is a set of wrapper APIs that make
> Hive's metastore and serdes accessible to Pig and MR. I think these
> wrappers provide a lot of value, and I'm eager to see them merged into
> Hive, but I'm dreading the marketing campaign that I suspect will follow:
> "Hive now supports metadata thanks to HCatalog!".
>


[jira] [Updated] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3826:


Description: 
I'm not sure if this is the only cause of the exception 
"org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row)" from the metastore, but one cause seems to be related to a drop command 
failing, and being retried by the client.

Based on focusing on a single thread in the metastore with DEBUG level logging, 
I was seeing the objects that were intended to be dropped remaining in the 
PersistenceManager cache even after a rollback.  The steps seemed to be as 
follows:

1) First attempt to drop the table, the table is pulled into the 
PersistenceManager cache for the purposes of dropping
2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
causes a rollback of the transaction
3) The drop is retried using a different thread on the metastore Thrift server 
or a different server and succeeds
4) Back on the original thread of the original Thrift server someone tries to 
perform some write operation which produces a commit.  This causes those 
detached objects related to the dropped table to attempt to reattach, causing 
JDO to query the SQL backend for those objects which it can't find.  This 
causes the exception.

I was able to reproduce this regularly using the following sequence of commands:
Hive client 1 (Hive1): connected to a metastore Thrift server running a single 
thread, I hard coded a RuntimeException into the code to drop a table in the 
ObjectStore, specifically right before the commit in preDropStorageDescriptor, 
to induce a rollback.  I also turned off all retries at all layers of the 
metastore.
Hive client 2 (Hive2): connected to a separate metastore Thrift server running 
with standard configs and code

1: On Hive1, CREATE TABLE t1 (c STRING);
2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
3: On Hive2, DROP TABLE t1; // Succeeds
4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not sure 
why this was necessary, but it didn't work without it, it seemed to have an 
affect on the order objects were committed in the next step
5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
with the NucleusObjectNotFoundException

The object that would cause the exception varied, I saw the MTable, the 
MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

  was:
I'm not sure if this is the only cause of the exception 
"org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row)" from the metastore, but one cause seems to be related to a drop command 
failing, and being retried by the client.

Based on focusing on a single thread in the metastore with DEBUG level logging, 
I was seeing the objects that were intended to be dropped remaining in the 
PersistenceManager cache even after a rollback.  The steps seemed to be as 
follows:

1) First attempt to drop the table, the table is pulled into the 
PersistenceManager cache for the purposes of dropping
2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
causes a rollback of the transaction
3) The drop is retried using a different thread on the metastore Thrift server 
or a different server and succeeds
4) Back on the original thread of the original Thrift server someone tries to 
perform some write operation which produces a commit.  This causes those 
detached objects related to the dropped table to attempt to reattach, causing 
JDO to query the SQL backend for those objects which it can't find.  This 
causes the exception.

I was able to reproduce this regularly using the following sequence of commands:
Hive client 1 (Hive1): connected to a metastore Thrift server running a single 
thread, I hard coded a RuntimeException into the code to drop a table in the 
ObjectStore, specifically right before the commit in preDropStorageDescriptor, 
to induce a rollback
Hive client 2 (Hive2): connected to a separate metastore Thrift server running 
with standard configs and code

1: On Hive1, CREATE TABLE t1 (c STRING);
2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
3: On Hive2, DROP TABLE t1; // Succeeds
4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not sure 
why this was necessary, but it didn't work without it, it seemed to have an 
affect on the order objects were committed in the next step
5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
with the NucleusObjectNotFoundException

The object that would cause the exception varied, I saw the MTable, the 
MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.


> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> --

Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Ashish Thusoo
hmm... why is this considered "preferential treatment"?

All the work for HCat is in the public domain so we can really evaluate
whether they have been following apache practices - the fact that they are
graduating from the incubator would seem to indicate that they have been
doing so. If this code base is contributed back to Hive, is that not
counted as a significant contribution to Hive? I am failing to understand
on what count they don't qualify to be committers.

Plus if it is too onerous to enforce committer privileges on selective
parts (is there a way?) of the project, then what do terms like Hive
committer, HCat committer mean? Also should Hive committers have privileges
to commit into HCat part of the code once it becomes a subproject. I think
we are just creating walls and the problem with walls is that they just
impede cross pollination and community expansion.

Ashish


On Thu, Dec 20, 2012 at 1:59 PM, Carl Steinbach  wrote:

> I agree with Namit on this issue. I don't think it's fair to the
> existing group of Hive contributors to give preferential
> treatment to HCat committers, or to automatically promote them to
> full committer status on the Hive project.
>
> On Thu, Dec 20, 2012 at 1:10 PM, Bhandarkar, Milind <
> milind.bhandar...@emc.com> wrote:
>
> > I agree with Ashish.
> >
> > When Hcat becomes a subproject of Hive, all Hcat committers should
> > immediately become Hive committers.
> >
> > After all, that worked well for Hadoop, where all Hadoop committers can
> > commit to all Hadoop code (common/HDFS/MapReduce), but not all do,
> instead
> > focusing only on their area of expertise, and familiarity with portions
> of
> > codebase.
> >
> > - milind
> >
> > ---
> > Milind Bhandarkar
> > Chief Scientist,
> > Machine Learning Platforms,
> > Greenplum, A Division of EMC
> > +1-650-523-3858 (W)
> > +1-408-666-8483 (C)
> >
> >
> >
> >
> >
> > On 12/20/12 5:58 AM, "Ashish Thusoo"  wrote:
> >
> > >Actually I don't understand why getting Hcat folks as committers on Hive
> > >is
> > >a problem. Hive itself became a subproject of Hadoop when it started
> with
> > >all the Hive committers becoming Hadoop committers. And of course
> everyone
> > >maintained the discipline that they commit in parts of the code that
> they
> > >understand and that they have worked on. Some of the committers from
> Hive
> > >ended up becoming Hadoop committers - others who worked only on Hive
> ended
> > >up leaving the Hadoop committers list once Hive became a TLP. So why put
> > >in
> > >these arguments about process when the end result would be beneficial to
> > >the community and to the project. Would Hive not benefit if some folks
> > >from
> > >Hcat start working on Hive proper as well - of course under the guidance
> > >of
> > >Hive mentors etc. Would the project not benefit in the long run if Hcat
> is
> > >brought in and some day becomes the default metastore for Hive. I mean
> if
> > >there are so many long term benefits from this then why focus on control
> > >and code safety which I think any responsible committer knows how to
> > >navigate and there are well understood best practices for that. And why
> > >can't a committer be booted out if he/she is breaking the discipline and
> > >really nosing in places which he/she does not understand.
> > >
> > >I mean if we agree that directionally Hcat being a part of Hive makes
> > >sense
> > >then why don't we try to get rid of the procedural elements that would
> > >only
> > >slow down that transition? If there is angst about specific people on
> Hcat
> > >committers list on the Hive committers side (are there any?), then I
> think
> > >that should be addressed on a case by case basis but why enforce a
> general
> > >rule. In the same vein why have a rule saying in 6-9 months a Hcat
> > >committer becomes a Hive committer - how is that helpful? If they are
> > >changing the Hcat subproject in Hive are they not already Hive
> committers?
> > >And if they gain the expertise to review and commit code in the
> > >SemanticAnalyzer in a few months should they not be able to do that
> before
> > >9 months are over? And if they don't get that expertise in 9 months
> would
> > >they really review and commit anything in the SemanticAnalyzer - I mean
> > >there are Hive committers who don't touch that piece of code today. no?
> > >
> > >Ashish
> > >
> > >
> > >On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain  wrote:
> > >
> > >> I don’t agree with the proposal. It is impractical to have a Hcat
> > >>committer
> > >> with commit access to Hcat only portions of Hive. We cannot guarantee
> > >>that
> > >> a Hcat
> > >> committer will become a Hive committer in 6-9 months, that depends on
> > >>what
> > >> they do
> > >> in the next 6-9 months.
> > >>
> > >> The current Hcat committers should spend more time in reviewing
> patches,
> > >> work on non-Hcat areas in Hive, and then gradually become a hive
> > >> committer. They should not be given any preferential treatment, and
> t

[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3803:


Status: Open  (was: Patch Available)

> explain dependency should show the dependencies hierarchically in presence of 
> views
> ---
>
> Key: HIVE-3803
> URL: https://issues.apache.org/jira/browse/HIVE-3803
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
> hive.3803.4.patch, hive.3803.5.patch
>
>
> It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537494#comment-13537494
 ] 

Kevin Wilfong commented on HIVE-3803:
-

Couple requests for additional comments on Phabricator.  Otherwise looks good.

> explain dependency should show the dependencies hierarchically in presence of 
> views
> ---
>
> Key: HIVE-3803
> URL: https://issues.apache.org/jira/browse/HIVE-3803
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
> hive.3803.4.patch, hive.3803.5.patch
>
>
> It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537479#comment-13537479
 ] 

Vinod Kumar Vavilapalli commented on HIVE-3784:
---

Was trying to play with the patch, and my earlier concern resurfaced.
bq.  With different join keys, it needs some work to merge into a single MR 
anyway - that work is independent of this change.
That isn't true. Even today, I am able to get hive to automatically merge 
multi-way map-join with different join keys into a single map-only job. With 
this patch, we are losing that functionality. For e.g., the following runs as a 
single Map only job:
{noformat}
select /*+MAPJOIN(smallTableTwo)*/ idOne, idTwo, value FROM
( select /*+MAPJOIN(smallTableOne)*/ idOne, idTwo, value FROM
  bigTable   
  JOIN  
  
  smallTableOne on (bigTable.idOne = smallTableOne.idOne)   

  ) firstjoin   
  
JOIN
  
smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo)
   
{noformat}


> de-emphasize mapjoin hint
> -
>
> Key: HIVE-3784
> URL: https://issues.apache.org/jira/browse/HIVE-3784
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
> hive.3784.4.patch, hive.3784.5.patch
>
>
> hive.auto.convert.join has been around for a long time, and is pretty stable.
> When mapjoin hint was created, the above parameter did not exist.
> The only reason for the user to specify a mapjoin currently is if they want
> it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
> Eventually, that should also go away, but that may take some time to 
> stabilize.
> There are many rules in SemanticAnalyzer to handle the following trees:
> ReduceSink -> MapJoin
> Union  -> MapJoin
> MapJoin-> MapJoin
> This should not be supported anymore. In any of the above scenarios, the
> user can get the mapjoin behavior by setting hive.auto.convert.join to true
> and not specifying the hint. This will simplify the code a lot.
> What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537462#comment-13537462
 ] 

Kevin Wilfong commented on HIVE-3552:
-

A few more comments on Phabricator.

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.1.patch, 
> hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
> hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3552:


Status: Open  (was: Patch Available)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.1.patch, 
> hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
> hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3825:


Attachment: HIVE-3825.txt

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3825:


Status: Patch Available  (was: Open)

Initial implemenation

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Carl Steinbach
> Would the project not benefit in the long run if Hcat is
> brought in and some day becomes the default metastore for Hive.


Folks, can we please try to keep this straight? HCatalog on its own does
not provide any support for metadata. It is a set of wrapper APIs that make
Hive's metastore and serdes accessible to Pig and MR. I think these
wrappers provide a lot of value, and I'm eager to see them merged into
Hive, but I'm dreading the marketing campaign that I suspect will follow:
"Hive now supports metadata thanks to HCatalog!".


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Carl Steinbach
I agree with Namit on this issue. I don't think it's fair to the
existing group of Hive contributors to give preferential
treatment to HCat committers, or to automatically promote them to
full committer status on the Hive project.

On Thu, Dec 20, 2012 at 1:10 PM, Bhandarkar, Milind <
milind.bhandar...@emc.com> wrote:

> I agree with Ashish.
>
> When Hcat becomes a subproject of Hive, all Hcat committers should
> immediately become Hive committers.
>
> After all, that worked well for Hadoop, where all Hadoop committers can
> commit to all Hadoop code (common/HDFS/MapReduce), but not all do, instead
> focusing only on their area of expertise, and familiarity with portions of
> codebase.
>
> - milind
>
> ---
> Milind Bhandarkar
> Chief Scientist,
> Machine Learning Platforms,
> Greenplum, A Division of EMC
> +1-650-523-3858 (W)
> +1-408-666-8483 (C)
>
>
>
>
>
> On 12/20/12 5:58 AM, "Ashish Thusoo"  wrote:
>
> >Actually I don't understand why getting Hcat folks as committers on Hive
> >is
> >a problem. Hive itself became a subproject of Hadoop when it started with
> >all the Hive committers becoming Hadoop committers. And of course everyone
> >maintained the discipline that they commit in parts of the code that they
> >understand and that they have worked on. Some of the committers from Hive
> >ended up becoming Hadoop committers - others who worked only on Hive ended
> >up leaving the Hadoop committers list once Hive became a TLP. So why put
> >in
> >these arguments about process when the end result would be beneficial to
> >the community and to the project. Would Hive not benefit if some folks
> >from
> >Hcat start working on Hive proper as well - of course under the guidance
> >of
> >Hive mentors etc. Would the project not benefit in the long run if Hcat is
> >brought in and some day becomes the default metastore for Hive. I mean if
> >there are so many long term benefits from this then why focus on control
> >and code safety which I think any responsible committer knows how to
> >navigate and there are well understood best practices for that. And why
> >can't a committer be booted out if he/she is breaking the discipline and
> >really nosing in places which he/she does not understand.
> >
> >I mean if we agree that directionally Hcat being a part of Hive makes
> >sense
> >then why don't we try to get rid of the procedural elements that would
> >only
> >slow down that transition? If there is angst about specific people on Hcat
> >committers list on the Hive committers side (are there any?), then I think
> >that should be addressed on a case by case basis but why enforce a general
> >rule. In the same vein why have a rule saying in 6-9 months a Hcat
> >committer becomes a Hive committer - how is that helpful? If they are
> >changing the Hcat subproject in Hive are they not already Hive committers?
> >And if they gain the expertise to review and commit code in the
> >SemanticAnalyzer in a few months should they not be able to do that before
> >9 months are over? And if they don't get that expertise in 9 months would
> >they really review and commit anything in the SemanticAnalyzer - I mean
> >there are Hive committers who don't touch that piece of code today. no?
> >
> >Ashish
> >
> >
> >On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain  wrote:
> >
> >> I don’t agree with the proposal. It is impractical to have a Hcat
> >>committer
> >> with commit access to Hcat only portions of Hive. We cannot guarantee
> >>that
> >> a Hcat
> >> committer will become a Hive committer in 6-9 months, that depends on
> >>what
> >> they do
> >> in the next 6-9 months.
> >>
> >> The current Hcat committers should spend more time in reviewing patches,
> >> work on non-Hcat areas in Hive, and then gradually become a hive
> >> committer. They should not be given any preferential treatment, and the
> >> process should be same as it would be for any other hive contributor
> >> currently. Given that the expertise of the Hcat committers, they should
> >> be inline for becoming a hive committer if they continue to work in
> >>hive,
> >> but that cannot be guaranteed. I agree that some Hive committers should
> >>try
> >> and help the existing Hcat patches, and again that is voluntary and
> >> different
> >> committers cannot be assigned to different parts of the code.
> >>
> >> Thanks,
> >> -namit
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 12/20/12 1:03 AM, "Carl Steinbach"  wrote:
> >>
> >> >Alan's proposal sounds like a good idea to me.
> >> >
> >> >+1
> >> >
> >> >On Dec 18, 2012 5:36 PM, "Travis Crawford" 
> >> >wrote:
> >> >
> >> >> Alan, I think your proposal sounds great.
> >> >>
> >> >> --travis
> >> >>
> >> >> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates 
> >> >>wrote:
> >> >> > Carl, speaking just for myself and not as a representative of the
> >>HCat
> >> >> PPMC at this point, I am coming to agree with you that HCat
> >>integrating
> >> >> with Hive fully makes more sense.
> >> >> >
> >> >> > However, this makes the committer quest

[jira] [Commented] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537407#comment-13537407
 ] 

Kevin Wilfong commented on HIVE-3826:
-

https://reviews.facebook.net/D7539

> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> -
>
> Key: HIVE-3826
> URL: https://issues.apache.org/jira/browse/HIVE-3826
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.11
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3826:


Attachment: HIVE-3826.1.patch.txt

> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> -
>
> Key: HIVE-3826
> URL: https://issues.apache.org/jira/browse/HIVE-3826
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.11
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-20 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3826:
---

 Summary: Rollbacks and retries of drops cause 
org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)
 Key: HIVE-3826
 URL: https://issues.apache.org/jira/browse/HIVE-3826
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong


I'm not sure if this is the only cause of the exception 
"org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
row)" from the metastore, but one cause seems to be related to a drop command 
failing, and being retried by the client.

Based on focusing on a single thread in the metastore with DEBUG level logging, 
I was seeing the objects that were intended to be dropped remaining in the 
PersistenceManager cache even after a rollback.  The steps seemed to be as 
follows:

1) First attempt to drop the table, the table is pulled into the 
PersistenceManager cache for the purposes of dropping
2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
causes a rollback of the transaction
3) The drop is retried using a different thread on the metastore Thrift server 
or a different server and succeeds
4) Back on the original thread of the original Thrift server someone tries to 
perform some write operation which produces a commit.  This causes those 
detached objects related to the dropped table to attempt to reattach, causing 
JDO to query the SQL backend for those objects which it can't find.  This 
causes the exception.

I was able to reproduce this regularly using the following sequence of commands:
Hive client 1 (Hive1): connected to a metastore Thrift server running a single 
thread, I hard coded a RuntimeException into the code to drop a table in the 
ObjectStore, specifically right before the commit in preDropStorageDescriptor, 
to induce a rollback
Hive client 2 (Hive2): connected to a separate metastore Thrift server running 
with standard configs and code

1: On Hive1, CREATE TABLE t1 (c STRING);
2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
3: On Hive2, DROP TABLE t1; // Succeeds
4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not sure 
why this was necessary, but it didn't work without it, it seemed to have an 
affect on the order objects were committed in the next step
5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
with the NucleusObjectNotFoundException

The object that would cause the exception varied, I saw the MTable, the 
MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW

2012-12-20 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537398#comment-13537398
 ] 

Mark Grover commented on HIVE-3104:
---

[~cyril.liao] can you create a separate JIRA for the above please? 

> Predicate pushdown doesn't work with multi-insert statements using LATERAL 
> VIEW
> ---
>
> Key: HIVE-3104
> URL: https://issues.apache.org/jira/browse/HIVE-3104
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0
> Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0
>Reporter: Mark Grover
>
> Predicate pushdown seems to work for single-insert queries using LATERAL 
> VIEW. It also seems to work for multi-insert queries *not* using LATERAL 
> VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW.
> Here are some examples. In the below examples, I make use of the fact that a 
> query with no partition filtering when run under "hive.mapred.mode=strict" 
> fails.
> --Table creation and population
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (col1 array, col2 int)  PARTITIONED BY (part_col int);
> INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), 
> count(*) FROM test;
> INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
> count(*) FROM test;
> -- Query 1
> -- This succeeds (using LATERAL VIEW with single insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT exp_col1
> WHERE (part_col=2);
> -- Query 2
> -- This succeeds (NOT using LATERAL VIEW with multi-insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT col1
> WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
> SELECT col1
> WHERE (part_col=2);
> -- Query 3
> -- This fails (using LATERAL VIEW with multi-insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT exp_col1
> WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
> SELECT exp_col1
> WHERE (part_col=2);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3825) Add Operator level Hooks

2012-12-20 Thread Pamela Vagata (JIRA)
Pamela Vagata created HIVE-3825:
---

 Summary: Add Operator level Hooks
 Key: HIVE-3825
 URL: https://issues.apache.org/jira/browse/HIVE-3825
 Project: Hive
  Issue Type: New Feature
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Bhandarkar, Milind
I agree with Ashish.

When Hcat becomes a subproject of Hive, all Hcat committers should
immediately become Hive committers.

After all, that worked well for Hadoop, where all Hadoop committers can
commit to all Hadoop code (common/HDFS/MapReduce), but not all do, instead
focusing only on their area of expertise, and familiarity with portions of
codebase.

- milind

---
Milind Bhandarkar
Chief Scientist,
Machine Learning Platforms,
Greenplum, A Division of EMC
+1-650-523-3858 (W)
+1-408-666-8483 (C)





On 12/20/12 5:58 AM, "Ashish Thusoo"  wrote:

>Actually I don't understand why getting Hcat folks as committers on Hive
>is
>a problem. Hive itself became a subproject of Hadoop when it started with
>all the Hive committers becoming Hadoop committers. And of course everyone
>maintained the discipline that they commit in parts of the code that they
>understand and that they have worked on. Some of the committers from Hive
>ended up becoming Hadoop committers - others who worked only on Hive ended
>up leaving the Hadoop committers list once Hive became a TLP. So why put
>in
>these arguments about process when the end result would be beneficial to
>the community and to the project. Would Hive not benefit if some folks
>from
>Hcat start working on Hive proper as well - of course under the guidance
>of
>Hive mentors etc. Would the project not benefit in the long run if Hcat is
>brought in and some day becomes the default metastore for Hive. I mean if
>there are so many long term benefits from this then why focus on control
>and code safety which I think any responsible committer knows how to
>navigate and there are well understood best practices for that. And why
>can't a committer be booted out if he/she is breaking the discipline and
>really nosing in places which he/she does not understand.
>
>I mean if we agree that directionally Hcat being a part of Hive makes
>sense
>then why don't we try to get rid of the procedural elements that would
>only
>slow down that transition? If there is angst about specific people on Hcat
>committers list on the Hive committers side (are there any?), then I think
>that should be addressed on a case by case basis but why enforce a general
>rule. In the same vein why have a rule saying in 6-9 months a Hcat
>committer becomes a Hive committer - how is that helpful? If they are
>changing the Hcat subproject in Hive are they not already Hive committers?
>And if they gain the expertise to review and commit code in the
>SemanticAnalyzer in a few months should they not be able to do that before
>9 months are over? And if they don't get that expertise in 9 months would
>they really review and commit anything in the SemanticAnalyzer - I mean
>there are Hive committers who don't touch that piece of code today. no?
>
>Ashish
>
>
>On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain  wrote:
>
>> I don’t agree with the proposal. It is impractical to have a Hcat
>>committer
>> with commit access to Hcat only portions of Hive. We cannot guarantee
>>that
>> a Hcat
>> committer will become a Hive committer in 6-9 months, that depends on
>>what
>> they do
>> in the next 6-9 months.
>>
>> The current Hcat committers should spend more time in reviewing patches,
>> work on non-Hcat areas in Hive, and then gradually become a hive
>> committer. They should not be given any preferential treatment, and the
>> process should be same as it would be for any other hive contributor
>> currently. Given that the expertise of the Hcat committers, they should
>> be inline for becoming a hive committer if they continue to work in
>>hive,
>> but that cannot be guaranteed. I agree that some Hive committers should
>>try
>> and help the existing Hcat patches, and again that is voluntary and
>> different
>> committers cannot be assigned to different parts of the code.
>>
>> Thanks,
>> -namit
>>
>>
>>
>>
>>
>>
>>
>> On 12/20/12 1:03 AM, "Carl Steinbach"  wrote:
>>
>> >Alan's proposal sounds like a good idea to me.
>> >
>> >+1
>> >
>> >On Dec 18, 2012 5:36 PM, "Travis Crawford" 
>> >wrote:
>> >
>> >> Alan, I think your proposal sounds great.
>> >>
>> >> --travis
>> >>
>> >> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates 
>> >>wrote:
>> >> > Carl, speaking just for myself and not as a representative of the
>>HCat
>> >> PPMC at this point, I am coming to agree with you that HCat
>>integrating
>> >> with Hive fully makes more sense.
>> >> >
>> >> > However, this makes the committer question even thornier.  Travis
>>and
>> >> Namit, I think the shepherd proposal needs to lay out a clear and
>>time
>> >> bounded path to committership for HCat committers.  Having HCat
>> >>committers
>> >> as second class Hive citizens for the long run will not be healthy.
>>I
>> >> propose the following as a starting point for discussion:
>> >> >
>> >> > All active HCat committers (those who have contributed or
>>committed a
>> >> patch in the last 6 months) will be made committers in the HCat
>>portion
>> >> only of Hive.  In addition those committers

[jira] [Work started] (HIVE-3752) Add a non-sql API in hive to access data.

2012-12-20 Thread Nitay Joffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3752 started by Nitay Joffe.

> Add a non-sql API in hive to access data.
> -
>
> Key: HIVE-3752
> URL: https://issues.apache.org/jira/browse/HIVE-3752
> Project: Hive
>  Issue Type: Improvement
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> We would like to add an input/output format for accessing Hive data in Hadoop 
> directly without having to use e.g. a transform. Using a transform
> means having to do a whole map-reduce step with its own disk accesses and its 
> imposed structure. It also means needing to have Hive be the base 
> infrastructure for the entire system being developed which is not the right 
> fit as we only need a small part of it (access to the data).
> So we propose adding an API level InputFormat and OutputFormat to Hive that 
> will make it trivially easy to select a table with partition spec and read 
> from / write to it. We chose this design to make it compatible with Hadoop so 
> that existing systems that work with Hadoop's IO API will just work out of 
> the box.
> We need this system for the Giraph graph processing system 
> (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
> is a common use case.
> [~namitjain] [~aching] [~kevinwilfong] [~apresta]
> Input-side (HiveApiInputFormat) review: https://reviews.facebook.net/D7401

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Status: Patch Available  (was: Open)

patch 3 is updated to use 30011 error message

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work stopped] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3718 stopped by Pamela Vagata.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3718 started by Pamela Vagata.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work stopped] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3718 stopped by Pamela Vagata.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-20 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3718 started by Pamela Vagata.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #235

2012-12-20 Thread Apache Jenkins Server
See 

--
[...truncated 36456 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-12-20_12-03-59_400_9031965450249765184/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201212201204_1677301696.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2012-12-20_12-04-02_935_6520683087398376556/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-12-20_12-04-02_935_6520683087398376556/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201212201204_27099639.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201212201204_1584040872.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201212201204_1472898889.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: 

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #235

2012-12-20 Thread Apache Jenkins Server
See 


--
[...truncated 9916 lines...]

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 

[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 


test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.20.2 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 1.0.0 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.23.3 
(

Hive-trunk-h0.21 - Build # 1868 - Failure

2012-12-20 Thread Apache Jenkins Server
Changes for Build #1868
[kevinwilfong] HIVE-3728. make optimizing multi-group by configurable. (njain 
via kevinwilfong)

[kevinwilfong] HIVE-3757. union_remove_9.q fails in trunk (hadoop 23) (njain 
via kevinwilfong)




No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1868)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1868/ to 
view the results.

Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-20 Thread Ashish Thusoo
Actually I don't understand why getting Hcat folks as committers on Hive is
a problem. Hive itself became a subproject of Hadoop when it started with
all the Hive committers becoming Hadoop committers. And of course everyone
maintained the discipline that they commit in parts of the code that they
understand and that they have worked on. Some of the committers from Hive
ended up becoming Hadoop committers - others who worked only on Hive ended
up leaving the Hadoop committers list once Hive became a TLP. So why put in
these arguments about process when the end result would be beneficial to
the community and to the project. Would Hive not benefit if some folks from
Hcat start working on Hive proper as well - of course under the guidance of
Hive mentors etc. Would the project not benefit in the long run if Hcat is
brought in and some day becomes the default metastore for Hive. I mean if
there are so many long term benefits from this then why focus on control
and code safety which I think any responsible committer knows how to
navigate and there are well understood best practices for that. And why
can't a committer be booted out if he/she is breaking the discipline and
really nosing in places which he/she does not understand.

I mean if we agree that directionally Hcat being a part of Hive makes sense
then why don't we try to get rid of the procedural elements that would only
slow down that transition? If there is angst about specific people on Hcat
committers list on the Hive committers side (are there any?), then I think
that should be addressed on a case by case basis but why enforce a general
rule. In the same vein why have a rule saying in 6-9 months a Hcat
committer becomes a Hive committer - how is that helpful? If they are
changing the Hcat subproject in Hive are they not already Hive committers?
And if they gain the expertise to review and commit code in the
SemanticAnalyzer in a few months should they not be able to do that before
9 months are over? And if they don't get that expertise in 9 months would
they really review and commit anything in the SemanticAnalyzer - I mean
there are Hive committers who don't touch that piece of code today. no?

Ashish


On Wed, Dec 19, 2012 at 8:23 PM, Namit Jain  wrote:

> I don’t agree with the proposal. It is impractical to have a Hcat committer
> with commit access to Hcat only portions of Hive. We cannot guarantee that
> a Hcat
> committer will become a Hive committer in 6-9 months, that depends on what
> they do
> in the next 6-9 months.
>
> The current Hcat committers should spend more time in reviewing patches,
> work on non-Hcat areas in Hive, and then gradually become a hive
> committer. They should not be given any preferential treatment, and the
> process should be same as it would be for any other hive contributor
> currently. Given that the expertise of the Hcat committers, they should
> be inline for becoming a hive committer if they continue to work in hive,
> but that cannot be guaranteed. I agree that some Hive committers should try
> and help the existing Hcat patches, and again that is voluntary and
> different
> committers cannot be assigned to different parts of the code.
>
> Thanks,
> -namit
>
>
>
>
>
>
>
> On 12/20/12 1:03 AM, "Carl Steinbach"  wrote:
>
> >Alan's proposal sounds like a good idea to me.
> >
> >+1
> >
> >On Dec 18, 2012 5:36 PM, "Travis Crawford" 
> >wrote:
> >
> >> Alan, I think your proposal sounds great.
> >>
> >> --travis
> >>
> >> On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates 
> >>wrote:
> >> > Carl, speaking just for myself and not as a representative of the HCat
> >> PPMC at this point, I am coming to agree with you that HCat integrating
> >> with Hive fully makes more sense.
> >> >
> >> > However, this makes the committer question even thornier.  Travis and
> >> Namit, I think the shepherd proposal needs to lay out a clear and time
> >> bounded path to committership for HCat committers.  Having HCat
> >>committers
> >> as second class Hive citizens for the long run will not be healthy.  I
> >> propose the following as a starting point for discussion:
> >> >
> >> > All active HCat committers (those who have contributed or committed a
> >> patch in the last 6 months) will be made committers in the HCat portion
> >> only of Hive.  In addition those committers will be assigned a
> >>particular
> >> shepherd who is a current Hive committer and who will be responsible for
> >> mentoring them towards full Hive committership.  As a part of this
> >> mentorship the HCat committer will review patches of other contributors,
> >> contribute patches to Hive (both inside and outside of HCatalog),
> >>respond
> >> to user issues on the mailing lists, etc.  It is intended that as a
> >>result
> >> of this mentorship program HCat committers can become full Hive
> >>committers
> >> in 6-9 months.  No new HCat only committers will be elected in Hive
> >>after
> >> this.  All Hive committers will automatically also have commit rights on
> >

[jira] [Commented] (HIVE-3728) make optimizing multi-group by configurable

2012-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537023#comment-13537023
 ] 

Hudson commented on HIVE-3728:
--

Integrated in Hive-trunk-h0.21 #1868 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1868/])
HIVE-3728. make optimizing multi-group by configurable. (njain via 
kevinwilfong) (Revision 1424292)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424292
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* 
/hive/trunk/ql/src/test/queries/clientpositive/groupby_mutli_insert_common_distinct.q
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_mutli_insert_common_distinct.q.out


> make optimizing multi-group by configurable
> ---
>
> Key: HIVE-3728
> URL: https://issues.apache.org/jira/browse/HIVE-3728
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11
>
> Attachments: hive.3728.2.patch, hive.3728.3.patch
>
>
> This was done as part of https://issues.apache.org/jira/browse/HIVE-609.
> This should be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3757) union_remove_9.q fails in trunk (hadoop 23)

2012-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537024#comment-13537024
 ] 

Hudson commented on HIVE-3757:
--

Integrated in Hive-trunk-h0.21 #1868 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1868/])
HIVE-3757. union_remove_9.q fails in trunk (hadoop 23) (njain via 
kevinwilfong) (Revision 1424290)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424290
Files : 
* /hive/trunk/ql/src/test/results/clientpositive/union_remove_9.q.out


> union_remove_9.q fails in trunk (hadoop 23)
> ---
>
> Key: HIVE-3757
> URL: https://issues.apache.org/jira/browse/HIVE-3757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: Gang Tim Liu
>Assignee: Namit Jain
> Fix For: 0.11
>
> Attachments: hive.3757.1.patch
>
>
> check out the latest code from trunk
> {code}
> svn info
> {code}
> {quote}
> Path: .
> URL: http://svn.apache.org/repos/asf/hive/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 1415321
> Node Kind: directory
> Schedule: normal
> Last Changed Author: hashutosh
> Last Changed Rev: 1415278
> Last Changed Date: 2012-11-29 09:11:53 -0800 (Thu, 29 Nov 2012)
> {quote}
> {code}
> ant -Dhadoop.version=0.23.3 -Dhadoop-0.23.version=0.23.3 -Dhadoop.mr.rev=23 
> test -Dtestcase=TestCliDriver -Dqfile=union_remove_9.q
> {code}
> {quote}
> [junit] diff -a 
> /Users/gang/hive-trunk-11-29/build/ql/test/logs/clientpositive/union_remove_9.q.out
>  
> /Users/gang/hive-trunk-11-29/ql/src/test/results/clientpositive/union_remove_9.q.out
> [junit] 106c106
> [junit] <   expr: UDFToLong(_col1)
> [junit] ---
> [junit] >   expr: _col1
> [junit] 109,123c109,116
> [junit] < Select Operator
> [junit] <   expressions:
> [junit] < expr: _col0
> [junit] < type: string
> [junit] < expr: _col1
> [junit] < type: bigint
> [junit] <   outputColumnNames: _col0, _col1
> [junit] <   File Output Operator
> [junit] < compressed: false
> [junit] < GlobalTableId: 1
> [junit] < table:
> [junit] < input format: 
> [junit] < output format: 
> [junit] < serde: 
> [junit] < name: default.outputtbl1
> [junit] ---
> [junit] > File Output Operator
> [junit] >   compressed: false
> [junit] >   GlobalTableId: 1
> [junit] >   table:
> [junit] >   input format: 
> [junit] >   output format: 
> [junit] >   serde: 
> [junit] >   name: default.outputtbl1
> [junit] <   expr: UDFToLong(_col1)
> [junit] ---
> [junit] >   expr: _col1
> [junit] 149,163c142,149
> [junit] < Select Operator
> [junit] <   expressions:
> [junit] < expr: _col0
> [junit] < type: string
> [junit] < expr: _col1
> [junit] < type: bigint
> [junit] <   outputColumnNames: _col0, _col1
> [junit] <   File Output Operator
> [junit] < compressed: false
> [junit] < GlobalTableId: 1
> [junit] < table:
> [junit] < input format: 
> [junit] < output format: 
> [junit] < serde: 
> [junit] < name: default.outputtbl1
> [junit] ---
> [junit] > File Output Operator
> [junit] >   compressed: false
> [junit] >   GlobalTableId: 1
> [junit] >   table:
> [junit] >   input format: 
> [junit] >   output format: 
> [junit] >   serde: 
> [junit] >   name: default.outputtbl1
> [junit] Failed query: union_remove_9.q
> {quote}

--
This message is aut

[jira] [Updated] (HIVE-2439) Upgrade antlr version to 3.4

2012-12-20 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-2439:
---

Fix Version/s: 0.11
   0.9.1
   0.10.0
 Assignee: Thiruvel Thirumoolan
   Status: Patch Available  (was: Open)

> Upgrade antlr version to 3.4
> 
>
> Key: HIVE-2439
> URL: https://issues.apache.org/jira/browse/HIVE-2439
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Ashutosh Chauhan
>Assignee: Thiruvel Thirumoolan
> Fix For: 0.10.0, 0.9.1, 0.11
>
> Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
> HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch
>
>
> Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2439) Upgrade antlr version to 3.4

2012-12-20 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536973#comment-13536973
 ] 

Thiruvel Thirumoolan commented on HIVE-2439:


Patch uploaded to Phabricator - https://reviews.facebook.net/D7527

> Upgrade antlr version to 3.4
> 
>
> Key: HIVE-2439
> URL: https://issues.apache.org/jira/browse/HIVE-2439
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
> HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch
>
>
> Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3821) RCFile does not work with lazyBinarySerDe

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3821:
-

Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

> RCFile does not work with lazyBinarySerDe
> -
>
> Key: HIVE-3821
> URL: https://issues.apache.org/jira/browse/HIVE-3821
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> create table tst(key string, value string) row format serde 
> 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' stored as rcfile;  
> insert overwrite table tst select * from src;
> gets an error:
> Caused by: java.lang.UnsupportedOperationException: Currently the writer can 
> only accept BytesRefArrayWritable
> at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:882)
> at 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3821) RCFile does not work with lazyBinarySerDe

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3821:
-

Status: Patch Available  (was: Open)

> RCFile does not work with lazyBinarySerDe
> -
>
> Key: HIVE-3821
> URL: https://issues.apache.org/jira/browse/HIVE-3821
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> create table tst(key string, value string) row format serde 
> 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' stored as rcfile;  
> insert overwrite table tst select * from src;
> gets an error:
> Caused by: java.lang.UnsupportedOperationException: Currently the writer can 
> only accept BytesRefArrayWritable
> at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:882)
> at 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3821) RCFile does not work with lazyBinarySerDe

2012-12-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536967#comment-13536967
 ] 

Namit Jain commented on HIVE-3821:
--

LazyBinaryColumnarSerDe already exists for the same.

> RCFile does not work with lazyBinarySerDe
> -
>
> Key: HIVE-3821
> URL: https://issues.apache.org/jira/browse/HIVE-3821
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> create table tst(key string, value string) row format serde 
> 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' stored as rcfile;  
> insert overwrite table tst select * from src;
> gets an error:
> Caused by: java.lang.UnsupportedOperationException: Currently the writer can 
> only accept BytesRefArrayWritable
> at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:882)
> at 
> org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
> ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3818) Identify VOID datatype in generated table and notify the user accordingly

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3818:
-

Status: Open  (was: Patch Available)

comments on phabricator

> Identify VOID datatype in generated table and notify the user accordingly
> -
>
> Key: HIVE-3818
> URL: https://issues.apache.org/jira/browse/HIVE-3818
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Gary Colman
>Assignee: Gary Colman
>Priority: Trivial
> Attachments: 
> Patch_to_throw_useful_error_msg_for_VOID_datatype_in_created_table.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When using rcfile as a datastore, generating a table from a select statement 
> with a null field results in a void data type, and throws an exception 
> (Internal error: no LazyObject for VOID).
> eg.
>   set hive.default.fileformat=RCFILE;
>   CREATE TABLE test_table AS SELECT NULL, key FROM src;
> Make the message to the user a little more intuitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3818) Identify VOID datatype in generated table and notify the user accordingly

2012-12-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536946#comment-13536946
 ] 

Namit Jain commented on HIVE-3818:
--

Can you create a phabricator entry ?
https://cwiki.apache.org/Hive/phabricatorcodereview.html


https://reviews.facebook.net/D7521

> Identify VOID datatype in generated table and notify the user accordingly
> -
>
> Key: HIVE-3818
> URL: https://issues.apache.org/jira/browse/HIVE-3818
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Gary Colman
>Assignee: Gary Colman
>Priority: Trivial
> Attachments: 
> Patch_to_throw_useful_error_msg_for_VOID_datatype_in_created_table.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When using rcfile as a datastore, generating a table from a select statement 
> with a null field results in a void data type, and throws an exception 
> (Internal error: no LazyObject for VOID).
> eg.
>   set hive.default.fileformat=RCFILE;
>   CREATE TABLE test_table AS SELECT NULL, key FROM src;
> Make the message to the user a little more intuitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3824:
-

Attachment: hive.3824.1.patch

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
> Attachments: hive.3824.1.patch
>
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3104) Predicate pushdown doesn't work with multi-insert statements using LATERAL VIEW

2012-12-20 Thread cyril liao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536934#comment-13536934
 ] 

cyril liao commented on HIVE-3104:
--

LATER VIEW doesn't work with UNION ALL too.
query NO.1:
 SELECT
 1 as from_pid,
 1 as to_pid,
 cid as from_path,
 (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
 0 as status
FROM
(SELECT union_map(c_map) AS c_map
 FROM
 (SELECT collect_map(id,parent_id)AS c_map
  FROM
  wl_channels
  GROUP BY id,parent_id
  )tmp
)tmp2
LATERAL VIEW recursion_concat(c_map) a AS cid, pid

this query returns about 1 rows ,and there status is 0.

query NO.2:
 select
  a.from_pid as from_pid,
  a.to_pid as to_pid, 
  a.from_path as from_path,
  a.to_path as to_path,
  a.status as status
from wl_dc_channels a
  where a.status <> 0

this query returns about 100 rows ,and there status is 1 or 2.

query NO.3:
select
  from_pid,
  to_pid,
  from_path,
  to_path,
  status
 from
(
 SELECT
 1 as from_pid,
 1 as to_pid,
 cid as from_path,
 (CASE WHEN pid=0 THEN cid ELSE pid END) as to_path,
 0 as status
FROM
(SELECT union_map(c_map) AS c_map
 FROM
 (SELECT collect_map(id,parent_id)AS c_map
  FROM
  wl_channels
  GROUP BY id,parent_id
  )tmp
)tmp2
LATERAL VIEW recursion_concat(c_map) a AS cid, pid
union all
 select
  a.from_pid as from_pid,
  a.to_pid as to_pid, 
  a.from_path as from_path,
  a.to_path as to_path,
  a.status as status
from wl_dc_channels a
  where a.status <> 0
) unin_tbl

this query has the same result as query NO.2

> Predicate pushdown doesn't work with multi-insert statements using LATERAL 
> VIEW
> ---
>
> Key: HIVE-3104
> URL: https://issues.apache.org/jira/browse/HIVE-3104
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0
> Environment: Apache Hive 0.9.0, Apache Hadoop 0.20.205.0
>Reporter: Mark Grover
>
> Predicate pushdown seems to work for single-insert queries using LATERAL 
> VIEW. It also seems to work for multi-insert queries *not* using LATERAL 
> VIEW. However, it doesn't work for multi-insert queries using LATERAL VIEW.
> Here are some examples. In the below examples, I make use of the fact that a 
> query with no partition filtering when run under "hive.mapred.mode=strict" 
> fails.
> --Table creation and population
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (col1 array, col2 int)  PARTITIONED BY (part_col int);
> INSERT OVERWRITE TABLE test PARTITION (part_col=1) SELECT array(1,2), 
> count(*) FROM test;
> INSERT OVERWRITE TABLE test PARTITION (part_col=2) SELECT array(2,4,6), 
> count(*) FROM test;
> -- Query 1
> -- This succeeds (using LATERAL VIEW with single insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT exp_col1
> WHERE (part_col=2);
> -- Query 2
> -- This succeeds (NOT using LATERAL VIEW with multi-insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT col1
> WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
> SELECT col1
> WHERE (part_col=2);
> -- Query 3
> -- This fails (using LATERAL VIEW with multi-insert)
> set hive.mapred.mode=strict;
> FROM partition_test
> LATERAL VIEW explode(col1) tmp AS exp_col1
> INSERT OVERWRITE DIRECTORY '/test/1'
> SELECT exp_col1
> WHERE (part_col=2)
> INSERT OVERWRITE DIRECTORY '/test/2'
> SELECT exp_col1
> WHERE (part_col=2);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536916#comment-13536916
 ] 

Namit Jain commented on HIVE-3824:
--

https://reviews.facebook.net/D7515

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-20 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3824:


 Summary: bug if different serdes are used for different partitions
 Key: HIVE-3824
 URL: https://issues.apache.org/jira/browse/HIVE-3824
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain


Consider the following testcase:

create table tst5 (key string, value string) partitioned by (ds string) stored 
as rcfile;
insert overwrite table tst5 partition (ds='1') select * from src;
insert overwrite table tst5 partition (ds='2') select * from src;
insert overwrite table tst5 partition (ds='3') select * from src;

alter table tst5 stored as sequencefile; 

insert overwrite table tst5 partition (ds='4') select * from src;
insert overwrite table tst5 partition (ds='5') select * from src;
insert overwrite table tst5 partition (ds='6') select * from src;  

alter table tst5 set serde 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 

insert overwrite table tst5 partition (ds='7') select * from src;
insert overwrite table tst5 partition (ds='8') select * from src;
insert overwrite table tst5 partition (ds='9') select * from src;  

The following query works fine:

 select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   

since both the partitions use ColumnarSerDe

But the following query fails:

select key + key, value from tst5 where ((ds = '4') or (ds = '1') or (ds='7'));

since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

2012-12-20 Thread Nitay Joffe (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536860#comment-13536860
 ] 

Nitay Joffe commented on HIVE-3752:
---

[~namitjain] here's the initial API proposal: 
https://cwiki.apache.org/confluence/display/Hive/Hadoop-compatible+Input-Output+Format+for+Hive.
 Let me know your thoughts.

> Add a non-sql API in hive to access data.
> -
>
> Key: HIVE-3752
> URL: https://issues.apache.org/jira/browse/HIVE-3752
> Project: Hive
>  Issue Type: Improvement
>Reporter: Nitay Joffe
>Assignee: Nitay Joffe
>
> We would like to add an input/output format for accessing Hive data in Hadoop 
> directly without having to use e.g. a transform. Using a transform
> means having to do a whole map-reduce step with its own disk accesses and its 
> imposed structure. It also means needing to have Hive be the base 
> infrastructure for the entire system being developed which is not the right 
> fit as we only need a small part of it (access to the data).
> So we propose adding an API level InputFormat and OutputFormat to Hive that 
> will make it trivially easy to select a table with partition spec and read 
> from / write to it. We chose this design to make it compatible with Hadoop so 
> that existing systems that work with Hadoop's IO API will just work out of 
> the box.
> We need this system for the Giraph graph processing system 
> (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
> is a common use case.
> [~namitjain] [~aching] [~kevinwilfong] [~apresta]
> Input-side (HiveApiInputFormat) review: https://reviews.facebook.net/D7401

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira