[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-27 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977157#comment-14977157
 ] 

Sushanth Sowmyan commented on HIVE-11985:
-

For most things that muck with the typesystem in hive, [~jdere] is my go-to 
person to check with. Tagging him here.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11988) [hive] security issue with hive & ranger for import table command

2015-10-27 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977645#comment-14977645
 ] 

Sushanth Sowmyan commented on HIVE-11988:
-

Ugh, looks like I missed updating 3 tests:

 * TestMinimrCliDriver.testCliDriver_import_exported_table
 * TestMiniSparkOnYarnCliDriver.testCliDriver_import_exported_table
 * TestCliDriver.testCliDriver_authorization_reset

And a fourth test, which I thought I had updated is failing, but not for the 
extra PREHOOK/POSTHOOK : 
 * TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import

I'll look into these and post an update tonight.

> [hive] security issue with hive & ranger for import table command
> -
>
> Key: HIVE-11988
> URL: https://issues.apache.org/jira/browse/HIVE-11988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Deepak Sharma
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Attachments: HIVE-11988.2.patch, HIVE-11988.3.patch, HIVE-11988.patch
>
>
> if a user does not have permission to create table in hive , then if the same 
> user import data for a table using following command then , it will have to 
> create table also and that is working successfully , ideally it should not 
> work
> STR:
> 
> 1. put some raw data in hdfs path /user/user1/tempdata
> 2. in ranger check policy , user1 should not have any permission on any table
> 3. login through user1 into beeline ( obviously it will fail since user 
> doesnt have permission to create table)
> create table tt1(id INT,ff String);
> FAILED: HiveAccessControlException Permission denied: user user1 does not 
> have CREATE privilege on default/tt1 (state=42000,code=4)
> 4. now try following command to import data into a table ( table should not 
> exist already)
> import table tt1 from '/user/user1/tempdata';
> ER:
> since user1 doesnt have permission to create table so this operation should 
> fail
> AR:
> table is created successfully and data is also imported !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9013) Hive set command exposes metastore db password

2015-10-26 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9013:
---
Attachment: HIVE-9013.5.patch-branch1

Attaching branch-1 version of patch.

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 2.0.0
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974863#comment-14974863
 ] 

Sushanth Sowmyan commented on HIVE-9013:


Committed to branch-1 as well.

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975144#comment-14975144
 ] 

Sushanth Sowmyan commented on HIVE-9013:


The branch-1.2 version of this patch incorporates HIVE-11670's fix as well.

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 1.3.0, 2.0.0, 1.2.2
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1, HIVE-9013.5.patch-branch1.2
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11670) Strip out password information from TezSessionState configuration

2015-10-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975140#comment-14975140
 ] 

Sushanth Sowmyan commented on HIVE-11670:
-

Note - while this patch was not committed in branch-1.2, in the process of 
backporting HIVE-9013, this was effectively merged in to the branch-1.2 commit 
for HIVE-9013 as well.

> Strip out password information from TezSessionState configuration
> -
>
> Key: HIVE-11670
> URL: https://issues.apache.org/jira/browse/HIVE-11670
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11670.1.patch
>
>
> Remove password information from configuration copy that is sent to Yarn/Tez. 
> We don't need it there. The config entries can potentially be visible to 
> other users.
> HIVE-10508 had the fix which removed this in certain places, however, when I 
> initiated a session via Hive Cli, I could still see the password information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9013) Hive set command exposes metastore db password

2015-10-26 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9013:
---
Attachment: HIVE-9013.5.patch-branch1.2

Attaching branch-1.2 version of patch as well, committed there too.

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 1.3.0, 2.0.0, 1.2.2
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1, HIVE-9013.5.patch-branch1.2
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9013) Hive set command exposes metastore db password

2015-10-26 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9013:
---
Fix Version/s: 1.3.0

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11988) [hive] security issue with hive & ranger for import table command

2015-10-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975717#comment-14975717
 ] 

Sushanth Sowmyan commented on HIVE-11988:
-

Oh! True. I remember thinking that I needed to update that, but somehow thought 
it was part of my previous plan for how I wanted to have a separate *ForTest 
class. I'll update it.

> [hive] security issue with hive & ranger for import table command
> -
>
> Key: HIVE-11988
> URL: https://issues.apache.org/jira/browse/HIVE-11988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Deepak Sharma
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Attachments: HIVE-11988.patch
>
>
> if a user does not have permission to create table in hive , then if the same 
> user import data for a table using following command then , it will have to 
> create table also and that is working successfully , ideally it should not 
> work
> STR:
> 
> 1. put some raw data in hdfs path /user/user1/tempdata
> 2. in ranger check policy , user1 should not have any permission on any table
> 3. login through user1 into beeline ( obviously it will fail since user 
> doesnt have permission to create table)
> create table tt1(id INT,ff String);
> FAILED: HiveAccessControlException Permission denied: user user1 does not 
> have CREATE privilege on default/tt1 (state=42000,code=4)
> 4. now try following command to import data into a table ( table should not 
> exist already)
> import table tt1 from '/user/user1/tempdata';
> ER:
> since user1 doesnt have permission to create table so this operation should 
> fail
> AR:
> table is created successfully and data is also imported !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11988) [hive] security issue with hive & ranger for import table command

2015-10-26 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11988:

Attachment: HIVE-11988.patch

Patch attached.

> [hive] security issue with hive & ranger for import table command
> -
>
> Key: HIVE-11988
> URL: https://issues.apache.org/jira/browse/HIVE-11988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Deepak Sharma
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Attachments: HIVE-11988.patch
>
>
> if a user does not have permission to create table in hive , then if the same 
> user import data for a table using following command then , it will have to 
> create table also and that is working successfully , ideally it should not 
> work
> STR:
> 
> 1. put some raw data in hdfs path /user/user1/tempdata
> 2. in ranger check policy , user1 should not have any permission on any table
> 3. login through user1 into beeline ( obviously it will fail since user 
> doesnt have permission to create table)
> create table tt1(id INT,ff String);
> FAILED: HiveAccessControlException Permission denied: user user1 does not 
> have CREATE privilege on default/tt1 (state=42000,code=4)
> 4. now try following command to import data into a table ( table should not 
> exist already)
> import table tt1 from '/user/user1/tempdata';
> ER:
> since user1 doesnt have permission to create table so this operation should 
> fail
> AR:
> table is created successfully and data is also imported !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11988) [hive] security issue with hive & ranger for import table command

2015-10-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975615#comment-14975615
 ] 

Sushanth Sowmyan commented on HIVE-11988:
-

[~thejas], could you please have a look?

> [hive] security issue with hive & ranger for import table command
> -
>
> Key: HIVE-11988
> URL: https://issues.apache.org/jira/browse/HIVE-11988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Deepak Sharma
>Assignee: Sushanth Sowmyan
>Priority: Critical
> Attachments: HIVE-11988.patch
>
>
> if a user does not have permission to create table in hive , then if the same 
> user import data for a table using following command then , it will have to 
> create table also and that is working successfully , ideally it should not 
> work
> STR:
> 
> 1. put some raw data in hdfs path /user/user1/tempdata
> 2. in ranger check policy , user1 should not have any permission on any table
> 3. login through user1 into beeline ( obviously it will fail since user 
> doesnt have permission to create table)
> create table tt1(id INT,ff String);
> FAILED: HiveAccessControlException Permission denied: user user1 does not 
> have CREATE privilege on default/tt1 (state=42000,code=4)
> 4. now try following command to import data into a table ( table should not 
> exist already)
> import table tt1 from '/user/user1/tempdata';
> ER:
> since user1 doesnt have permission to create table so this operation should 
> fail
> AR:
> table is created successfully and data is also imported !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974794#comment-14974794
 ] 

Sushanth Sowmyan commented on HIVE-9013:


Ah, I see the issue with making it static in terms of adding another parameter 
there. If the hidden configs were not configurable themselves, it would be 
possible to make it static, but not otherwise. I'm okay with the patch as-is, 
+1. I'll go ahead and commit this.

Thanks, Binglin and Thejas!

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12261) schematool version info exit status should depend on compatibility, not equality

2015-10-25 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973309#comment-14973309
 ] 

Sushanth Sowmyan commented on HIVE-12261:
-

Hi [~thejas] - looks good to me, +1.

I will admit that for a moment there, I thought that 
HiveSchemaTool.verifySchemaVersion was now doing the wrong thing by testing for 
newSchemaVersion >= MetaStoreSchemaInfo.getHiveSchemaVersion() instead of the 
equality before to verify the update, but I see why I was wrong to assume so. 
Maybe worth adding a comment there to explain there what the compatibility 
check does, and why the direction of >= is correct for it.


> schematool version info exit status should depend on compatibility, not 
> equality
> 
>
> Key: HIVE-12261
> URL: https://issues.apache.org/jira/browse/HIVE-12261
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-12261-branch-1.0.0.patch, 
> HIVE-12261-branch-1.patch, HIVE-12261.1.patch
>
>
> Newer versions of metastore schema are compatible with older versions of 
> hive, as only new tables or columns are added with additional information.
> HIVE-11613 added a check in hive schematool -info command to see if schema 
> version is equal. 
> However, the state where db schema version is ahead of hive software version 
> is often seen when a 'rolling upgrade' or 'rolling downgrade' is happening. 
> This is a state where hive is functional and returning non zero status for it 
> is misleading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-23 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971950#comment-14971950
 ] 

Sushanth Sowmyan commented on HIVE-9013:


Hi [~decster], thanks for the update and the patch.

I'd ask for one last update if you don't mind (or we can do that as a separate 
patch):

It's better to have HiveConf.stripHiddenConfigurations(Configuration conf) as 
you have introduced to be static, I think. That way, it avoids one notion of 
confusion later on in the code (as in your patch) where we have to call it like 
this:

{code}
conf.stripHiddenConfigurations(job);
{code}

In that scenario, it becomes unclear if we're stripping it from conf, or from 
job, and the truth of the matter is that we're stripping it from job. If we 
made that call static, we can call HiveConf.stripHiddenConfigurations(job), 
which would be much clearer.

I think, with that, I'm +1 on this. Thanks for adding in tests. Normally, for 
ql changes, such as with set behaviour, we make changes to .q files, which is 
easier to develop, but having a proper junit test as you have done is good too. 
:)

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970194#comment-14970194
 ] 

Sushanth Sowmyan commented on HIVE-9013:


Hi Binglin, thanks for your update. I think we could use two more minor changes:

a) It'd be good to have a .q test added to this that simply sets one hidden 
variable and non-hidden variable, and then runs a set (to show all) and a set 
on each of these individual variables (to show individual behaviour) - that 
way, we'll have a .q.out test that we can check against in the future for 
regressions.
b) There's another jira, HIVE-10518, which introduced behaviour to strip out 
password details from a jobconf before passing it on. Could you please also 
make a change, so that these two are integrated together better? i.e. The goal 
behaviour for Utilities.stripHivePasswordDetails after your patch should not be 
Utilities.stripHivePasswordDetails but Utilities.stripRestrictedConfigurations, 
thereby stripping all other config params that match your new enum as well.

Thanks!

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967775#comment-14967775
 ] 

Sushanth Sowmyan commented on HIVE-9013:


Hi [~decster], please let me know if you're planning on updating this jira per 
[~thejas]'s suggestions above - if you don't mind, I can help update this patch 
to get it in. I think this will be a very useful patch to have in.

Thanks!

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12221) Concurrency issue in HCatUtil.getHiveMetastoreClient()

2015-10-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967832#comment-14967832
 ] 

Sushanth Sowmyan commented on HIVE-12221:
-

Per Roshan's mail to me, adding in a reference : 
https://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java

> Concurrency issue in HCatUtil.getHiveMetastoreClient() 
> ---
>
> Key: HIVE-12221
> URL: https://issues.apache.org/jira/browse/HIVE-12221
> Project: Hive
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> HCatUtil.getHiveMetastoreClient()  uses double checked locking pattern
> to implement singleton, which is a broken pattern



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-16 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961662#comment-14961662
 ] 

Sushanth Sowmyan commented on HIVE-12083:
-

Thanks, Thejas!

Committed to branch-1, branch-1.2 and master, where HIVE-10965 exists.

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12083.2.patch, HIVE-12083.patch
>
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-14 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957767#comment-14957767
 ] 

Sushanth Sowmyan commented on HIVE-12083:
-

[~thejas]/[~ashutoshc], can I bug either of you for a review for the updated 
patch?

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12083.2.patch, HIVE-12083.patch
>
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs

2015-10-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955289#comment-14955289
 ] 

Sushanth Sowmyan commented on HIVE-11149:
-

[~thejas], agreed in theory, but is blocked by HIVE-11891, which, admittedly is 
also a reasonable backport candidate.

> Fix issue with sometimes HashMap in PerfLogger.java hangs 
> --
>
> Key: HIVE-11149
> URL: https://issues.apache.org/jira/browse/HIVE-11149
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 1.2.1
>Reporter: WangMeng
>Assignee: WangMeng
> Fix For: 2.0.0
>
> Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, 
> HIVE-11149.03.patch, HIVE-11149.04.patch
>
>
> In  Multi-thread environment,  sometimes the  HashMap in PerfLogger.java  
> will  casue massive Java Processes hang  and cost  large amounts of 
> unnecessary CPU and Memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs

2015-10-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955809#comment-14955809
 ] 

Sushanth Sowmyan commented on HIVE-11149:
-

As an update, I do not think that we should be backporting HIVE-11891, since it 
refactors PerfLogger from hive-exec to hive-common, which is a cross-jar change 
that I don't think we should make on backport maint lines. However, this patch 
is simple enough that we could create a 1.2 version of this patch as well which 
will affect PerfLogger in hive-exec as it used to be in 1.2.

> Fix issue with sometimes HashMap in PerfLogger.java hangs 
> --
>
> Key: HIVE-11149
> URL: https://issues.apache.org/jira/browse/HIVE-11149
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 1.2.1
>Reporter: WangMeng
>Assignee: WangMeng
> Fix For: 2.0.0
>
> Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, 
> HIVE-11149.03.patch, HIVE-11149.04.patch
>
>
> In  Multi-thread environment,  sometimes the  HashMap in PerfLogger.java  
> will  casue massive Java Processes hang  and cost  large amounts of 
> unnecessary CPU and Memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-12 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953421#comment-14953421
 ] 

Sushanth Sowmyan commented on HIVE-12083:
-

> Should we short circuit for empty partitions case as well in the client side ?

I think that makes sense and we should. I didn't initially because I hadn't 
evaluated the calling codepath to see if there was a difference between a null 
return and an empty return for AggrStats from the HMSC for the empty partNames 
case. Now that I've looked through that in some detail, I am for it. I will 
update the patch.

> Does the case where table has not partition columns also use the 
> getAggrColStatsFor method ? If that is the case we should not be 
> shortcircuting this way.

I thought of that, but irrespective of whether the client short-circuits, the 
metastore server will short circuit anyway, it's only a matter of a difference 
between returning null and an empty object.

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12083.patch
>
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since 

[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-12 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953504#comment-14953504
 ] 

Sushanth Sowmyan commented on HIVE-12083:
-

Spoke to ashutosh about this - going to make one more change - in addition to 
the short-circuit on the client side, the desired behaviour on the client side 
would also be to return an empty AggrStats rather than returning null.

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12083.patch
>
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-12 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12083:

Attachment: HIVE-12083.2.patch

Patch updated.

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12083.2.patch, HIVE-12083.patch
>
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case

2015-10-12 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954231#comment-14954231
 ] 

Sushanth Sowmyan commented on HIVE-10965:
-

Note, this fix introduces a bug that is fixed by 
https://issues.apache.org/jira/browse/HIVE-12083 , and thus, that patch must be 
present on all branches this was patched with.

> direct SQL for stats fails in 0-column case
> ---
>
> Key: HIVE-10965
> URL: https://issues.apache.org/jira/browse/HIVE-10965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.2.1, 1.0.2
>
> Attachments: HIVE-10965.01.patch, HIVE-10965.02.patch, 
> HIVE-10965.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-09 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12083:

Attachment: HIVE-12083.patch

Patch attached, with tests.

[~sershe]/[~thejas], could you please review?

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-12083.patch
>
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-09 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12083:

Description: 
In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
AggrStats object to be returned if partNames is empty or colNames is empty:

{code}
diff --git 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
index 0a56bac..ed810d2 100644
--- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
+++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
@@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
   public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
   List partNames, List colNames, boolean 
useDensityFunctionForNDVEstimation)
   throws MetaException {
+if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // 
Nothing to aggregate.
 long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
colNames);
 List colStatsList;
 // Try to read from the cache first
{code}

This runs afoul of thrift requirements that AggrStats have required fields:

{code}
struct AggrStats {
1: required list colStats,
2: required i64 partsFound // number of partitions for which stats were found
}
{code}

Thus, we get errors as follows:

{noformat}
2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
(TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of 
message.
org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
unset! Struct:AggrStats(colStats:null, partsFound:0)
at 
org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Normally, this would not occur since HIVE-10965 does also include a guard on 
the client-side for colNames.isEmpty() to not call the metastore call at all, 
but there is no guard for partNames being empty, and would still cause an error 
on the metastore side if the thrift call were called directly, as would happen 
if the client is from an older version before this was patched.

  was:
In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
AggrStats object to be returned if partNames is empty or colNames is empty:

{code}
diff --git 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
index 0a56bac..ed810d2 100644
--- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
+++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
@@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
   public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
   List partNames, List colNames, boolean 
useDensityFunctionForNDVEstimation)
   throws MetaException {
+if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // 
Nothing to aggregate.
 long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
colNames);
 List colStatsList;
 // Try to read from the cache first
{code}

This runs afoul of thrift requirements that AggrStats have 

[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-09 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12083:

Component/s: Metastore

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-09 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12083:

Affects Version/s: 1.0.2
   1.2.1

> HIVE-10965 introduces thrift error if partNames or colNames are empty
> -
>
> Key: HIVE-12083
> URL: https://issues.apache.org/jira/browse/HIVE-12083
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 1.0.2
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
> AggrStats object to be returned if partNames is empty or colNames is empty:
> {code}
> diff --git 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> index 0a56bac..ed810d2 100644
> --- 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> +++ 
> metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
>public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
>List partNames, List colNames, boolean 
> useDensityFunctionForNDVEstimation)
>throws MetaException {
> +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); 
> // Nothing to aggregate.
>  long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
> colNames);
>  List colStatsList;
>  // Try to read from the cache first
> {code}
> This runs afoul of thrift requirements that AggrStats have required fields:
> {code}
> struct AggrStats {
> 1: required list colStats,
> 2: required i64 partsFound // number of partitions for which stats were found
> }
> {code}
> Thus, we get errors as follows:
> {noformat}
> 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
> (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Normally, this would not occur since HIVE-10965 does also include a guard on 
> the client-side for colNames.isEmpty() to not call the metastore call at all, 
> but there is no guard for partNames being empty, and would still cause an 
> error on the metastore side if the thrift call were called directly, as would 
> happen if the client is from an older version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables

2015-10-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4997:
---
Fix Version/s: (was: 0.13.0)

> HCatalog doesn't allow multiple input tables
> 
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Daniel Intskirveli
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables

2015-10-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4997:
---
Release Note:   (was: IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected)

> HCatalog doesn't allow multiple input tables
> 
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Daniel Intskirveli
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables

2015-10-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4997:
---
Tags:   (was: IncompatibleClassChangeError: Found interface 
org.apache.hadoop.mapreduce.JobContext, but class was expected)

> HCatalog doesn't allow multiple input tables
> 
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Daniel Intskirveli
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables

2015-10-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949874#comment-14949874
 ] 

Sushanth Sowmyan commented on HIVE-4997:


Hi, [~Abhiram], I notice you marked this issue as resolved - however, this 
issue has not been committed to hive, and we have not decided to abandon it 
either, and thus, this has not been resolved.

I'm reopening it, and with updates and after the patch is accepted and 
committed, it can be resolved.

> HCatalog doesn't allow multiple input tables
> 
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Daniel Intskirveli
>Assignee: abhiram
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-4997) HCatalog doesn't allow multiple input tables

2015-10-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reopened HIVE-4997:

  Assignee: (was: abhiram)

> HCatalog doesn't allow multiple input tables
> 
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Daniel Intskirveli
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables

2015-10-08 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4997:
---
Hadoop Flags:   (was: Incompatible change)

> HCatalog doesn't allow multiple input tables
> 
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Daniel Intskirveli
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same 
> MapReduce job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12012) select query on json table with map containing numeric values fails

2015-10-06 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945911#comment-14945911
 ] 

Sushanth Sowmyan commented on HIVE-12012:
-

Ah, sorry - when you pinged me last, I did not see you'd attached a patch for 
this - but yes, that patch fixes this issue. +1.

> select query on json table with map containing numeric values fails
> ---
>
> Key: HIVE-12012
> URL: https://issues.apache.org/jira/browse/HIVE-12012
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Jagruti Varia
>Assignee: Jason Dere
> Attachments: HIVE-12012.1.patch
>
>
> select query on json table throws this error if table contains map type 
> column:
> {noformat}
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: 
> org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not 
> numeric, can not use numeric value accessors
>  at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26]
> {noformat}
> steps to reproduce the issue:
> {noformat}
> hive> create table c_complex(a array,b map) row format 
> serde 'org.apache.hive.hcatalog.data.JsonSerDe';
> OK
> Time taken: 0.319 seconds
> hive> insert into table c_complex select array('aaa'),map('aaa',1) from 
> studenttab10k limit 2;
> Query ID = hrt_qa_20150826183232_47deb33a-19c0-4d2b-a92f-726659eb9413
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1440603993714_0010)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> Reducer 2 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 11.75 s   
>  
> 
> Loading data to table default.c_complex
> Table default.c_complex stats: [numFiles=1, numRows=2, totalSize=56, 
> rawDataSize=0]
> OK
> Time taken: 13.706 seconds
> hive> select * from c_complex;
> OK
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: 
> org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not 
> numeric, can not use numeric value accessors
>  at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26]
> Time taken: 0.115 seconds
> hive> select count(*) from c_complex;
> OK
> 2
> Time taken: 0.205 seconds, Fetched: 1 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8519) Hive metastore lock wait timeout

2015-10-05 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944297#comment-14944297
 ] 

Sushanth Sowmyan commented on HIVE-8519:


I notice a similar issue when I try to drop a table with about 5 partitions.

Essentially, what seems to be happening with that flow is the following:

a) Deleting a table requires deleting all partition objects for that table, 
Table->Partition is a 1:many mapping
b) Deleting the partition objects requires deleting a all SD objects associated 
with the partitions, Partition->SD is a 1:1 mapping
c) Deleting SD objects requires looking for all CDs pointed to by the SDs, and 
wherever a CD has no more SDs pointing to it, we need to drop the CD in 
question, SD->CD is a many:1 mapping.
d) If a CD is to be deleted, we need to drop all List associated 
with it (COLUMNS_V2 where CD_ID in list of CDs to delete.)

The big inefficiency here is that SD->CD is a many:1 mapping with a goal of 
reusing CDs for efficiency, but in practice, we don't. But the fact that it is 
many:1, not 1:1, means we need to do that additional check before dropping 
rather than simply dropping. This combination hits us in the worst way possible 
for both of those.

We need to rethink the way we use our objects and either drop the many:1 intent 
or actually make sure that we create a unique CD for every SD, or this is not 
going to be scalable. Other solutions that bypass this wonky model may also 
exist that we have to work out.

> Hive metastore lock wait timeout
> 
>
> Key: HIVE-8519
> URL: https://issues.apache.org/jira/browse/HIVE-8519
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Liao, Xiaoge
>
> We got a lot of exception as below when doing a drop table partition, which 
> made hive query every every slow. For example, it will cost 250s while 
> executing use db_test;
> Log:
> 2014-10-17 04:04:46,873 ERROR Datastore.Persist (Log4JLogger.java:error(115)) 
> - Update of object 
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor@13c9c4b3" using 
> statement "UPDATE `SDS` SET `CD_ID`=? WHERE `SD_ID`=?" failed : 
> java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1074)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4096)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4028)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2734)
> at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2458)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2375)
> at 
> com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2359)
> at 
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
> at 
> org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
> at 
> org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399)
> at 
> org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439)
> at 
> org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:374)
> at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateTable(RDBMSPersistenceHandler.java:417)
> at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObject(RDBMSPersistenceHandler.java:390)
> at 
> org.datanucleus.state.JDOStateManager.flush(JDOStateManager.java:5012)
> at org.datanucleus.FlushOrdered.execute(FlushOrdered.java:106)
> at 
> org.datanucleus.ExecutionContextImpl.flushInternal(ExecutionContextImpl.java:4019)
> at 
> org.datanucleus.ExecutionContextThreadedImpl.flushInternal(ExecutionContextThreadedImpl.java:450)
> at org.datanucleus.store.query.Query.prepareDatastore(Query.java:1575)
> at org.datanucleus.store.query.Query.executeQuery(Query.java:1760)
> at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
> at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:243)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.listStorageDescriptorsWithCD(ObjectStore.java:2185)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.removeUnusedColumnDescriptor(ObjectStore.java:2131)
> at 
> 

[jira] [Commented] (HIVE-11676) implement metastore API to do file footer PPD

2015-10-05 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944199#comment-14944199
 ] 

Sushanth Sowmyan commented on HIVE-11676:
-

+cc [~mithun] who was interested in this sort of api a while back.

> implement metastore API to do file footer PPD
> -
>
> Key: HIVE-11676
> URL: https://issues.apache.org/jira/browse/HIVE-11676
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11676.01.patch, HIVE-11676.patch
>
>
> Need to pass on the expression/sarg, extract column stats from footer (at 
> write time?) and then apply one to the other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-10-02 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941642#comment-14941642
 ] 

Sushanth Sowmyan commented on HIVE-11852:
-

[~ashutoshc], the problem with a config property here is that this stats squish 
I'm trying to prevent does not happen on the ql-side. This happens on the 
metastore, from the AlterTableHandler where an alter table gets issued from the 
client side. The metastore then decides that since the table has been altered, 
the table is now different, and thus, stats must be nuked.

I feel like if the decision to nuke the stats were not made by the metastore, 
but by the ql-side, that is cleaner and would not result in this problem, but 
then if stats squishing and table altering were two different metastore calls, 
we run into issues where one succeeding and the other not would lead to 
incorrect data elsewhere, apart from other performance implications as well.

> numRows and rawDataSize table properties are not replicated
> ---
>
> Key: HIVE-11852
> URL: https://issues.apache.org/jira/browse/HIVE-11852
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.2.1
>Reporter: Paul Isaychuk
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11852.patch
>
>
> numRows and rawDataSize table properties are not replicated when exported for 
> replication and re-imported.
> {code}
> Table drdbnonreplicatabletable.vanillatable has different TblProps from 
> drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
> totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
> java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
> different TblProps from drdbnonreplicatabletable.vanillatable expected 
> [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
> [{numFiles=1, totalSize=560}]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12012) select query on json table with map containing numeric values fails

2015-10-02 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941874#comment-14941874
 ] 

Sushanth Sowmyan commented on HIVE-12012:
-

Thanks for the report, Jason. Sure, I can look further into this. I had looked 
at HCATALOG-630 a long time back but I seem to remember that I could not 
reproduce that at the time. If we have a more recent reproduction, it 
definitely is worth investigating.

Tests for JsonSerDe are mostly in TestJsonSerDe, instead of in .q files, since 
it descends from HCatalog - that seems to test map and 
map as the basic cases, which work.

I'll try to reproduce and dig further.

> select query on json table with map containing numeric values fails
> ---
>
> Key: HIVE-12012
> URL: https://issues.apache.org/jira/browse/HIVE-12012
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Jagruti Varia
>Assignee: Jason Dere
> Attachments: HIVE-12012.1.patch
>
>
> select query on json table throws this error if table contains map type 
> column:
> {noformat}
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: 
> org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not 
> numeric, can not use numeric value accessors
>  at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26]
> {noformat}
> steps to reproduce the issue:
> {noformat}
> hive> create table c_complex(a array,b map) row format 
> serde 'org.apache.hive.hcatalog.data.JsonSerDe';
> OK
> Time taken: 0.319 seconds
> hive> insert into table c_complex select array('aaa'),map('aaa',1) from 
> studenttab10k limit 2;
> Query ID = hrt_qa_20150826183232_47deb33a-19c0-4d2b-a92f-726659eb9413
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1440603993714_0010)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> Reducer 2 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 11.75 s   
>  
> 
> Loading data to table default.c_complex
> Table default.c_complex stats: [numFiles=1, numRows=2, totalSize=56, 
> rawDataSize=0]
> OK
> Time taken: 13.706 seconds
> hive> select * from c_complex;
> OK
> Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: 
> org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not 
> numeric, can not use numeric value accessors
>  at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26]
> Time taken: 0.115 seconds
> hive> select count(*) from c_complex;
> OK
> 2
> Time taken: 0.205 seconds, Fetched: 1 row(s)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11898) support default partition in metastoredirectsql

2015-09-30 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938920#comment-14938920
 ] 

Sushanth Sowmyan commented on HIVE-11898:
-

+1. (I've looked at the patch, and it makes sense as something that's not 
wrong, but have not verified the test results that Sergey say pass for him.)

One thing though - in order for this jira to be more readable in the future 
when we come across this, could you please edit in a description for this issue?

> support default partition in metastoredirectsql
> ---
>
> Key: HIVE-11898
> URL: https://issues.apache.org/jira/browse/HIVE-11898
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11898.01.patch, HIVE-11898.02.patch, 
> HIVE-11898.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-09-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903322#comment-14903322
 ] 

Sushanth Sowmyan commented on HIVE-11852:
-

[~alangates], can I bug you for a review? (Most of the patch file size is the 
.q and the .out, I promise this time it's not a huge patch dump. :D )

> numRows and rawDataSize table properties are not replicated
> ---
>
> Key: HIVE-11852
> URL: https://issues.apache.org/jira/browse/HIVE-11852
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.2.1
>Reporter: Paul Isaychuk
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11852.patch
>
>
> numRows and rawDataSize table properties are not replicated when exported for 
> replication and re-imported.
> {code}
> Table drdbnonreplicatabletable.vanillatable has different TblProps from 
> drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
> totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
> java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
> different TblProps from drdbnonreplicatabletable.vanillatable expected 
> [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
> [{numFiles=1, totalSize=560}]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-09-16 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11852:

Reporter: Paul Isaychuk  (was: Sushanth Sowmyan)

> numRows and rawDataSize table properties are not replicated
> ---
>
> Key: HIVE-11852
> URL: https://issues.apache.org/jira/browse/HIVE-11852
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.2.1
>Reporter: Paul Isaychuk
>Assignee: Sushanth Sowmyan
>
> numRows and rawDataSize table properties are not replicated when exported for 
> replication and re-imported.
> {code}
> Table drdbnonreplicatabletable.vanillatable has different TblProps from 
> drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
> totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
> java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
> different TblProps from drdbnonreplicatabletable.vanillatable expected 
> [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
> [{numFiles=1, totalSize=560}]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-09-16 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791297#comment-14791297
 ] 

Sushanth Sowmyan commented on HIVE-11852:
-

The issue here is that there is a MoveTask that's done as part of the import 
process which issues an alter_table which nukes the stats that were just 
created. On digging further, I discovered a couple of other cases that would 
result in the same stats squishing behaviour.

Patch attached to fix them, and a .q file to test them.

> numRows and rawDataSize table properties are not replicated
> ---
>
> Key: HIVE-11852
> URL: https://issues.apache.org/jira/browse/HIVE-11852
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.2.1
>Reporter: Paul Isaychuk
>Assignee: Sushanth Sowmyan
>
> numRows and rawDataSize table properties are not replicated when exported for 
> replication and re-imported.
> {code}
> Table drdbnonreplicatabletable.vanillatable has different TblProps from 
> drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
> totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
> java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
> different TblProps from drdbnonreplicatabletable.vanillatable expected 
> [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
> [{numFiles=1, totalSize=560}]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-09-16 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11852:

Attachment: HIVE-11852.patch

> numRows and rawDataSize table properties are not replicated
> ---
>
> Key: HIVE-11852
> URL: https://issues.apache.org/jira/browse/HIVE-11852
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.2.1
>Reporter: Paul Isaychuk
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11852.patch
>
>
> numRows and rawDataSize table properties are not replicated when exported for 
> replication and re-imported.
> {code}
> Table drdbnonreplicatabletable.vanillatable has different TblProps from 
> drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
> totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
> java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
> different TblProps from drdbnonreplicatabletable.vanillatable expected 
> [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
> [{numFiles=1, totalSize=560}]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11819) HiveServer2 catches OOMs on request threads

2015-09-14 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744414#comment-14744414
 ] 

Sushanth Sowmyan commented on HIVE-11819:
-

Patch makes a lot of sense, and I was talking to Thejas about the possibility 
of a bug like this just last week. [~vgumashta], could you please review? I'm 
+1 on it in theory, but since I'm still fairly new to the HS2 side of things, 
do not consider myself binding on this review.

> HiveServer2 catches OOMs on request threads
> ---
>
> Key: HIVE-11819
> URL: https://issues.apache.org/jira/browse/HIVE-11819
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11819.patch
>
>
> ThriftCLIService methods such as ExecuteStatement are apparently capable of 
> catching OOMs because they get wrapped in RTE by HiveSessionProxy. 
> This shouldn't happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-09-10 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739405#comment-14739405
 ] 

Sushanth Sowmyan commented on HIVE-11510:
-

+1, committing to branch-1 and master.

> Metatool updateLocation warning on views
> 
>
> Key: HIVE-11510
> URL: https://issues.apache.org/jira/browse/HIVE-11510
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.14.0
>Reporter: Eric Czech
>Assignee: Wei Zheng
> Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch, 
> HIVE-11510.3.patch
>
>
> If views are present in a hive database, issuing a 'hive metatool 
> -updateLocation' command will result in an error like this:
> ...
> Warning: Found records with bad LOCATION in SDS table.. 
> bad location URI: null
> bad location URI: null
> bad location URI: null
> 
> Based on the source code for Metatool, it looks like there would then be a 
> "bad location URI: null" message for every view and it also appears this is 
> happening simply because the 'sds' table in the hive schema has a column 
> called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11657) HIVE-2573 introduces some issues during metastore init (and CLI init)

2015-09-02 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727709#comment-14727709
 ] 

Sushanth Sowmyan commented on HIVE-11657:
-

+1.

> HIVE-2573 introduces some issues during metastore init (and CLI init)
> -
>
> Key: HIVE-11657
> URL: https://issues.apache.org/jira/browse/HIVE-11657
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-11657.patch
>
>
> HIVE-2573 introduced static reload functions call.
> It has a few problems:
> 1) When metastore client is initialized using an externally supplied config 
> (i.e. Hive.get(HiveConf)), it still gets called during static init using the 
> main service config. In my case, even though I have uris in the supplied 
> config to connect to remote MS (which eventually happens), the static call 
> creates objectstore, which is undesirable.
> 2) It breaks compat - old metastores do not support this call so new clients 
> will fail, and there's no workaround like not using a new feature because the 
> static call is always made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-09-02 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727705#comment-14727705
 ] 

Sushanth Sowmyan commented on HIVE-11668:
-

Change looks good, and I've tested it out on mysql to make sure there are no 
surprises. +1.

> make sure directsql calls pre-query init when needed
> 
>
> Key: HIVE-11668
> URL: https://issues.apache.org/jira/browse/HIVE-11668
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11668.01.patch, HIVE-11668.02.patch, 
> HIVE-11668.patch
>
>
> See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.

2015-09-01 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726411#comment-14726411
 ] 

Sushanth Sowmyan commented on HIVE-11123:
-

I think HIVE-11668 is close to committing (after I verify), so I think we can 
hold off on reverting this, since it is actually still useful. Otherwise, I'd 
agree that this was a revert-candidate.

> Fix how to confirm the RDBMS product name at Metastore.
> ---
>
> Key: HIVE-11123
> URL: https://issues.apache.org/jira/browse/HIVE-11123
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.0
> Environment: PostgreSQL
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, 
> HIVE-11123.3.patch, HIVE-11123.4.patch, HIVE-11123.4a.patch
>
>
> I use PostgreSQL to Hive Metastore. And I saw the following message at 
> PostgreSQL log.
> {code}
> < 2015-06-26 10:58:15.488 JST >ERROR:  syntax error at or near "@@" at 
> character 5
> < 2015-06-26 10:58:15.488 JST >STATEMENT:  SET @@session.sql_mode=ANSI_QUOTES
> < 2015-06-26 10:58:15.489 JST >ERROR:  relation "v$instance" does not exist 
> at character 21
> < 2015-06-26 10:58:15.489 JST >STATEMENT:  SELECT version FROM v$instance
> < 2015-06-26 10:58:15.490 JST >ERROR:  column "version" does not exist at 
> character 10
> < 2015-06-26 10:58:15.490 JST >STATEMENT:  SELECT @@version
> {code}
> When Hive CLI and Beeline embedded mode are carried out, this message is 
> output to PostgreSQL log.
> These queries are called from MetaStoreDirectSql#determineDbType. And if we 
> use MetaStoreDirectSql#getProductName, we need not to call these queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-09-01 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726161#comment-14726161
 ] 

Sushanth Sowmyan commented on HIVE-11668:
-

I'll respond to this by the end of the day today - I wanted to test this out.

> make sure directsql calls pre-query init when needed
> 
>
> Key: HIVE-11668
> URL: https://issues.apache.org/jira/browse/HIVE-11668
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11668.01.patch, HIVE-11668.02.patch, 
> HIVE-11668.patch
>
>
> See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-08-31 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723808#comment-14723808
 ] 

Sushanth Sowmyan commented on HIVE-11510:
-

Thank you Wei, this looks good. +1.

> Metatool updateLocation warning on views
> 
>
> Key: HIVE-11510
> URL: https://issues.apache.org/jira/browse/HIVE-11510
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.14.0
>Reporter: Eric Czech
>Assignee: Wei Zheng
> Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch
>
>
> If views are present in a hive database, issuing a 'hive metatool 
> -updateLocation' command will result in an error like this:
> ...
> Warning: Found records with bad LOCATION in SDS table.. 
> bad location URI: null
> bad location URI: null
> bad location URI: null
> 
> Based on the source code for Metatool, it looks like there would then be a 
> "bad location URI: null" message for every view and it also appears this is 
> happening simply because the 'sds' table in the hive schema has a column 
> called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed

2015-08-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720412#comment-14720412
 ] 

Sushanth Sowmyan commented on HIVE-11668:
-

This patch will still have an issue, as observed by [~wzheng] earlier today:

{noformat}
Caused by: org.datanucleus.api.jdo.exceptions.TransactionNotActiveException: 
Transaction is not active. You either need to define a transaction around this, 
or run your PersistenceManagerFactory with 'NontransactionalRead' and 
'NontransactionalWrite' set to 'true'
FailedObject:org.datanucleus.exceptions.TransactionNotActiveException: 
Transaction is not active. You either need to define a transaction around this, 
or run your PersistenceManagerFactory with 'NontransactionalRead' and 
'NontransactionalWrite' set to 'true'
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:396)
at org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:186)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:196)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.init(MetaStoreDirectSql.java:137)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:335)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:286)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:57)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:601)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:579)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:632)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:468)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:66)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5815)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:203)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74)
... 19 more
{noformat}

The issue here is this. Earlier, the runDbCheck() function was instantiating a 
transaction if it wasn't already open. So, as long as we were determining the 
db type by using runDbCheck, we were opening the txn as a side-effect (ugh). 
Now, by determining the product name by the jdbc provider, we're not calling 
runDbCheck, and thus, the txn is never opened.

You need the following in your chain, hopefully in a more sane place than in 
runDbCheck():

{noformat}
 Transaction tx = pm.currentTransaction();
+if (!tx.isActive()) {
+  tx.begin();
+}
{noformat}



 make sure directsql calls pre-query init when needed
 

 Key: HIVE-11668
 URL: https://issues.apache.org/jira/browse/HIVE-11668
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11668.patch


 See comments in HIVE-11123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.

2015-08-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720402#comment-14720402
 ] 

Sushanth Sowmyan commented on HIVE-11123:
-

Also, this patch broke hive working against mysql and potentially other dbs - I 
will follow up with comments on HIVE-11668. Testing with derby alone in unit 
test mode is problematic. Sorry I didn't catch this before it was committed.

 Fix how to confirm the RDBMS product name at Metastore.
 ---

 Key: HIVE-11123
 URL: https://issues.apache.org/jira/browse/HIVE-11123
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0
 Environment: PostgreSQL
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, 
 HIVE-11123.3.patch, HIVE-11123.4.patch, HIVE-11123.4a.patch


 I use PostgreSQL to Hive Metastore. And I saw the following message at 
 PostgreSQL log.
 {code}
  2015-06-26 10:58:15.488 JST ERROR:  syntax error at or near @@ at 
 character 5
  2015-06-26 10:58:15.488 JST STATEMENT:  SET @@session.sql_mode=ANSI_QUOTES
  2015-06-26 10:58:15.489 JST ERROR:  relation v$instance does not exist 
 at character 21
  2015-06-26 10:58:15.489 JST STATEMENT:  SELECT version FROM v$instance
  2015-06-26 10:58:15.490 JST ERROR:  column version does not exist at 
 character 10
  2015-06-26 10:58:15.490 JST STATEMENT:  SELECT @@version
 {code}
 When Hive CLI and Beeline embedded mode are carried out, this message is 
 output to PostgreSQL log.
 These queries are called from MetaStoreDirectSql#determineDbType. And if we 
 use MetaStoreDirectSql#getProductName, we need not to call these queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views

2015-08-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720434#comment-14720434
 ] 

Sushanth Sowmyan commented on HIVE-11510:
-

With the current patch, the metastore will do a LOG.debug for every single null 
record, which can be a lot, and will also slow down that process a lot.

Would it be possible to simply update the UpdateMStorageDescriptorTblURIRetVal 
class with a int numNullRecords initialized to zero and incremented each time 
you get a null? Also, in that case, I would imagine that we shouldn't add that 
location to badRecords, since that would bloat up the size of badRecords 
unnecessarily. After we do that, we can then do a singular log in 
HiveMetaTool.printTblURIUpdateSummary along with the other statistics, 
mentioning how many null records we found, and that that is okay if the user 
has that many indexes/views.

 Metatool updateLocation warning on views
 

 Key: HIVE-11510
 URL: https://issues.apache.org/jira/browse/HIVE-11510
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.14.0
Reporter: Eric Czech
Assignee: Wei Zheng
 Attachments: HIVE-11510.1.patch


 If views are present in a hive database, issuing a 'hive metatool 
 -updateLocation' command will result in an error like this:
 ...
 Warning: Found records with bad LOCATION in SDS table.. 
 bad location URI: null
 bad location URI: null
 bad location URI: null
 
 Based on the source code for Metatool, it looks like there would then be a 
 bad location URI: null message for every view and it also appears this is 
 happening simply because the 'sds' table in the hive schema has a column 
 called location that is NULL only for views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-08-27 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reopened HIVE-8678:


(Actually, maybe not a problem is an incorrect status, since it would 
indicate that the report is accurate, but working as designed. Reopening to 
close it again.)

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.2


 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-08-27 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan resolved HIVE-8678.

Resolution: Cannot Reproduce

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.2


 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-08-27 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717588#comment-14717588
 ] 

Sushanth Sowmyan commented on HIVE-8678:


Closed as Cannot reproduce

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.2


 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-08-27 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan resolved HIVE-8678.

   Resolution: Not A Problem
Fix Version/s: 1.2.2

Resolving as Not a problem as of branch-1.2, since this problem is not 
reproducible in the newer releases of hive.

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.2


 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-7000) Several issues with javadoc generation

2015-08-25 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan resolved HIVE-7000.

Resolution: Not A Problem

 Several issues with javadoc generation
 --

 Key: HIVE-7000
 URL: https://issues.apache.org/jira/browse/HIVE-7000
 Project: Hive
  Issue Type: Improvement
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7000.1.patch, javadoc_secondstab.patch


 1.
 Ran 'mvn  javadoc:javadoc -Phadoop-2'.  Encountered several issues
 - Generated classes are included in the javadoc
 - generation fails in the top level hcatalog folder because its src folder  
 contains  no java files.
 Patch attached to fix these issues.
 2.
 Tried mvn javadoc:aggregate -Phadoop-2 
 - cannot get an aggregated javadoc for all of hive
 - tried setting 'aggregate' parameter to true. Didn't work
 There are several questions in StackOverflow about multiple project javadoc. 
 Seems like this is broken. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11599) Add metastore command to dump it's configs

2015-08-24 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709871#comment-14709871
 ] 

Sushanth Sowmyan commented on HIVE-11599:
-

+1 to intent, this would be most useful.

 Add metastore command to dump it's configs
 --

 Key: HIVE-11599
 URL: https://issues.apache.org/jira/browse/HIVE-11599
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Affects Versions: 1.0.0
Reporter: Eugene Koifman

 We should have equivalent of Hive CLI set command on Metastore (and likely 
 HS2) which can dump out all properties this particular process is running 
 with.
 cc [~thejas]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9583) Rolling upgrade of Hive MetaStore Server

2015-08-24 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan resolved HIVE-9583.

   Resolution: Fixed
Fix Version/s: 1.2.2

(Marking as fixed on the 1.2 line, since per Thiruvel, all the tasks inside 
this are done, and were done as of 1.2.0)

 Rolling upgrade of Hive MetaStore Server
 

 Key: HIVE-9583
 URL: https://issues.apache.org/jira/browse/HIVE-9583
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.14.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
  Labels: hcatalog, metastore
 Fix For: 1.2.2


 This is an umbrella JIRA to track all rolling upgrade JIRAs w.r.t MetaStore 
 server. This will be helpful for users deploying Metastore server and 
 connecting to it with HCatalog or Hive CLI interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11607) Export tables broken for data 32 MB

2015-08-20 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705798#comment-14705798
 ] 

Sushanth Sowmyan commented on HIVE-11607:
-

Looks good to me, +1. (Agree with Swarnim's comment on RB as well, in that 
comments on the default options being set for DistCpOptions might be nice)

+cc [~mithun]

 Export tables broken for data  32 MB
 -

 Key: HIVE-11607
 URL: https://issues.apache.org/jira/browse/HIVE-11607
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11607.patch


 Broken for both hadoop-1 as well as hadoop-2 line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11607) Export tables broken for data 32 MB

2015-08-19 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703992#comment-14703992
 ] 

Sushanth Sowmyan commented on HIVE-11607:
-

Also, Hadoop20Shims.runDistCp seems to refer to 
org.apache.hadoop.tools.distcp2 as a classname - since 
org.apache.hadoop.tools.distcp2.DistCp would be the appropriate class, I'm not 
sure it works for 1.0 either unless I'm reading this incorrectly.


 Export tables broken for data  32 MB
 -

 Key: HIVE-11607
 URL: https://issues.apache.org/jira/browse/HIVE-11607
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 1.0.0, 1.2.0, 1.1.0
Reporter: Ashutosh Chauhan

 Broken for both hadoop-1 as well as hadoop-2 line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata

2015-08-14 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697900#comment-14697900
 ] 

Sushanth Sowmyan commented on HIVE-11552:
-

+cc [~thejas]

 implement basic methods for getting/putting file metadata
 -

 Key: HIVE-11552
 URL: https://issues.apache.org/jira/browse/HIVE-11552
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: hbase-metastore-branch

 Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, 
 HIVE-11552.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11456) HCatStorer should honor mapreduce.output.basename

2015-08-05 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658961#comment-14658961
 ] 

Sushanth Sowmyan commented on HIVE-11456:
-

Thanks for the fix - I have an additional question to verify if this causes a 
problem. In the case of appends, where a previous file already exists, it's 
possible that HCat would add an additional suffix to the resultant file, as 
noted by the following:

https://github.com/apache/hive/blob/master/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L650-L656

I want to make sure that this is not a surprise to you, and is okay?

 HCatStorer should honor mapreduce.output.basename
 -

 Key: HIVE-11456
 URL: https://issues.apache.org/jira/browse/HIVE-11456
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
Priority: Critical
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-11456.1.patch


 Pig on Tez scripts with union directly followed by HCatStorer have a problem 
 due to HCatStorer not honoring mapreduce.output.basename and always using 
 part. Tez sets mapreduce.output.basename to part-v000-o000 (vertex id 
 followed by output id). With union optimizer, Pig uses vertex groups to write 
 directly from both the vertices to the final output directory. Since hcat 
 ignores the mapreduce.output.basename, both the vertices produce 
 part-r-n and when they are moved from the temp location to the final 
 directory, they just overwrite each other. There is no failure and only one 
 of the files with that name makes it into the final directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-08-03 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652435#comment-14652435
 ] 

Sushanth Sowmyan commented on HIVE-8678:


On digging further, my issues in the 0.13.1 vm were a different issue from the 
one reported here, and was related to pig's jodatime being an older library 
than needed. It was solved by adding a joda-time-2.1.jar to PIG_CLASSPATH, and 
setting PIG_USER_CLASSPATH_FIRST so that it picked it up first. At this point, 
I am not able to reproduce this issue with 0.13.1 either.



 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648281#comment-14648281
 ] 

Sushanth Sowmyan commented on HIVE-11407:
-

The edits look good, +1.

 JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
 --

 Key: HIVE-11407
 URL: https://issues.apache.org/jira/browse/HIVE-11407
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch


 With around 7000 tables having around 1500 columns each, and 512MB of HS2 
 memory, I am able to reproduce this OOM .
 Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-07-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646429#comment-14646429
 ] 

Sushanth Sowmyan commented on HIVE-10165:
-

I think the examples are good and on-point as a guideline to new users - thank 
you for finding them. :) I don't think any further emphasis is needed. Also, 
thank you for the bit on the fix version setting clarification there. That's 
something that pops up often.

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Elliot West
Assignee: Elliot West
  Labels: TODOC2.0, streaming_api
 Fix For: 2.0.0

 Attachments: HIVE-10165.0.patch, HIVE-10165.10.patch, 
 HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, 
 HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-27 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642976#comment-14642976
 ] 

Sushanth Sowmyan commented on HIVE-8678:


Actually, after finding a 0.13.1 VM, I'm able to reproduce this. In 1.2, 
however, I am not. So something changed along the way to fix this. I can dig 
further to figure out what the problem was that made it not work in 0.13.1.

In addition, this problem exists with both orc and text formats in 0.13.1.

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-23 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639129#comment-14639129
 ] 

Sushanth Sowmyan commented on HIVE-8678:


No worries - sorry I didn't try experimenting on this earlier. :)

Hopefully this means that this bug was squished in the meanwhile somewhere 
between 0.13.1 and 1.2 and does not exist any longer.

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it

2015-07-22 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11344:

Attachment: HIVE-11344.patch

Patch implementing (a) attached.

 HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo 
 objects are unusable after it
 

 Key: HIVE-11344
 URL: https://issues.apache.org/jira/browse/HIVE-11344
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11344.patch


 HIVE-9845 introduced a notion of compression for HCatSplits so that when 
 serializing, it finds commonalities between PartInfo and TableInfo objects, 
 and if the two are identical, it nulls out that field in PartInfo, thus 
 making sure that when PartInfo is then serialized, info is not repeated.
 This, however, has the side effect of making the PartInfo object unusable if 
 HCatSplit.write has been called.
 While this does not affect M/R directly, since they do not know about the 
 PartInfo objects and once serialized, the HCatSplit object is recreated by 
 deserializing on the backend, which does restore the split and its PartInfo 
 objects, this does, however, affect framework users of HCat that try to mimic 
 M/R and then use the PartInfo objects to instantiate distinct readers.
 Thus, we need to make it so that PartInfo is still usable after 
 HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it

2015-07-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637699#comment-14637699
 ] 

Sushanth Sowmyan commented on HIVE-11344:
-

[~mithun], could you please review?

 HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo 
 objects are unusable after it
 

 Key: HIVE-11344
 URL: https://issues.apache.org/jira/browse/HIVE-11344
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11344.patch


 HIVE-9845 introduced a notion of compression for HCatSplits so that when 
 serializing, it finds commonalities between PartInfo and TableInfo objects, 
 and if the two are identical, it nulls out that field in PartInfo, thus 
 making sure that when PartInfo is then serialized, info is not repeated.
 This, however, has the side effect of making the PartInfo object unusable if 
 HCatSplit.write has been called.
 While this does not affect M/R directly, since they do not know about the 
 PartInfo objects and once serialized, the HCatSplit object is recreated by 
 deserializing on the backend, which does restore the split and its PartInfo 
 objects, this does, however, affect framework users of HCat that try to mimic 
 M/R and then use the PartInfo objects to instantiate distinct readers.
 Thus, we need to make it so that PartInfo is still usable after 
 HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it

2015-07-22 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11344:

Summary: HIVE-9845 makes HCatSplit.write modify the split so that PartInfo 
objects are unusable after it  (was: HIVE-9845 makes HCatSplit.write modify the 
split so that PartitionInfo objects are unusable after it)

 HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are 
 unusable after it
 ---

 Key: HIVE-11344
 URL: https://issues.apache.org/jira/browse/HIVE-11344
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11344.patch


 HIVE-9845 introduced a notion of compression for HCatSplits so that when 
 serializing, it finds commonalities between PartInfo and TableInfo objects, 
 and if the two are identical, it nulls out that field in PartInfo, thus 
 making sure that when PartInfo is then serialized, info is not repeated.
 This, however, has the side effect of making the PartInfo object unusable if 
 HCatSplit.write has been called.
 While this does not affect M/R directly, since they do not know about the 
 PartInfo objects and once serialized, the HCatSplit object is recreated by 
 deserializing on the backend, which does restore the split and its PartInfo 
 objects, this does, however, affect framework users of HCat that try to mimic 
 M/R and then use the PartInfo objects to instantiate distinct readers.
 Thus, we need to make it so that PartInfo is still usable after 
 HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it

2015-07-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637678#comment-14637678
 ] 

Sushanth Sowmyan commented on HIVE-11344:
-

There are three routes I see available here:

a) There is decompress logic in PartInfo.setTableInfo, and compress logic in 
PartInfo.writeObject. we could make it so that PartInfo.writeObject does the 
compression, writes itself, and then does the decompression back.
b) We could decompress on demand - wherein if a user calls 
getInputFormatClassName(), we then fetch that info if it's not available, and 
always return values consistently.
c) We could add a new conf parameter that controls whether or not we do 
compression - users with 100k splits would prefer compression, and be okay with 
the fact that PartInfo objects are not usable, and users that want to use the 
PartInfo objects will be okay with the fact that they are going to hog a little 
bit more serialized space.

(c) is a bad solution all-round. [~ashutoshc] would be mad at me for adding 
another conf parameter, and it is entirely possible that those that are trying 
to implement other streaming interfaces/etc and are mimicing M/R will run into 
a large number of partitions as well.
(b) is nifty, and I probably like the idea of, but I'm not entirely certain if 
it will run afoul of other serialization methods in the future that call 
getters to get fields (some json serializers) which might result in a bloated 
serialized PartInfo object anyway. Also, it spreads the decompression logic 
across multiple getters, and pushes the assert statement in multiple places as 
well.
(a) is probably the cleanest solution, although it makes a code reader wonder 
why we're going through the gymnastics we are. Some code comments might help 
with that.


 HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo 
 objects are unusable after it
 

 Key: HIVE-11344
 URL: https://issues.apache.org/jira/browse/HIVE-11344
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

 HIVE-9845 introduced a notion of compression for HCatSplits so that when 
 serializing, it finds commonalities between PartInfo and TableInfo objects, 
 and if the two are identical, it nulls out that field in PartInfo, thus 
 making sure that when PartInfo is then serialized, info is not repeated.
 This, however, has the side effect of making the PartInfo object unusable if 
 HCatSplit.write has been called.
 While this does not affect M/R directly, since they do not know about the 
 PartInfo objects and once serialized, the HCatSplit object is recreated by 
 deserializing on the backend, which does restore the split and its PartInfo 
 objects, this does, however, affect framework users of HCat that try to mimic 
 M/R and then use the PartInfo objects to instantiate distinct readers.
 Thus, we need to make it so that PartInfo is still usable after 
 HCatSplit.write is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637886#comment-14637886
 ] 

Sushanth Sowmyan commented on HIVE-8678:


I'm currently unable to reproduce this issue on hive-1.2 and pig-0.14.0, where 
I get the following:
In hive:
{noformat}

hive create table tdate(a string, b date) stored as orc;
OK
Time taken: 0.151 seconds
hive create table tsource(a string, b string) stored as orc;
OK
Time taken: 0.057 seconds
hive insert into table tsource values (abc, 2015-02-28);
...
OK
Time taken: 19.875 seconds
hive select * from tsource;
OK
abc 2015-02-28
Time taken: 0.143 seconds, Fetched: 1 row(s)
hive select a, cast(b as date) from tsource;
OK
abc 2015-02-28
Time taken: 0.092 seconds, Fetched: 1 row(s)
hive insert into table tdate select a, cast(b as date) from tsource;
...
OK
Time taken: 20.672 seconds
hive select * from tdate;
OK
abc 2015-02-28
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive describe tdate;
OK
a   string  
b   date
Time taken: 0.293 seconds, Fetched: 2 row(s)
{noformat}

In pig:
{noformat}
grunt A = load 'tdate' using org.apache.hive.hcatalog.pig.HCatLoader(); 
grunt describe A;   
2015-07-22 15:42:26,367 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
A: {a: chararray,b: datetime}
grunt dump A;
...
(abc,2015-02-28T00:00:00.000-08:00)
grunt
{noformat}



 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-22 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637888#comment-14637888
 ] 

Sushanth Sowmyan commented on HIVE-8678:


Also, unit tests exist since the introduction of DATE capability that have 
tested date interop between hive and pig through HCatalog, and that still 
succeeds for me when I try running them on hive 0.13.1.

Could you please show me what hive commands and pig commands you're running to 
recreate this issue?

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635548#comment-14635548
 ] 

Sushanth Sowmyan commented on HIVE-8678:


What storage format are you using for the table in question? (i.e. is it Text, 
RCFile, ORC, something else?)

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by

2015-07-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636010#comment-14636010
 ] 

Sushanth Sowmyan commented on HIVE-11172:
-

Incorrect results makes it a good candidate for a backport to branch-1.2.

Pedantic note : 1.2.1 has already shipped. This would go in 1.2.2, please set 
fix version appropriately after committing.

 Vectorization wrong results for aggregate query with where clause without 
 group by
 --

 Key: HIVE-11172
 URL: https://issues.apache.org/jira/browse/HIVE-11172
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
Reporter: Yi Zhang
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 2.0.0

 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, 
 HIVE-11172.3.patch


 create table testvec(id int, dt int, greg_dt string) stored as orc;
 insert into table testvec
 values 
 (1,20150330, '2015-03-30'),
 (2,20150301, '2015-03-01'),
 (3,20150502, '2015-05-02'),
 (4,20150401, '2015-04-01'),
 (5,20150313, '2015-03-13'),
 (6,20150314, '2015-03-14'),
 (7,20150404, '2015-04-04');
 hive select dt, greg_dt from testvec where id=5;
 OK
 20150313  2015-03-13
 Time taken: 4.435 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true;
 hive set hive.map.aggr;
 hive.map.aggr=true
 hive select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-30
 hive set hive.vectorized.execution.enabled=false;
 hive  select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by

2015-07-21 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11172:

Fix Version/s: 1.3.0

 Vectorization wrong results for aggregate query with where clause without 
 group by
 --

 Key: HIVE-11172
 URL: https://issues.apache.org/jira/browse/HIVE-11172
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
Reporter: Yi Zhang
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, 
 HIVE-11172.3.patch


 create table testvec(id int, dt int, greg_dt string) stored as orc;
 insert into table testvec
 values 
 (1,20150330, '2015-03-30'),
 (2,20150301, '2015-03-01'),
 (3,20150502, '2015-05-02'),
 (4,20150401, '2015-04-01'),
 (5,20150313, '2015-03-13'),
 (6,20150314, '2015-03-14'),
 (7,20150404, '2015-04-04');
 hive select dt, greg_dt from testvec where id=5;
 OK
 20150313  2015-03-13
 Time taken: 4.435 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true;
 hive set hive.map.aggr;
 hive.map.aggr=true
 hive select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-30
 hive set hive.vectorized.execution.enabled=false;
 hive  select max(dt), max(greg_dt) from testvec where id=5;
 OK
 20150313  2015-03-13



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-17 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-8678:
--

Assignee: Sushanth Sowmyan

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog

2015-07-17 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631906#comment-14631906
 ] 

Sushanth Sowmyan commented on HIVE-8678:


Something seems weird here - looking at the code, it looks like the current 
code, where it simply casts to Date should be the right way to do this, since 
it should have called .getPrimitiveJavaObject() on the PrimitiveObjectInspector 
to get this object, and DateObjectInspector.getPrimitiveJavaObject() should 
have returned a Date. However, clearly, from your stack trace, you're getting a 
string. I'll dig into this and update as I find more.

 Pig fails to correctly load DATE fields using HCatalog
 --

 Key: HIVE-8678
 URL: https://issues.apache.org/jira/browse/HIVE-8678
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Michael McLellan
Assignee: Sushanth Sowmyan

 Using:
 Hadoop 2.5.0-cdh5.2.0 
 Pig 0.12.0-cdh5.2.0
 Hive 0.13.1-cdh5.2.0
 When using pig -useHCatalog to load a Hive table that has a DATE field, when 
 trying to DUMP the field, the following error occurs:
 {code}
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.sql.Date
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375)
 at 
 org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 2014-10-30 22:58:05,469 [main] ERROR 
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting 
 read value to tuple
 {code}
 It seems to be occuring here: 
 https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433
 and that it should be:
 {code}Date d = Date.valueOf(o);{code} 
 instead of 
 {code}Date d = (Date) o;{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11198) Fix load data query file format check for partitioned tables

2015-07-08 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619528#comment-14619528
 ] 

Sushanth Sowmyan commented on HIVE-11198:
-

Very useful, Prashant! I do believe this should fix the other issue I observed 
with repl. Thanks!

+1

 Fix load data query file format check for partitioned tables
 

 Key: HIVE-11198
 URL: https://issues.apache.org/jira/browse/HIVE-11198
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11198.patch


 HIVE-8 added file format check for ORC format. The check will throw 
 exception when non ORC formats is loaded to ORC managed table. But it does 
 not work for partitioned table. Partitioned tables are allowed to have some 
 partitions with different file format. See this discussion for more details
 https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14617271page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14617271



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-07-07 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617271#comment-14617271
 ] 

Sushanth Sowmyan commented on HIVE-8:
-

I have a question here - I will open another bug if need be, but if it's a 
simple misunderstanding, it won't matter.

From the patch, I see the following bit:

{code}
337   private void ensureFileFormatsMatch(TableSpec ts, URI 
fromURI) throws SemanticException {
338 Class? extends InputFormat destInputFormat = 
ts.tableHandle.getInputFormatClass();
339 // Other file formats should do similar check to make sure file 
formats match
340 // when doing LOAD DATA .. INTO TABLE
341 if (OrcInputFormat.class.equals(destInputFormat)) {
342   Path inputFilePath = new Path(fromURI);
343   try {
344 FileSystem fs = FileSystem.get(fromURI, conf);
345 // just creating orc reader is going to do sanity checks to 
make sure its valid ORC file
346 OrcFile.createReader(fs, inputFilePath);
347   } catch (FileFormatException e) {
348 throw new 
SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg(Destination +
349  table is stored as ORC but the file being loaded is not a 
valid ORC file.));
350   } catch (IOException e) {
351 throw new SemanticException(Unable to load data to destination 
table. +
352  Error:  + e.getMessage());
353   }
354 }
355   }
{code}

Now, it's entirely possible that the table in question is an ORC table, but the 
partition being loaded is of another format, such as Text - Hive supports mixed 
partition scenarios. In fact, this is a likely scenario in the case of a 
replication of a table that used to be Text, but has been converted to Orc, so 
that all new partitions will be orc. Then, in that case, the destination table 
will be a MANAGED_TABLE, and will be an orc table, but import will try to 
load a text partition on to it.

Shouldn't this refer to a partitionspec rather than the table's inputformat for 
this check to work with that scenario?

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables

2015-07-07 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617264#comment-14617264
 ] 

Sushanth Sowmyan commented on HIVE-8:
-

Thanks, [~leftylev]! Added.

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11118) Load data query should validate file formats with destination tables

2015-07-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-8:

Fix Version/s: 2.0.0
   1.3.0

 Load data query should validate file formats with destination tables
 

 Key: HIVE-8
 URL: https://issues.apache.org/jira/browse/HIVE-8
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, 
 HIVE-8.4.patch, HIVE-8.patch


 Load data local inpath queries does not do any validation wrt file format. If 
 the destination table is ORC and if we try to load files that are not ORC, 
 the load will succeed but querying such tables will result in runtime 
 exceptions. We can do some simple sanity checks to prevent loading of files 
 that does not match the destination table file format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions

2015-07-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11104:

Fix Version/s: 1.3.0

 Select operator doesn't propagate constants appearing in expressions
 

 Key: HIVE-11104
 URL: https://issues.apache.org/jira/browse/HIVE-11104
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-29 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605959#comment-14605959
 ] 

Sushanth Sowmyan commented on HIVE-10983:
-

Not a problem! As part of the release process, I'm required to go unset all 
jiras marked for older released releases, and that's what I was doing. :)

To expand further, the idea is that Fix Version is set to track which branches 
the commits got committed to, and thus, should not be set unless this patch has 
already been committed to those branches. So, now, for example, if this commit 
is committed to branch-1.2 to track 1.2.x, its fix version would be 1.2.2 once 
it is committed. Setting it to 1.2.0 would mean that this was included as part 
of the 1.2.0 release, which it wasn't. So, for this, when a committer commits a 
patch for this bug, if they commit it to branch-1.2, they should then set the 
fix version to 1.2.2.


 SerDeUtils bug  ,when Text is reused 
 -

 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
  Labels: patch
 Fix For: 2.0.0

 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
 HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt


 {noformat}
 The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
 invoke a bad method of Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 When i query data from a lzo table , I found  in results : the length of the 
 current row is always largr  than the previous row, and sometimes,the current 
  row contains the contents of the previous row。 For example ,i execute a sql ,
 {code:sql}
 select *   from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content  of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
   `line` string)
 PARTITIONED BY (
   `logdate` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '\\U'
 WITH SERDEPROPERTIES (
   'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
   OUTPUTFORMAT 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11066) Ensure tests don't share directories on FS

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11066:

Fix Version/s: (was: 1.2.1)
   1.2.2

 Ensure tests don't share directories on FS
 --

 Key: HIVE-11066
 URL: https://issues.apache.org/jira/browse/HIVE-11066
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 1.2.2

 Attachments: HIVE-11066.patch


 Tests often fail with errors like
 Could not fully delete 
 D:\w\hv\hcatalog\hcatalog-pig-adapter\target\tmp\dfs\name1 on Windows 
 platforms.
 Attached is a prototype on avoiding these false negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11059:

Fix Version/s: (was: 1.2.1)
   1.2.2

 hcatalog-server-extensions tests scope should depend on hive-exec
 -

 Key: HIVE-11059
 URL: https://issues.apache.org/jira/browse/HIVE-11059
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor
 Fix For: 1.2.2

 Attachments: HIVE-11059.patch


 (causes test failures in Windows due to the lack of WindowsPathUtil being 
 available otherwise)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11060) Make test windowing.q robust

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11060:

Fix Version/s: 1.2.2

 Make test windowing.q robust
 

 Key: HIVE-11060
 URL: https://issues.apache.org/jira/browse/HIVE-11060
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11060.01.patch, HIVE-11060.patch


 Add partition / order by in over clause to make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11083) Make test cbo_windowing robust

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11083:

Fix Version/s: 1.2.2

 Make test cbo_windowing robust
 --

 Key: HIVE-11083
 URL: https://issues.apache.org/jira/browse/HIVE-11083
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11083.patch


 Make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11076:

Fix Version/s: 1.2.2

 Explicitly set hive.cbo.enable=true for some tests
 --

 Key: HIVE-11076
 URL: https://issues.apache.org/jira/browse/HIVE-11076
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11048) Make test cbo_windowing robust

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11048:

Fix Version/s: 1.2.2

 Make test cbo_windowing robust
 --

 Key: HIVE-11048
 URL: https://issues.apache.org/jira/browse/HIVE-11048
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 1.2.2

 Attachments: HIVE-11048.patch


 Add partition / order by in over clause to make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11050) testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11050:

Fix Version/s: (was: 1.2.1)
   1.2.2

 testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data 
 creation queries
 --

 Key: HIVE-11050
 URL: https://issues.apache.org/jira/browse/HIVE-11050
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Blocker
 Fix For: 1.2.2

 Attachments: HIVE-11050.01.branch-1.patch, HIVE-11050.01.patch


 In some environments the Q file tests vector_outer_join\{1-4\}.q fail because 
 the data creation queries produce different input files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11095:

Fix Version/s: (was: 1.2.0)

 SerDeUtils  another bug ,when Text is reused
 

 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 0.14.0, 1.0.0, 1.2.0
 Environment: Hadoop 2.3.0-cdh5.0.0
 Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt


 {noformat}
 The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
 Text,getBytes()!
 The method getBytes of Text returns the raw bytes; however, only data up to 
 Text.length is valid.A better way is  use copyBytes()  if you need the 
 returned array to be precisely the length of the data.
 But the copyBytes is added behind hadoop1. 
 {noformat}
 How I found this bug?
 When i query data from a lzo table , I found in results : the length of the 
 current row is always largr than the previous row, and sometimes,the current 
 row contains the contents of the previous row。 For example ,i execute a sql ,
 {code:sql}
 select * from web_searchhub where logdate=2015061003
 {code}
 the result of sql see blow.Notice that ,the second row content contains the 
 first row content.
 {noformat}
 INFO [03:00:05.589] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
 session=901,thread=223ession=3151,thread=254 2015061003
 {noformat}
 The content of origin lzo file content see below ,just 2 rows.
 {noformat}
 INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
 session=3148,thread=285
 INFO [03:00:05.635] HttpFrontServer::FrontSH 
 msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
 {noformat}
 I think this error is caused by the Text reuse,and I found the solutions .
 Addicational, table create sql is : 
 {code:sql}
 CREATE EXTERNAL TABLE `web_searchhub`(
 `line` string)
 PARTITIONED BY (
 `logdate` string)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '
 U'
 WITH SERDEPROPERTIES (
 'serialization.encoding'='GBK')
 STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
 OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
 LOCATION
 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11010:

Fix Version/s: (was: 1.2.1)

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 1.2.1
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser

 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-06-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605018#comment-14605018
 ] 

Sushanth Sowmyan edited comment on HIVE-4577 at 6/29/15 1:15 AM:
-

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.1 release. Please set appropriate commit version when this fix is committed.


was (Author: sushanth):
Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.` release. Please set appropriate commit version when this fix is committed.

 hive CLI can't handle hadoop dfs command  with space and quotes.
 

 Key: HIVE-4577
 URL: https://issues.apache.org/jira/browse/HIVE-4577
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
Reporter: Bing Li
Assignee: Bing Li
 Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
 HIVE-4577.3.patch.txt, HIVE-4577.4.patch


 As design, hive could support hadoop dfs command in hive shell, like 
 hive dfs -mkdir /user/biadmin/mydir;
 but has different behavior with hadoop if the path contains space and quotes
 hive dfs -mkdir hello; 
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
 /user/biadmin/hello
 hive dfs -mkdir 'world';
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
 /user/biadmin/'world'
 hive dfs -mkdir bei jing;
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/bei
 drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
 /user/biadmin/jing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605019#comment-14605019
 ] 

Sushanth Sowmyan commented on HIVE-11010:
-

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.1 release. Please set appropriate commit version when this fix is committed.

 Accumulo storage handler queries via HS2 fail
 -

 Key: HIVE-11010
 URL: https://issues.apache.org/jira/browse/HIVE-11010
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 1.2.1
 Environment: Secure
Reporter: Takahiko Saito
Assignee: Josh Elser

 On Kerberized cluster, accumulo storage handler throws an error, 
 [usrname]@[principlaname] is not allowed to impersonate [username] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-06-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605017#comment-14605017
 ] 

Sushanth Sowmyan edited comment on HIVE-10792 at 6/29/15 1:15 AM:
--

Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.1 release. Please set appropriate commit version when this fix is committed.


was (Author: sushanth):
Removing fix version of 1.2.1 since this is not part of the already-released 
1.2.` release. Please set appropriate commit version when this fix is committed.

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical
 Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, 
 HIVE-10792.test.sql


 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    2   3   4   5   6   7   8   9   10   11   >