[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used
[ https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977157#comment-14977157 ] Sushanth Sowmyan commented on HIVE-11985: - For most things that muck with the typesystem in hive, [~jdere] is my go-to person to check with. Tagging him here. > don't store type names in metastore when metastore type names are not used > -- > > Key: HIVE-11985 > URL: https://issues.apache.org/jira/browse/HIVE-11985 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, > HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11988) [hive] security issue with hive & ranger for import table command
[ https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977645#comment-14977645 ] Sushanth Sowmyan commented on HIVE-11988: - Ugh, looks like I missed updating 3 tests: * TestMinimrCliDriver.testCliDriver_import_exported_table * TestMiniSparkOnYarnCliDriver.testCliDriver_import_exported_table * TestCliDriver.testCliDriver_authorization_reset And a fourth test, which I thought I had updated is failing, but not for the extra PREHOOK/POSTHOOK : * TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import I'll look into these and post an update tonight. > [hive] security issue with hive & ranger for import table command > - > > Key: HIVE-11988 > URL: https://issues.apache.org/jira/browse/HIVE-11988 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.2.1 >Reporter: Deepak Sharma >Assignee: Sushanth Sowmyan >Priority: Critical > Attachments: HIVE-11988.2.patch, HIVE-11988.3.patch, HIVE-11988.patch > > > if a user does not have permission to create table in hive , then if the same > user import data for a table using following command then , it will have to > create table also and that is working successfully , ideally it should not > work > STR: > > 1. put some raw data in hdfs path /user/user1/tempdata > 2. in ranger check policy , user1 should not have any permission on any table > 3. login through user1 into beeline ( obviously it will fail since user > doesnt have permission to create table) > create table tt1(id INT,ff String); > FAILED: HiveAccessControlException Permission denied: user user1 does not > have CREATE privilege on default/tt1 (state=42000,code=4) > 4. now try following command to import data into a table ( table should not > exist already) > import table tt1 from '/user/user1/tempdata'; > ER: > since user1 doesnt have permission to create table so this operation should > fail > AR: > table is created successfully and data is also imported !! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9013: --- Attachment: HIVE-9013.5.patch-branch1 Attaching branch-1 version of patch. > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Fix For: 2.0.0 > > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, > HIVE-9013.5.patch-branch1 > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974863#comment-14974863 ] Sushanth Sowmyan commented on HIVE-9013: Committed to branch-1 as well. > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, > HIVE-9013.5.patch-branch1 > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975144#comment-14975144 ] Sushanth Sowmyan commented on HIVE-9013: The branch-1.2 version of this patch incorporates HIVE-11670's fix as well. > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Fix For: 1.3.0, 2.0.0, 1.2.2 > > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, > HIVE-9013.5.patch-branch1, HIVE-9013.5.patch-branch1.2 > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11670) Strip out password information from TezSessionState configuration
[ https://issues.apache.org/jira/browse/HIVE-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975140#comment-14975140 ] Sushanth Sowmyan commented on HIVE-11670: - Note - while this patch was not committed in branch-1.2, in the process of backporting HIVE-9013, this was effectively merged in to the branch-1.2 commit for HIVE-9013 as well. > Strip out password information from TezSessionState configuration > - > > Key: HIVE-11670 > URL: https://issues.apache.org/jira/browse/HIVE-11670 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11670.1.patch > > > Remove password information from configuration copy that is sent to Yarn/Tez. > We don't need it there. The config entries can potentially be visible to > other users. > HIVE-10508 had the fix which removed this in certain places, however, when I > initiated a session via Hive Cli, I could still see the password information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9013: --- Attachment: HIVE-9013.5.patch-branch1.2 Attaching branch-1.2 version of patch as well, committed there too. > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Fix For: 1.3.0, 2.0.0, 1.2.2 > > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, > HIVE-9013.5.patch-branch1, HIVE-9013.5.patch-branch1.2 > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9013: --- Fix Version/s: 1.3.0 > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, > HIVE-9013.5.patch-branch1 > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11988) [hive] security issue with hive & ranger for import table command
[ https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975717#comment-14975717 ] Sushanth Sowmyan commented on HIVE-11988: - Oh! True. I remember thinking that I needed to update that, but somehow thought it was part of my previous plan for how I wanted to have a separate *ForTest class. I'll update it. > [hive] security issue with hive & ranger for import table command > - > > Key: HIVE-11988 > URL: https://issues.apache.org/jira/browse/HIVE-11988 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.2.1 >Reporter: Deepak Sharma >Assignee: Sushanth Sowmyan >Priority: Critical > Attachments: HIVE-11988.patch > > > if a user does not have permission to create table in hive , then if the same > user import data for a table using following command then , it will have to > create table also and that is working successfully , ideally it should not > work > STR: > > 1. put some raw data in hdfs path /user/user1/tempdata > 2. in ranger check policy , user1 should not have any permission on any table > 3. login through user1 into beeline ( obviously it will fail since user > doesnt have permission to create table) > create table tt1(id INT,ff String); > FAILED: HiveAccessControlException Permission denied: user user1 does not > have CREATE privilege on default/tt1 (state=42000,code=4) > 4. now try following command to import data into a table ( table should not > exist already) > import table tt1 from '/user/user1/tempdata'; > ER: > since user1 doesnt have permission to create table so this operation should > fail > AR: > table is created successfully and data is also imported !! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11988) [hive] security issue with hive & ranger for import table command
[ https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11988: Attachment: HIVE-11988.patch Patch attached. > [hive] security issue with hive & ranger for import table command > - > > Key: HIVE-11988 > URL: https://issues.apache.org/jira/browse/HIVE-11988 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.2.1 >Reporter: Deepak Sharma >Assignee: Sushanth Sowmyan >Priority: Critical > Attachments: HIVE-11988.patch > > > if a user does not have permission to create table in hive , then if the same > user import data for a table using following command then , it will have to > create table also and that is working successfully , ideally it should not > work > STR: > > 1. put some raw data in hdfs path /user/user1/tempdata > 2. in ranger check policy , user1 should not have any permission on any table > 3. login through user1 into beeline ( obviously it will fail since user > doesnt have permission to create table) > create table tt1(id INT,ff String); > FAILED: HiveAccessControlException Permission denied: user user1 does not > have CREATE privilege on default/tt1 (state=42000,code=4) > 4. now try following command to import data into a table ( table should not > exist already) > import table tt1 from '/user/user1/tempdata'; > ER: > since user1 doesnt have permission to create table so this operation should > fail > AR: > table is created successfully and data is also imported !! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11988) [hive] security issue with hive & ranger for import table command
[ https://issues.apache.org/jira/browse/HIVE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975615#comment-14975615 ] Sushanth Sowmyan commented on HIVE-11988: - [~thejas], could you please have a look? > [hive] security issue with hive & ranger for import table command > - > > Key: HIVE-11988 > URL: https://issues.apache.org/jira/browse/HIVE-11988 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.14.0, 1.2.1 >Reporter: Deepak Sharma >Assignee: Sushanth Sowmyan >Priority: Critical > Attachments: HIVE-11988.patch > > > if a user does not have permission to create table in hive , then if the same > user import data for a table using following command then , it will have to > create table also and that is working successfully , ideally it should not > work > STR: > > 1. put some raw data in hdfs path /user/user1/tempdata > 2. in ranger check policy , user1 should not have any permission on any table > 3. login through user1 into beeline ( obviously it will fail since user > doesnt have permission to create table) > create table tt1(id INT,ff String); > FAILED: HiveAccessControlException Permission denied: user user1 does not > have CREATE privilege on default/tt1 (state=42000,code=4) > 4. now try following command to import data into a table ( table should not > exist already) > import table tt1 from '/user/user1/tempdata'; > ER: > since user1 doesnt have permission to create table so this operation should > fail > AR: > table is created successfully and data is also imported !! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974794#comment-14974794 ] Sushanth Sowmyan commented on HIVE-9013: Ah, I see the issue with making it static in terms of adding another parameter there. If the hidden configs were not configurable themselves, it would be possible to make it static, but not otherwise. I'm okay with the patch as-is, +1. I'll go ahead and commit this. Thanks, Binglin and Thejas! > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12261) schematool version info exit status should depend on compatibility, not equality
[ https://issues.apache.org/jira/browse/HIVE-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973309#comment-14973309 ] Sushanth Sowmyan commented on HIVE-12261: - Hi [~thejas] - looks good to me, +1. I will admit that for a moment there, I thought that HiveSchemaTool.verifySchemaVersion was now doing the wrong thing by testing for newSchemaVersion >= MetaStoreSchemaInfo.getHiveSchemaVersion() instead of the equality before to verify the update, but I see why I was wrong to assume so. Maybe worth adding a comment there to explain there what the compatibility check does, and why the direction of >= is correct for it. > schematool version info exit status should depend on compatibility, not > equality > > > Key: HIVE-12261 > URL: https://issues.apache.org/jira/browse/HIVE-12261 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.3.0, 2.0.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-12261-branch-1.0.0.patch, > HIVE-12261-branch-1.patch, HIVE-12261.1.patch > > > Newer versions of metastore schema are compatible with older versions of > hive, as only new tables or columns are added with additional information. > HIVE-11613 added a check in hive schematool -info command to see if schema > version is equal. > However, the state where db schema version is ahead of hive software version > is often seen when a 'rolling upgrade' or 'rolling downgrade' is happening. > This is a state where hive is functional and returning non zero status for it > is misleading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971950#comment-14971950 ] Sushanth Sowmyan commented on HIVE-9013: Hi [~decster], thanks for the update and the patch. I'd ask for one last update if you don't mind (or we can do that as a separate patch): It's better to have HiveConf.stripHiddenConfigurations(Configuration conf) as you have introduced to be static, I think. That way, it avoids one notion of confusion later on in the code (as in your patch) where we have to call it like this: {code} conf.stripHiddenConfigurations(job); {code} In that scenario, it becomes unclear if we're stripping it from conf, or from job, and the truth of the matter is that we're stripping it from job. If we made that call static, we can call HiveConf.stripHiddenConfigurations(job), which would be much clearer. I think, with that, I'm +1 on this. Thanks for adding in tests. Normally, for ql changes, such as with set behaviour, we make changes to .q files, which is easier to develop, but having a proper junit test as you have done is good too. :) > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch, HIVE-9013.5.patch > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970194#comment-14970194 ] Sushanth Sowmyan commented on HIVE-9013: Hi Binglin, thanks for your update. I think we could use two more minor changes: a) It'd be good to have a .q test added to this that simply sets one hidden variable and non-hidden variable, and then runs a set (to show all) and a set on each of these individual variables (to show individual behaviour) - that way, we'll have a .q.out test that we can check against in the future for regressions. b) There's another jira, HIVE-10518, which introduced behaviour to strip out password details from a jobconf before passing it on. Could you please also make a change, so that these two are integrated together better? i.e. The goal behaviour for Utilities.stripHivePasswordDetails after your patch should not be Utilities.stripHivePasswordDetails but Utilities.stripRestrictedConfigurations, thereby stripping all other config params that match your new enum as well. Thanks! > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, > HIVE-9013.4.patch > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password
[ https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967775#comment-14967775 ] Sushanth Sowmyan commented on HIVE-9013: Hi [~decster], please let me know if you're planning on updating this jira per [~thejas]'s suggestions above - if you don't mind, I can help update this patch to get it in. I think this will be a very useful patch to have in. Thanks! > Hive set command exposes metastore db password > -- > > Key: HIVE-9013 > URL: https://issues.apache.org/jira/browse/HIVE-9013 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch > > > When auth is enabled, we still need set command to set some variables(e.g. > mapreduce.job.queuename), but set command alone also list all > information(including vars in restrict list), this exposes like > "javax.jdo.option.ConnectionPassword" > I think conf var in the restrict list should also excluded from dump vars > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12221) Concurrency issue in HCatUtil.getHiveMetastoreClient()
[ https://issues.apache.org/jira/browse/HIVE-12221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967832#comment-14967832 ] Sushanth Sowmyan commented on HIVE-12221: - Per Roshan's mail to me, adding in a reference : https://en.wikipedia.org/wiki/Double-checked_locking#Usage_in_Java > Concurrency issue in HCatUtil.getHiveMetastoreClient() > --- > > Key: HIVE-12221 > URL: https://issues.apache.org/jira/browse/HIVE-12221 > Project: Hive > Issue Type: Bug >Reporter: Roshan Naik > > HCatUtil.getHiveMetastoreClient() uses double checked locking pattern > to implement singleton, which is a broken pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961662#comment-14961662 ] Sushanth Sowmyan commented on HIVE-12083: - Thanks, Thejas! Committed to branch-1, branch-1.2 and master, where HIVE-10965 exists. > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-12083.2.patch, HIVE-12083.patch > > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957767#comment-14957767 ] Sushanth Sowmyan commented on HIVE-12083: - [~thejas]/[~ashutoshc], can I bug either of you for a review for the updated patch? > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-12083.2.patch, HIVE-12083.patch > > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955289#comment-14955289 ] Sushanth Sowmyan commented on HIVE-11149: - [~thejas], agreed in theory, but is blocked by HIVE-11891, which, admittedly is also a reasonable backport candidate. > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Fix For: 2.0.0 > > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955809#comment-14955809 ] Sushanth Sowmyan commented on HIVE-11149: - As an update, I do not think that we should be backporting HIVE-11891, since it refactors PerfLogger from hive-exec to hive-common, which is a cross-jar change that I don't think we should make on backport maint lines. However, this patch is simple enough that we could create a 1.2 version of this patch as well which will affect PerfLogger in hive-exec as it used to be in 1.2. > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Fix For: 2.0.0 > > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953421#comment-14953421 ] Sushanth Sowmyan commented on HIVE-12083: - > Should we short circuit for empty partitions case as well in the client side ? I think that makes sense and we should. I didn't initially because I hadn't evaluated the calling codepath to see if there was a difference between a null return and an empty return for AggrStats from the HMSC for the empty partNames case. Now that I've looked through that in some detail, I am for it. I will update the patch. > Does the case where table has not partition columns also use the > getAggrColStatsFor method ? If that is the case we should not be > shortcircuting this way. I thought of that, but irrespective of whether the client short-circuits, the metastore server will short circuit anyway, it's only a matter of a difference between returning null and an empty object. > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-12083.patch > > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since
[jira] [Commented] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953504#comment-14953504 ] Sushanth Sowmyan commented on HIVE-12083: - Spoke to ashutosh about this - going to make one more change - in addition to the short-circuit on the client side, the desired behaviour on the client side would also be to return an empty AggrStats rather than returning null. > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-12083.patch > > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-12083: Attachment: HIVE-12083.2.patch Patch updated. > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-12083.2.patch, HIVE-12083.patch > > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10965) direct SQL for stats fails in 0-column case
[ https://issues.apache.org/jira/browse/HIVE-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954231#comment-14954231 ] Sushanth Sowmyan commented on HIVE-10965: - Note, this fix introduces a bug that is fixed by https://issues.apache.org/jira/browse/HIVE-12083 , and thus, that patch must be present on all branches this was patched with. > direct SQL for stats fails in 0-column case > --- > > Key: HIVE-10965 > URL: https://issues.apache.org/jira/browse/HIVE-10965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.2.1, 1.0.2 > > Attachments: HIVE-10965.01.patch, HIVE-10965.02.patch, > HIVE-10965.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-12083: Attachment: HIVE-12083.patch Patch attached, with tests. [~sershe]/[~thejas], could you please review? > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-12083.patch > > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-12083: Description: In the fix for HIVE-10965, there is a short-circuit path that causes an empty AggrStats object to be returned if partNames is empty or colNames is empty: {code} diff --git metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java index 0a56bac..ed810d2 100644 --- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java +++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( public AggrStats aggrColStatsForPartitions(String dbName, String tableName, List partNames, List colNames, boolean useDensityFunctionForNDVEstimation) throws MetaException { +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // Nothing to aggregate. long partsFound = partsFoundForPartitions(dbName, tableName, partNames, colNames); List colStatsList; // Try to read from the cache first {code} This runs afoul of thrift requirements that AggrStats have required fields: {code} struct AggrStats { 1: required list colStats, 2: required i64 partsFound // number of partitions for which stats were found } {code} Thus, we get errors as follows: {noformat} 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is unset! Struct:AggrStats(colStats:null, partsFound:0) at org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Normally, this would not occur since HIVE-10965 does also include a guard on the client-side for colNames.isEmpty() to not call the metastore call at all, but there is no guard for partNames being empty, and would still cause an error on the metastore side if the thrift call were called directly, as would happen if the client is from an older version before this was patched. was: In the fix for HIVE-10965, there is a short-circuit path that causes an empty AggrStats object to be returned if partNames is empty or colNames is empty: {code} diff --git metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java index 0a56bac..ed810d2 100644 --- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java +++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( public AggrStats aggrColStatsForPartitions(String dbName, String tableName, List partNames, List colNames, boolean useDensityFunctionForNDVEstimation) throws MetaException { +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // Nothing to aggregate. long partsFound = partsFoundForPartitions(dbName, tableName, partNames, colNames); List colStatsList; // Try to read from the cache first {code} This runs afoul of thrift requirements that AggrStats have
[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-12083: Component/s: Metastore > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty
[ https://issues.apache.org/jira/browse/HIVE-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-12083: Affects Version/s: 1.0.2 1.2.1 > HIVE-10965 introduces thrift error if partNames or colNames are empty > - > > Key: HIVE-12083 > URL: https://issues.apache.org/jira/browse/HIVE-12083 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1, 1.0.2 >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > > In the fix for HIVE-10965, there is a short-circuit path that causes an empty > AggrStats object to be returned if partNames is empty or colNames is empty: > {code} > diff --git > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > index 0a56bac..ed810d2 100644 > --- > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > +++ > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > @@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats( >public AggrStats aggrColStatsForPartitions(String dbName, String tableName, >List partNames, List colNames, boolean > useDensityFunctionForNDVEstimation) >throws MetaException { > +if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); > // Nothing to aggregate. > long partsFound = partsFoundForPartitions(dbName, tableName, partNames, > colNames); > List colStatsList; > // Try to read from the cache first > {code} > This runs afoul of thrift requirements that AggrStats have required fields: > {code} > struct AggrStats { > 1: required list colStats, > 2: required i64 partsFound // number of partitions for which stats were found > } > {code} > Thus, we get errors as follows: > {noformat} > 2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer > (TThreadPoolServer.java:run(213)) - Thrift error occurred during processing > of message. > org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is > unset! Struct:AggrStats(colStats:null, partsFound:0) > at > org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Normally, this would not occur since HIVE-10965 does also include a guard on > the client-side for colNames.isEmpty() to not call the metastore call at all, > but there is no guard for partNames being empty, and would still cause an > error on the metastore side if the thrift call were called directly, as would > happen if the client is from an older version before this was patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4997: --- Fix Version/s: (was: 0.13.0) > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4997: --- Release Note: (was: IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected) > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4997: --- Tags: (was: IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected) > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949874#comment-14949874 ] Sushanth Sowmyan commented on HIVE-4997: Hi, [~Abhiram], I notice you marked this issue as resolved - however, this issue has not been committed to hive, and we have not decided to abandon it either, and thus, this has not been resolved. I'm reopening it, and with updates and after the patch is accepted and committed, it can be resolved. > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli >Assignee: abhiram > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reopened HIVE-4997: Assignee: (was: abhiram) > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4997) HCatalog doesn't allow multiple input tables
[ https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4997: --- Hadoop Flags: (was: Incompatible change) > HCatalog doesn't allow multiple input tables > > > Key: HIVE-4997 > URL: https://issues.apache.org/jira/browse/HIVE-4997 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 0.13.0 >Reporter: Daniel Intskirveli > Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch > > > HCatInputFormat does not allow reading from multiple hive tables in the same > MapReduce job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12012) select query on json table with map containing numeric values fails
[ https://issues.apache.org/jira/browse/HIVE-12012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945911#comment-14945911 ] Sushanth Sowmyan commented on HIVE-12012: - Ah, sorry - when you pinged me last, I did not see you'd attached a patch for this - but yes, that patch fixes this issue. +1. > select query on json table with map containing numeric values fails > --- > > Key: HIVE-12012 > URL: https://issues.apache.org/jira/browse/HIVE-12012 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Jagruti Varia >Assignee: Jason Dere > Attachments: HIVE-12012.1.patch > > > select query on json table throws this error if table contains map type > column: > {noformat} > Failed with exception > java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: > org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not > numeric, can not use numeric value accessors > at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26] > {noformat} > steps to reproduce the issue: > {noformat} > hive> create table c_complex(a array,b map) row format > serde 'org.apache.hive.hcatalog.data.JsonSerDe'; > OK > Time taken: 0.319 seconds > hive> insert into table c_complex select array('aaa'),map('aaa',1) from > studenttab10k limit 2; > Query ID = hrt_qa_20150826183232_47deb33a-19c0-4d2b-a92f-726659eb9413 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1440603993714_0010) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED > KILLED > > Map 1 .. SUCCEEDED 1 100 0 > 0 > Reducer 2 .. SUCCEEDED 1 100 0 > 0 > > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 11.75 s > > > Loading data to table default.c_complex > Table default.c_complex stats: [numFiles=1, numRows=2, totalSize=56, > rawDataSize=0] > OK > Time taken: 13.706 seconds > hive> select * from c_complex; > OK > Failed with exception > java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: > org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not > numeric, can not use numeric value accessors > at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26] > Time taken: 0.115 seconds > hive> select count(*) from c_complex; > OK > 2 > Time taken: 0.205 seconds, Fetched: 1 row(s) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8519) Hive metastore lock wait timeout
[ https://issues.apache.org/jira/browse/HIVE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944297#comment-14944297 ] Sushanth Sowmyan commented on HIVE-8519: I notice a similar issue when I try to drop a table with about 5 partitions. Essentially, what seems to be happening with that flow is the following: a) Deleting a table requires deleting all partition objects for that table, Table->Partition is a 1:many mapping b) Deleting the partition objects requires deleting a all SD objects associated with the partitions, Partition->SD is a 1:1 mapping c) Deleting SD objects requires looking for all CDs pointed to by the SDs, and wherever a CD has no more SDs pointing to it, we need to drop the CD in question, SD->CD is a many:1 mapping. d) If a CD is to be deleted, we need to drop all List associated with it (COLUMNS_V2 where CD_ID in list of CDs to delete.) The big inefficiency here is that SD->CD is a many:1 mapping with a goal of reusing CDs for efficiency, but in practice, we don't. But the fact that it is many:1, not 1:1, means we need to do that additional check before dropping rather than simply dropping. This combination hits us in the worst way possible for both of those. We need to rethink the way we use our objects and either drop the many:1 intent or actually make sure that we create a unique CD for every SD, or this is not going to be scalable. Other solutions that bypass this wonky model may also exist that we have to work out. > Hive metastore lock wait timeout > > > Key: HIVE-8519 > URL: https://issues.apache.org/jira/browse/HIVE-8519 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.10.0 >Reporter: Liao, Xiaoge > > We got a lot of exception as below when doing a drop table partition, which > made hive query every every slow. For example, it will cost 250s while > executing use db_test; > Log: > 2014-10-17 04:04:46,873 ERROR Datastore.Persist (Log4JLogger.java:error(115)) > - Update of object > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor@13c9c4b3" using > statement "UPDATE `SDS` SET `CD_ID`=? WHERE `SD_ID`=?" failed : > java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1074) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4096) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4028) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2734) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155) > at > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2458) > at > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2375) > at > com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2359) > at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105) > at > org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105) > at > org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeUpdate(ParamLoggingPreparedStatement.java:399) > at > org.datanucleus.store.rdbms.SQLController.executeStatementUpdate(SQLController.java:439) > at > org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:374) > at > org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateTable(RDBMSPersistenceHandler.java:417) > at > org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObject(RDBMSPersistenceHandler.java:390) > at > org.datanucleus.state.JDOStateManager.flush(JDOStateManager.java:5012) > at org.datanucleus.FlushOrdered.execute(FlushOrdered.java:106) > at > org.datanucleus.ExecutionContextImpl.flushInternal(ExecutionContextImpl.java:4019) > at > org.datanucleus.ExecutionContextThreadedImpl.flushInternal(ExecutionContextThreadedImpl.java:450) > at org.datanucleus.store.query.Query.prepareDatastore(Query.java:1575) > at org.datanucleus.store.query.Query.executeQuery(Query.java:1760) > at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) > at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:243) > at > org.apache.hadoop.hive.metastore.ObjectStore.listStorageDescriptorsWithCD(ObjectStore.java:2185) > at > org.apache.hadoop.hive.metastore.ObjectStore.removeUnusedColumnDescriptor(ObjectStore.java:2131) > at >
[jira] [Commented] (HIVE-11676) implement metastore API to do file footer PPD
[ https://issues.apache.org/jira/browse/HIVE-11676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944199#comment-14944199 ] Sushanth Sowmyan commented on HIVE-11676: - +cc [~mithun] who was interested in this sort of api a while back. > implement metastore API to do file footer PPD > - > > Key: HIVE-11676 > URL: https://issues.apache.org/jira/browse/HIVE-11676 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11676.01.patch, HIVE-11676.patch > > > Need to pass on the expression/sarg, extract column stats from footer (at > write time?) and then apply one to the other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11852) numRows and rawDataSize table properties are not replicated
[ https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941642#comment-14941642 ] Sushanth Sowmyan commented on HIVE-11852: - [~ashutoshc], the problem with a config property here is that this stats squish I'm trying to prevent does not happen on the ql-side. This happens on the metastore, from the AlterTableHandler where an alter table gets issued from the client side. The metastore then decides that since the table has been altered, the table is now different, and thus, stats must be nuked. I feel like if the decision to nuke the stats were not made by the metastore, but by the ql-side, that is cleaner and would not result in this problem, but then if stats squishing and table altering were two different metastore calls, we run into issues where one succeeding and the other not would lead to incorrect data elsewhere, apart from other performance implications as well. > numRows and rawDataSize table properties are not replicated > --- > > Key: HIVE-11852 > URL: https://issues.apache.org/jira/browse/HIVE-11852 > Project: Hive > Issue Type: Bug > Components: Import/Export >Affects Versions: 1.2.1 >Reporter: Paul Isaychuk >Assignee: Sushanth Sowmyan > Attachments: HIVE-11852.patch > > > numRows and rawDataSize table properties are not replicated when exported for > replication and re-imported. > {code} > Table drdbnonreplicatabletable.vanillatable has different TblProps from > drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, > totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] > java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has > different TblProps from drdbnonreplicatabletable.vanillatable expected > [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found > [{numFiles=1, totalSize=560}] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12012) select query on json table with map containing numeric values fails
[ https://issues.apache.org/jira/browse/HIVE-12012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941874#comment-14941874 ] Sushanth Sowmyan commented on HIVE-12012: - Thanks for the report, Jason. Sure, I can look further into this. I had looked at HCATALOG-630 a long time back but I seem to remember that I could not reproduce that at the time. If we have a more recent reproduction, it definitely is worth investigating. Tests for JsonSerDe are mostly in TestJsonSerDe, instead of in .q files, since it descends from HCatalog - that seems to test mapand map as the basic cases, which work. I'll try to reproduce and dig further. > select query on json table with map containing numeric values fails > --- > > Key: HIVE-12012 > URL: https://issues.apache.org/jira/browse/HIVE-12012 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Jagruti Varia >Assignee: Jason Dere > Attachments: HIVE-12012.1.patch > > > select query on json table throws this error if table contains map type > column: > {noformat} > Failed with exception > java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: > org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not > numeric, can not use numeric value accessors > at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26] > {noformat} > steps to reproduce the issue: > {noformat} > hive> create table c_complex(a array,b map ) row format > serde 'org.apache.hive.hcatalog.data.JsonSerDe'; > OK > Time taken: 0.319 seconds > hive> insert into table c_complex select array('aaa'),map('aaa',1) from > studenttab10k limit 2; > Query ID = hrt_qa_20150826183232_47deb33a-19c0-4d2b-a92f-726659eb9413 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1440603993714_0010) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED > KILLED > > Map 1 .. SUCCEEDED 1 100 0 > 0 > Reducer 2 .. SUCCEEDED 1 100 0 > 0 > > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 11.75 s > > > Loading data to table default.c_complex > Table default.c_complex stats: [numFiles=1, numRows=2, totalSize=56, > rawDataSize=0] > OK > Time taken: 13.706 seconds > hive> select * from c_complex; > OK > Failed with exception > java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: > org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not > numeric, can not use numeric value accessors > at [Source: java.io.ByteArrayInputStream@295f79b; line: 1, column: 26] > Time taken: 0.115 seconds > hive> select count(*) from c_complex; > OK > 2 > Time taken: 0.205 seconds, Fetched: 1 row(s) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11898) support default partition in metastoredirectsql
[ https://issues.apache.org/jira/browse/HIVE-11898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938920#comment-14938920 ] Sushanth Sowmyan commented on HIVE-11898: - +1. (I've looked at the patch, and it makes sense as something that's not wrong, but have not verified the test results that Sergey say pass for him.) One thing though - in order for this jira to be more readable in the future when we come across this, could you please edit in a description for this issue? > support default partition in metastoredirectsql > --- > > Key: HIVE-11898 > URL: https://issues.apache.org/jira/browse/HIVE-11898 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11898.01.patch, HIVE-11898.02.patch, > HIVE-11898.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11852) numRows and rawDataSize table properties are not replicated
[ https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903322#comment-14903322 ] Sushanth Sowmyan commented on HIVE-11852: - [~alangates], can I bug you for a review? (Most of the patch file size is the .q and the .out, I promise this time it's not a huge patch dump. :D ) > numRows and rawDataSize table properties are not replicated > --- > > Key: HIVE-11852 > URL: https://issues.apache.org/jira/browse/HIVE-11852 > Project: Hive > Issue Type: Bug > Components: Import/Export >Affects Versions: 1.2.1 >Reporter: Paul Isaychuk >Assignee: Sushanth Sowmyan > Attachments: HIVE-11852.patch > > > numRows and rawDataSize table properties are not replicated when exported for > replication and re-imported. > {code} > Table drdbnonreplicatabletable.vanillatable has different TblProps from > drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, > totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] > java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has > different TblProps from drdbnonreplicatabletable.vanillatable expected > [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found > [{numFiles=1, totalSize=560}] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11852) numRows and rawDataSize table properties are not replicated
[ https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11852: Reporter: Paul Isaychuk (was: Sushanth Sowmyan) > numRows and rawDataSize table properties are not replicated > --- > > Key: HIVE-11852 > URL: https://issues.apache.org/jira/browse/HIVE-11852 > Project: Hive > Issue Type: Bug > Components: Import/Export >Affects Versions: 1.2.1 >Reporter: Paul Isaychuk >Assignee: Sushanth Sowmyan > > numRows and rawDataSize table properties are not replicated when exported for > replication and re-imported. > {code} > Table drdbnonreplicatabletable.vanillatable has different TblProps from > drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, > totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] > java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has > different TblProps from drdbnonreplicatabletable.vanillatable expected > [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found > [{numFiles=1, totalSize=560}] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11852) numRows and rawDataSize table properties are not replicated
[ https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791297#comment-14791297 ] Sushanth Sowmyan commented on HIVE-11852: - The issue here is that there is a MoveTask that's done as part of the import process which issues an alter_table which nukes the stats that were just created. On digging further, I discovered a couple of other cases that would result in the same stats squishing behaviour. Patch attached to fix them, and a .q file to test them. > numRows and rawDataSize table properties are not replicated > --- > > Key: HIVE-11852 > URL: https://issues.apache.org/jira/browse/HIVE-11852 > Project: Hive > Issue Type: Bug > Components: Import/Export >Affects Versions: 1.2.1 >Reporter: Paul Isaychuk >Assignee: Sushanth Sowmyan > > numRows and rawDataSize table properties are not replicated when exported for > replication and re-imported. > {code} > Table drdbnonreplicatabletable.vanillatable has different TblProps from > drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, > totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] > java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has > different TblProps from drdbnonreplicatabletable.vanillatable expected > [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found > [{numFiles=1, totalSize=560}] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11852) numRows and rawDataSize table properties are not replicated
[ https://issues.apache.org/jira/browse/HIVE-11852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11852: Attachment: HIVE-11852.patch > numRows and rawDataSize table properties are not replicated > --- > > Key: HIVE-11852 > URL: https://issues.apache.org/jira/browse/HIVE-11852 > Project: Hive > Issue Type: Bug > Components: Import/Export >Affects Versions: 1.2.1 >Reporter: Paul Isaychuk >Assignee: Sushanth Sowmyan > Attachments: HIVE-11852.patch > > > numRows and rawDataSize table properties are not replicated when exported for > replication and re-imported. > {code} > Table drdbnonreplicatabletable.vanillatable has different TblProps from > drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, > totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}] > java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has > different TblProps from drdbnonreplicatabletable.vanillatable expected > [{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found > [{numFiles=1, totalSize=560}] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11819) HiveServer2 catches OOMs on request threads
[ https://issues.apache.org/jira/browse/HIVE-11819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744414#comment-14744414 ] Sushanth Sowmyan commented on HIVE-11819: - Patch makes a lot of sense, and I was talking to Thejas about the possibility of a bug like this just last week. [~vgumashta], could you please review? I'm +1 on it in theory, but since I'm still fairly new to the HS2 side of things, do not consider myself binding on this review. > HiveServer2 catches OOMs on request threads > --- > > Key: HIVE-11819 > URL: https://issues.apache.org/jira/browse/HIVE-11819 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11819.patch > > > ThriftCLIService methods such as ExecuteStatement are apparently capable of > catching OOMs because they get wrapped in RTE by HiveSessionProxy. > This shouldn't happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739405#comment-14739405 ] Sushanth Sowmyan commented on HIVE-11510: - +1, committing to branch-1 and master. > Metatool updateLocation warning on views > > > Key: HIVE-11510 > URL: https://issues.apache.org/jira/browse/HIVE-11510 > Project: Hive > Issue Type: Bug > Components: Database/Schema >Affects Versions: 0.14.0 >Reporter: Eric Czech >Assignee: Wei Zheng > Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch, > HIVE-11510.3.patch > > > If views are present in a hive database, issuing a 'hive metatool > -updateLocation' command will result in an error like this: > ... > Warning: Found records with bad LOCATION in SDS table.. > bad location URI: null > bad location URI: null > bad location URI: null > > Based on the source code for Metatool, it looks like there would then be a > "bad location URI: null" message for every view and it also appears this is > happening simply because the 'sds' table in the hive schema has a column > called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11657) HIVE-2573 introduces some issues during metastore init (and CLI init)
[ https://issues.apache.org/jira/browse/HIVE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727709#comment-14727709 ] Sushanth Sowmyan commented on HIVE-11657: - +1. > HIVE-2573 introduces some issues during metastore init (and CLI init) > - > > Key: HIVE-11657 > URL: https://issues.apache.org/jira/browse/HIVE-11657 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: HIVE-11657.patch > > > HIVE-2573 introduced static reload functions call. > It has a few problems: > 1) When metastore client is initialized using an externally supplied config > (i.e. Hive.get(HiveConf)), it still gets called during static init using the > main service config. In my case, even though I have uris in the supplied > config to connect to remote MS (which eventually happens), the static call > creates objectstore, which is undesirable. > 2) It breaks compat - old metastores do not support this call so new clients > will fail, and there's no workaround like not using a new feature because the > static call is always made -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727705#comment-14727705 ] Sushanth Sowmyan commented on HIVE-11668: - Change looks good, and I've tested it out on mysql to make sure there are no surprises. +1. > make sure directsql calls pre-query init when needed > > > Key: HIVE-11668 > URL: https://issues.apache.org/jira/browse/HIVE-11668 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11668.01.patch, HIVE-11668.02.patch, > HIVE-11668.patch > > > See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726411#comment-14726411 ] Sushanth Sowmyan commented on HIVE-11123: - I think HIVE-11668 is close to committing (after I verify), so I think we can hold off on reverting this, since it is actually still useful. Otherwise, I'd agree that this was a revert-candidate. > Fix how to confirm the RDBMS product name at Metastore. > --- > > Key: HIVE-11123 > URL: https://issues.apache.org/jira/browse/HIVE-11123 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.0 > Environment: PostgreSQL >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, > HIVE-11123.3.patch, HIVE-11123.4.patch, HIVE-11123.4a.patch > > > I use PostgreSQL to Hive Metastore. And I saw the following message at > PostgreSQL log. > {code} > < 2015-06-26 10:58:15.488 JST >ERROR: syntax error at or near "@@" at > character 5 > < 2015-06-26 10:58:15.488 JST >STATEMENT: SET @@session.sql_mode=ANSI_QUOTES > < 2015-06-26 10:58:15.489 JST >ERROR: relation "v$instance" does not exist > at character 21 > < 2015-06-26 10:58:15.489 JST >STATEMENT: SELECT version FROM v$instance > < 2015-06-26 10:58:15.490 JST >ERROR: column "version" does not exist at > character 10 > < 2015-06-26 10:58:15.490 JST >STATEMENT: SELECT @@version > {code} > When Hive CLI and Beeline embedded mode are carried out, this message is > output to PostgreSQL log. > These queries are called from MetaStoreDirectSql#determineDbType. And if we > use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726161#comment-14726161 ] Sushanth Sowmyan commented on HIVE-11668: - I'll respond to this by the end of the day today - I wanted to test this out. > make sure directsql calls pre-query init when needed > > > Key: HIVE-11668 > URL: https://issues.apache.org/jira/browse/HIVE-11668 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11668.01.patch, HIVE-11668.02.patch, > HIVE-11668.patch > > > See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723808#comment-14723808 ] Sushanth Sowmyan commented on HIVE-11510: - Thank you Wei, this looks good. +1. > Metatool updateLocation warning on views > > > Key: HIVE-11510 > URL: https://issues.apache.org/jira/browse/HIVE-11510 > Project: Hive > Issue Type: Bug > Components: Database/Schema >Affects Versions: 0.14.0 >Reporter: Eric Czech >Assignee: Wei Zheng > Attachments: HIVE-11510.1.patch, HIVE-11510.2.patch > > > If views are present in a hive database, issuing a 'hive metatool > -updateLocation' command will result in an error like this: > ... > Warning: Found records with bad LOCATION in SDS table.. > bad location URI: null > bad location URI: null > bad location URI: null > > Based on the source code for Metatool, it looks like there would then be a > "bad location URI: null" message for every view and it also appears this is > happening simply because the 'sds' table in the hive schema has a column > called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11668) make sure directsql calls pre-query init when needed
[ https://issues.apache.org/jira/browse/HIVE-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720412#comment-14720412 ] Sushanth Sowmyan commented on HIVE-11668: - This patch will still have an issue, as observed by [~wzheng] earlier today: {noformat} Caused by: org.datanucleus.api.jdo.exceptions.TransactionNotActiveException: Transaction is not active. You either need to define a transaction around this, or run your PersistenceManagerFactory with 'NontransactionalRead' and 'NontransactionalWrite' set to 'true' FailedObject:org.datanucleus.exceptions.TransactionNotActiveException: Transaction is not active. You either need to define a transaction around this, or run your PersistenceManagerFactory with 'NontransactionalRead' and 'NontransactionalWrite' set to 'true' at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:396) at org.datanucleus.api.jdo.JDOTransaction.rollback(JDOTransaction.java:186) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:196) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.init(MetaStoreDirectSql.java:137) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:335) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:286) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:57) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:601) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:579) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:632) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:468) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:66) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5815) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:203) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74) ... 19 more {noformat} The issue here is this. Earlier, the runDbCheck() function was instantiating a transaction if it wasn't already open. So, as long as we were determining the db type by using runDbCheck, we were opening the txn as a side-effect (ugh). Now, by determining the product name by the jdbc provider, we're not calling runDbCheck, and thus, the txn is never opened. You need the following in your chain, hopefully in a more sane place than in runDbCheck(): {noformat} Transaction tx = pm.currentTransaction(); +if (!tx.isActive()) { + tx.begin(); +} {noformat} make sure directsql calls pre-query init when needed Key: HIVE-11668 URL: https://issues.apache.org/jira/browse/HIVE-11668 Project: Hive Issue Type: Bug Components: Metastore Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11668.patch See comments in HIVE-11123 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720402#comment-14720402 ] Sushanth Sowmyan commented on HIVE-11123: - Also, this patch broke hive working against mysql and potentially other dbs - I will follow up with comments on HIVE-11668. Testing with derby alone in unit test mode is problematic. Sorry I didn't catch this before it was committed. Fix how to confirm the RDBMS product name at Metastore. --- Key: HIVE-11123 URL: https://issues.apache.org/jira/browse/HIVE-11123 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0 Environment: PostgreSQL Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11123.1.patch, HIVE-11123.2.patch, HIVE-11123.3.patch, HIVE-11123.4.patch, HIVE-11123.4a.patch I use PostgreSQL to Hive Metastore. And I saw the following message at PostgreSQL log. {code} 2015-06-26 10:58:15.488 JST ERROR: syntax error at or near @@ at character 5 2015-06-26 10:58:15.488 JST STATEMENT: SET @@session.sql_mode=ANSI_QUOTES 2015-06-26 10:58:15.489 JST ERROR: relation v$instance does not exist at character 21 2015-06-26 10:58:15.489 JST STATEMENT: SELECT version FROM v$instance 2015-06-26 10:58:15.490 JST ERROR: column version does not exist at character 10 2015-06-26 10:58:15.490 JST STATEMENT: SELECT @@version {code} When Hive CLI and Beeline embedded mode are carried out, this message is output to PostgreSQL log. These queries are called from MetaStoreDirectSql#determineDbType. And if we use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11510) Metatool updateLocation warning on views
[ https://issues.apache.org/jira/browse/HIVE-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720434#comment-14720434 ] Sushanth Sowmyan commented on HIVE-11510: - With the current patch, the metastore will do a LOG.debug for every single null record, which can be a lot, and will also slow down that process a lot. Would it be possible to simply update the UpdateMStorageDescriptorTblURIRetVal class with a int numNullRecords initialized to zero and incremented each time you get a null? Also, in that case, I would imagine that we shouldn't add that location to badRecords, since that would bloat up the size of badRecords unnecessarily. After we do that, we can then do a singular log in HiveMetaTool.printTblURIUpdateSummary along with the other statistics, mentioning how many null records we found, and that that is okay if the user has that many indexes/views. Metatool updateLocation warning on views Key: HIVE-11510 URL: https://issues.apache.org/jira/browse/HIVE-11510 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.14.0 Reporter: Eric Czech Assignee: Wei Zheng Attachments: HIVE-11510.1.patch If views are present in a hive database, issuing a 'hive metatool -updateLocation' command will result in an error like this: ... Warning: Found records with bad LOCATION in SDS table.. bad location URI: null bad location URI: null bad location URI: null Based on the source code for Metatool, it looks like there would then be a bad location URI: null message for every view and it also appears this is happening simply because the 'sds' table in the hive schema has a column called location that is NULL only for views. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reopened HIVE-8678: (Actually, maybe not a problem is an incorrect status, since it would indicate that the report is accurate, but working as designed. Reopening to close it again.) Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Fix For: 1.2.2 Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-8678. Resolution: Cannot Reproduce Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Fix For: 1.2.2 Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717588#comment-14717588 ] Sushanth Sowmyan commented on HIVE-8678: Closed as Cannot reproduce Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Fix For: 1.2.2 Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-8678. Resolution: Not A Problem Fix Version/s: 1.2.2 Resolving as Not a problem as of branch-1.2, since this problem is not reproducible in the newer releases of hive. Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Fix For: 1.2.2 Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7000) Several issues with javadoc generation
[ https://issues.apache.org/jira/browse/HIVE-7000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-7000. Resolution: Not A Problem Several issues with javadoc generation -- Key: HIVE-7000 URL: https://issues.apache.org/jira/browse/HIVE-7000 Project: Hive Issue Type: Improvement Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7000.1.patch, javadoc_secondstab.patch 1. Ran 'mvn javadoc:javadoc -Phadoop-2'. Encountered several issues - Generated classes are included in the javadoc - generation fails in the top level hcatalog folder because its src folder contains no java files. Patch attached to fix these issues. 2. Tried mvn javadoc:aggregate -Phadoop-2 - cannot get an aggregated javadoc for all of hive - tried setting 'aggregate' parameter to true. Didn't work There are several questions in StackOverflow about multiple project javadoc. Seems like this is broken. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11599) Add metastore command to dump it's configs
[ https://issues.apache.org/jira/browse/HIVE-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709871#comment-14709871 ] Sushanth Sowmyan commented on HIVE-11599: - +1 to intent, this would be most useful. Add metastore command to dump it's configs -- Key: HIVE-11599 URL: https://issues.apache.org/jira/browse/HIVE-11599 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore Affects Versions: 1.0.0 Reporter: Eugene Koifman We should have equivalent of Hive CLI set command on Metastore (and likely HS2) which can dump out all properties this particular process is running with. cc [~thejas] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9583) Rolling upgrade of Hive MetaStore Server
[ https://issues.apache.org/jira/browse/HIVE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-9583. Resolution: Fixed Fix Version/s: 1.2.2 (Marking as fixed on the 1.2 line, since per Thiruvel, all the tasks inside this are done, and were done as of 1.2.0) Rolling upgrade of Hive MetaStore Server Key: HIVE-9583 URL: https://issues.apache.org/jira/browse/HIVE-9583 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.14.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Labels: hcatalog, metastore Fix For: 1.2.2 This is an umbrella JIRA to track all rolling upgrade JIRAs w.r.t MetaStore server. This will be helpful for users deploying Metastore server and connecting to it with HCatalog or Hive CLI interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11607) Export tables broken for data 32 MB
[ https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705798#comment-14705798 ] Sushanth Sowmyan commented on HIVE-11607: - Looks good to me, +1. (Agree with Swarnim's comment on RB as well, in that comments on the default options being set for DistCpOptions might be nice) +cc [~mithun] Export tables broken for data 32 MB - Key: HIVE-11607 URL: https://issues.apache.org/jira/browse/HIVE-11607 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 1.0.0, 1.2.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11607.patch Broken for both hadoop-1 as well as hadoop-2 line -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11607) Export tables broken for data 32 MB
[ https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703992#comment-14703992 ] Sushanth Sowmyan commented on HIVE-11607: - Also, Hadoop20Shims.runDistCp seems to refer to org.apache.hadoop.tools.distcp2 as a classname - since org.apache.hadoop.tools.distcp2.DistCp would be the appropriate class, I'm not sure it works for 1.0 either unless I'm reading this incorrectly. Export tables broken for data 32 MB - Key: HIVE-11607 URL: https://issues.apache.org/jira/browse/HIVE-11607 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 1.0.0, 1.2.0, 1.1.0 Reporter: Ashutosh Chauhan Broken for both hadoop-1 as well as hadoop-2 line -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11552) implement basic methods for getting/putting file metadata
[ https://issues.apache.org/jira/browse/HIVE-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697900#comment-14697900 ] Sushanth Sowmyan commented on HIVE-11552: - +cc [~thejas] implement basic methods for getting/putting file metadata - Key: HIVE-11552 URL: https://issues.apache.org/jira/browse/HIVE-11552 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: hbase-metastore-branch Attachments: HIVE-11552.nogen.patch, HIVE-11552.nogen.patch, HIVE-11552.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11456) HCatStorer should honor mapreduce.output.basename
[ https://issues.apache.org/jira/browse/HIVE-11456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658961#comment-14658961 ] Sushanth Sowmyan commented on HIVE-11456: - Thanks for the fix - I have an additional question to verify if this causes a problem. In the case of appends, where a previous file already exists, it's possible that HCat would add an additional suffix to the resultant file, as noted by the following: https://github.com/apache/hive/blob/master/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java#L650-L656 I want to make sure that this is not a surprise to you, and is okay? HCatStorer should honor mapreduce.output.basename - Key: HIVE-11456 URL: https://issues.apache.org/jira/browse/HIVE-11456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Priority: Critical Fix For: 1.3.0, 1.2.1, 2.0.0 Attachments: HIVE-11456.1.patch Pig on Tez scripts with union directly followed by HCatStorer have a problem due to HCatStorer not honoring mapreduce.output.basename and always using part. Tez sets mapreduce.output.basename to part-v000-o000 (vertex id followed by output id). With union optimizer, Pig uses vertex groups to write directly from both the vertices to the final output directory. Since hcat ignores the mapreduce.output.basename, both the vertices produce part-r-n and when they are moved from the temp location to the final directory, they just overwrite each other. There is no failure and only one of the files with that name makes it into the final directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652435#comment-14652435 ] Sushanth Sowmyan commented on HIVE-8678: On digging further, my issues in the 0.13.1 vm were a different issue from the one reported here, and was related to pig's jodatime being an older library than needed. It was solved by adding a joda-time-2.1.jar to PIG_CLASSPATH, and setting PIG_USER_CLASSPATH_FIRST so that it picked it up first. At this point, I am not able to reproduce this issue with 0.13.1 either. Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM
[ https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648281#comment-14648281 ] Sushanth Sowmyan commented on HIVE-11407: - The edits look good, +1. JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM -- Key: HIVE-11407 URL: https://issues.apache.org/jira/browse/HIVE-11407 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Sushanth Sowmyan Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch With around 7000 tables having around 1500 columns each, and 512MB of HS2 memory, I am able to reproduce this OOM . Most of the memory is consumed by the datanucleus objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646429#comment-14646429 ] Sushanth Sowmyan commented on HIVE-10165: - I think the examples are good and on-point as a guideline to new users - thank you for finding them. :) I don't think any further emphasis is needed. Also, thank you for the bit on the fix version setting clarification there. That's something that pops up often. Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: TODOC2.0, streaming_api Fix For: 2.0.0 Attachments: HIVE-10165.0.patch, HIVE-10165.10.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, HIVE-10165.9.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642976#comment-14642976 ] Sushanth Sowmyan commented on HIVE-8678: Actually, after finding a 0.13.1 VM, I'm able to reproduce this. In 1.2, however, I am not. So something changed along the way to fix this. I can dig further to figure out what the problem was that made it not work in 0.13.1. In addition, this problem exists with both orc and text formats in 0.13.1. Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639129#comment-14639129 ] Sushanth Sowmyan commented on HIVE-8678: No worries - sorry I didn't try experimenting on this earlier. :) Hopefully this means that this bug was squished in the meanwhile somewhere between 0.13.1 and 1.2 and does not exist any longer. Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it
[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11344: Attachment: HIVE-11344.patch Patch implementing (a) attached. HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it Key: HIVE-11344 URL: https://issues.apache.org/jira/browse/HIVE-11344 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-11344.patch HIVE-9845 introduced a notion of compression for HCatSplits so that when serializing, it finds commonalities between PartInfo and TableInfo objects, and if the two are identical, it nulls out that field in PartInfo, thus making sure that when PartInfo is then serialized, info is not repeated. This, however, has the side effect of making the PartInfo object unusable if HCatSplit.write has been called. While this does not affect M/R directly, since they do not know about the PartInfo objects and once serialized, the HCatSplit object is recreated by deserializing on the backend, which does restore the split and its PartInfo objects, this does, however, affect framework users of HCat that try to mimic M/R and then use the PartInfo objects to instantiate distinct readers. Thus, we need to make it so that PartInfo is still usable after HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it
[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637699#comment-14637699 ] Sushanth Sowmyan commented on HIVE-11344: - [~mithun], could you please review? HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it Key: HIVE-11344 URL: https://issues.apache.org/jira/browse/HIVE-11344 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-11344.patch HIVE-9845 introduced a notion of compression for HCatSplits so that when serializing, it finds commonalities between PartInfo and TableInfo objects, and if the two are identical, it nulls out that field in PartInfo, thus making sure that when PartInfo is then serialized, info is not repeated. This, however, has the side effect of making the PartInfo object unusable if HCatSplit.write has been called. While this does not affect M/R directly, since they do not know about the PartInfo objects and once serialized, the HCatSplit object is recreated by deserializing on the backend, which does restore the split and its PartInfo objects, this does, however, affect framework users of HCat that try to mimic M/R and then use the PartInfo objects to instantiate distinct readers. Thus, we need to make it so that PartInfo is still usable after HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it
[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11344: Summary: HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it (was: HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it) HIVE-9845 makes HCatSplit.write modify the split so that PartInfo objects are unusable after it --- Key: HIVE-11344 URL: https://issues.apache.org/jira/browse/HIVE-11344 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-11344.patch HIVE-9845 introduced a notion of compression for HCatSplits so that when serializing, it finds commonalities between PartInfo and TableInfo objects, and if the two are identical, it nulls out that field in PartInfo, thus making sure that when PartInfo is then serialized, info is not repeated. This, however, has the side effect of making the PartInfo object unusable if HCatSplit.write has been called. While this does not affect M/R directly, since they do not know about the PartInfo objects and once serialized, the HCatSplit object is recreated by deserializing on the backend, which does restore the split and its PartInfo objects, this does, however, affect framework users of HCat that try to mimic M/R and then use the PartInfo objects to instantiate distinct readers. Thus, we need to make it so that PartInfo is still usable after HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it
[ https://issues.apache.org/jira/browse/HIVE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637678#comment-14637678 ] Sushanth Sowmyan commented on HIVE-11344: - There are three routes I see available here: a) There is decompress logic in PartInfo.setTableInfo, and compress logic in PartInfo.writeObject. we could make it so that PartInfo.writeObject does the compression, writes itself, and then does the decompression back. b) We could decompress on demand - wherein if a user calls getInputFormatClassName(), we then fetch that info if it's not available, and always return values consistently. c) We could add a new conf parameter that controls whether or not we do compression - users with 100k splits would prefer compression, and be okay with the fact that PartInfo objects are not usable, and users that want to use the PartInfo objects will be okay with the fact that they are going to hog a little bit more serialized space. (c) is a bad solution all-round. [~ashutoshc] would be mad at me for adding another conf parameter, and it is entirely possible that those that are trying to implement other streaming interfaces/etc and are mimicing M/R will run into a large number of partitions as well. (b) is nifty, and I probably like the idea of, but I'm not entirely certain if it will run afoul of other serialization methods in the future that call getters to get fields (some json serializers) which might result in a bloated serialized PartInfo object anyway. Also, it spreads the decompression logic across multiple getters, and pushes the assert statement in multiple places as well. (a) is probably the cleanest solution, although it makes a code reader wonder why we're going through the gymnastics we are. Some code comments might help with that. HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it Key: HIVE-11344 URL: https://issues.apache.org/jira/browse/HIVE-11344 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan HIVE-9845 introduced a notion of compression for HCatSplits so that when serializing, it finds commonalities between PartInfo and TableInfo objects, and if the two are identical, it nulls out that field in PartInfo, thus making sure that when PartInfo is then serialized, info is not repeated. This, however, has the side effect of making the PartInfo object unusable if HCatSplit.write has been called. While this does not affect M/R directly, since they do not know about the PartInfo objects and once serialized, the HCatSplit object is recreated by deserializing on the backend, which does restore the split and its PartInfo objects, this does, however, affect framework users of HCat that try to mimic M/R and then use the PartInfo objects to instantiate distinct readers. Thus, we need to make it so that PartInfo is still usable after HCatSplit.write is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637886#comment-14637886 ] Sushanth Sowmyan commented on HIVE-8678: I'm currently unable to reproduce this issue on hive-1.2 and pig-0.14.0, where I get the following: In hive: {noformat} hive create table tdate(a string, b date) stored as orc; OK Time taken: 0.151 seconds hive create table tsource(a string, b string) stored as orc; OK Time taken: 0.057 seconds hive insert into table tsource values (abc, 2015-02-28); ... OK Time taken: 19.875 seconds hive select * from tsource; OK abc 2015-02-28 Time taken: 0.143 seconds, Fetched: 1 row(s) hive select a, cast(b as date) from tsource; OK abc 2015-02-28 Time taken: 0.092 seconds, Fetched: 1 row(s) hive insert into table tdate select a, cast(b as date) from tsource; ... OK Time taken: 20.672 seconds hive select * from tdate; OK abc 2015-02-28 Time taken: 0.051 seconds, Fetched: 1 row(s) hive describe tdate; OK a string b date Time taken: 0.293 seconds, Fetched: 2 row(s) {noformat} In pig: {noformat} grunt A = load 'tdate' using org.apache.hive.hcatalog.pig.HCatLoader(); grunt describe A; 2015-07-22 15:42:26,367 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS A: {a: chararray,b: datetime} grunt dump A; ... (abc,2015-02-28T00:00:00.000-08:00) grunt {noformat} Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637888#comment-14637888 ] Sushanth Sowmyan commented on HIVE-8678: Also, unit tests exist since the introduction of DATE capability that have tested date interop between hive and pig through HCatalog, and that still succeeds for me when I try running them on hive 0.13.1. Could you please show me what hive commands and pig commands you're running to recreate this issue? Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635548#comment-14635548 ] Sushanth Sowmyan commented on HIVE-8678: What storage format are you using for the table in question? (i.e. is it Text, RCFile, ORC, something else?) Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by
[ https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636010#comment-14636010 ] Sushanth Sowmyan commented on HIVE-11172: - Incorrect results makes it a good candidate for a backport to branch-1.2. Pedantic note : 1.2.1 has already shipped. This would go in 1.2.2, please set fix version appropriately after committing. Vectorization wrong results for aggregate query with where clause without group by -- Key: HIVE-11172 URL: https://issues.apache.org/jira/browse/HIVE-11172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Reporter: Yi Zhang Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Fix For: 2.0.0 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, HIVE-11172.3.patch create table testvec(id int, dt int, greg_dt string) stored as orc; insert into table testvec values (1,20150330, '2015-03-30'), (2,20150301, '2015-03-01'), (3,20150502, '2015-05-02'), (4,20150401, '2015-04-01'), (5,20150313, '2015-03-13'), (6,20150314, '2015-03-14'), (7,20150404, '2015-04-04'); hive select dt, greg_dt from testvec where id=5; OK 20150313 2015-03-13 Time taken: 4.435 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive set hive.map.aggr; hive.map.aggr=true hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-30 hive set hive.vectorized.execution.enabled=false; hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11172) Vectorization wrong results for aggregate query with where clause without group by
[ https://issues.apache.org/jira/browse/HIVE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11172: Fix Version/s: 1.3.0 Vectorization wrong results for aggregate query with where clause without group by -- Key: HIVE-11172 URL: https://issues.apache.org/jira/browse/HIVE-11172 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Reporter: Yi Zhang Assignee: Hari Sankar Sivarama Subramaniyan Priority: Critical Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11172.1.patch, HIVE-11172.2.patch, HIVE-11172.3.patch create table testvec(id int, dt int, greg_dt string) stored as orc; insert into table testvec values (1,20150330, '2015-03-30'), (2,20150301, '2015-03-01'), (3,20150502, '2015-05-02'), (4,20150401, '2015-04-01'), (5,20150313, '2015-03-13'), (6,20150314, '2015-03-14'), (7,20150404, '2015-04-04'); hive select dt, greg_dt from testvec where id=5; OK 20150313 2015-03-13 Time taken: 4.435 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive set hive.map.aggr; hive.map.aggr=true hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-30 hive set hive.vectorized.execution.enabled=false; hive select max(dt), max(greg_dt) from testvec where id=5; OK 20150313 2015-03-13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reassigned HIVE-8678: -- Assignee: Sushanth Sowmyan Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8678) Pig fails to correctly load DATE fields using HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631906#comment-14631906 ] Sushanth Sowmyan commented on HIVE-8678: Something seems weird here - looking at the code, it looks like the current code, where it simply casts to Date should be the right way to do this, since it should have called .getPrimitiveJavaObject() on the PrimitiveObjectInspector to get this object, and DateObjectInspector.getPrimitiveJavaObject() should have returned a Date. However, clearly, from your stack trace, you're getting a string. I'll dig into this and update as I find more. Pig fails to correctly load DATE fields using HCatalog -- Key: HIVE-8678 URL: https://issues.apache.org/jira/browse/HIVE-8678 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Michael McLellan Assignee: Sushanth Sowmyan Using: Hadoop 2.5.0-cdh5.2.0 Pig 0.12.0-cdh5.2.0 Hive 0.13.1-cdh5.2.0 When using pig -useHCatalog to load a Hive table that has a DATE field, when trying to DUMP the field, the following error occurs: {code} 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error converting read value to tuple at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.sql.Date at org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:457) at org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:375) at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64) 2014-10-30 22:58:05,469 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 6018: Error converting read value to tuple {code} It seems to be occuring here: https://github.com/apache/hive/blob/trunk/hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java#L433 and that it should be: {code}Date d = Date.valueOf(o);{code} instead of {code}Date d = (Date) o;{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11198) Fix load data query file format check for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619528#comment-14619528 ] Sushanth Sowmyan commented on HIVE-11198: - Very useful, Prashant! I do believe this should fix the other issue I observed with repl. Thanks! +1 Fix load data query file format check for partitioned tables Key: HIVE-11198 URL: https://issues.apache.org/jira/browse/HIVE-11198 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11198.patch HIVE-8 added file format check for ORC format. The check will throw exception when non ORC formats is loaded to ORC managed table. But it does not work for partitioned table. Partitioned tables are allowed to have some partitions with different file format. See this discussion for more details https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14617271page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14617271 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617271#comment-14617271 ] Sushanth Sowmyan commented on HIVE-8: - I have a question here - I will open another bug if need be, but if it's a simple misunderstanding, it won't matter. From the patch, I see the following bit: {code} 337 private void ensureFileFormatsMatch(TableSpec ts, URI fromURI) throws SemanticException { 338 Class? extends InputFormat destInputFormat = ts.tableHandle.getInputFormatClass(); 339 // Other file formats should do similar check to make sure file formats match 340 // when doing LOAD DATA .. INTO TABLE 341 if (OrcInputFormat.class.equals(destInputFormat)) { 342 Path inputFilePath = new Path(fromURI); 343 try { 344 FileSystem fs = FileSystem.get(fromURI, conf); 345 // just creating orc reader is going to do sanity checks to make sure its valid ORC file 346 OrcFile.createReader(fs, inputFilePath); 347 } catch (FileFormatException e) { 348 throw new SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg(Destination + 349 table is stored as ORC but the file being loaded is not a valid ORC file.)); 350 } catch (IOException e) { 351 throw new SemanticException(Unable to load data to destination table. + 352 Error: + e.getMessage()); 353 } 354 } 355 } {code} Now, it's entirely possible that the table in question is an ORC table, but the partition being loaded is of another format, such as Text - Hive supports mixed partition scenarios. In fact, this is a likely scenario in the case of a replication of a table that used to be Text, but has been converted to Orc, so that all new partitions will be orc. Then, in that case, the destination table will be a MANAGED_TABLE, and will be an orc table, but import will try to load a text partition on to it. Shouldn't this refer to a partitionspec rather than the table's inputformat for this check to work with that scenario? Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617264#comment-14617264 ] Sushanth Sowmyan commented on HIVE-8: - Thanks, [~leftylev]! Added. Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-8: Fix Version/s: 2.0.0 1.3.0 Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.3.0, 2.0.0 Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11104: Fix Version/s: 1.3.0 Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605959#comment-14605959 ] Sushanth Sowmyan commented on HIVE-10983: - Not a problem! As part of the release process, I'm required to go unset all jiras marked for older released releases, and that's what I was doing. :) To expand further, the idea is that Fix Version is set to track which branches the commits got committed to, and thus, should not be set unless this patch has already been committed to those branches. So, now, for example, if this commit is committed to branch-1.2 to track 1.2.x, its fix version would be 1.2.2 once it is committed. Setting it to 1.2.0 would mean that this was included as part of the 1.2.0 release, which it wasn't. So, for this, when a committer commits a patch for this bug, if they commit it to branch-1.2, they should then set the fix version to 1.2.2. SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 2.0.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11066) Ensure tests don't share directories on FS
[ https://issues.apache.org/jira/browse/HIVE-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11066: Fix Version/s: (was: 1.2.1) 1.2.2 Ensure tests don't share directories on FS -- Key: HIVE-11066 URL: https://issues.apache.org/jira/browse/HIVE-11066 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 1.2.2 Attachments: HIVE-11066.patch Tests often fail with errors like Could not fully delete D:\w\hv\hcatalog\hcatalog-pig-adapter\target\tmp\dfs\name1 on Windows platforms. Attached is a prototype on avoiding these false negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec
[ https://issues.apache.org/jira/browse/HIVE-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11059: Fix Version/s: (was: 1.2.1) 1.2.2 hcatalog-server-extensions tests scope should depend on hive-exec - Key: HIVE-11059 URL: https://issues.apache.org/jira/browse/HIVE-11059 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Minor Fix For: 1.2.2 Attachments: HIVE-11059.patch (causes test failures in Windows due to the lack of WindowsPathUtil being available otherwise) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11060) Make test windowing.q robust
[ https://issues.apache.org/jira/browse/HIVE-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11060: Fix Version/s: 1.2.2 Make test windowing.q robust Key: HIVE-11060 URL: https://issues.apache.org/jira/browse/HIVE-11060 Project: Hive Issue Type: Bug Components: Tests Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11060.01.patch, HIVE-11060.patch Add partition / order by in over clause to make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11083) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11083: Fix Version/s: 1.2.2 Make test cbo_windowing robust -- Key: HIVE-11083 URL: https://issues.apache.org/jira/browse/HIVE-11083 Project: Hive Issue Type: Test Components: Tests Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11083.patch Make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11076: Fix Version/s: 1.2.2 Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11048) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11048: Fix Version/s: 1.2.2 Make test cbo_windowing robust -- Key: HIVE-11048 URL: https://issues.apache.org/jira/browse/HIVE-11048 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 1.2.2 Attachments: HIVE-11048.patch Add partition / order by in over clause to make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11050) testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries
[ https://issues.apache.org/jira/browse/HIVE-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11050: Fix Version/s: (was: 1.2.1) 1.2.2 testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries -- Key: HIVE-11050 URL: https://issues.apache.org/jira/browse/HIVE-11050 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Blocker Fix For: 1.2.2 Attachments: HIVE-11050.01.branch-1.patch, HIVE-11050.01.patch In some environments the Q file tests vector_outer_join\{1-4\}.q fail because the data creation queries produce different input files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11095: Fix Version/s: (was: 1.2.0) SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail
[ https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11010: Fix Version/s: (was: 1.2.1) Accumulo storage handler queries via HS2 fail - Key: HIVE-11010 URL: https://issues.apache.org/jira/browse/HIVE-11010 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 1.2.1 Environment: Secure Reporter: Takahiko Saito Assignee: Josh Elser On Kerberized cluster, accumulo storage handler throws an error, [usrname]@[principlaname] is not allowed to impersonate [username] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605018#comment-14605018 ] Sushanth Sowmyan edited comment on HIVE-4577 at 6/29/15 1:15 AM: - Removing fix version of 1.2.1 since this is not part of the already-released 1.2.1 release. Please set appropriate commit version when this fix is committed. was (Author: sushanth): Removing fix version of 1.2.1 since this is not part of the already-released 1.2.` release. Please set appropriate commit version when this fix is committed. hive CLI can't handle hadoop dfs command with space and quotes. Key: HIVE-4577 URL: https://issues.apache.org/jira/browse/HIVE-4577 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 Reporter: Bing Li Assignee: Bing Li Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, HIVE-4577.3.patch.txt, HIVE-4577.4.patch As design, hive could support hadoop dfs command in hive shell, like hive dfs -mkdir /user/biadmin/mydir; but has different behavior with hadoop if the path contains space and quotes hive dfs -mkdir hello; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 /user/biadmin/hello hive dfs -mkdir 'world'; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 /user/biadmin/'world' hive dfs -mkdir bei jing; drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/bei drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 /user/biadmin/jing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11010) Accumulo storage handler queries via HS2 fail
[ https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605019#comment-14605019 ] Sushanth Sowmyan commented on HIVE-11010: - Removing fix version of 1.2.1 since this is not part of the already-released 1.2.1 release. Please set appropriate commit version when this fix is committed. Accumulo storage handler queries via HS2 fail - Key: HIVE-11010 URL: https://issues.apache.org/jira/browse/HIVE-11010 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0, 1.2.1 Environment: Secure Reporter: Takahiko Saito Assignee: Josh Elser On Kerberized cluster, accumulo storage handler throws an error, [usrname]@[principlaname] is not allowed to impersonate [username] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases
[ https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605017#comment-14605017 ] Sushanth Sowmyan edited comment on HIVE-10792 at 6/29/15 1:15 AM: -- Removing fix version of 1.2.1 since this is not part of the already-released 1.2.1 release. Please set appropriate commit version when this fix is committed. was (Author: sushanth): Removing fix version of 1.2.1 since this is not part of the already-released 1.2.` release. Please set appropriate commit version when this fix is committed. PPD leads to wrong answer when mapper scans the same table with multiple aliases Key: HIVE-10792 URL: https://issues.apache.org/jira/browse/HIVE-10792 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Dayue Gao Assignee: Dayue Gao Priority: Critical Attachments: HIVE-10792.1.patch, HIVE-10792.2.patch, HIVE-10792.test.sql Here's the steps to reproduce the bug. First of all, prepare a simple ORC table with one row {code} create table test_orc (c0 int, c1 int) stored as ORC; {code} Table: test_orc ||c0||c1|| |0|1| The following SQL gets empty result which is not expected {code} select * from test_orc t1 union all select * from test_orc t2 where t2.c0 = 1 {code} Self join is also broken {code} set hive.auto.convert.join=false; -- force common join select * from test_orc t1 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0); {code} It gets empty result while the expected answer is ||t1.c0||t1.c1||t2.c0||t2.c1|| |0|1|NULL|NULL| In these cases, we pushdown predicates into OrcInputFormat. As a result, TableScanOperator for t1 can't receive its rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)