[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=560230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560230 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Mar/21 00:48 Start Date: 03/Mar/21 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1635: URL: https://github.com/apache/hive/pull/1635 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 560230) Time Spent: 8h 40m (was: 8.5h) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24502) Store table level regular expression used during dump for table level replication
[ https://issues.apache.org/jira/browse/HIVE-24502?focusedWorklogId=560229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560229 ] ASF GitHub Bot logged work on HIVE-24502: - Author: ASF GitHub Bot Created on: 03/Mar/21 00:48 Start Date: 03/Mar/21 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1759: URL: https://github.com/apache/hive/pull/1759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 560229) Time Spent: 40m (was: 0.5h) > Store table level regular expression used during dump for table level > replication > - > > Key: HIVE-24502 > URL: https://issues.apache.org/jira/browse/HIVE-24502 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24502.01.patch, HIVE-24502.02.patch, > HIVE-24502.03.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Store include table list and exclude table list as part of dump meta data file -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?focusedWorklogId=560228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560228 ] ASF GitHub Bot logged work on HIVE-15444: - Author: ASF GitHub Bot Created on: 03/Mar/21 00:48 Start Date: 03/Mar/21 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1815: URL: https://github.com/apache/hive/pull/1815 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 560228) Time Spent: 0.5h (was: 20m) > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch, HIVE-15444.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > {code} > {code} > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > {code} > {code} > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24837) Upgrade httpclient to 4.5.13+
[ https://issues.apache.org/jira/browse/HIVE-24837?focusedWorklogId=560218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560218 ] ASF GitHub Bot logged work on HIVE-24837: - Author: ASF GitHub Bot Created on: 03/Mar/21 00:09 Start Date: 03/Mar/21 00:09 Worklog Time Spent: 10m Work Description: hsnusonic closed pull request #2032: URL: https://github.com/apache/hive/pull/2032 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 560218) Time Spent: 20m (was: 10m) > Upgrade httpclient to 4.5.13+ > - > > Key: HIVE-24837 > URL: https://issues.apache.org/jira/browse/HIVE-24837 > Project: Hive > Issue Type: Improvement >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > > Hive is using httpclients 4.5.6. We will need to upgrade httpclient and > httpcore. > {quote}CVSSv2: > Base Score: MEDIUM (5.0) > Vector: /AV:N/AC:L/Au:N/C:N/I:P/A:N > CVSSv3: > Base Score: MEDIUM (5.3) > Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N > CVE-2020-13956: Apache HttpClient incorrect handling of malformed > authority component in request URIs > Severity: Medium > Vendor: > The Apache Software Foundation > Versions Affected: > Apache HttpClient 4.5.12 and prior > Apache HttpClient 5.0.2 and prior > Description: > Apache HttpClient versions prior to version 4.5.13 and 5.0.3 can > misinterpret malformed authority component in request URIs passed to > the library as java.net.URI object and pick the wrong target host for > request execution. > Mitigation: > As of release 4.5.13 and 5.0.3 HttpClient will reject URIs with > ambiguous malformed authority component as invalid. Users of HttpClient > are advised to upgrade to version 4.5.13 or 5.0.3 and sanitize request > URIs when using java.net.URI as input. > Credit: > This issue was discovered and reported by Priyank Nigam > {quote} > Reference: > * [https://www.openwall.com/lists/oss-security/2020/10/08/4] > * [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13956] > * [https://nvd.nist.gov/vuln/detail/CVE-2020-13956] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24783) Store currentNotificationID on target during repl load operation
[ https://issues.apache.org/jira/browse/HIVE-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294033#comment-17294033 ] Haymant Mangla commented on HIVE-24783: --- My Pleasure. > Store currentNotificationID on target during repl load operation > > > Key: HIVE-24783 > URL: https://issues.apache.org/jira/browse/HIVE-24783 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24783) Store currentNotificationID on target during repl load operation
[ https://issues.apache.org/jira/browse/HIVE-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha resolved HIVE-24783. - Resolution: Fixed Committed to master. Thank you for the patch, [~haymant] > Store currentNotificationID on target during repl load operation > > > Key: HIVE-24783 > URL: https://issues.apache.org/jira/browse/HIVE-24783 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24783) Store currentNotificationID on target during repl load operation
[ https://issues.apache.org/jira/browse/HIVE-24783?focusedWorklogId=560163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560163 ] ASF GitHub Bot logged work on HIVE-24783: - Author: ASF GitHub Bot Created on: 02/Mar/21 22:15 Start Date: 02/Mar/21 22:15 Worklog Time Spent: 10m Work Description: pkumarsinha merged pull request #2005: URL: https://github.com/apache/hive/pull/2005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 560163) Time Spent: 1h 20m (was: 1h 10m) > Store currentNotificationID on target during repl load operation > > > Key: HIVE-24783 > URL: https://issues.apache.org/jira/browse/HIVE-24783 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24783) Store currentNotificationID on target during repl load operation
[ https://issues.apache.org/jira/browse/HIVE-24783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293311#comment-17293311 ] Pravin Sinha edited comment on HIVE-24783 at 3/2/21, 10:05 PM: --- +1 was (Author: pkumarsinha): +1 Pending test > Store currentNotificationID on target during repl load operation > > > Key: HIVE-24783 > URL: https://issues.apache.org/jira/browse/HIVE-24783 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24837) Upgrade httpclient to 4.5.13+
[ https://issues.apache.org/jira/browse/HIVE-24837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-24837. -- Fix Version/s: 4.0.0 Resolution: Fixed Fix has been committed to master. Closing the jira. Thanks for the contribute [~hsnusonic] > Upgrade httpclient to 4.5.13+ > - > > Key: HIVE-24837 > URL: https://issues.apache.org/jira/browse/HIVE-24837 > Project: Hive > Issue Type: Improvement >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > > Hive is using httpclients 4.5.6. We will need to upgrade httpclient and > httpcore. > {quote}CVSSv2: > Base Score: MEDIUM (5.0) > Vector: /AV:N/AC:L/Au:N/C:N/I:P/A:N > CVSSv3: > Base Score: MEDIUM (5.3) > Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N > CVE-2020-13956: Apache HttpClient incorrect handling of malformed > authority component in request URIs > Severity: Medium > Vendor: > The Apache Software Foundation > Versions Affected: > Apache HttpClient 4.5.12 and prior > Apache HttpClient 5.0.2 and prior > Description: > Apache HttpClient versions prior to version 4.5.13 and 5.0.3 can > misinterpret malformed authority component in request URIs passed to > the library as java.net.URI object and pick the wrong target host for > request execution. > Mitigation: > As of release 4.5.13 and 5.0.3 HttpClient will reject URIs with > ambiguous malformed authority component as invalid. Users of HttpClient > are advised to upgrade to version 4.5.13 or 5.0.3 and sanitize request > URIs when using java.net.URI as input. > Credit: > This issue was discovered and reported by Priyank Nigam > {quote} > Reference: > * [https://www.openwall.com/lists/oss-security/2020/10/08/4] > * [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13956] > * [https://nvd.nist.gov/vuln/detail/CVE-2020-13956] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24837) Upgrade httpclient to 4.5.13+
[ https://issues.apache.org/jira/browse/HIVE-24837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-24837: - Summary: Upgrade httpclient to 4.5.13+ (was: Upgrade httpclient to 4.5.13+ due to CVE-2020-13956) > Upgrade httpclient to 4.5.13+ > - > > Key: HIVE-24837 > URL: https://issues.apache.org/jira/browse/HIVE-24837 > Project: Hive > Issue Type: Improvement >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > > Hive is using httpclients 4.5.6. We will need to upgrade httpclient and > httpcore. > {quote}CVSSv2: > Base Score: MEDIUM (5.0) > Vector: /AV:N/AC:L/Au:N/C:N/I:P/A:N > CVSSv3: > Base Score: MEDIUM (5.3) > Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N > CVE-2020-13956: Apache HttpClient incorrect handling of malformed > authority component in request URIs > Severity: Medium > Vendor: > The Apache Software Foundation > Versions Affected: > Apache HttpClient 4.5.12 and prior > Apache HttpClient 5.0.2 and prior > Description: > Apache HttpClient versions prior to version 4.5.13 and 5.0.3 can > misinterpret malformed authority component in request URIs passed to > the library as java.net.URI object and pick the wrong target host for > request execution. > Mitigation: > As of release 4.5.13 and 5.0.3 HttpClient will reject URIs with > ambiguous malformed authority component as invalid. Users of HttpClient > are advised to upgrade to version 4.5.13 or 5.0.3 and sanitize request > URIs when using java.net.URI as input. > Credit: > This issue was discovered and reported by Priyank Nigam > {quote} > Reference: > * [https://www.openwall.com/lists/oss-security/2020/10/08/4] > * [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13956] > * [https://nvd.nist.gov/vuln/detail/CVE-2020-13956] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293868#comment-17293868 ] Harshit Gupta commented on HIVE-24596: -- Yeah Sure!! Let's assume the following [^table_definitions] and the following [^query]. The explain ddl output for the query will look like [^explain_ddl_output] > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Attachments: explain_ddl_output, query, table_definitions > > Time Spent: 20m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshit Gupta updated HIVE-24596: - Attachment: table_definitions query explain_ddl_output > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Attachments: explain_ddl_output, query, table_definitions > > Time Spent: 20m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema
[ https://issues.apache.org/jira/browse/HIVE-24841?focusedWorklogId=560025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-560025 ] ASF GitHub Bot logged work on HIVE-24841: - Author: ASF GitHub Bot Created on: 02/Mar/21 16:33 Start Date: 02/Mar/21 16:33 Worklog Time Spent: 10m Work Description: kasakrisz commented on a change in pull request #2035: URL: https://github.com/apache/hive/pull/2035#discussion_r585719471 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ParallelEdgeFixer.java ## @@ -256,9 +257,20 @@ private static String extractColumnName(ExprNodeDesc expr) throws SemanticExcept public static Optional> colMappingInverseKeys(ReduceSinkOperator rs) { Map ret = new HashMap(); Map exprMap = rs.getColumnExprMap(); +Set neededColumns = new HashSet(); try { for (Entry e : exprMap.entrySet()) { -ret.put(extractColumnName(e.getValue()), e.getKey()); +String columnName = extractColumnName(e.getValue()); +if (rs.getSchema().getColumnInfo(e.getKey()) == null) { + // ignore incorrectly mapped columns (if there's any) - but require its input to be present + neededColumns.add(columnName); +} else { + ret.put(columnName, e.getKey()); +} + } + neededColumns.removeAll(ret.keySet()); + if (!neededColumns.isEmpty()) { +throw new SemanticException("There is no way to compute: " + neededColumns); Review comment: It would be useful to log the exception in the catch clause at least debug level. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 560025) Time Spent: 20m (was: 10m) > Parallel edge fixer may run into NPE when RS is missing a duplicate column > from the output schema > - > > Key: HIVE-24841 > URL: https://issues.apache.org/jira/browse/HIVE-24841 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > This may mean that the RS has an incorrect schema - but that will be > investigated separately -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa İman resolved HIVE-24346. - Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thank you [~amagyar] > Store HPL/SQL packages into HMS > --- > > Key: HIVE-24346 > URL: https://issues.apache.org/jira/browse/HIVE-24346 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24685) Remove HiveSubQRemoveRelBuilder
[ https://issues.apache.org/jira/browse/HIVE-24685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-24685: --- Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Remove HiveSubQRemoveRelBuilder > --- > > Key: HIVE-24685 > URL: https://issues.apache.org/jira/browse/HIVE-24685 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Fix For: 4.0.0 > > > The class seems to be a close clone of {{RelBuilder}} created due to some > bugs existing in original implementation. Those issues seem to be fixed now > and we should be able to get rid of the copy. In the worst case scenario, if > we need to keep it for the time being, we could try to make it extend > {{RelBuilder}} and override only necessary methods. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24823) Fix ide error in BasePartitionEvaluator
[ https://issues.apache.org/jira/browse/HIVE-24823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24823. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Rajesh for reviewing the changes! > Fix ide error in BasePartitionEvaluator > --- > > Key: HIVE-24823 > URL: https://issues.apache.org/jira/browse/HIVE-24823 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293773#comment-17293773 ] Zoltan Haindrich commented on HIVE-24596: - Yes, reducing the round-trips in problematic cases would be very usefull! I'm still a bit confused; based on the description I've saw so far I (somehow) was expecting sql statements as output...and I kinda still feel like that should be the case. note that in the PR 'explain ddl' outputs seem more like standard explains...so I feel like I'm missing basic with the concept. Maybe it will be easier to understand your idea thru an example: {code} $ create table t (a integer); $ create table t2 (a integer); $ explain ddl select 1 from t join t2 where t.a=t2.a; {code} for the above I would expect to see a "create table t (a integer)" in the output...could you give a theoretical transcript? > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats
[ https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559978 ] ASF GitHub Bot logged work on HIVE-24814: - Author: ASF GitHub Bot Created on: 02/Mar/21 14:15 Start Date: 02/Mar/21 14:15 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #2009: URL: https://github.com/apache/hive/pull/2009#issuecomment-788939546 @pgaref Thanks for your interest (as always). The Time parsing/formatting code is all over the place in Hive. Did you know there are some areas of the code that allow for 10 digits of nanos, by truncating the last digit, but not others? I am trying to consolidate all that stuff and bring it into one place for visibility and trying to harmonize it with the pre-canned ISO formats already included in the JDK. The less Hive-specific stuff regarding time handling, the better. I also plan on adding copious documentation once I get the unit tests all working. I know some of them are broken. I'm having a pretty hard time figuring out where they are going wrong. And yes `ISO_LOCAL_TIME` includes an optional NANO field. This will be mentioned in comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559978) Time Spent: 1h 10m (was: 1h) > Harmonize Hive Date-Time Formats > > > Key: HIVE-24814 > URL: https://issues.apache.org/jira/browse/HIVE-24814 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24758) Log Tez Task DAG ID, DAG Session ID, HS2 Hostname
[ https://issues.apache.org/jira/browse/HIVE-24758?focusedWorklogId=559954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559954 ] ASF GitHub Bot logged work on HIVE-24758: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:33 Start Date: 02/Mar/21 13:33 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1963: URL: https://github.com/apache/hive/pull/1963#discussion_r585566958 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ## @@ -236,6 +239,10 @@ public int execute() { throw new HiveException("Operation cancelled"); } +// Log all the info required to find the various logs for this query +LOG.info("HS2 Host: [{}], Query ID: [{}], Dag ID: [{}], DAG Session ID: [{}]", getHostNameIP(), queryId, Review comment: Hey @belugabehr -- taking another look here and seems like SessionState could do the job. What about SessionState.getHiveServer2Host() ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559954) Time Spent: 1h 40m (was: 1.5h) > Log Tez Task DAG ID, DAG Session ID, HS2 Hostname > - > > Key: HIVE-24758 > URL: https://issues.apache.org/jira/browse/HIVE-24758 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In order to get the logs for a particular query, submitted to Tez on YARN, > the following pieces of information are required: > * YARN Application ID > * TEZ DAG ID > * HS2 Host that ran the job > Include this information in TezTask output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293707#comment-17293707 ] Rajesh Balamohan commented on HIVE-24596: - Lot of times, for debugging cbo & other aspects of query plans, users would have to provide lot of details like query plans, table details, logs etc. Certain times custom dev-jars are shipped to gather additional information, and in some cases sample data is also requested from users. {{explain ddl }} helps in identifying the tables/views of the specific query and generates the schema, partitions, stats, views for the specific query. This way, dev has to just get this sql output and they can run it in their local environment to reproduce the issue (i.e {{explain cbo }} or {{explain }} should generate the same result without having real data). This is targeted towards easier debugging. I haven't gone through the testcases yet in the PR. I believe, it should cover examples and test cases, if not present already. > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?focusedWorklogId=559944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559944 ] ASF GitHub Bot logged work on HIVE-24839: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:08 Start Date: 02/Mar/21 13:08 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2034: URL: https://github.com/apache/hive/pull/2034#discussion_r585549664 ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java ## @@ -174,8 +174,10 @@ public StatEstimator getStatEstimator() { } private Optional getRangeWidth(Range range) { - if (range.minValue != null && range.maxValue != null) { -return Optional.of(range.maxValue.doubleValue() - range.minValue.doubleValue()); + if (range != null) { Review comment: could you please also add the testcase from the jira? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559944) Time Spent: 20m (was: 10m) > SubStrStatEstimator.estimate throws NullPointerException > > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > This issue can be reproduced by running the following queries: > {code:java} > create table t0 (s string); > create table t1 (s string, i int); > insert into t0 select "abc"; > insert into t1 select "abc", 4; > select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; > {code} > The select query fails with error: > {code:java} > Error: Error while compiling statement: FAILED: NullPointerException null > (state=42000,code=4) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema
[ https://issues.apache.org/jira/browse/HIVE-24841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24841: -- Labels: pull-request-available (was: ) > Parallel edge fixer may run into NPE when RS is missing a duplicate column > from the output schema > - > > Key: HIVE-24841 > URL: https://issues.apache.org/jira/browse/HIVE-24841 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This may mean that the RS has an incorrect schema - but that will be > investigated separately -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema
[ https://issues.apache.org/jira/browse/HIVE-24841?focusedWorklogId=559942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559942 ] ASF GitHub Bot logged work on HIVE-24841: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:06 Start Date: 02/Mar/21 13:06 Worklog Time Spent: 10m Work Description: kgyrtkirk opened a new pull request #2035: URL: https://github.com/apache/hive/pull/2035 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559942) Remaining Estimate: 0h Time Spent: 10m > Parallel edge fixer may run into NPE when RS is missing a duplicate column > from the output schema > - > > Key: HIVE-24841 > URL: https://issues.apache.org/jira/browse/HIVE-24841 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This may mean that the RS has an incorrect schema - but that will be > investigated separately -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats
[ https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559940 ] ASF GitHub Bot logged work on HIVE-24814: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:06 Start Date: 02/Mar/21 13:06 Worklog Time Spent: 10m Work Description: pgaref commented on pull request #2009: URL: https://github.com/apache/hive/pull/2009#issuecomment-788895303 Hey @belugabehr can you please take another look on the .out diffs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559940) Time Spent: 1h (was: 50m) > Harmonize Hive Date-Time Formats > > > Key: HIVE-24814 > URL: https://issues.apache.org/jira/browse/HIVE-24814 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats
[ https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559939 ] ASF GitHub Bot logged work on HIVE-24814: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:05 Start Date: 02/Mar/21 13:05 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2009: URL: https://github.com/apache/hive/pull/2009#discussion_r585545605 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToString.java ## @@ -28,20 +28,12 @@ import java.time.ZoneOffset; import java.time.format.DateTimeFormatter; import java.time.format.DateTimeFormatterBuilder; -import java.time.temporal.ChronoField; public class CastTimestampToString extends TimestampToStringUnaryUDF { private static final long serialVersionUID = 1L; - private static final DateTimeFormatter PRINT_FORMATTER; - - static { -DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder(); -// Date and time parts -builder.append(DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss")); -// Fractional part -builder.optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd(); -PRINT_FORMATTER = builder.toFormatter(); - } + private static final DateTimeFormatter PRINT_FORMATTER = + new DateTimeFormatterBuilder().append(DateTimeFormatter.ISO_LOCAL_DATE).appendLiteral(' ') + .append(DateTimeFormatter.ISO_LOCAL_TIME).toFormatter(); Review comment: is optional Nanotime included in ISO_LOCAL_TIME ? Looks likes its already there: ``.appendFraction(NANO_OF_SECOND, 0, 9, true)`` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559939) Time Spent: 50m (was: 40m) > Harmonize Hive Date-Time Formats > > > Key: HIVE-24814 > URL: https://issues.apache.org/jira/browse/HIVE-24814 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293684#comment-17293684 ] Zoltan Haindrich commented on HIVE-24596: - [~harshit.gupta] or [~rajesh.balamohan]: Could you please give a sample usage to this feature? > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24814) Harmonize Hive Date-Time Formats
[ https://issues.apache.org/jira/browse/HIVE-24814?focusedWorklogId=559938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559938 ] ASF GitHub Bot logged work on HIVE-24814: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:03 Start Date: 02/Mar/21 13:03 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2009: URL: https://github.com/apache/hive/pull/2009#discussion_r585545605 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToString.java ## @@ -28,20 +28,12 @@ import java.time.ZoneOffset; import java.time.format.DateTimeFormatter; import java.time.format.DateTimeFormatterBuilder; -import java.time.temporal.ChronoField; public class CastTimestampToString extends TimestampToStringUnaryUDF { private static final long serialVersionUID = 1L; - private static final DateTimeFormatter PRINT_FORMATTER; - - static { -DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder(); -// Date and time parts -builder.append(DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss")); -// Fractional part -builder.optionalStart().appendFraction(ChronoField.NANO_OF_SECOND, 0, 9, true).optionalEnd(); -PRINT_FORMATTER = builder.toFormatter(); - } + private static final DateTimeFormatter PRINT_FORMATTER = + new DateTimeFormatterBuilder().append(DateTimeFormatter.ISO_LOCAL_DATE).appendLiteral(' ') + .append(DateTimeFormatter.ISO_LOCAL_TIME).toFormatter(); Review comment: is optional Nanotime included in ISO_LOCAL_TIME ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559938) Time Spent: 40m (was: 0.5h) > Harmonize Hive Date-Time Formats > > > Key: HIVE-24814 > URL: https://issues.apache.org/jira/browse/HIVE-24814 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Harmonize Hive on JDK date-time formats courtesy of {{DateTimeFormatter}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293681#comment-17293681 ] Robbie Zhang commented on HIVE-24839: - This bug can be worked around by setting hive.stats.estimators.enable to false. > SubStrStatEstimator.estimate throws NullPointerException > > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This issue can be reproduced by running the following queries: > {code:java} > create table t0 (s string); > create table t1 (s string, i int); > insert into t0 select "abc"; > insert into t1 select "abc", 4; > select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; > {code} > The select query fails with error: > {code:java} > Error: Error while compiling statement: FAILED: NullPointerException null > (state=42000,code=4) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24839: -- Labels: pull-request-available (was: ) > SubStrStatEstimator.estimate throws NullPointerException > > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This issue can be reproduced by running the following queries: > {code:java} > create table t0 (s string); > create table t1 (s string, i int); > insert into t0 select "abc"; > insert into t1 select "abc", 4; > select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; > {code} > The select query fails with error: > {code:java} > Error: Error while compiling statement: FAILED: NullPointerException null > (state=42000,code=4) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?focusedWorklogId=559936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559936 ] ASF GitHub Bot logged work on HIVE-24839: - Author: ASF GitHub Bot Created on: 02/Mar/21 13:00 Start Date: 02/Mar/21 13:00 Worklog Time Spent: 10m Work Description: ujc714 opened a new pull request #2034: URL: https://github.com/apache/hive/pull/2034 ### What changes were proposed in this pull request? It fixes a bug in UDFSubstr.SubStrStatEstimator. ### Why are the changes needed? The method getRangeWidth didn't check if range is null before it references the properties of range. When Hive estimates the stats on a substr function with a child UDF, the compilation might fail due to NullPointerException. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Start MiniHS2Cluster then run the following queries manually: ``` create table t0 (s string); create table t1 (s string, i int); insert into t0 select "abc"; insert into t1 select "abc", 4; select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559936) Remaining Estimate: 0h Time Spent: 10m > SubStrStatEstimator.estimate throws NullPointerException > > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This issue can be reproduced by running the following queries: > {code:java} > create table t0 (s string); > create table t1 (s string, i int); > insert into t0 select "abc"; > insert into t1 select "abc", 4; > select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; > {code} > The select query fails with error: > {code:java} > Error: Error while compiling statement: FAILED: NullPointerException null > (state=42000,code=4) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?focusedWorklogId=559933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559933 ] ASF GitHub Bot logged work on HIVE-24596: - Author: ASF GitHub Bot Created on: 02/Mar/21 12:53 Start Date: 02/Mar/21 12:53 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2033: URL: https://github.com/apache/hive/pull/2033#discussion_r585539220 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java ## @@ -541,12 +606,12 @@ JSONObject collectAuthRelatedEntities(PrintStream out, ExplainWork work) if (delegate != null) { Class itface = SessionState.get().getAuthorizerInterface(); Object authorizer = AuthorizationFactory.create(delegate, itface, - new AuthorizationFactory.AuthorizationExceptionHandler() { -@Override -public void exception(Exception exception) { - exceptions.add(exception.getMessage()); -} - }); + new AuthorizationFactory.AuthorizationExceptionHandler() { +@Override Review comment: there are lots of pure indentation changes in this patch - are we using the same formatter settings? `dev-support/eclipse-styles.xml` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559933) Time Spent: 20m (was: 10m) > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293674#comment-17293674 ] Robbie Zhang commented on HIVE-24839: - We can see such backtrace in HS2 log file: {code:java} java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.getRangeWidth(UDFSubstr.java:177) at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(UDFSubstr.java:156) at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1576) at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1435) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:197) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143) at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78) at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:447) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:158) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12823) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:422) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} The expression "substr(t0.s, t1.i-1)" has a nested function. The second parameter of substr is actually GenericUDFOPMinus. The ColStatistics on it doesn't have a valid range. But getRangeWidth doesn't check it: {code:java} private Optional getRangeWidth(Range range) { if (range.minValue != null && range.maxValue != null) { return Optional.of(range.maxValue.doubleValue() - range.minValue.doubleValue()); } return Optional.empty(); } {code} Only 4 UDF classes implement StatEstimatorProvider and only UDFSubstr has this bug. > SubStrStatEstimator.estimate throws NullPointerException > > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 >
[jira] [Updated] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24596: -- Labels: pull-request-available (was: ) > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24596) Explain ddl for debugging
[ https://issues.apache.org/jira/browse/HIVE-24596?focusedWorklogId=559927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559927 ] ASF GitHub Bot logged work on HIVE-24596: - Author: ASF GitHub Bot Created on: 02/Mar/21 12:31 Start Date: 02/Mar/21 12:31 Worklog Time Spent: 10m Work Description: HarshitGupta11 opened a new pull request #2033: URL: https://github.com/apache/hive/pull/2033 https://issues.apache.org/jira/browse/HIVE-24596 For debugging query issues, basic details like table schema, statistics, partition details, query plans are needed. It would be good to have "explain ddl" support, which can generate these details. This can help in recreating the schema and planner issues without sample data. ### What changes were proposed in this pull request? Added "explain ddl " option which will emit all the DDL plans for the given query. ### Why are the changes needed? For Improving the debugging Process in clusters. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The patch was tested on the local cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559927) Remaining Estimate: 0h Time Spent: 10m > Explain ddl for debugging > - > > Key: HIVE-24596 > URL: https://issues.apache.org/jira/browse/HIVE-24596 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Harshit Gupta >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > For debugging query issues, basic details like table schema, statistics, > partition details, query plans are needed. > It would be good to have "explain ddl" support, which can generate these > details. This can help in recreating the schema and planner issues without > sample data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24841) Parallel edge fixer may run into NPE when RS is missing a duplicate column from the output schema
[ https://issues.apache.org/jira/browse/HIVE-24841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-24841: --- > Parallel edge fixer may run into NPE when RS is missing a duplicate column > from the output schema > - > > Key: HIVE-24841 > URL: https://issues.apache.org/jira/browse/HIVE-24841 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > This may mean that the RS has an incorrect schema - but that will be > investigated separately -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24840) Materialized View incremental rebuild produces wrong result set after compaction
[ https://issues.apache.org/jira/browse/HIVE-24840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-24840: - > Materialized View incremental rebuild produces wrong result set after > compaction > > > Key: HIVE-24840 > URL: https://issues.apache.org/jira/browse/HIVE-24840 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Critical > > {code} > create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, > NULL); > create materialized view mat1 stored as orc TBLPROPERTIES > ('transactional'='true') as > select a,b,c from t1 where a > 0 or a is null; > delete from t1 where a = 1; > alter table t1 compact 'major'; > -- Wait until compaction finished. > alter materialized view mat1 rebuild; > {code} > Expected result of query > {code} > select * from mat1; > {code} > {code} > 2 two 2 > NULL NULL NULL > {code} > but if incremental rebuild is enabled the result is > {code} > 1 one 1 > 2 two 2 > NULL NULL NULL > {code} > Cause: Incremental rebuild queries whether the source tables of a > materialized view has delete or update transaction since the last rebuild > from metastore from COMPLETED_TXN_COMPONENTS table. However when a major > compaction is performed on the source tables the records related to these > tables are deleted from COMPLETED_TXN_COMPONENTS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24827) Hive aggregation query returns incorrect results for non text files
[ https://issues.apache.org/jira/browse/HIVE-24827?focusedWorklogId=559914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559914 ] ASF GitHub Bot logged work on HIVE-24827: - Author: ASF GitHub Bot Created on: 02/Mar/21 12:14 Start Date: 02/Mar/21 12:14 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2018: URL: https://github.com/apache/hive/pull/2018#discussion_r585508226 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ## @@ -4028,6 +4030,8 @@ public static int getFooterCount(TableDesc table, JobConf job) throws IOExceptio int footerCount; try { footerCount = Integer.parseInt(table.getProperties().getProperty(serdeConstants.FOOTER_COUNT, "0")); + footerCount = + validateHeaderFooter(table, footerCount, "skip.footer.line.count"); Review comment: since `FOOTER_COUNT = "skip.footer.line.count"` ; I think you could also push in this `Integer.parseInt` into you method as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559914) Time Spent: 40m (was: 0.5h) > Hive aggregation query returns incorrect results for non text files > --- > > Key: HIVE-24827 > URL: https://issues.apache.org/jira/browse/HIVE-24827 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > When header & footer are configured for non-text files, the aggregation query > returns wrong result. > Propose to ignore this property for non-text files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24723) Use ExecutorService in TezSessionPool
[ https://issues.apache.org/jira/browse/HIVE-24723?focusedWorklogId=559892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559892 ] ASF GitHub Bot logged work on HIVE-24723: - Author: ASF GitHub Bot Created on: 02/Mar/21 12:12 Start Date: 02/Mar/21 12:12 Worklog Time Spent: 10m Work Description: pgaref commented on pull request #1939: URL: https://github.com/apache/hive/pull/1939#issuecomment-788782335 Thanks for the PR @belugabehr and the time to polish this! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559892) Time Spent: 3h 20m (was: 3h 10m) > Use ExecutorService in TezSessionPool > - > > Key: HIVE-24723 > URL: https://issues.apache.org/jira/browse/HIVE-24723 > Project: Hive > Issue Type: Improvement > Components: Tez >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently there are some wonky home-made thread pooling action going on in > {{TezSessionPool}}. Replace it with some JDK/Guava goodness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24835) Replace HiveSubQueryFinder with RexUtil.SubQueryFinder
[ https://issues.apache.org/jira/browse/HIVE-24835?focusedWorklogId=559879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559879 ] ASF GitHub Bot logged work on HIVE-24835: - Author: ASF GitHub Bot Created on: 02/Mar/21 11:53 Start Date: 02/Mar/21 11:53 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #2026: URL: https://github.com/apache/hive/pull/2026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 559879) Time Spent: 20m (was: 10m) > Replace HiveSubQueryFinder with RexUtil.SubQueryFinder > -- > > Key: HIVE-24835 > URL: https://issues.apache.org/jira/browse/HIVE-24835 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > HiveSubQueryFinder has been copied from RexUtil::SubQueryFinder due to > CALCITE-1726. Currently, Hive is in calcite-1.21.0 and this bug is resolved > so the duplicated code can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24817) "not in" clause returns incorrect data when there is coercion
[ https://issues.apache.org/jira/browse/HIVE-24817?focusedWorklogId=559871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559871 ] ASF GitHub Bot logged work on HIVE-24817: - Author: ASF GitHub Bot Created on: 02/Mar/21 11:50 Start Date: 02/Mar/21 11:50 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #2027: URL: https://github.com/apache/hive/pull/2027#discussion_r585391616 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java ## @@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node, T columnDesc = children.get(0); T valueDesc = interpretNode(columnDesc, children.get(i)); if (valueDesc == null) { - if (hasNullValue) { -// Skip if null value has already been added -continue; - } - TypeInfo targetType = exprFactory.getTypeInfo(columnDesc); + // Keep original + TypeInfo targetType = exprFactory.getTypeInfo(children.get(i)); if (!expressions.containsKey(targetType)) { expressions.put(targetType, columnDesc); } - T nullConst = exprFactory.createConstantExpr(targetType, null); - expressions.put(targetType, nullConst); - hasNullValue = true; + expressions.put(targetType, children.get(i)); } else { Review comment: I was going thru here and there and I think there might be another way around this problem which could retain this optimization as well: * introduce a new `NOT` operator: which can be controlled to return true/false in case of null values * in case of filter expressions start using the new not operator; and switch mode below every `NOT` operator but this feels like a more complicated change - we should only do it if we loose important optimizations ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java ## @@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node, T columnDesc = children.get(0); T valueDesc = interpretNode(columnDesc, children.get(i)); if (valueDesc == null) { - if (hasNullValue) { -// Skip if null value has already been added -continue; - } - TypeInfo targetType = exprFactory.getTypeInfo(columnDesc); + // Keep original + TypeInfo targetType = exprFactory.getTypeInfo(children.get(i)); if (!expressions.containsKey(targetType)) { expressions.put(targetType, columnDesc); } - T nullConst = exprFactory.createConstantExpr(targetType, null); - expressions.put(targetType, nullConst); - hasNullValue = true; + expressions.put(targetType, children.get(i)); } else { TypeInfo targetType = exprFactory.getTypeInfo(valueDesc); if (!expressions.containsKey(targetType)) { Review comment: this if statement has no effect - the map value will be overwritten anyway ; I wonder if we have a bug here ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java ## @@ -1007,17 +1001,12 @@ protected T getXpathOrFuncExprNodeDesc(ASTNode node, T columnDesc = children.get(0); T valueDesc = interpretNode(columnDesc, children.get(i)); if (valueDesc == null) { - if (hasNullValue) { -// Skip if null value has already been added -continue; - } - TypeInfo targetType = exprFactory.getTypeInfo(columnDesc); + // Keep original + TypeInfo targetType = exprFactory.getTypeInfo(children.get(i)); if (!expressions.containsKey(targetType)) { expressions.put(targetType, columnDesc); } - T nullConst = exprFactory.createConstantExpr(targetType, null); - expressions.put(targetType, nullConst); - hasNullValue = true; + expressions.put(targetType, children.get(i)); Review comment: for `IN` the original logic is valid as long as it's in `UnknownAs.FALSE` mode...but for `NOT IN` the correct interpretation would be `UnknownAs.TRUE`. I think we might be better off not coping with the `UnknownAs` devils here - and retain the original expressions as in the current proposed patch; I'm not sure how much optimization opportunities/performance we will loose that way. ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java ## @@ -1007,17
[jira] [Resolved] (HIVE-24835) Replace HiveSubQueryFinder with RexUtil.SubQueryFinder
[ https://issues.apache.org/jira/browse/HIVE-24835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24835. --- Resolution: Fixed Pushed to master. Thanks [~zabetak]. > Replace HiveSubQueryFinder with RexUtil.SubQueryFinder > -- > > Key: HIVE-24835 > URL: https://issues.apache.org/jira/browse/HIVE-24835 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > HiveSubQueryFinder has been copied from RexUtil::SubQueryFinder due to > CALCITE-1726. Currently, Hive is in calcite-1.21.0 and this bug is resolved > so the duplicated code can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Zhang reassigned HIVE-24839: --- > SubStrStatEstimator.estimate throws NullPointerException > > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 > Project: Hive > Issue Type: Bug >Reporter: Robbie Zhang >Assignee: Robbie Zhang >Priority: Major > > This issue can be reproduced by running the following queries: > {code:java} > create table t0 (s string); > create table t1 (s string, i int); > insert into t0 select "abc"; > insert into t1 select "abc", 4; > select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; > {code} > The select query fails with error: > {code:java} > Error: Error while compiling statement: FAILED: NullPointerException null > (state=42000,code=4) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24343) Table partition operations (create, drop, select) fail when the number of partitions is greater than 32767 (signed int)
[ https://issues.apache.org/jira/browse/HIVE-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293600#comment-17293600 ] Narayanan Venkateswaran commented on HIVE-24343: Please note that although this issue exists presently in hive, this issue is exposed through repair done in the partition management task thread and is fixed with the backports of the following JIRAs, * HIVE-23111 * HIVE-23851 * HIVE-24584 > Table partition operations (create, drop, select) fail when the number of > partitions is greater than 32767 (signed int) > --- > > Key: HIVE-24343 > URL: https://issues.apache.org/jira/browse/HIVE-24343 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Minor > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > The table partition operations - create, drop, select access the underlying > relation database using JDO, which internally routes the operations through > the JDBC driver. Most of the underlying JDBC driver implementations place a > limit on the number of parameters that can be passed through a statement > implementation. The limitations are as follows, > postgreSQL - 32767 > (https://www.postgresql.org/message-id/16832734.post%40talk.nabble.com) > MySQL - 32767 - 2 Byte Integer - num of params > (https://dev.mysql.com/doc/internals/en/com-stmt-prepare-response.html#packet-COM_STMT_PREPARE_OK) > Oracle - 32767 - > https://www.jooq.org/doc/3.12/manual/sql-building/dsl-context/custom-settings/settings-inline-threshold/ > Derby - 32767 - stored in an unsinged integer - Note the Prepared > Statement implementation here - > [https://svn.apache.org/repos/asf/db/derby/code/branches/10.1/java/client/org/apache/derby/client/am/PreparedStatement.java] > > These limits should be taken into account when querying the underlying > metastore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24718) Moving to file based iteration for copying data
[ https://issues.apache.org/jira/browse/HIVE-24718?focusedWorklogId=559852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559852 ] ASF GitHub Bot logged work on HIVE-24718: - Author: ASF GitHub Bot Created on: 02/Mar/21 08:02 Start Date: 02/Mar/21 08:02 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1936: URL: https://github.com/apache/hive/pull/1936#discussion_r585302812 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java ## @@ -1836,6 +1837,64 @@ public void testHdfsNameserviceWithDataCopy() throws Throwable { .verifyResults(new String[]{"2", "3"}); } + @Test + public void testReplWithRetryDisabledIterators() throws Throwable { +List clause = new ArrayList<>(); +//NS replacement parameters has no effect when data is also copied to staging +clause.add("'" + HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET + "'='false'"); +clause.add("'" + HiveConf.ConfVars.REPL_COPY_ITERATOR_RETRY + "'='false'"); +WarehouseInstance.Tuple tuple = primary.run("use " + primaryDbName) +.run("create table acid_table (key int, value int) partitioned by (load_date date) " + +"clustered by(key) into 2 buckets stored as orc tblproperties ('transactional'='true')") +.run("create table table1 (i String)") +.run("insert into table1 values (1)") +.run("insert into table1 values (2)") +.dump(primaryDbName, clause); +assertFalseExternalFileList(new Path(new Path(tuple.dumpLocation, Review comment: nit: If you just pass dumpLocation to the method, and do the path creation inside the method, this would look clean. Anyway the method assertFalseExternalFileList isn't doing much. So, alternatively, you can do the fs.exist() write there and get rid of method. ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -653,6 +649,8 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal new TimeValidator(TimeUnit.HOURS), "Total allowed retry duration in hours inclusive of all retries. Once this is exhausted, " + "the policy instance will be marked as failed and will need manual intervention to restart."), +REPL_COPY_ITERATOR_RETRY("hive.repl.copy.iterator.retry", true, Review comment: REPL_COPY_FILE_LIST_ITERATOR_RETRY ? ## File path: ql/src/test/org/apache/hadoop/hive/ql/exec/repl/util/TestFileList.java ## @@ -18,147 +18,266 @@ package org.apache.hadoop.hive.ql.exec.repl.util; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.conf.HiveConf; -import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.apache.hadoop.hive.ql.ErrorMsg; +import org.apache.hadoop.hive.ql.exec.util.Retryable; +import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; import org.mockito.ArgumentCaptor; -import org.mockito.Mock; import org.mockito.Mockito; +import org.mockito.junit.MockitoJUnitRunner; import org.powermock.core.classloader.annotations.PrepareForTest; -import org.powermock.modules.junit4.PowerMockRunner; import org.slf4j.LoggerFactory; -import java.io.BufferedWriter; +import java.io.File; +import java.io.IOException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; -import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.TimeUnit; -import static org.junit.Assert.assertFalse; -import static org.junit.Assert.assertTrue; - - /** * Tests the File List implementation. */ -@RunWith(PowerMockRunner.class) +@RunWith(MockitoJUnitRunner.class) @PrepareForTest({LoggerFactory.class}) public class TestFileList { - @Mock - private BufferedWriter bufferedWriter; - - - @Test - public void testNoStreaming() throws Exception { -Object tuple[] = setupAndGetTuple(100, false); -FileList fileList = (FileList) tuple[0]; -FileListStreamer fileListStreamer = (FileListStreamer) tuple[1]; -fileList.add("Entry1"); -fileList.add("Entry2"); -assertFalse(isStreamingToFile(fileListStreamer)); - } + HiveConf conf = new HiveConf(); + private FSDataOutputStream outStream; + private FSDataOutputStream testFileStream; + final String TEST_DATA_DIR = new File(System.getProperty("java.io.tmpdir") + + File.separator + TestFileList.class.getCanonicalName() + "-" + System.currentTimeMillis() + ).getPath().replaceAll("", "/"); + private Exception testException = new IOException("test"); @Test - public void testAlwaysStreaming() throws Exception { -Object tuple[] = setupAndGetTuple(100, true); -