[GitHub] [hive] davidov541 commented on issue #635: HIVE-14888: Fixing the check of current security mode for Spark
davidov541 commented on issue #635: HIVE-14888: Fixing the check of current security mode for Spark URL: https://github.com/apache/hive/pull/635#issuecomment-497911329 ReviewBoard: https://reviews.apache.org/r/70718/ JIRA: https://issues.apache.org/jira/browse/HIVE-14888 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582541 ## File path: ql/src/test/results/clientpositive/llap/multi_count_distinct_null.q.out ## @@ -73,7 +73,7 @@ STAGE PLANS: outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 18 Data size: 1628 Basic stats: COMPLETE Column stats: COMPLETE Select Operator - expressions: CASE WHEN (((_col3 = 3L) and _col0 is not null)) THEN (1) ELSE (null) END (type: int), CASE WHEN (((_col3 = 5L) and _col1 is not null)) THEN (1) ELSE (null) END (type: int), CASE WHEN (((_col3 = 6L) and _col2 is not null)) THEN (1) ELSE (null) END (type: int) + expressions: CASE WHEN (((_col3 = 3L) and _col0 is not null)) THEN (1) ELSE (UDFToInteger(null)) END (type: int), CASE WHEN (((_col3 = 5L) and _col1 is not null)) THEN (1) ELSE (UDFToInteger(null)) END (type: int), CASE WHEN (((_col3 = 6L) and _col2 is not null)) THEN (1) ELSE (UDFToInteger(null)) END (type: int) Review comment: Cast not being constant folded? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582664 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ## @@ -1274,6 +1274,40 @@ protected ExprNodeDesc getXpathOrFuncExprNodeDesc(ASTNode expr, children); } +// introduce cast for expressions which are of different type than parent WHEN expr's type Review comment: WHEN or THEN? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582787 ## File path: ql/src/test/results/clientpositive/perf/spark/query80.q.out ## @@ -782,7 +782,7 @@ STAGE PLANS: 1 Map 33 Statistics: Num rows: 231917759 Data size: 31534108438 Basic stats: COMPLETE Column stats: NONE Select Operator -expressions: _col15 (type: string), _col5 (type: decimal(7,2)), CASE WHEN (_col9 is not null) THEN (_col9) ELSE (0) END (type: decimal(12,2)), (_col6 - CASE WHEN (_col10 is not null) THEN (_col10) ELSE (0) END) (type: decimal(13,2)) +expressions: _col15 (type: string), _col5 (type: decimal(7,2)), CASE WHEN (_col9 is not null) THEN (_col9) ELSE (CAST( 0 AS decimal(7,2))) END (type: decimal(7,2)), (_col6 - CASE WHEN (_col10 is not null) THEN (_col10) ELSE (CAST( 0 AS decimal(7,2))) END) (type: decimal(8,2)) Review comment: Interesting case change, why is it not 12,2 or 13,2? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582808 ## File path: ql/src/test/results/clientpositive/perf/spark/query80.q.out ## @@ -782,7 +782,7 @@ STAGE PLANS: 1 Map 33 Statistics: Num rows: 231917759 Data size: 31534108438 Basic stats: COMPLETE Column stats: NONE Select Operator -expressions: _col15 (type: string), _col5 (type: decimal(7,2)), CASE WHEN (_col9 is not null) THEN (_col9) ELSE (0) END (type: decimal(12,2)), (_col6 - CASE WHEN (_col10 is not null) THEN (_col10) ELSE (0) END) (type: decimal(13,2)) +expressions: _col15 (type: string), _col5 (type: decimal(7,2)), CASE WHEN (_col9 is not null) THEN (_col9) ELSE (CAST( 0 AS decimal(7,2))) END (type: decimal(7,2)), (_col6 - CASE WHEN (_col10 is not null) THEN (_col10) ELSE (CAST( 0 AS decimal(7,2))) END) (type: decimal(8,2)) Review comment: Also, inside it is 7,2, outside it is 8,2? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582682 ## File path: ql/src/test/results/clientpositive/perf/spark/query77.q.out ## @@ -826,7 +826,7 @@ STAGE PLANS: outputColumnNames: _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 95833780 Data size: 13030622681 Basic stats: COMPLETE Column stats: NONE Select Operator - expressions: 'web channel' (type: string), _col3 (type: int), _col4 (type: decimal(17,2)), CASE WHEN (_col1 is not null) THEN (_col1) ELSE (0) END (type: decimal(17,2)), (_col5 - CASE WHEN (_col2 is not null) THEN (_col2) ELSE (0) END) (type: decimal(18,2)) + expressions: 'web channel' (type: string), _col3 (type: int), _col4 (type: decimal(17,2)), CASE WHEN (_col1 is not null) THEN (_col1) ELSE (CAST( 0 AS decimal(17,2))) END (type: decimal(17,2)), (_col5 - CASE WHEN (_col2 is not null) THEN (_col2) ELSE (CAST( 0 AS decimal(17,2))) END) (type: decimal(18,2)) Review comment: No constant folding again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582586 ## File path: ql/src/test/results/clientpositive/llap/subquery_select.q.out ## @@ -3943,10 +3943,14 @@ STAGE PLANS: mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE -Reduce Output Operator - sort order: - Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE - value expressions: _col0 (type: int) +Select Operator Review comment: Did the join change type due to this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582362 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ## @@ -1274,6 +1274,40 @@ protected ExprNodeDesc getXpathOrFuncExprNodeDesc(ASTNode expr, children); } +// introduce cast for expressions which are of different type than parent WHEN expr's type +// this is done so that vectorization is able to vectorize appropriately +if (genericUDF instanceof GenericUDFWhen && desc instanceof ExprNodeGenericFuncDesc) { + String castTypeName = desc.getTypeInfo().getTypeName(); + if (desc.getTypeInfo() instanceof DecimalTypeInfo) { Review comment: Why is this a special case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582883 ## File path: ql/src/test/results/clientpositive/vectorized_case.q.out ## @@ -705,13 +705,13 @@ STAGE PLANS: native: true vectorizationSchemaColumns: [0:member:decimal(10,0)/DECIMAL_64, 1:attr:decimal(10,0)/DECIMAL_64, 2:ROW__ID:struct] Select Operator - expressions: CASE WHEN ((member = 1)) THEN ((attr + 1)) ELSE (2) END (type: decimal(11,0)) + expressions: CASE WHEN ((member = 1)) THEN ((attr + 1)) ELSE (CAST( 2 AS decimal(11,0))) END (type: decimal(11,0)) outputColumnNames: _col0 Select Vectorization: className: VectorSelectOperator native: true - projectedOutputColumnNums: [9] - selectExpressions: IfExprDecimalColumnColumn(col 6:boolean, col 10:decimal(11,0)col 8:decimal(11,0))(children: Decimal64ColEqualDecimal64Scalar(col 0:decimal(10,0)/DECIMAL_64, decimal64Val 1, decimalVal 1) -> 6:boolean, ConvertDecimal64ToDecimal(col 7:decimal(11,0)/DECIMAL_64)(children: Decimal64ColAddDecimal64Scalar(col 1:decimal(10,0)/DECIMAL_64, decimal64Val 1, decimalVal 1) -> 7:decimal(11,0)/DECIMAL_64) -> 10:decimal(11,0), ConstantVectorExpression(val 2) -> 8:decimal(11,0)) -> 9:decimal(11,0) + projectedOutputColumnNums: [8] + selectExpressions: IfExprDecimal64ColumnDecimal64Scalar(col 6:boolean, col 7:decimal(11,0)/DECIMAL_64, decimal64Val 2, decimalVal 2)(children: Decimal64ColEqualDecimal64Scalar(col 0:decimal(10,0)/DECIMAL_64, decimal64Val 1, decimalVal 1) -> 6:boolean, Decimal64ColAddDecimal64Scalar(col 1:decimal(10,0)/DECIMAL_64, decimal64Val 1, decimalVal 1) -> 7:decimal(11,0)/DECIMAL_64) -> 8:decimal(11,0)/DECIMAL_64 Review comment: This is pretty neat, the expression optimized to a faster one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582420 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ## @@ -1274,6 +1274,40 @@ protected ExprNodeDesc getXpathOrFuncExprNodeDesc(ASTNode expr, children); } +// introduce cast for expressions which are of different type than parent WHEN expr's type +// this is done so that vectorization is able to vectorize appropriately +if (genericUDF instanceof GenericUDFWhen && desc instanceof ExprNodeGenericFuncDesc) { + String castTypeName = desc.getTypeInfo().getTypeName(); + if (desc.getTypeInfo() instanceof DecimalTypeInfo) { +castTypeName = serdeConstants.DECIMAL_TYPE_NAME; + } + FunctionInfo castFun = FunctionRegistry.getFunctionInfo(castTypeName); + + String whenTypeName = desc.getTypeInfo().getTypeName(); + List updatedChildren = new ArrayList<>(); + if (castFun != null) { +GenericUDF castUDF = castFun.getGenericUDF(); +if (desc.getTypeInfo() instanceof DecimalTypeInfo || desc +.getTypeInfo() instanceof VarcharTypeInfo || desc +.getTypeInfo() instanceof CharTypeInfo || desc +.getTypeInfo() instanceof TimestampLocalTZTypeInfo) { + ((SettableUDF) castUDF).setTypeInfo(desc.getTypeInfo()); Review comment: Special case for varchar(1) -> varchar(2) ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582845 ## File path: ql/src/test/results/clientpositive/spark/vectorized_case.q.out ## @@ -688,13 +688,13 @@ STAGE PLANS: native: true vectorizationSchemaColumns: [0:member:decimal(10,0)/DECIMAL_64, 1:attr:decimal(10,0)/DECIMAL_64, 2:ROW__ID:struct] Select Operator -expressions: CASE WHEN ((member = 1)) THEN (1) ELSE ((attr + 2)) END (type: decimal(11,0)) +expressions: CASE WHEN ((member = 1)) THEN (CAST( 1 AS decimal(11,0))) ELSE ((attr + 2)) END (type: decimal(11,0)) Review comment: Missed constant fold again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582463 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ## @@ -1274,6 +1274,40 @@ protected ExprNodeDesc getXpathOrFuncExprNodeDesc(ASTNode expr, children); } +// introduce cast for expressions which are of different type than parent WHEN expr's type +// this is done so that vectorization is able to vectorize appropriately +if (genericUDF instanceof GenericUDFWhen && desc instanceof ExprNodeGenericFuncDesc) { + String castTypeName = desc.getTypeInfo().getTypeName(); + if (desc.getTypeInfo() instanceof DecimalTypeInfo) { +castTypeName = serdeConstants.DECIMAL_TYPE_NAME; + } + FunctionInfo castFun = FunctionRegistry.getFunctionInfo(castTypeName); + + String whenTypeName = desc.getTypeInfo().getTypeName(); + List updatedChildren = new ArrayList<>(); + if (castFun != null) { +GenericUDF castUDF = castFun.getGenericUDF(); +if (desc.getTypeInfo() instanceof DecimalTypeInfo || desc +.getTypeInfo() instanceof VarcharTypeInfo || desc +.getTypeInfo() instanceof CharTypeInfo || desc +.getTypeInfo() instanceof TimestampLocalTZTypeInfo) { + ((SettableUDF) castUDF).setTypeInfo(desc.getTypeInfo()); +} +for (int i = 0; i < desc.getChildren().size(); i++) { + ExprNodeDesc child = children.get(i); + ExprNodeDesc newChild = child; + if ((i % 2 == 1 || i == desc.getChildren().size() - 1) && !whenTypeName Review comment: Does typeName have the scale/precision? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582305 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java ## @@ -571,7 +571,7 @@ private RexNode handleExplicitCast(ExprNodeGenericFuncDesc func, List c * If a CASE has branches with string/int/boolean branch types; there is no common type. */ private List adjustCaseBranchTypes(List nodes, RelDataType retType) { -List branchTypes = new ArrayList<>(); Review comment: Commenting out code or removing it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting
t3rmin4t0r commented on a change in pull request #656: HIVE-21742 : Vectorization: CASE result type casting URL: https://github.com/apache/hive/pull/656#discussion_r289582521 ## File path: ql/src/test/results/clientpositive/infer_join_preds.q.out ## @@ -1187,7 +1187,7 @@ STAGE PLANS: predicate: prid is not null (type: boolean) Statistics: Num rows: 1 Data size: 668 Basic stats: COMPLETE Column stats: NONE Select Operator -expressions: idp_warehouse_id (type: bigint), prid (type: bigint), concat(CAST( CASE WHEN (prid is null) THEN (1) ELSE (prid) END AS STRING), ',', CASE WHEN (prtimesheetid is null) THEN (1) ELSE (prtimesheetid) END, ',', CASE WHEN (prassignmentid is null) THEN (1) ELSE (prassignmentid) END, ',', CASE WHEN (prchargecodeid is null) THEN (1) ELSE (prchargecodeid) END, ',', CASE WHEN (prtypecodeid is null) THEN ('') ELSE (CAST( prtypecodeid AS STRING)) END, ',', CASE WHEN (practsum is null) THEN (1) ELSE (practsum) END, ',', CASE WHEN (prsequence is null) THEN (1) ELSE (prsequence) END, ',', CASE WHEN (length(prmodby) is null) THEN ('') ELSE (prmodby) END, ',', CASE WHEN (prmodtime is null) THEN (TIMESTAMP'2017-12-08 00:00:00') ELSE (prmodtime) END, ',', CASE WHEN (prrmexported is null) THEN (1) ELSE (prrmexported) END, ',', CASE WHEN (prrmckdel is null) THEN (1) ELSE (prrmckdel) END, ',', CASE WHEN (slice_status is null) THEN (1) ELSE (slice_status) END, ',', CASE WHEN (role_id is null) THEN (1) ELSE (role_id) END, ',', CASE WHEN (length(user_lov1) is null) THEN ('') ELSE (user_lov1) END, ',', CASE WHEN (length(user_lov2) is null) THEN ('') ELSE (user_lov2) END, ',', CASE WHEN (incident_id is null) THEN (1) ELSE (incident_id) END, ',', CASE WHEN (incident_investment_id is null) THEN (1) ELSE (incident_investment_id) END, ',', CASE WHEN (odf_ss_actuals is null) THEN (1) ELSE (odf_ss_actuals) END) (type: string) +expressions: idp_warehouse_id (type: bigint), prid (type: bigint), concat(CAST( CASE WHEN (prid is null) THEN (1L) ELSE (prid) END AS STRING), ',', CASE WHEN (prtimesheetid is null) THEN (1L) ELSE (prtimesheetid) END, ',', CASE WHEN (prassignmentid is null) THEN (1L) ELSE (prassignmentid) END, ',', CASE WHEN (prchargecodeid is null) THEN (1L) ELSE (prchargecodeid) END, ',', CASE WHEN (prtypecodeid is null) THEN ('') ELSE (CAST( prtypecodeid AS STRING)) END, ',', CASE WHEN (practsum is null) THEN (CAST( 1 AS decimal(38,20))) ELSE (practsum) END, ',', CASE WHEN (prsequence is null) THEN (1L) ELSE (prsequence) END, ',', CASE WHEN (length(prmodby) is null) THEN ('') ELSE (CAST( prmodby AS STRING)) END, ',', CASE WHEN (prmodtime is null) THEN (TIMESTAMP'2017-12-08 00:00:00') ELSE (prmodtime) END, ',', CASE WHEN (prrmexported is null) THEN (1L) ELSE (prrmexported) END, ',', CASE WHEN (prrmckdel is null) THEN (1L) ELSE (prrmckdel) END, ',', CASE WHEN (slice_status is null) THEN (1) ELSE (slice_status) END, ',', CASE WHEN (role_id is null) THEN (1L) ELSE (role_id) END, ',', CASE WHEN (length(user_lov1) is null) THEN ('') ELSE (CAST( user_lov1 AS STRING)) END, ',', CASE WHEN (length(user_lov2) is null) THEN ('') ELSE (CAST( user_lov2 AS STRING)) END, ',', CASE WHEN (incident_id is null) THEN (1L) ELSE (incident_id) END, ',', CASE WHEN (incident_investment_id is null) THEN (1L) ELSE (incident_investment_id) END, ',', CASE WHEN (odf_ss_actuals is null) THEN (1L) ELSE (odf_ss_actuals) END) (type: string) Review comment: is the casts kicking in from 1 -> 1L? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] prasanthj commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication.
prasanthj commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication. URL: https://github.com/apache/hive/pull/648#discussion_r289528707 ## File path: service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java ## @@ -137,32 +138,47 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response) return; } } - // If the cookie based authentication is already enabled, parse the - // request and validate the request cookies. - if (isCookieAuthEnabled) { -clientUserName = validateCookie(request); -requireNewCookie = (clientUserName == null); -if (requireNewCookie) { - LOG.info("Could not validate cookie sent, will try to generate a new cookie"); -} - } - // If the cookie based authentication is not enabled or the request does - // not have a valid cookie, use the kerberos or password based authentication - // depending on the server setup. - if (clientUserName == null) { -// For a kerberos setup -if (isKerberosAuthMode(authType)) { - String delegationToken = request.getHeader(HIVE_DELEGATION_TOKEN_HEADER); - // Each http request must have an Authorization header - if ((delegationToken != null) && (!delegationToken.isEmpty())) { -clientUserName = doTokenAuth(request, response); - } else { -clientUserName = doKerberosAuth(request); + + clientIpAddress = request.getRemoteAddr(); + LOG.debug("Client IP Address: " + clientIpAddress); + String trustedDomain = HiveConf.getVar(hiveConf, ConfVars.HIVE_SERVER2_TRUST_DOMAIN).trim(); + + // Skip authentication if the connection is from the trusted domain + if (!trustedDomain.isEmpty() && + PlainSaslHelper.isHostFromTrustedDomain(request.getRemoteHost(), trustedDomain)) { +LOG.info("No authentication performed because the connecting host " + request.getRemoteHost() + Review comment: We can only support this for non-kerberos auth mode (password based) and look for "Authorization: Basic" header. Extract the username and discard the password. If cookie comes along with the request, we can use the username from the cookie. My understanding here is that, a new request comes in with "Authorization: Basic" header, we trust the domain, extract the username from auth header, generate a cookie and respond with cookie. If a new request comes back with the cookie, validate the cookie, extract the user name and we are done. We should set the expectation from clients here in the config description (whether clients should send basic auth header and that password will be used if not from trusted domain and for trusted domains password will be discarded). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] odraese commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication.
odraese commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication. URL: https://github.com/apache/hive/pull/648#discussion_r289455397 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -3468,6 +3468,10 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal " (Use with property hive.server2.custom.authentication.class)\n" + " PAM: Pluggable authentication module\n" + " NOSASL: Raw transport"), +HIVE_SERVER2_TRUST_DOMAIN("hive.server2.trust.domain", "", +"Specifies the host or a domain to trust connections from. Authentication is skipped " + +"for any connection coming from this domain or the host. By default it is " + +"empty, which means that all the connections to HiveServer2 are authenticated."), Review comment: Explaining the logic that this property provides a suffix (for endswith check) rather than requiring an exact host name, could help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] odraese commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication.
odraese commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication. URL: https://github.com/apache/hive/pull/648#discussion_r289455198 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -3468,6 +3468,10 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal " (Use with property hive.server2.custom.authentication.class)\n" + " PAM: Pluggable authentication module\n" + " NOSASL: Raw transport"), +HIVE_SERVER2_TRUST_DOMAIN("hive.server2.trust.domain", "", Review comment: Maybe we should call this trusted.domain (to be consistent with code) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] odraese commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication.
odraese commented on a change in pull request #648: HIVE-21783: Accept Hive connections from the same domain without authentication. URL: https://github.com/apache/hive/pull/648#discussion_r289455006 ## File path: service/src/java/org/apache/hive/service/auth/PlainSaslHelper.java ## @@ -65,16 +70,65 @@ public static TTransportFactory getPlainTransportFactory(String authTypeStr) return saslFactory; } + static TTransportFactory getDualPlainTransportFactory(TTransportFactory otherTrans, +String trustedDomain) + throws LoginException { +LOG.info("Created additional transport factory for skipping authentication when client " + +"connection is from the same domain."); +return new DualSaslTransportFactory(otherTrans, trustedDomain); + } + public static TTransport getPlainTransport(String username, String password, TTransport underlyingTransport) throws SaslException { return new TSaslClientTransport("PLAIN", null, null, null, new HashMap(), new PlainCallbackHandler(username, password), underlyingTransport); } + // Return true if the remote host is from the trusted domain, i.e. host URL has the same + // suffix as the trusted domain. + static public boolean isHostFromTrustedDomain(String remoteHost, String trustedDomain) { +return remoteHost.endsWith(trustedDomain); + } + private PlainSaslHelper() { throw new UnsupportedOperationException("Can't initialize class"); } + static final class DualSaslTransportFactory extends TTransportFactory { +TTransportFactory otherFactory; +TTransportFactory noAuthFactory; +String trustedDomain; + +DualSaslTransportFactory(TTransportFactory otherFactory, String trustedDomain) +throws LoginException { + this.noAuthFactory = getPlainTransportFactory(AuthMethods.NONE.toString()); + this.otherFactory = otherFactory; + this.trustedDomain = trustedDomain; +} + +@Override +public TTransport getTransport(final TTransport trans) { + TSocket tSocket = null; + // Attempt to avoid authentication if only we can fetch the client IP address and it + // happens to be from the same domain as the server. + if (trans instanceof TSocket) { +tSocket = (TSocket) trans; + } else if (trans instanceof TSaslServerTransport) { +TSaslServerTransport saslTrans = (TSaslServerTransport) trans; +tSocket = (TSocket)(saslTrans.getUnderlyingTransport()); + } + String remoteHost = tSocket != null ? + tSocket.getSocket().getInetAddress().getCanonicalHostName() : null; + if (remoteHost != null && isHostFromTrustedDomain(remoteHost, trustedDomain)) { +LOG.info("No authentication performed because the connecting host " + remoteHost + " is " + +"from the trusted domain " + trustedDomain); +return noAuthFactory.getTransport(trans); + } else { Review comment: We don't need this else block. Just return the otherFactory.getTransport(trans) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] odraese commented on issue #643: Hive 21785: Add task queue/runtime stats per LLAP daemon to output
odraese commented on issue #643: Hive 21785: Add task queue/runtime stats per LLAP daemon to output URL: https://github.com/apache/hive/pull/643#issuecomment-497766195 Added updated version as patch to https://issues.apache.org/jira/browse/HIVE-21785 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289320836 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -3154,4 +3160,22 @@ public void taskInfoUpdated(TezTaskAttemptID attemptId, boolean isGuaranteed) { + attemptId + ", " + newState); sendUpdateMessageAsync(ti, newState); } + + private void updateMetrics(TaskAttemptImpl taskAttempt) { +// Only do it for successful map tasks +if (!TaskAttemptState.SUCCEEDED.equals(taskAttempt.getState()) || !isMapTask(taskAttempt)) { + return; +} +// Check if this task was already assigned to a node +NodeInfo nodeInfo = knownTasks.get(taskAttempt).assignedNode; +if (nodeInfo == null) { + return; +} + +metrics.addTaskLatency(nodeInfo.shortStringBase, taskAttempt.getFinishTime() - taskAttempt.getLaunchTime()); + } + + private boolean isMapTask(TaskAttemptImpl taskAttempt) { Review comment: I am not yet entirely familiar with this part of the code. I thought that the vertex can tell us more but getVertex is package private method for TaskAttemptImp, and getting the Vertex from would need something like this (found in getTransitiveVertexOutputs): -- DagInfo info = getContext().getCurrentDagInfo(); if (!(info instanceof DAG)) { LOG.warn("DAG info is not a DAG"); return; } DAG dag = (DAG) info; Vertex vertex = dag.getVertex(taskAttempt.getVertexID()); -- I found casting DagInfo to DAG more shady than relying on counters, but feel free to disagree. Also open to any suggestions where should I dig around more to find a better solution! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289320836 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -3154,4 +3160,22 @@ public void taskInfoUpdated(TezTaskAttemptID attemptId, boolean isGuaranteed) { + attemptId + ", " + newState); sendUpdateMessageAsync(ti, newState); } + + private void updateMetrics(TaskAttemptImpl taskAttempt) { +// Only do it for successful map tasks +if (!TaskAttemptState.SUCCEEDED.equals(taskAttempt.getState()) || !isMapTask(taskAttempt)) { + return; +} +// Check if this task was already assigned to a node +NodeInfo nodeInfo = knownTasks.get(taskAttempt).assignedNode; +if (nodeInfo == null) { + return; +} + +metrics.addTaskLatency(nodeInfo.shortStringBase, taskAttempt.getFinishTime() - taskAttempt.getLaunchTime()); + } + + private boolean isMapTask(TaskAttemptImpl taskAttempt) { Review comment: I am not yet entirely familiar with this part of the code. I thought that the vertex can tell us more but getVertex is package private method for TaskAttemptImp, and getting the Vertex from would need something like this (found in getTransitiveVertexOutputs): -- DagInfo info = getContext().getCurrentDagInfo(); if (!(info instanceof DAG)) { LOG.warn("DAG info is not a DAG"); return; } DAG dag = (DAG) info; Vertex vertex = dag.getVertex(taskAttempt.getVertexID()); -- I found casting DagInfo to DAG more shady than relying on counters, but feel free to disagree. Also open to any suggestions where should I dig around more to find a better solution! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289317467 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -1175,6 +1178,9 @@ public boolean deallocateTask( LOG.debug("Processing deallocateTask for task={}, taskSucceeded={}, endReason={}", task, taskSucceeded, endReason); } +if (task instanceof TaskAttemptImpl && metrics != null) { + updateMetrics((TaskAttemptImpl)task); Review comment: The metrics can be null in case of conf.getBoolean(ConfVars.HIVE_IN_TEST.varname, false) I tried to find the style problem, but not sure which part should be written differently This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289292223 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/metrics/LlapTaskSchedulerMetrics.java ## @@ -254,6 +266,14 @@ public void setWmUnusedGuaranteed(int unusedGuaranteed) { wmUnusedGuaranteedCount.set(unusedGuaranteed); } + public void addTaskLatency(String daemonId, long value) { Review comment: Done. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289290466 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java ## @@ -426,9 +428,10 @@ public LlapTaskSchedulerService(TaskSchedulerContext taskSchedulerContext, Clock this.pauseMonitor = new JvmPauseMonitor(conf); pauseMonitor.start(); String displayName = "LlapTaskSchedulerMetrics-" + MetricsUtils.getHostName(); + int latencyMetricWindowSize = HiveConf.getIntVar(conf, ConfVars.LLAP_LATENCY_METRIC_WINDOW_SIZE); Review comment: Added a new feature to turn off this metrics if this value is 0, or negative number. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289284589 ## File path: llap-common/src/test/org/apache/hadoop/hive/llap/metrics/MockMetricsCollector.java ## @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.llap.metrics; + +import com.google.common.collect.Lists; +import org.apache.hadoop.metrics2.AbstractMetric; +import org.apache.hadoop.metrics2.MetricsCollector; +import org.apache.hadoop.metrics2.MetricsInfo; +import org.apache.hadoop.metrics2.MetricsRecordBuilder; +import org.apache.hadoop.metrics2.MetricsTag; +import org.apache.hadoop.metrics2.lib.Interns; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Mock metrics collector for this test only. + * This MetricsCollector implementation is used to get the actual + * MetricsSource data, collected by the Review comment: I guess it does not matter this case, but done it anyway since I doing a new patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] sankarh opened a new pull request #655: HIVE-21811: Load data into partitioned table throws NPE if DB is enabled for replication.
sankarh opened a new pull request #655: HIVE-21811: Load data into partitioned table throws NPE if DB is enabled for replication. URL: https://github.com/apache/hive/pull/655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289282673 ## File path: hive-site.xml ## @@ -0,0 +1,358 @@ + Review comment: Added by mistake. Removed. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics
pvary commented on a change in pull request #633: HIVE-21740: Collect LLAP execution latency metrics URL: https://github.com/apache/hive/pull/633#discussion_r289282184 ## File path: llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/metrics/LlapTaskSchedulerMetrics.java ## @@ -254,6 +272,14 @@ public void setWmUnusedGuaranteed(int unusedGuaranteed) { wmUnusedGuaranteedCount.set(unusedGuaranteed); } + public void addTaskLatency(String key, long value) { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org