[jira] [Updated] (HIVE-19247) StatsOptimizer: Missing stats fast-path for Date

2018-04-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19247:
---
Description: 
{code}
2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
(StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
metadata optimizer for column : jour
{code}

{code}
if (udaf instanceof GenericUDAFMin) {
ExprNodeColumnDesc colDesc = 
(ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
String colName = colDesc.getColumn();
StatType type = getType(colDesc.getTypeString());
if (!tbl.isPartitioned()) {
  if (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), 
colName)) {
Logger.debug("Stats for table : " + tbl.getTableName() + " 
column " + colName
+ " are not up to date.");
return null;
  }
  ColumnStatisticsData statData = 
hive.getMSC().getTableColumnStatistics(
  tbl.getDbName(), tbl.getTableName(), 
Lists.newArrayList(colName))
  .get(0).getStatsData();
  String name = colDesc.getTypeString().toUpperCase();
  switch (type) {
case Integeral: {
  LongSubType subType = LongSubType.valueOf(name);
  LongColumnStatsData lstats = statData.getLongStats();
  if (lstats.isSetLowValue()) {
oneRow.add(subType.cast(lstats.getLowValue()));
  } else {
oneRow.add(null);
  }
  break;
}
case Double: {
  DoubleSubType subType = DoubleSubType.valueOf(name);
  DoubleColumnStatsData dstats = statData.getDoubleStats();
  if (dstats.isSetLowValue()) {
oneRow.add(subType.cast(dstats.getLowValue()));
  } else {
oneRow.add(null);
  }
  break;
}
default: // unsupported type
  Logger.debug("Unsupported type: " + colDesc.getTypeString() + 
" encountered in " +
  "metadata optimizer for column : " + colName);
  return null;
  }
}
{code}

{code}
enum StatType{
  Integeral,
  Double,
  String,
  Boolean,
  Binary,
  Unsupported
}

enum LongSubType {
  BIGINT { @Override
  Object cast(long longValue) { return longValue; } },
  INT { @Override
  Object cast(long longValue) { return (int)longValue; } },
  SMALLINT { @Override
  Object cast(long longValue) { return (short)longValue; } },
  TINYINT { @Override
  Object cast(long longValue) { return (byte)longValue; } };

  abstract Object cast(long longValue);
}
{code}

Date is stored in stats (& also the typo there).

  was:
{code}
2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
(StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
metadata optimizer for column : jour
{code}

{code}
if (udaf instanceof GenericUDAFMin) {
ExprNodeColumnDesc colDesc = 
(ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
String colName = colDesc.getColumn();
StatType type = getType(colDesc.getTypeString());
if (!tbl.isPartitioned()) {
  if (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), 
colName)) {
Logger.debug("Stats for table : " + tbl.getTableName() + " 
column " + colName
+ " are not up to date.");
return null;
  }
  ColumnStatisticsData statData = 
hive.getMSC().getTableColumnStatistics(
  tbl.getDbName(), tbl.getTableName(), 
Lists.newArrayList(colName))
  .get(0).getStatsData();
  String name = colDesc.getTypeString().toUpperCase();
  switch (type) {
case Integeral: {
  LongSubType subType = LongSubType.valueOf(name);
  LongColumnStatsData lstats = statData.getLongStats();
  if (lstats.isSetLowValue()) {
oneRow.add(subType.cast(lstats.getLowValue()));
  } else {
oneRow.add(null);
  }
  break;
}
case Double: {
  DoubleSubType subType = DoubleSubType.valueOf(name);
  DoubleColumnStatsData dstats = statData.getDoubleStats();
  if (dstats.isSetLowValue()) {
oneRow.add(subType.c

[jira] [Updated] (HIVE-19247) StatsOptimizer: Missing stats fast-path for Date

2018-04-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19247:
---
Attachment: HIVE-19247.1.patch

> StatsOptimizer: Missing stats fast-path for Date
> 
>
> Key: HIVE-19247
> URL: https://issues.apache.org/jira/browse/HIVE-19247
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0, 3.0.0, 2.3.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19247.1.patch
>
>
> {code}
> 2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
> HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
> (StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
> metadata optimizer for column : jour
> {code}
> {code}
> if (udaf instanceof GenericUDAFMin) {
> ExprNodeColumnDesc colDesc = 
> (ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
> String colName = colDesc.getColumn();
> StatType type = getType(colDesc.getTypeString());
> if (!tbl.isPartitioned()) {
>   if 
> (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), colName)) {
> Logger.debug("Stats for table : " + tbl.getTableName() + " 
> column " + colName
> + " are not up to date.");
> return null;
>   }
>   ColumnStatisticsData statData = 
> hive.getMSC().getTableColumnStatistics(
>   tbl.getDbName(), tbl.getTableName(), 
> Lists.newArrayList(colName))
>   .get(0).getStatsData();
>   String name = colDesc.getTypeString().toUpperCase();
>   switch (type) {
> case Integeral: {
>   LongSubType subType = LongSubType.valueOf(name);
>   LongColumnStatsData lstats = statData.getLongStats();
>   if (lstats.isSetLowValue()) {
> oneRow.add(subType.cast(lstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> case Double: {
>   DoubleSubType subType = DoubleSubType.valueOf(name);
>   DoubleColumnStatsData dstats = statData.getDoubleStats();
>   if (dstats.isSetLowValue()) {
> oneRow.add(subType.cast(dstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> default: // unsupported type
>   Logger.debug("Unsupported type: " + colDesc.getTypeString() 
> + " encountered in " +
>   "metadata optimizer for column : " + colName);
>   return null;
>   }
> }
> {code}
> {code}
> enum StatType{
>   Integeral,
>   Double,
>   String,
>   Boolean,
>   Binary,
>   Unsupported
> }
> enum LongSubType {
>   BIGINT { @Override
>   Object cast(long longValue) { return longValue; } },
>   INT { @Override
>   Object cast(long longValue) { return (int)longValue; } },
>   SMALLINT { @Override
>   Object cast(long longValue) { return (short)longValue; } },
>   TINYINT { @Override
>   Object cast(long longValue) { return (byte)longValue; } };
>   abstract Object cast(long longValue);
> }
> {code}
> Date is stored in stats (& also the typo there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19247) StatsOptimizer: Missing stats fast-path for Date

2018-04-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19247:
---
Status: Patch Available  (was: Open)

> StatsOptimizer: Missing stats fast-path for Date
> 
>
> Key: HIVE-19247
> URL: https://issues.apache.org/jira/browse/HIVE-19247
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.3.2, 2.2.0, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19247.1.patch
>
>
> {code}
> 2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
> HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
> (StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
> metadata optimizer for column : jour
> {code}
> {code}
> if (udaf instanceof GenericUDAFMin) {
> ExprNodeColumnDesc colDesc = 
> (ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
> String colName = colDesc.getColumn();
> StatType type = getType(colDesc.getTypeString());
> if (!tbl.isPartitioned()) {
>   if 
> (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), colName)) {
> Logger.debug("Stats for table : " + tbl.getTableName() + " 
> column " + colName
> + " are not up to date.");
> return null;
>   }
>   ColumnStatisticsData statData = 
> hive.getMSC().getTableColumnStatistics(
>   tbl.getDbName(), tbl.getTableName(), 
> Lists.newArrayList(colName))
>   .get(0).getStatsData();
>   String name = colDesc.getTypeString().toUpperCase();
>   switch (type) {
> case Integeral: {
>   LongSubType subType = LongSubType.valueOf(name);
>   LongColumnStatsData lstats = statData.getLongStats();
>   if (lstats.isSetLowValue()) {
> oneRow.add(subType.cast(lstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> case Double: {
>   DoubleSubType subType = DoubleSubType.valueOf(name);
>   DoubleColumnStatsData dstats = statData.getDoubleStats();
>   if (dstats.isSetLowValue()) {
> oneRow.add(subType.cast(dstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> default: // unsupported type
>   Logger.debug("Unsupported type: " + colDesc.getTypeString() 
> + " encountered in " +
>   "metadata optimizer for column : " + colName);
>   return null;
>   }
> }
> {code}
> {code}
> enum StatType{
>   Integeral,
>   Double,
>   String,
>   Boolean,
>   Binary,
>   Unsupported
> }
> enum LongSubType {
>   BIGINT { @Override
>   Object cast(long longValue) { return longValue; } },
>   INT { @Override
>   Object cast(long longValue) { return (int)longValue; } },
>   SMALLINT { @Override
>   Object cast(long longValue) { return (short)longValue; } },
>   TINYINT { @Override
>   Object cast(long longValue) { return (byte)longValue; } };
>   abstract Object cast(long longValue);
> }
> {code}
> Date is stored in stats (& also the typo there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19247) StatsOptimizer: Missing stats fast-path for Date

2018-04-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19247:
---
Summary: StatsOptimizer: Missing stats fast-path for Date  (was: 
StatsOptimizer: Missing stats fast-path for Date/Timestamp)

> StatsOptimizer: Missing stats fast-path for Date
> 
>
> Key: HIVE-19247
> URL: https://issues.apache.org/jira/browse/HIVE-19247
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0, 3.0.0, 2.3.2
>Reporter: Gopal V
>Priority: Major
> Attachments: HIVE-19247.1.patch
>
>
> {code}
> 2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
> HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
> (StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
> metadata optimizer for column : jour
> {code}
> {code}
> if (udaf instanceof GenericUDAFMin) {
> ExprNodeColumnDesc colDesc = 
> (ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
> String colName = colDesc.getColumn();
> StatType type = getType(colDesc.getTypeString());
> if (!tbl.isPartitioned()) {
>   if 
> (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), colName)) {
> Logger.debug("Stats for table : " + tbl.getTableName() + " 
> column " + colName
> + " are not up to date.");
> return null;
>   }
>   ColumnStatisticsData statData = 
> hive.getMSC().getTableColumnStatistics(
>   tbl.getDbName(), tbl.getTableName(), 
> Lists.newArrayList(colName))
>   .get(0).getStatsData();
>   String name = colDesc.getTypeString().toUpperCase();
>   switch (type) {
> case Integeral: {
>   LongSubType subType = LongSubType.valueOf(name);
>   LongColumnStatsData lstats = statData.getLongStats();
>   if (lstats.isSetLowValue()) {
> oneRow.add(subType.cast(lstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> case Double: {
>   DoubleSubType subType = DoubleSubType.valueOf(name);
>   DoubleColumnStatsData dstats = statData.getDoubleStats();
>   if (dstats.isSetLowValue()) {
> oneRow.add(subType.cast(dstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> default: // unsupported type
>   Logger.debug("Unsupported type: " + colDesc.getTypeString() 
> + " encountered in " +
>   "metadata optimizer for column : " + colName);
>   return null;
>   }
> }
> {code}
> {code}
> enum StatType{
>   Integeral,
>   Double,
>   String,
>   Boolean,
>   Binary,
>   Unsupported
> }
> enum LongSubType {
>   BIGINT { @Override
>   Object cast(long longValue) { return longValue; } },
>   INT { @Override
>   Object cast(long longValue) { return (int)longValue; } },
>   SMALLINT { @Override
>   Object cast(long longValue) { return (short)longValue; } },
>   TINYINT { @Override
>   Object cast(long longValue) { return (byte)longValue; } };
>   abstract Object cast(long longValue);
> }
> {code}
> Date/Timestamp are stored as Integral stats (& also the typo there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19247) StatsOptimizer: Missing stats fast-path for Date

2018-04-23 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-19247:
--

Assignee: Gopal V

> StatsOptimizer: Missing stats fast-path for Date
> 
>
> Key: HIVE-19247
> URL: https://issues.apache.org/jira/browse/HIVE-19247
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0, 3.0.0, 2.3.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19247.1.patch
>
>
> {code}
> 2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
> HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
> (StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
> metadata optimizer for column : jour
> {code}
> {code}
> if (udaf instanceof GenericUDAFMin) {
> ExprNodeColumnDesc colDesc = 
> (ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
> String colName = colDesc.getColumn();
> StatType type = getType(colDesc.getTypeString());
> if (!tbl.isPartitioned()) {
>   if 
> (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), colName)) {
> Logger.debug("Stats for table : " + tbl.getTableName() + " 
> column " + colName
> + " are not up to date.");
> return null;
>   }
>   ColumnStatisticsData statData = 
> hive.getMSC().getTableColumnStatistics(
>   tbl.getDbName(), tbl.getTableName(), 
> Lists.newArrayList(colName))
>   .get(0).getStatsData();
>   String name = colDesc.getTypeString().toUpperCase();
>   switch (type) {
> case Integeral: {
>   LongSubType subType = LongSubType.valueOf(name);
>   LongColumnStatsData lstats = statData.getLongStats();
>   if (lstats.isSetLowValue()) {
> oneRow.add(subType.cast(lstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> case Double: {
>   DoubleSubType subType = DoubleSubType.valueOf(name);
>   DoubleColumnStatsData dstats = statData.getDoubleStats();
>   if (dstats.isSetLowValue()) {
> oneRow.add(subType.cast(dstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> default: // unsupported type
>   Logger.debug("Unsupported type: " + colDesc.getTypeString() 
> + " encountered in " +
>   "metadata optimizer for column : " + colName);
>   return null;
>   }
> }
> {code}
> {code}
> enum StatType{
>   Integeral,
>   Double,
>   String,
>   Boolean,
>   Binary,
>   Unsupported
> }
> enum LongSubType {
>   BIGINT { @Override
>   Object cast(long longValue) { return longValue; } },
>   INT { @Override
>   Object cast(long longValue) { return (int)longValue; } },
>   SMALLINT { @Override
>   Object cast(long longValue) { return (short)longValue; } },
>   TINYINT { @Override
>   Object cast(long longValue) { return (byte)longValue; } };
>   abstract Object cast(long longValue);
> }
> {code}
> Date/Timestamp are stored as Integral stats (& also the typo there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19283) Select count(distinct()) a couple of times stuck in last reducer

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450041#comment-16450041
 ] 

Gopal V commented on HIVE-19283:


This was fixed sometime during hive-3.x

https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/tez/multi_count_distinct.q.out

> Select count(distinct()) a couple of times stuck in last reducer
> 
>
> Key: HIVE-19283
> URL: https://issues.apache.org/jira/browse/HIVE-19283
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.1.1
>Reporter: Goun Na
>Assignee: Ashutosh Chauhan
>Priority: Major
>
>  Distinct count query performance is significantly improved due to 
> HIVE-10568. 
> {code:java}
> select count(distinct elevenst_id)
> from 11st.log_table
> where part_dt between '20180101' and '20180131'{code}
>  
> However, some queries with several distinct counts are still slow. It starts 
> with multiple mappers, but stuck in the last one reducer. 
> {code:java}
> select 
>   count(distinct elevenst_id)
> , count(distinct member_id)
> , count(distinct user_id)
> , count(distinct action_id)
> , count(distinct other_id)
>  from 11st.log_table
> where part_dt between '20180101' and '20180131'{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19247) StatsOptimizer: Missing stats fast-path for Date

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450218#comment-16450218
 ] 

Gopal V commented on HIVE-19247:


Failures are unrelated, but two of failed the tests failed to startup the 
CliDriver fully.

{code}
java.lang.AssertionError: Failed during initFromDatasets processLine with code=2
at org.junit.Assert.fail(Assert.java:88)
at org.apache.hadoop.hive.ql.QTestUtil.initDataset(QTestUtil.java:1227)
at 
org.apache.hadoop.hive.ql.QTestUtil.initDataSetForTest(QTestUtil.java:1207)
at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:1275)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:176)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:5
{code}

> StatsOptimizer: Missing stats fast-path for Date
> 
>
> Key: HIVE-19247
> URL: https://issues.apache.org/jira/browse/HIVE-19247
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.2.0, 3.0.0, 2.3.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19247.1.patch
>
>
> {code}
> 2018-04-19T18:57:24,268 DEBUG [67259108-c184-4c92-9e18-9e296922 
> HiveServer2-Handler-Pool: Thread-73]: optimizer.StatsOptimizer 
> (StatsOptimizer.java:process(614)) - Unsupported type: date encountered in 
> metadata optimizer for column : jour
> {code}
> {code}
> if (udaf instanceof GenericUDAFMin) {
> ExprNodeColumnDesc colDesc = 
> (ExprNodeColumnDesc)exprMap.get(((ExprNodeColumnDesc)aggr.getParameters().get(0)).getColumn());
> String colName = colDesc.getColumn();
> StatType type = getType(colDesc.getTypeString());
> if (!tbl.isPartitioned()) {
>   if 
> (!StatsSetupConst.areColumnStatsUptoDate(tbl.getParameters(), colName)) {
> Logger.debug("Stats for table : " + tbl.getTableName() + " 
> column " + colName
> + " are not up to date.");
> return null;
>   }
>   ColumnStatisticsData statData = 
> hive.getMSC().getTableColumnStatistics(
>   tbl.getDbName(), tbl.getTableName(), 
> Lists.newArrayList(colName))
>   .get(0).getStatsData();
>   String name = colDesc.getTypeString().toUpperCase();
>   switch (type) {
> case Integeral: {
>   LongSubType subType = LongSubType.valueOf(name);
>   LongColumnStatsData lstats = statData.getLongStats();
>   if (lstats.isSetLowValue()) {
> oneRow.add(subType.cast(lstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> case Double: {
>   DoubleSubType subType = DoubleSubType.valueOf(name);
>   DoubleColumnStatsData dstats = statData.getDoubleStats();
>   if (dstats.isSetLowValue()) {
> oneRow.add(subType.cast(dstats.getLowValue()));
>   } else {
> oneRow.add(null);
>   }
>   break;
> }
> default: // unsupported type
>   Logger.debug("Unsupported type: " + colDesc.getTypeString() 
> + " encountered in " +
>   "metadata optimizer for column : " + colName);
>   return null;
>   }
> }
> {code}
> {code}
> enum StatType{
>   Integeral,
>   Double,
>   String,
>   Boolean,
>   Binary,
>   Unsupported
> }
> enum LongSubType {
>   BIGINT { @Override
>   Object cast(long longValue) { return longValue; } },
>   INT { @Override
>   Object cast(long longValue) { return (int)longValue; } },
>   SMALLINT { @Override
>   Object cast(long longValue) { return (short)longValue; } },
>   TINYINT { @Override
>   Object cast(long longValue) { return (byte)longValue; } };
>   abstract Object cast(long longValue);
> }
> {code}
> Date is stored in stats (& also the typo there).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450358#comment-16450358
 ] 

Gopal V commented on HIVE-19124:


LGTM - +1 

This ticket opens up an interesting question about the way metastore fires off 
compactor jobs (i.e the metastore has to have access to a YARN cluster & it 
needs to submit it to a specific queue etc).

> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.01.patch, HIVE-19124.02.patch, 
> HIVE-19124.03.patch, HIVE-19124.03.patch, HIVE-19124.04.patch, 
> HIVE-19124.05.patch, HIVE-19124.06.patch, HIVE-19124.07.patch, 
> HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450379#comment-16450379
 ] 

Gopal V commented on HIVE-19124:


Yes, this is true even today, but it will move away with standalone-metastore.

I just realized there's one dropped issue in the code from [~ekoifman] to 
address.

TxnUtils.updateForCompactionQuery is yet another codepath, which we shouldn't 
need - the TxnUtils.createValidCompactWriteIdList goes further ahead than that.

Basically if low-water mark is x and x+1, x+2 etc are aborted, your impl only 
compactors till x.

> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.01.patch, HIVE-19124.02.patch, 
> HIVE-19124.03.patch, HIVE-19124.03.patch, HIVE-19124.04.patch, 
> HIVE-19124.05.patch, HIVE-19124.06.patch, HIVE-19124.07.patch, 
> HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450388#comment-16450388
 ] 

Gopal V commented on HIVE-19124:


You need to modify the TxnUtils to accept a ValidReaderWriteIdList and rewrite 
the current impl as 

{code}
return 
createValidCompactWriteIdList(createValidReaderWriteIdList(tableWriteIds));
{code}

And then you don't need a new metastore object.

ValidReaderWriteIdList & ValidCompactorWriteIdList extends 
ValidReaderWriteIdList

You need a ValidReaderWriteIdList.get to pass into compactionWriteIds.

> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.01.patch, HIVE-19124.02.patch, 
> HIVE-19124.03.patch, HIVE-19124.03.patch, HIVE-19124.04.patch, 
> HIVE-19124.05.patch, HIVE-19124.06.patch, HIVE-19124.07.patch, 
> HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19124) implement a basic major compactor for MM tables

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450409#comment-16450409
 ] 

Gopal V commented on HIVE-19124:


bq. I'll take a look into the root canal thru the ear variant today, for now.

Thanks, I'll hold my +1 back for that patch.

> implement a basic major compactor for MM tables
> ---
>
> Key: HIVE-19124
> URL: https://issues.apache.org/jira/browse/HIVE-19124
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: mm-gap-2
> Attachments: HIVE-19124.01.patch, HIVE-19124.02.patch, 
> HIVE-19124.03.patch, HIVE-19124.03.patch, HIVE-19124.04.patch, 
> HIVE-19124.05.patch, HIVE-19124.06.patch, HIVE-19124.07.patch, 
> HIVE-19124.patch
>
>
> For now, it will run a query directly and only major compactions will be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19293) Turn on hive.optimize.index.filter

2018-04-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451517#comment-16451517
 ] 

Gopal V commented on HIVE-19293:


Now that the "CREATE INDEX" can no longer return incorrect results in queries, 
we should enable that flag=true, because that now only turns on the ORC/Parquet 
index filters only.

> Turn on hive.optimize.index.filter
> --
>
> Key: HIVE-19293
> URL: https://issues.apache.org/jira/browse/HIVE-19293
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
>
> HIVE-18448 has turned this off. This could cause performance regression. This 
> should be turned on by default



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-19360:
--

Assignee: Gopal V

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>   (SELECT t18.c_customer_id $f0,
>  

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: HIVE-19360.1.patch

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>   (SELECT t18.c_customer_id $f

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Status: Patch Available  (was: Open)

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>   (SELECT t18.c_customer

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: HIVE-19360.1.patch

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>   (SELECT t18.c_customer_id $f

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: (was: HIVE-19360.1.patch)

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>   (SELECT t18.c_cus

[jira] [Commented] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458788#comment-16458788
 ] 

Gopal V commented on HIVE-19360:


The error is very odd - since there is no instance of HiveProtoLoggingHook in 
the entire hive codebase.

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
>

[jira] [Comment Edited] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458788#comment-16458788
 ] 

Gopal V edited comment on HIVE-19360 at 4/30/18 5:42 PM:
-

-The error is very odd - since there is no instance of HiveProtoLoggingHook in 
the entire hive codebase.-

Never mind, I need to rebase


was (Author: gopalv):
The error is very odd - since there is no instance of HiveProtoLoggingHook in 
the entire hive codebase.

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
>   

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: HIVE-19360.2.patch

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch, HIVE-19360.2.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>   (SELECT 

[jira] [Commented] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459141#comment-16459141
 ] 

Gopal V commented on HIVE-19360:


[~rajesh.balamohan]: depends on the 3.x JDBC storage handler implementation (as 
you can see this is not a lot of code).

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch, HIVE-19360.2.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
>  

[jira] [Commented] (HIVE-19362) enable LLAP cache affinity by default

2018-04-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459168#comment-16459168
 ] 

Gopal V commented on HIVE-19362:


LGTM +1

Linked to HIVE-18665 which prevented this config from kicking in when LLAP is 
disabled.

> enable LLAP cache affinity by default
> -
>
> Key: HIVE-19362
> URL: https://issues.apache.org/jira/browse/HIVE-19362
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19362.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: HIVE-19360.3.patch

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch, HIVE-19360.2.patch, 
> HIVE-19360.3.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: (was: HIVE-19360.3.patch)

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch, HIVE-19360.2.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>

[jira] [Updated] (HIVE-19360) CBO: Add an "optimizedSQL" to QueryPlan object

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19360:
---
Attachment: HIVE-19360.3.patch

> CBO: Add an "optimizedSQL" to QueryPlan object 
> ---
>
> Key: HIVE-19360
> URL: https://issues.apache.org/jira/browse/HIVE-19360
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Diagnosability
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19360.1.patch, HIVE-19360.2.patch, 
> HIVE-19360.3.patch
>
>
> Calcite RelNodes can be converted back into SQL (as the new JDBC storage 
> handler does), which allows Hive to print out the post CBO plan as a SQL 
> query instead of having to guess the join orders from the subsequent Tez plan.
> The query generated might not be always valid SQL at this point, but is a 
> world ahead of DAG plans in readability.
> Eg. tpc-ds Query4 CTEs gets expanded to
> {code}
> SELECT t16.$f3 customer_preferred_cust_flag
> FROM
>   (SELECT t0.c_customer_id $f0,
>SUM((t2.ws_ext_list_price - 
> t2.ws_ext_wholesale_cost - t2.ws_ext_discount_amt + t2.ws_ext_sales_price) / 
> CAST(2 AS DECIMAL(10, 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t0
>INNER JOIN (
>  (SELECT ws_sold_date_sk,
>  ws_bill_customer_sk,
>  ws_ext_discount_amt,
>  ws_ext_sales_price,
>  ws_ext_wholesale_cost,
>  ws_ext_list_price
>   FROM default.web_sales
>   WHERE ws_bill_customer_sk IS NOT NULL
> AND ws_sold_date_sk IS NOT NULL) t2
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t4 ON t2.ws_sold_date_sk = 
> t4.d_date_sk) ON t0.c_customer_sk = t2.ws_bill_customer_sk
>GROUP BY t0.c_customer_id,
> t0.c_first_name,
> t0.c_last_name,
> t0.c_preferred_cust_flag,
> t0.c_birth_country,
> t0.c_login,
> t0.c_email_address) t7
> INNER JOIN (
>   (SELECT t9.c_customer_id $f0,
>t9.c_preferred_cust_flag $f3,
> 
> SUM((t11.ss_ext_list_price - t11.ss_ext_wholesale_cost - 
> t11.ss_ext_discount_amt + t11.ss_ext_sales_price) / CAST(2 AS DECIMAL(10, 
> 0))) $f8
>FROM
>  (SELECT c_customer_sk,
>  c_customer_id,
>  c_first_name,
>  c_last_name,
>  c_preferred_cust_flag,
>  c_birth_country,
>  c_login,
>  c_email_address
>   FROM default.customer
>   WHERE c_customer_sk IS NOT NULL
> AND c_customer_id IS NOT NULL) t9
>INNER JOIN (
>  (SELECT ss_sold_date_sk,
>  ss_customer_sk,
>  ss_ext_discount_amt,
>  ss_ext_sales_price,
>  ss_ext_wholesale_cost,
>  ss_ext_list_price
>   FROM default.store_sales
>   WHERE ss_customer_sk IS NOT NULL
> AND ss_sold_date_sk IS NOT NULL) t11
>INNER JOIN
>  (SELECT d_date_sk,
>  CAST(2002 AS INTEGER) d_year
>   FROM default.date_dim
>   WHERE d_year = 2002
> AND d_date_sk IS NOT NULL) t13 ON 
> t11.ss_sold_date_sk = t13.d_date_sk) ON t9.c_customer_sk = t11.ss_customer_sk
>GROUP BY t9.c_customer_id,
> t9.c_first_name,
> t9.c_last_name,
> t9.c_preferred_cust_flag,
> t9.c_birth_country,
> t9.c_login,
> t9.c_email_address) t16
> INNER JOIN (
>

[jira] [Commented] (HIVE-19206) Automatic memory management for open streaming writers

2018-04-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459233#comment-16459233
 ] 

Gopal V commented on HIVE-19206:


LGTM - +1

Most of the noise in this patch is from the single change in 

{code}
-  String getTable();
+  Table getTable();
{code}



> Automatic memory management for open streaming writers
> --
>
> Key: HIVE-19206
> URL: https://issues.apache.org/jira/browse/HIVE-19206
> Project: Hive
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-19206.1.patch, HIVE-19206.2.patch
>
>
> Problem:
>  When there are 100s of record updaters open, the amount of memory required 
> by orc writers keeps growing because of ORC's internal buffers. This can lead 
> to potential high GC or OOM during streaming ingest.
> Solution:
>  The high level idea is for the streaming connection to remember all the open 
> record updaters and flush the record updater periodically (at some interval). 
> Records written to each record updater can be used as a metric to determine 
> the candidate record updaters for flushing. 
>  If stripe size of orc file is 64MB, the default memory management check 
> happens only after every 5000 rows which may which may be too late when there 
> are too many concurrent writers in a process. Example case would be 100 
> writers open and each of them have almost full stripe of 64MB buffered data, 
> this would take 100*64MB ~=6GB of memory. When all of the record writers 
> flush, the memory usage drops down to 100*~2MB which is just ~200MB memory 
> usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19368) Metastore: log a warning with table-name + partition-count when get_partitions returns >10k partitions

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19368:
---
Affects Version/s: 3.1.0

> Metastore: log a warning with table-name + partition-count when 
> get_partitions returns >10k partitions
> --
>
> Key: HIVE-19368
> URL: https://issues.apache.org/jira/browse/HIVE-19368
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Priority: Major
>
> Ran into this particular letter from the trenches & would like a normal WARN 
> log for it.
> https://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-trenches/24



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19368) Metastore: log a warning with table-name + partition-count when get_partitions returns >10k partitions

2018-04-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19368:
---
Component/s: Standalone Metastore

> Metastore: log a warning with table-name + partition-count when 
> get_partitions returns >10k partitions
> --
>
> Key: HIVE-19368
> URL: https://issues.apache.org/jira/browse/HIVE-19368
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.0
>Reporter: Gopal V
>Priority: Major
>
> Ran into this particular letter from the trenches & would like a normal WARN 
> log for it.
> https://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-trenches/24



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18570) ACID IOW implemented using base may delete too much data

2018-04-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459367#comment-16459367
 ] 

Gopal V commented on HIVE-18570:


The other optimistic alternative is for the "insert into" to fail, rather than 
block readers - very few people do "insert into" + "insert overwrite" 
concurrently, but a lot more would do "insert overwrite" + "select" 
concurrently.

> ACID IOW implemented using base may delete too much data
> 
>
> Key: HIVE-18570
> URL: https://issues.apache.org/jira/browse/HIVE-18570
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-18570.01.patch
>
>
> Suppose we have a table with delta_0 insert data.
> Txn 1 starts an insert into delta_1.
> Txn 2 starts an IOW into base_2.
> Txn 2 commits.
> Txn 1 commits after txn 2 but its results would be invisible.
> Txn 2 deletes rows committed by txn 1 that according to standard ACID 
> semantics it could have never observed and affected; this sequence of events 
> is only possible under read-uncommitted isolation level (so, 2 deletes rows 
> written by 1 before 1 commits them). 
> This is if we look at IOW as transactional delete+insert. Otherwise we are 
> just saying IOW performs "semi"-transactional delete.
> If 1 ran an update on rows instead of an insert, and 2 still ran an 
> IOW/delete, row lock conflict (or equivalent) should cause one of them to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19369) Locks: Add new lock implementations for always zero-wait readers

2018-05-01 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459865#comment-16459865
 ] 

Gopal V commented on HIVE-19369:


Yes, despite the big description above - that new lock is 90% of this JIRA 
adds, rest of this is just documentation 

Here's the current LockType

  SHARED_READ(1),
  SHARED_WRITE(2),
  EXCLUSIVE(3);

We already have 3 of the above asks, just needs to add a new one (the one which 
you mentioned) 

SHARED_READ
SHARED_WRITE
EXCLUSIVE | EXCL_DROP
+ EXCL_WRITE 

And the only other behaviour change when EXCLUSIVE gets more granularity is 
that the EXCL_DROP is no-wait for the readers, while current EXCLUSIVE is a 
wait for the readers as well.

> Locks: Add new lock implementations for always zero-wait readers
> 
>
> Key: HIVE-19369
> URL: https://issues.apache.org/jira/browse/HIVE-19369
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gopal V
>Priority: Major
>
> Hive Locking with Micro-managed and full-ACID tables needs a better locking 
> implementation which allows for no-wait readers always.
> EXCL_DROP
> EXCL_WRITE
> SHARED_WRITE
> SHARED_READ
> Short write-up
> EXCL_DROP is a "drop partition" or "drop table" and waits for all others to 
> exit
> EXCL_WRITE excludes all writes and will wait for all existing SHARED_WRITE to 
> exit.
> SHARED_WRITE allows all SHARED_WRITES to go through, but will wait for an 
> EXCL_WRITE & EXCL_DROP (waiting so that you can do drop + insert in different 
> threads).
> SHARED_READ does not wait for any lock - it fails fast for a pending 
> EXCL_DROP, because even if there is an EXCL_WRITE or SHARED_WRITE pending, 
> there's no semantic reason to wait for them to succeed before going ahead 
> with a SHARED_WRITE.
> a select * => SHARED_READ
> an insert into => SHARED_WRITE
> an insert overwrite or MERGE => EXCL_WRITE
> a drop table => EXCL_DROP
> TODO:
> The fate of the compactor needs to be added to this before it is a complete 
> description.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19376) Statistics: switch to 10bit HLL by default for Hive

2018-05-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19376:
---
Status: Patch Available  (was: Open)

> Statistics: switch to 10bit HLL by default for Hive
> ---
>
> Key: HIVE-19376
> URL: https://issues.apache.org/jira/browse/HIVE-19376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Gopal V
>Priority: Major
> Attachments: HIVE-19376.1.patch
>
>
> This reduces the memory usage for the metastore cache and the size of 
> bit-vectors in the DB by 16x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19376) Statistics: switch to 10bit HLL by default for Hive

2018-05-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19376:
---
Attachment: HIVE-19376.1.patch

> Statistics: switch to 10bit HLL by default for Hive
> ---
>
> Key: HIVE-19376
> URL: https://issues.apache.org/jira/browse/HIVE-19376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Gopal V
>Priority: Major
> Attachments: HIVE-19376.1.patch
>
>
> This reduces the memory usage for the metastore cache and the size of 
> bit-vectors in the DB by 16x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19376) Statistics: switch to 10bit HLL by default for Hive

2018-05-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-19376:
--

Assignee: Gopal V

> Statistics: switch to 10bit HLL by default for Hive
> ---
>
> Key: HIVE-19376
> URL: https://issues.apache.org/jira/browse/HIVE-19376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19376.1.patch
>
>
> This reduces the memory usage for the metastore cache and the size of 
> bit-vectors in the DB by 16x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19369) Locks: Add new lock implementations for always zero-wait readers

2018-05-01 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459938#comment-16459938
 ] 

Gopal V commented on HIVE-19369:


That means the 2nd part of that is a codepath which already exists (i.e 
drop_excl -> fail readers), which further reduces the code that is necessary 
for this ticket.

> Locks: Add new lock implementations for always zero-wait readers
> 
>
> Key: HIVE-19369
> URL: https://issues.apache.org/jira/browse/HIVE-19369
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gopal V
>Priority: Major
>
> Hive Locking with Micro-managed and full-ACID tables needs a better locking 
> implementation which allows for no-wait readers always.
> EXCL_DROP
> EXCL_WRITE
> SHARED_WRITE
> SHARED_READ
> Short write-up
> EXCL_DROP is a "drop partition" or "drop table" and waits for all others to 
> exit
> EXCL_WRITE excludes all writes and will wait for all existing SHARED_WRITE to 
> exit.
> SHARED_WRITE allows all SHARED_WRITES to go through, but will wait for an 
> EXCL_WRITE & EXCL_DROP (waiting so that you can do drop + insert in different 
> threads).
> SHARED_READ does not wait for any lock - it fails fast for a pending 
> EXCL_DROP, because even if there is an EXCL_WRITE or SHARED_WRITE pending, 
> there's no semantic reason to wait for them to succeed before going ahead 
> with a SHARED_WRITE.
> a select * => SHARED_READ
> an insert into => SHARED_WRITE
> an insert overwrite or MERGE => EXCL_WRITE
> a drop table => EXCL_DROP
> TODO:
> The fate of the compactor needs to be added to this before it is a complete 
> description.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19376) Statistics: switch to 10bit HLL by default for Hive

2018-05-01 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460146#comment-16460146
 ] 

Gopal V commented on HIVE-19376:


This change already switches the metastore storage to 1kb instead of 16kb.

The parent task right above this JIRA is the one which adds backwards compat 
for existing metastores which store the 14bit data.

> Statistics: switch to 10bit HLL by default for Hive
> ---
>
> Key: HIVE-19376
> URL: https://issues.apache.org/jira/browse/HIVE-19376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-19376.1.patch
>
>
> This reduces the memory usage for the metastore cache and the size of 
> bit-vectors in the DB by 16x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19310) Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might need to be run only in test env

2018-05-02 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461520#comment-16461520
 ] 

Gopal V commented on HIVE-19310:


LGTM - +1


> Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might 
> need to be run only in test env
> -
>
> Key: HIVE-19310
> URL: https://issues.apache.org/jira/browse/HIVE-19310
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19310.02.patch, HIVE-19310.1.patch
>
>
> MetaStoreDirectSql.ensureDbInit has the following 2 calls which we have 
> observed taking a long time in our testing:
> {code}
> initQueries.add(pm.newQuery(MNotificationLog.class, "dbName == ''"));
> initQueries.add(pm.newQuery(MNotificationNextId.class, "nextEventId < -1"));
> {code}
> In a production environment, these tables should be initialized using 
> schematool, however in a test environment, these calls might be needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19310) Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might need to be run only in test env

2018-05-02 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461525#comment-16461525
 ] 

Gopal V commented on HIVE-19310:


The init query regressed when the BTree on PART_COL_STATS went from (DB_NAME 
...) -> (CAT_NAME, DB_NAME)

The test query went from 3ms -> 4.5 seconds. After this fix, it has improved to 
0ms :)



> Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might 
> need to be run only in test env
> -
>
> Key: HIVE-19310
> URL: https://issues.apache.org/jira/browse/HIVE-19310
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19310.02.patch, HIVE-19310.1.patch
>
>
> MetaStoreDirectSql.ensureDbInit has the following 2 calls which we have 
> observed taking a long time in our testing:
> {code}
> initQueries.add(pm.newQuery(MNotificationLog.class, "dbName == ''"));
> initQueries.add(pm.newQuery(MNotificationNextId.class, "nextEventId < -1"));
> {code}
> In a production environment, these tables should be initialized using 
> schematool, however in a test environment, these calls might be needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19041) Thrift deserialization of Partition objects should intern fields

2018-05-03 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463233#comment-16463233
 ] 

Gopal V commented on HIVE-19041:


Hadoop comes with a weak-interner, which is used by Tez.

StringInterner.weakIntern()


> Thrift deserialization of Partition objects should intern fields
> 
>
> Key: HIVE-19041
> URL: https://issues.apache.org/jira/browse/HIVE-19041
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Attachments: HIVE-19041.01.patch
>
>
> When a client is creating large number of partitions, the thrift objects are 
> deserialized into Partition objects. The read method of these objects does 
> not intern the inputformat, location, outputformat which cause large number 
> of duplicate Strings in the HMS memory. We should intern these objects while 
> deserialization to reduce memory pressure. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19041) Thrift deserialization of Partition objects should intern fields

2018-05-03 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463233#comment-16463233
 ] 

Gopal V edited comment on HIVE-19041 at 5/4/18 12:21 AM:
-

Hadoop comes with a weak-interner, which is used by Tez.

StringInterner.weakIntern()

Looking at the code, it was explicitly removed by HIVE-17237 ??


was (Author: gopalv):
Hadoop comes with a weak-interner, which is used by Tez.

StringInterner.weakIntern()


> Thrift deserialization of Partition objects should intern fields
> 
>
> Key: HIVE-19041
> URL: https://issues.apache.org/jira/browse/HIVE-19041
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Attachments: HIVE-19041.01.patch
>
>
> When a client is creating large number of partitions, the thrift objects are 
> deserialized into Partition objects. The read method of these objects does 
> not intern the inputformat, location, outputformat which cause large number 
> of duplicate Strings in the HMS memory. We should intern these objects while 
> deserialization to reduce memory pressure. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19418) add background stats updater similar to compactor

2018-05-03 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463257#comment-16463257
 ] 

Gopal V commented on HIVE-19418:


bq. some exceptions like numRows, cannot be aggregated (i.e. you cannot combine 
ndvs from two inserts)

For pure insert queries all stats can be merged - because nDVs are actually 
stored as HyperLogLog bitsets which have a merge() op.

bq. herefore we will add background logic to metastore (similar to, and 
partially inside, the ACID compactor)

With standalone-metastore, adding more background logic to the metastore is 
going to become a big problem - I'd argue that even the compactor need to be 
moved out & the metastore can only keep the book-keeping for pending tasks (a 
generic task queue + priorities) because it will no longer have a yarn-site.xml 
in its configurations.


> add background stats updater similar to compactor
> -
>
> Key: HIVE-19418
> URL: https://issues.apache.org/jira/browse/HIVE-19418
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> There's a JIRA HIVE-19416 to add snapshot version to stats for MM/ACID tables 
> to make them usable in a transaction without breaking ACID (for metadata-only 
> optimization). However, stats for ACID tables can still become unusable if 
> e.g. two parallel inserts run - neither sees the data written by the other, 
> so after both finish, the snapshots on either set of stats won't match the 
> current snapshot and the stats will be unusable.
> Additionally, for ACID and non-ACID tables alike, a lot of the stats, with 
> some exceptions like numRows, cannot be aggregated (i.e. you cannot combine 
> ndvs from two inserts), and for ACID even less can be aggregated (you cannot 
> derive min/max if some rows are deleted but you don't scan the rest of the 
> dataset).
> Therefore we will add background logic to metastore (similar to, and 
> partially inside, the ACID compactor) to update stats.
> It will have 3 modes of operation.
> 1) Off.
> 2) Update only the stats that exist but are out of date (generating stats can 
> be expensive, so if the user is only analyzing a subset of tables it should 
> be able to only update that subset). We can simply look at existing stats and 
> only analyze for the relevant partitions and columns.
> 3) On: 2 + create stats for all tables and columns missing stats.
> There will also be a table parameter to skip stats update. 
> In phase 1, the process will operate outside of compactor, and run analyze 
> command on the table. The analyze command will automatically save the stats 
> with ACID snapshot information if needed, based on HIVE-19416, so we don't 
> need to do any special state management and this will work for all table 
> types. However it's also more expensive.
> In phase 2, we can explore adding stats collection during MM compaction that 
> uses a temp table. If we don't have open writers during major compaction (so 
> we overwrite all of the data), the temp table stats can simply be copied over 
> to the main table with correct snapshot information, saving us a table scan.
> In phase 3, we can add custom stats collection logic to full ACID compactor 
> that is not query based, the same way as we'd do for (2). Alternatively we 
> can wait for ACID compactor to become query based and just reuse (2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-05-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: HIVE-18079.10.patch

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18079.1.patch, HIVE-18079.10.patch, 
> HIVE-18079.2.patch, HIVE-18079.4.patch, HIVE-18079.5.patch, 
> HIVE-18079.6.patch, HIVE-18079.7.patch, HIVE-18079.8.patch, HIVE-18079.9.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19310) Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might need to be run only in test env

2018-05-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466359#comment-16466359
 ] 

Gopal V commented on HIVE-19310:


[~vgarg]: trivial patch to include for 3.x 

> Metastore: MetaStoreDirectSql.ensureDbInit has some slow DN calls which might 
> need to be run only in test env
> -
>
> Key: HIVE-19310
> URL: https://issues.apache.org/jira/browse/HIVE-19310
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19310.02.patch, HIVE-19310.03.patch, 
> HIVE-19310.1.patch
>
>
> MetaStoreDirectSql.ensureDbInit has the following 2 calls which we have 
> observed taking a long time in our testing:
> {code}
> initQueries.add(pm.newQuery(MNotificationLog.class, "dbName == ''"));
> initQueries.add(pm.newQuery(MNotificationNextId.class, "nextEventId < -1"));
> {code}
> In a production environment, these tables should be initialized using 
> schematool, however in a test environment, these calls might be needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19446) QueryCache: Transaction lists needed for pending cache entries

2018-05-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466555#comment-16466555
 ] 

Gopal V commented on HIVE-19446:


Lookups that happen too early are triggering this NPE.

{code}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.cache.results.QueryResultsCache.entryMatches(QueryResultsCache.java:705)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.cache.results.QueryResultsCache.lookup(QueryResultsCache.java:442)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.checkResultsCache(SemanticAnalyzer.java:14703)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12060)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:334)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
Show more
{code}

> QueryCache: Transaction lists needed for pending cache entries
> --
>
> Key: HIVE-19446
> URL: https://issues.apache.org/jira/browse/HIVE-19446
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jason Dere
>Priority: Major
>
> Hive query-cache needs a transactional list, even when the entry is pending 
> state so that other identical queries with the same transactional state can 
> wait for the first query to complete, instead of triggering their own 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19446) QueryCache: Transaction lists needed for pending cache entries

2018-05-08 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467805#comment-16467805
 ] 

Gopal V commented on HIVE-19446:


Lots of unrelated failures (failed for 3+ runs), row-order is different for 
tez_dynpart_hashjoin_1.q, but I don't think it is related directly to this 
patch, might be a flaky test ([~jdere] confirm?).

LGTM - +1

> QueryCache: Transaction lists needed for pending cache entries
> --
>
> Key: HIVE-19446
> URL: https://issues.apache.org/jira/browse/HIVE-19446
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-19446.1.patch
>
>
> Hive query-cache needs a transactional list, even when the entry is pending 
> state so that other identical queries with the same transactional state can 
> wait for the first query to complete, instead of triggering their own 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19480) Implement and Incorporate MAPREDUCE-207

2018-05-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469539#comment-16469539
 ] 

Gopal V commented on HIVE-19480:


>From the HiveConf in Hive2 and 3.

{code}
While MR remains the default engine for historical reasons, it is itself a 
historical engine
and is deprecated in Hive 2 line. It may be removed without further warning.
{code}

> Implement and Incorporate MAPREDUCE-207
> ---
>
> Key: HIVE-19480
> URL: https://issues.apache.org/jira/browse/HIVE-19480
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 1.2.3
>Reporter: BELUGA BEHR
>Priority: Major
>
> * HiveServer2 has the ability to run many MapReduce jobs in parallel.
>  * Each MapReduce application calculates the job's file splits at the client 
> level
>  * = HiveServer2 loading many file splits at the same time, putting pressure 
> on memory
> {quote}"The client running the job calculates the splits for the job by 
> calling getSplits(), then sends them to the application master, which uses 
> their storage locations to schedule map tasks that will process them on the 
> cluster."
>  - "Hadoop: The Definitive Guide"{quote}
> MAPREDUCE-207 should address this memory pressure by moving split 
> calculations into ApplicationMaster. Spark and Tez already take this approach.
> Once MAPREDUCE-207 is completed, leverage the capability in HiveServer2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19480) Implement and Incorporate MAPREDUCE-207

2018-05-09 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19480:
---
Affects Version/s: (was: 3.0.0)
   1.2.3

> Implement and Incorporate MAPREDUCE-207
> ---
>
> Key: HIVE-19480
> URL: https://issues.apache.org/jira/browse/HIVE-19480
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Affects Versions: 1.2.3
>Reporter: BELUGA BEHR
>Priority: Major
>
> * HiveServer2 has the ability to run many MapReduce jobs in parallel.
>  * Each MapReduce application calculates the job's file splits at the client 
> level
>  * = HiveServer2 loading many file splits at the same time, putting pressure 
> on memory
> {quote}"The client running the job calculates the splits for the job by 
> calling getSplits(), then sends them to the application master, which uses 
> their storage locations to schedule map tasks that will process them on the 
> cluster."
>  - "Hadoop: The Definitive Guide"{quote}
> MAPREDUCE-207 should address this memory pressure by moving split 
> calculations into ApplicationMaster. Spark and Tez already take this approach.
> Once MAPREDUCE-207 is completed, leverage the capability in HiveServer2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19465) Upgrade ORC to 1.5.0

2018-05-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469902#comment-16469902
 ] 

Gopal V commented on HIVE-19465:


The builds needs the ORC-SHIMS change to work from  HIVE-17463, otherwise 
queries fail with this change-set.

> Upgrade ORC to 1.5.0
> 
>
> Key: HIVE-19465
> URL: https://issues.apache.org/jira/browse/HIVE-19465
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19465.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17463) ORC: include orc-shims in hive-exec.jar

2018-05-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471036#comment-16471036
 ] 

Gopal V commented on HIVE-17463:


Yes, +1

> ORC: include orc-shims in hive-exec.jar
> ---
>
> Key: HIVE-17463
> URL: https://issues.apache.org/jira/browse/HIVE-17463
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-17463.1.patch
>
>
> ORC-234 added a new shims module - this needs to be part of hive-exec shading 
> to use ORC-1.5.x branch in Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19465) Upgrade ORC to 1.5.0

2018-05-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471038#comment-16471038
 ] 

Gopal V commented on HIVE-19465:


My nightly runs on LLAP have finished without any issue after rebuilding 
against orc-1.5 branch.

LGTM - +1 tests pending

> Upgrade ORC to 1.5.0
> 
>
> Key: HIVE-19465
> URL: https://issues.apache.org/jira/browse/HIVE-19465
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-19465.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18394) Materialized view: "Create Materialized View" should default to rewritable ones

2018-05-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472533#comment-16472533
 ] 

Gopal V edited comment on HIVE-18394 at 5/11/18 7:37 PM:
-

Neat - +1 tests pending



was (Author: gopalv):
Neat - +1



> Materialized view: "Create Materialized View" should default to rewritable 
> ones
> ---
>
> Key: HIVE-18394
> URL: https://issues.apache.org/jira/browse/HIVE-18394
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>  Labels: TODOC3.0
> Attachments: HIVE-18394.patch
>
>
> This is a usability ticket, since it is possible to end up creating 
> materialized views and realize that they need an additional flag to be picked 
> up by the optimizer to do rewrites to.
> {code:sql}
> create materialized view ca as select * from customer, customer_address where 
> c_current_addr_sk = ca_address_sk;
> set hive.materializedview.rewriting=true;
> select count(1) from customer, customer_address where c_current_addr_sk = 
> ca_address_sk; -- does not use materialized view
> {code}
> Needs another step
> {code:sql}
> alter materialized view ca enable rewrite;
> {code}
> And then, it kicks in 
> {code:sql}
> select count(1) from customer, customer_address where c_current_addr_sk = 
> ca_address_sk;
> OK
> 1200
> Time taken: 0.494 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18394) Materialized view: "Create Materialized View" should default to rewritable ones

2018-05-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472533#comment-16472533
 ] 

Gopal V commented on HIVE-18394:


Neat - +1



> Materialized view: "Create Materialized View" should default to rewritable 
> ones
> ---
>
> Key: HIVE-18394
> URL: https://issues.apache.org/jira/browse/HIVE-18394
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>  Labels: TODOC3.0
> Attachments: HIVE-18394.patch
>
>
> This is a usability ticket, since it is possible to end up creating 
> materialized views and realize that they need an additional flag to be picked 
> up by the optimizer to do rewrites to.
> {code:sql}
> create materialized view ca as select * from customer, customer_address where 
> c_current_addr_sk = ca_address_sk;
> set hive.materializedview.rewriting=true;
> select count(1) from customer, customer_address where c_current_addr_sk = 
> ca_address_sk; -- does not use materialized view
> {code}
> Needs another step
> {code:sql}
> alter materialized view ca enable rewrite;
> {code}
> And then, it kicks in 
> {code:sql}
> select count(1) from customer, customer_address where c_current_addr_sk = 
> ca_address_sk;
> OK
> 1200
> Time taken: 0.494 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18394) Materialized view: "Create Materialized View" should default to rewritable ones

2018-05-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18394:
---
Labels: TODOC3.0  (was: )

> Materialized view: "Create Materialized View" should default to rewritable 
> ones
> ---
>
> Key: HIVE-18394
> URL: https://issues.apache.org/jira/browse/HIVE-18394
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>  Labels: TODOC3.0
> Attachments: HIVE-18394.patch
>
>
> This is a usability ticket, since it is possible to end up creating 
> materialized views and realize that they need an additional flag to be picked 
> up by the optimizer to do rewrites to.
> {code:sql}
> create materialized view ca as select * from customer, customer_address where 
> c_current_addr_sk = ca_address_sk;
> set hive.materializedview.rewriting=true;
> select count(1) from customer, customer_address where c_current_addr_sk = 
> ca_address_sk; -- does not use materialized view
> {code}
> Needs another step
> {code:sql}
> alter materialized view ca enable rewrite;
> {code}
> And then, it kicks in 
> {code:sql}
> select count(1) from customer, customer_address where c_current_addr_sk = 
> ca_address_sk;
> OK
> 1200
> Time taken: 0.494 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19501) Fix HyperLogLog to be threadsafe

2018-05-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472563#comment-16472563
 ] 

Gopal V commented on HIVE-19501:


HIVE-18866: the putLong shows up as general performance issue as well, because 
there's no hash64(long) method in Murmur3


> Fix HyperLogLog to be threadsafe
> 
>
> Key: HIVE-19501
> URL: https://issues.apache.org/jira/browse/HIVE-19501
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> not sure if this is an issue in reality or not; but there are 3 static fields 
> in HyperLogLog which are rewritten during working; if there are multiple 
> threads are calculating HLL in the same JVM, there is a theoretical chance 
> that they might overwrite eachothers value...
> static fields:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L65
> usage:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-05-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: HIVE-18079.11.patch

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18079.1.patch, HIVE-18079.10.patch, 
> HIVE-18079.11.patch, HIVE-18079.2.patch, HIVE-18079.4.patch, 
> HIVE-18079.5.patch, HIVE-18079.6.patch, HIVE-18079.7.patch, 
> HIVE-18079.8.patch, HIVE-18079.9.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-05-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472919#comment-16472919
 ] 

Gopal V commented on HIVE-18079:


Rebased patches and there are some row-order flakiness 

tez_vector_dynpart_hashjoin_1.q.out

{code}
--13036 1
+-8915  1
 -3799  1
 10782  1
--8915  1
+-13036 1
{code}

where the order by is on the column having "1".

unionDistinct_1.q has a -- SORT_BEFORE_DIFF in it, so the result change won't 
break runs.


> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18079.1.patch, HIVE-18079.10.patch, 
> HIVE-18079.11.patch, HIVE-18079.2.patch, HIVE-18079.4.patch, 
> HIVE-18079.5.patch, HIVE-18079.6.patch, HIVE-18079.7.patch, 
> HIVE-18079.8.patch, HIVE-18079.9.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19501) Fix HyperLogLog to be threadsafe

2018-05-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472944#comment-16472944
 ] 

Gopal V commented on HIVE-19501:


FYI, the murmur64 for Long is actually pretty trivial.

{code}
  public static long hash64(long data) {
long hash = DEFAULT_SEED;
long k = Long.reverseBytes(data);
int length = Long.BYTES;
// mix functions
k *= C1;
k = Long.rotateLeft(k, R1);
k *= C2;
hash ^= k;
hash = Long.rotateLeft(hash, R2) * M + N1;
// finalization
hash ^= length;
hash = fmix64(hash);

return hash;
  }
{code}

and does not need a byte[] array.

> Fix HyperLogLog to be threadsafe
> 
>
> Key: HIVE-19501
> URL: https://issues.apache.org/jira/browse/HIVE-19501
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> not sure if this is an issue in reality or not; but there are 3 static fields 
> in HyperLogLog which are rewritten during working; if there are multiple 
> threads are calculating HLL in the same JVM, there is a theoretical chance 
> that they might overwrite eachothers value...
> static fields:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L65
> usage:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19501) Fix HyperLogLog to be threadsafe

2018-05-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472950#comment-16472950
 ] 

Gopal V commented on HIVE-19501:


The short and int versions are nearly identical, except they need to account 
for the sign-bit during the cast up from int -> long by doing 

{code}
long k = Integer.reverseBytes(data) & (-1L >>> 32);
int length = Integer.BYTES;
{code}


> Fix HyperLogLog to be threadsafe
> 
>
> Key: HIVE-19501
> URL: https://issues.apache.org/jira/browse/HIVE-19501
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> not sure if this is an issue in reality or not; but there are 3 static fields 
> in HyperLogLog which are rewritten during working; if there are multiple 
> threads are calculating HLL in the same JVM, there is a theoretical chance 
> that they might overwrite eachothers value...
> static fields:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L65
> usage:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19501) Fix HyperLogLog to be threadsafe

2018-05-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472950#comment-16472950
 ] 

Gopal V edited comment on HIVE-19501 at 5/12/18 7:10 AM:
-

The short and int versions are nearly identical, except they need to account 
for the sign-bit during the cast up from int -> long by doing 

{code}
  public static long hash64(int data) {
long k1 = Integer.reverseBytes(data) & (-1L >>> 32);
int length = Integer.BYTES;
long hash = DEFAULT_SEED;
k1 *= C1;
k1 = Long.rotateLeft(k1, R1);
k1 *= C2;
hash ^= k1;
// finalization
hash ^= length;
hash = fmix64(hash);
return hash;
  }
{code}



was (Author: gopalv):
The short and int versions are nearly identical, except they need to account 
for the sign-bit during the cast up from int -> long by doing 

{code}
long k = Integer.reverseBytes(data) & (-1L >>> 32);
int length = Integer.BYTES;
{code}


> Fix HyperLogLog to be threadsafe
> 
>
> Key: HIVE-19501
> URL: https://issues.apache.org/jira/browse/HIVE-19501
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> not sure if this is an issue in reality or not; but there are 3 static fields 
> in HyperLogLog which are rewritten during working; if there are multiple 
> threads are calculating HLL in the same JVM, there is a theoretical chance 
> that they might overwrite eachothers value...
> static fields:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L65
> usage:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-05-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472967#comment-16472967
 ] 

Gopal V commented on HIVE-18079:


Fixed the ordering issue in tez_vector_dynpart_hashjoin_1.q

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18079.1.patch, HIVE-18079.10.patch, 
> HIVE-18079.11.patch, HIVE-18079.12.patch, HIVE-18079.2.patch, 
> HIVE-18079.4.patch, HIVE-18079.5.patch, HIVE-18079.6.patch, 
> HIVE-18079.7.patch, HIVE-18079.8.patch, HIVE-18079.9.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-05-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: HIVE-18079.12.patch

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18079.1.patch, HIVE-18079.10.patch, 
> HIVE-18079.11.patch, HIVE-18079.12.patch, HIVE-18079.2.patch, 
> HIVE-18079.4.patch, HIVE-18079.5.patch, HIVE-18079.6.patch, 
> HIVE-18079.7.patch, HIVE-18079.8.patch, HIVE-18079.9.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18866) Semijoin: Implement a Long -> Hash64 vector fast-path

2018-05-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18866:
---
Attachment: HIVE-18079.9.patch

> Semijoin: Implement a Long -> Hash64 vector fast-path
> -
>
> Key: HIVE-18866
> URL: https://issues.apache.org/jira/browse/HIVE-18866
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>  Labels: performance
> Attachments: perf-hash64-long.png
>
>
> A significant amount of CPU is wasted with JMM restrictions on byte[] arrays.
> To transform from one Long -> another Long, this goes into a byte[] array, 
> which shows up as a hotspot.
> !perf-hash64-long.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18866) Semijoin: Implement a Long -> Hash64 vector fast-path

2018-05-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18866:
---
Attachment: (was: HIVE-18079.9.patch)

> Semijoin: Implement a Long -> Hash64 vector fast-path
> -
>
> Key: HIVE-18866
> URL: https://issues.apache.org/jira/browse/HIVE-18866
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>  Labels: performance
> Attachments: perf-hash64-long.png
>
>
> A significant amount of CPU is wasted with JMM restrictions on byte[] arrays.
> To transform from one Long -> another Long, this goes into a byte[] array, 
> which shows up as a hotspot.
> !perf-hash64-long.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18866) Semijoin: Implement a Long -> Hash64 vector fast-path

2018-05-12 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18866:
---
Attachment: 0001-hash64-WIP.patch

> Semijoin: Implement a Long -> Hash64 vector fast-path
> -
>
> Key: HIVE-18866
> URL: https://issues.apache.org/jira/browse/HIVE-18866
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>  Labels: performance
> Attachments: 0001-hash64-WIP.patch, perf-hash64-long.png
>
>
> A significant amount of CPU is wasted with JMM restrictions on byte[] arrays.
> To transform from one Long -> another Long, this goes into a byte[] array, 
> which shows up as a hotspot.
> !perf-hash64-long.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-05-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: HIVE-18079.13.patch

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18079.1.patch, HIVE-18079.10.patch, 
> HIVE-18079.11.patch, HIVE-18079.12.patch, HIVE-18079.13.patch, 
> HIVE-18079.2.patch, HIVE-18079.4.patch, HIVE-18079.5.patch, 
> HIVE-18079.6.patch, HIVE-18079.7.patch, HIVE-18079.8.patch, HIVE-18079.9.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18997) Hive column casting from decimal to double is resulting in NULL

2018-03-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405839#comment-16405839
 ] 

Gopal V commented on HIVE-18997:


Have you tested with Hive-3.0 instead of 1.1?

> Hive column casting from decimal to double is resulting in NULL
> ---
>
> Key: HIVE-18997
> URL: https://issues.apache.org/jira/browse/HIVE-18997
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, distribution
>Affects Versions: 1.1.0
> Environment: Hive CLI  and cloudera 5.8.3 distribution
>Reporter: Rangaswamy Narayan
>Priority: Major
>
> I have hive table table1 schema of the table looks like this
> {{[CREATE TABLE table1(p_decimal1 DECIMAL(38,5)) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',' STORED AS TEXTFILE] }}
> and i have below value in the table
> {{row : col(p_decimal1) row1 : 12345123451234512345123.45123 }}
> in later stage if i execute
> {{select CAST(p_decimal1 AS DOUBLE) from table1; }}
> query then I am getting {{NULL}} as a output. 
> expected output should be non-null value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18727) Update GenericUDFEnforceNotNullConstraint to throw an ERROR instead of Exception on failure

2018-03-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406545#comment-16406545
 ] 

Gopal V commented on HIVE-18727:


LGTM - +1

> Update GenericUDFEnforceNotNullConstraint to throw an ERROR instead of 
> Exception on failure
> ---
>
> Key: HIVE-18727
> URL: https://issues.apache.org/jira/browse/HIVE-18727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Kryvenko Igor
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18727.02.patch, HIVE-18727.patch
>
>
> Throwing an exception makes TezProcessor stop retrying the task. Since this 
> is NOT NULL constraint violation we don't want TezProcessor to keep retrying 
> on failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18999) Filter operator does not work for List

2018-03-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18999:
---
Description: 
I have reproduced at the current master.
set hive.optimize.point.lookup=false;
  
create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
bigint);

insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2015, 11);
insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2013, 11);

-- INCORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct(2014,11));
-- CORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct('2014','11'));



  was:
The repro is from EAR-8093 , especially by [~jcamachorodriguez]. 
I have reproduced at the current master.
set hive.optimize.point.lookup=false;
  
create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
bigint);

insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2015, 11);
insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2013, 11);

-- INCORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct(2014,11));
-- CORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct('2014','11'));




> Filter operator does not work for List
> --
>
> Key: HIVE-18999
> URL: https://issues.apache.org/jira/browse/HIVE-18999
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Major
>
> I have reproduced at the current master.
> set hive.optimize.point.lookup=false;
>   
> create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
> bigint);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2015, 11);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2013, 11);
> -- INCORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct(2014,11));
> -- CORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct('2014','11'));



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18999) Filter operator does not work for List

2018-03-20 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18999:
---
Description: 
set hive.optimize.point.lookup=false;
  
create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
bigint);

insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2015, 11);
insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2013, 11);

-- INCORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct(2014,11));
-- CORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct('2014','11'));



  was:
I have reproduced at the current master.
set hive.optimize.point.lookup=false;
  
create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
bigint);

insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2015, 11);
insert into table1 values (1, 1, 'ccl',2014, 11);
insert into table1 values (1, 1, 'ccl',2013, 11);

-- INCORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct(2014,11));
-- CORRECT
SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
(struct('2014','11'));




> Filter operator does not work for List
> --
>
> Key: HIVE-18999
> URL: https://issues.apache.org/jira/browse/HIVE-18999
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Major
>
> set hive.optimize.point.lookup=false;
>   
> create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
> bigint);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2015, 11);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2013, 11);
> -- INCORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct(2014,11));
> -- CORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct('2014','11'));



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18999) Filter operator does not work for List

2018-03-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406630#comment-16406630
 ] 

Gopal V commented on HIVE-18999:


[~steveyeom2017]: are you mixing up internal and external tickets? The EAR 
references makes no sense for Apache JIRAs - I removed the one in the 
descriptions.

> Filter operator does not work for List
> --
>
> Key: HIVE-18999
> URL: https://issues.apache.org/jira/browse/HIVE-18999
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Major
>
> set hive.optimize.point.lookup=false;
>   
> create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
> bigint);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2015, 11);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2013, 11);
> -- INCORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct(2014,11));
> -- CORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct('2014','11'));



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18999) Filter operator does not work for List

2018-03-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408695#comment-16408695
 ] 

Gopal V commented on HIVE-18999:


This is broken because the constant structs are  being constructed with 
Struct(Int, Int) and they don't compare with Struct(Long,Long) from the bigint 
on the Table side.

> Filter operator does not work for List
> --
>
> Key: HIVE-18999
> URL: https://issues.apache.org/jira/browse/HIVE-18999
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Major
>
> {code:sql}
> create table table1(col0 int, col1 bigint, col2 string, col3 bigint, col4 
> bigint);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2015, 11);
> insert into table1 values (1, 1, 'ccl',2014, 11);
> insert into table1 values (1, 1, 'ccl',2013, 11);
> -- INCORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct(2014,11));
> -- CORRECT
> SELECT COUNT(t1.col0) from table1 t1 where struct(col3, col4) in 
> (struct('2014','11'));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19038) LLAP: Service loader throws "Provider not found" exception if hive-llap-server is in class path while loading tokens

2018-03-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411857#comment-16411857
 ] 

Gopal V commented on HIVE-19038:


[~arunmahadevan]: do you have a patch? This looks like a ".Renewer" -> 
"$Renewer", but if you can confirm that, I can review

> LLAP: Service loader throws "Provider not found" exception if 
> hive-llap-server is in class path while loading tokens
> 
>
> Key: HIVE-19038
> URL: https://issues.apache.org/jira/browse/HIVE-19038
> Project: Hive
>  Issue Type: Bug
>Reporter: Arun Mahadevan
>Assignee: Arun Mahadevan
>Priority: Major
>  Labels: pull-request-available
>
> While testing storm in secure mode, the hive-llap-server jar file was 
> included in the class path and resulted in the below exception while trying 
> to renew credentials when invoking 
> "org.apache.hadoop.security.token.Token.getRenewer"
>  
>  
> {noformat}
> java.util.ServiceConfigurationError: 
> org.apache.hadoop.security.token.TokenRenewer: Provider 
> org.apache.hadoop.hive.llap.security.LlapTokenIdentifier.Renewer not found at 
> java.util.ServiceLoader.fail(ServiceLoader.java:239) ~[?:1.8.0_161] at 
> java.util.ServiceLoader.access$300(ServiceLoader.java:185) ~[?:1.8.0_161] at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372) 
> ~[?:1.8.0_161] at 
> java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) 
> ~[?:1.8.0_161] at 
> java.util.ServiceLoader$1.next(ServiceLoader.java:480) ~[?:1.8.0_161] at 
> org.apache.hadoop.security.token.Token.getRenewer(Token.java:463) 
> ~[hadoop-common-3.0.0.3.0.0.0-1064.jar:?] at 
> org.apache.hadoop.security.token.Token.renew(Token.java:490) 
> ~[hadoop-common-3.0.0.3.0.0.0-1064.jar:?] at 
> org.apache.storm.hdfs.security.AutoHDFS.doRenew(AutoHDFS.java:159) 
> ~[storm-autocreds-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.common.AbstractAutoCreds.renew(AbstractAutoCreds.java:104) 
> ~[storm-autocreds-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_161] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_161] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_161] at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?] at 
> clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
> ~[clojure-1.7.0.jar:?] at 
> org.apache.storm.daemon.nimbus$renew_credentials$fn__9121$fn__9126.invoke(nimbus.clj:1450)
>  ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.daemon.nimbus$renew_credentials$fn__9121.invoke(nimbus.clj:1449)
>  ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.daemon.nimbus$renew_credentials.invoke(nimbus.clj:1439) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.daemon.nimbus$fn__9547$exec_fn__3301__auto9548$fn__9567.invoke(nimbus.clj:2521)
>  ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$schedule_recurring$this__1656.invoke(timer.clj:105) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639$fn__1640.invoke(timer.clj:50) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639.invoke(timer.clj:42) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] at 
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_161] 2018-03-22 22:08:59.088 
> o.a.s.util timer [ERROR] Halting process: ("Error when processing an event") 
> java.lang.RuntimeException: ("Error when processing an event") at 
> org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> clojure.lang.RestFn.invoke(RestFn.java:423) ~[clojure-1.7.0.jar:?] at 
> org.apache.storm.daemon.nimbus$nimbus_data$fn__8334.invoke(nimbus.clj:221) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639$fn__1640.invoke(timer.clj:71) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639.invoke(timer.clj:42) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] at 
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19038) LLAP: Service loader throws "Provider not found" exception if hive-llap-server is in class path while loading tokens

2018-03-23 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411859#comment-16411859
 ] 

Gopal V commented on HIVE-19038:


LGTM - +1 tests pending

> LLAP: Service loader throws "Provider not found" exception if 
> hive-llap-server is in class path while loading tokens
> 
>
> Key: HIVE-19038
> URL: https://issues.apache.org/jira/browse/HIVE-19038
> Project: Hive
>  Issue Type: Bug
>Reporter: Arun Mahadevan
>Assignee: Arun Mahadevan
>Priority: Major
>  Labels: pull-request-available
>
> While testing storm in secure mode, the hive-llap-server jar file was 
> included in the class path and resulted in the below exception while trying 
> to renew credentials when invoking 
> "org.apache.hadoop.security.token.Token.getRenewer"
>  
>  
> {noformat}
> java.util.ServiceConfigurationError: 
> org.apache.hadoop.security.token.TokenRenewer: Provider 
> org.apache.hadoop.hive.llap.security.LlapTokenIdentifier.Renewer not found at 
> java.util.ServiceLoader.fail(ServiceLoader.java:239) ~[?:1.8.0_161] at 
> java.util.ServiceLoader.access$300(ServiceLoader.java:185) ~[?:1.8.0_161] at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372) 
> ~[?:1.8.0_161] at 
> java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) 
> ~[?:1.8.0_161] at 
> java.util.ServiceLoader$1.next(ServiceLoader.java:480) ~[?:1.8.0_161] at 
> org.apache.hadoop.security.token.Token.getRenewer(Token.java:463) 
> ~[hadoop-common-3.0.0.3.0.0.0-1064.jar:?] at 
> org.apache.hadoop.security.token.Token.renew(Token.java:490) 
> ~[hadoop-common-3.0.0.3.0.0.0-1064.jar:?] at 
> org.apache.storm.hdfs.security.AutoHDFS.doRenew(AutoHDFS.java:159) 
> ~[storm-autocreds-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.common.AbstractAutoCreds.renew(AbstractAutoCreds.java:104) 
> ~[storm-autocreds-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_161] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_161] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_161] at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_161] at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?] at 
> clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
> ~[clojure-1.7.0.jar:?] at 
> org.apache.storm.daemon.nimbus$renew_credentials$fn__9121$fn__9126.invoke(nimbus.clj:1450)
>  ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.daemon.nimbus$renew_credentials$fn__9121.invoke(nimbus.clj:1449)
>  ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.daemon.nimbus$renew_credentials.invoke(nimbus.clj:1439) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.daemon.nimbus$fn__9547$exec_fn__3301__auto9548$fn__9567.invoke(nimbus.clj:2521)
>  ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$schedule_recurring$this__1656.invoke(timer.clj:105) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639$fn__1640.invoke(timer.clj:50) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639.invoke(timer.clj:42) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] at 
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_161] 2018-03-22 22:08:59.088 
> o.a.s.util timer [ERROR] Halting process: ("Error when processing an event") 
> java.lang.RuntimeException: ("Error when processing an event") at 
> org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> clojure.lang.RestFn.invoke(RestFn.java:423) ~[clojure-1.7.0.jar:?] at 
> org.apache.storm.daemon.nimbus$nimbus_data$fn__8334.invoke(nimbus.clj:221) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639$fn__1640.invoke(timer.clj:71) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> org.apache.storm.timer$mk_timer$fn__1639.invoke(timer.clj:42) 
> ~[storm-core-1.2.1.3.0.0.0-1064.jar:1.2.1.3.0.0.0-1064] at 
> clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] at 
> java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18866) Semijoin: Implement a Long -> Hash64 vector fast-path

2018-03-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412467#comment-16412467
 ] 

Gopal V commented on HIVE-18866:


This codepath also runs hot during the HLL computation for column stats on 
Double & Longs.

{code}
 15.83 │   movzbq 0x18(%r10,%rsi,1),%rax

  
  0.03 │   cmp%ebx,%ecx 

  
   │ . jae32d   

  
  0.03 │   vmovd  %ebx,%xmm0

  
  0.00 │   vmovd  %edx,%xmm1

  
  0.02 │   movslq %r13d,%r14

  
  2.81 │   movzbq 0x19(%r10,%r14,1),%r8 

  
  2.79 │   movzbq 0x1f(%r10,%r14,1),%rcx

  
  0.45 │   movzbq 0x1a(%r10,%r14,1),%rbx

  
  0.60 │   movzbq 0x1b(%r10,%r14,1),%rdx

  
  0.41 │   movzbq 0x1c(%r10,%r14,1),%rsi

  
  1.63 │   movzbq 0x1d(%r10,%r14,1),%r13

  
  0.78 │   movzbq 0x1e(%r10,%r14,1),%r14  
{code}

> Semijoin: Implement a Long -> Hash64 vector fast-path
> -
>
> Key: HIVE-18866
> URL: https://issues.apache.org/jira/browse/HIVE-18866
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Priority: Major
>  Labels: performance
> Attachments: perf-hash64-long.png
>
>
> A significant amount of CPU is wasted with JMM restrictions on byte[] arrays.
> To transform from one Long -> another Long, this goes into a byte[] array, 
> which shows up as a hotspot.
> !perf-hash64-long.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18866) Semijoin: Implement a Long -> Hash64 vector fast-path

2018-03-24 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412467#comment-16412467
 ] 

Gopal V edited comment on HIVE-18866 at 3/24/18 7:33 AM:
-

This codepath also runs hot during the HLL computation for column stats on 
Double & Longs.

{code}
 15.83 │   movzbq 0x18(%r10,%rsi,1),%rax

  
  0.03 │   cmp%ebx,%ecx 

  
   │ . jae32d   

  
  0.03 │   vmovd  %ebx,%xmm0

  
  0.00 │   vmovd  %edx,%xmm1

  
  0.02 │   movslq %r13d,%r14

  
  2.81 │   movzbq 0x19(%r10,%r14,1),%r8 

  
  2.79 │   movzbq 0x1f(%r10,%r14,1),%rcx

  
  0.45 │   movzbq 0x1a(%r10,%r14,1),%rbx

  
  0.60 │   movzbq 0x1b(%r10,%r14,1),%rdx

  
  0.41 │   movzbq 0x1c(%r10,%r14,1),%rsi

  
  1.63 │   movzbq 0x1d(%r10,%r14,1),%r13

  
  0.78 │   movzbq 0x1e(%r10,%r14,1),%r14  
{code}

{code}
 0.00 │   vmovq  %xmm2,%r10 

▒
  0.03 │   mov%r10d,%r10d   

 ▒
   │   movslq %r11d,%rsi

 ▒
   │   mov%dil,0x19(%r8,%rsi,1) 

 ▒
  0.00 │   vmovq  %xmm2,%r11

 ▒
  0.05 │   sar$0x8,%r11 

 ▒
   │   vmovq  %xmm2,%r9 

 ▒
   │   sar$0x10,%r9 

 ▒
  0.00 │   mov%r11d,%r11d   

 ▒
  0.03 │   mov%r9d,%r9d   

[jira] [Commented] (HIVE-19052) Vectorization: Disable Vector Pass-Thru SMB MapJoin in the presence of old-style MR FilterMaps

2018-03-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414514#comment-16414514
 ] 

Gopal V commented on HIVE-19052:


+1 tests pending

> Vectorization: Disable Vector Pass-Thru SMB MapJoin in the presence of 
> old-style MR FilterMaps
> --
>
> Key: HIVE-19052
> URL: https://issues.apache.org/jira/browse/HIVE-19052
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Priority: Critical
> Attachments: HIVE-19052.01.patch
>
>
> Pass-Thru VectorSMBMapJoinOperator was not designed to handle old-style MR 
> FilterMaps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19042) set MALLOC_ARENA_MAX for LLAP

2018-03-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414622#comment-16414622
 ] 

Gopal V commented on HIVE-19042:


Yes, this makes sense - as long as that number stays above the total number of 
NUMA regions we're good.

LGTM - +1

I'll run more tests with this setup and see if I see the fragmentation issues 
again.

> set MALLOC_ARENA_MAX for LLAP
> -
>
> Key: HIVE-19042
> URL: https://issues.apache.org/jira/browse/HIVE-19042
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19042.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18928) HS2: Perflogger has a race condition

2018-03-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414861#comment-16414861
 ] 

Gopal V commented on HIVE-18928:


bq. any thoughts as to what is happening?

This happens when I fire off 1000 concurrent users at 1 HS2 - I haven't been 
able to reproduce this error when logging is enabled, so I'm not exactly sure 
what happens when things move that fast.

> HS2: Perflogger has a race condition
> 
>
> Key: HIVE-18928
> URL: https://issues.apache.org/jira/browse/HIVE-18928
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-18928.1.patch
>
>
> {code}
> Caused by: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) 
> ~[?:1.8.0_112]
> at java.util.HashMap$EntryIterator.next(HashMap.java:1471) 
> ~[?:1.8.0_112]
> at java.util.HashMap$EntryIterator.next(HashMap.java:1469) 
> ~[?:1.8.0_112]
> at java.util.AbstractCollection.toArray(AbstractCollection.java:196) 
> ~[?:1.8.0_112]
> at com.google.common.collect.Iterables.toArray(Iterables.java:316) 
> ~[guava-19.0.jar:?]
> at 
> com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:342) 
> ~[guava-19.0.jar:?]
> at 
> com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:327) 
> ~[guava-19.0.jar:?]
> at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:218) 
> ~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1561) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1498) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:198)
>  ~[hive-service-3.0.0.3.0.0.2-132.jar:3.0.0.3.0.0.2-132]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19035) Vectorization: Disable exotic STRUCT field reference form

2018-03-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416131#comment-16416131
 ] 

Gopal V commented on HIVE-19035:


Add a positive test-case for this - will the following query vectorize?

{code}
EXPLAIN VECTORIZATION EXPRESSION
FROM src_thrift
SELECT src_thrift.mstringstring['key_9'];
{code}

+1 tests pending.

> Vectorization: Disable exotic STRUCT field reference form
> -
>
> Key: HIVE-19035
> URL: https://issues.apache.org/jira/browse/HIVE-19035
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-19035.01.patch
>
>
> We currently don't support exotic field references like get a struct field 
> from array> returns a type array.  Attempt 
> causes ClassCastException in VectorizationContext that kills query planning.
> The Q file is input_testxpath3.q



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18928) HS2: Perflogger has a race condition

2018-03-27 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416445#comment-16416445
 ] 

Gopal V commented on HIVE-18928:


Actually, I think that will hide the actual issue of two threads ending up with 
the same perflogger object.

I'll keep this open until the whole of the test run is complete & I can see if 
this is a symptom or the actual bug.

> HS2: Perflogger has a race condition
> 
>
> Key: HIVE-18928
> URL: https://issues.apache.org/jira/browse/HIVE-18928
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-18928.1.patch
>
>
> {code}
> Caused by: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) 
> ~[?:1.8.0_112]
> at java.util.HashMap$EntryIterator.next(HashMap.java:1471) 
> ~[?:1.8.0_112]
> at java.util.HashMap$EntryIterator.next(HashMap.java:1469) 
> ~[?:1.8.0_112]
> at java.util.AbstractCollection.toArray(AbstractCollection.java:196) 
> ~[?:1.8.0_112]
> at com.google.common.collect.Iterables.toArray(Iterables.java:316) 
> ~[guava-19.0.jar:?]
> at 
> com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:342) 
> ~[guava-19.0.jar:?]
> at 
> com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:327) 
> ~[guava-19.0.jar:?]
> at 
> org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:218) 
> ~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1561) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1498) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:198)
>  ~[hive-service-3.0.0.3.0.0.2-132.jar:3.0.0.3.0.0.2-132]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19024) Vectorization: Disable complex type constants for VectorUDFAdaptor

2018-03-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418259#comment-16418259
 ] 

Gopal V commented on HIVE-19024:


Patch LGTM - +1

Can you run vectorized_dynamic_semijoin_reduction locally and confirm if it was 
just test flakiness?

> Vectorization: Disable complex type constants for VectorUDFAdaptor
> --
>
> Key: HIVE-19024
> URL: https://issues.apache.org/jira/browse/HIVE-19024
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-19024.01.patch
>
>
> Currently, complex type constants are not detected and cause execution 
> failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19032) Vectorization: Disable GROUP BY aggregations with DISTINCT

2018-03-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418263#comment-16418263
 ] 

Gopal V commented on HIVE-19032:


Change looks +1

The hive count distinct rewrite in CBO should be removing most of these cases, 
so it shouldn't be a common scenario.

However the q.out for vectorized_distinct_gby might need a golden update.


> Vectorization: Disable GROUP BY aggregations with DISTINCT
> --
>
> Key: HIVE-19032
> URL: https://issues.apache.org/jira/browse/HIVE-19032
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-19032.01.patch
>
>
> Vectorized GROUP BY does not support DISTINCT aggregation functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421165#comment-16421165
 ] 

Gopal V commented on HIVE-19085:


bq. Problem is probably due to the fastABS method. This method force 
"fastSignum" to 1 even when the decimal is 0 (in this case  "fastSignum" must 
be egal  at 0).

Yes, that seems to be the problem here - good catch.

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOpera

[jira] [Assigned] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-19085:
--

Assignee: Gopal V

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal is 0 (in this case  "fastSignum" must be egal  at 
> 0).
>  
> Have a good day



--
This message was sent by Atlassian 

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Labels: fastdecimal vectorization  (was: vectorization)

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 ev

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Component/s: Vectorization

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal is 0 (in 

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Attachment: HIVE-19085.1.patch

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal is 0 

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Labels: vectorization  (was: )

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal is 0 

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-03-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Status: Patch Available  (was: Open)

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal

[jira] [Updated] (HIVE-19096) query result cache interferes with explain analyze

2018-04-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19096:
---
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-18513

> query result cache interferes with explain analyze 
> ---
>
> Key: HIVE-19096
> URL: https://issues.apache.org/jira/browse/HIVE-19096
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Priority: Major
>
> if  result cache is active; the explain analyze doesn't really return usefull 
> informations; even for unseen queries the result is like this:
> {code}
> ++
> |Explain |
> ++
> | Stage-0|
> |   Fetch Operator   |
> | Cached Query Result:true,limit:-1  |
> ||
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19098) Hive: impossible to insert data in a parquet's table with "union all" in the select query

2018-04-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19098:
---
Component/s: File Formats

> Hive: impossible to insert data in a parquet's table with "union all" in the 
> select query
> -
>
> Key: HIVE-19098
> URL: https://issues.apache.org/jira/browse/HIVE-19098
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats, Hive
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Janaki Lahorani
>Priority: Minor
>
> Hello
> We have a parquet's table.
> We want to insert data in the table by a querie like this:
> "insert into my_table select * from my_select_table_1 union all select * from 
> my_select_table_2"
> It's fail with the error:
> 2018-04-03 15:49:28,898 FATAL [IPC Server handler 2 on 38465] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522749003448_0028_m_00_0 - exited : java.io.IOException: 
> java.lang.reflect.InvocationTargetException
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:695)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
>  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>  ... 11 more
> Caused by: java.lang.NullPointerException
>  at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:118)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher.pushProjectionsAndFilters(ProjectionPusher.java:189)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:75)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
>  at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:99)
>  ... 16 more
>  
> Scenario:
> create table t1 (col1 string);
> create table t2 (col1 string);
> insert into t2 values ('2017');
> insert into t1 values ('2017');
> create table t3 (col1 string) STORED AS PARQUETFILE;
>  INSERT into t3 select col1 from t1 union all select col1 from t2; 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18909) Metrics for results cache

2018-04-03 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425084#comment-16425084
 ] 

Gopal V commented on HIVE-18909:


LGTM - +1

> Metrics for results cache
> -
>
> Key: HIVE-18909
> URL: https://issues.apache.org/jira/browse/HIVE-18909
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
>  Labels: Metrics
> Attachments: HIVE-18909.1.patch, HIVE-18909.2.patch, 
> HIVE-18909.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19111) ACID: PPD & Split-pruning for txn_id filters

2018-04-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19111:
---
Issue Type: Improvement  (was: Bug)

> ACID: PPD & Split-pruning for txn_id filters
> 
>
> Key: HIVE-19111
> URL: https://issues.apache.org/jira/browse/HIVE-19111
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Gopal V
>Priority: Major
>
> HIVE-18839 uses transaction id filtering to do incremental scans of a table 
> (for a "from snapshot to snapshot" range).
> This filter can be pushed down into the Split-generation phase to skip entire 
> files and directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19113) Bucketing: Make CLUSTERED BY do CLUSTER BY if no explicit sorting is specified

2018-04-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19113:
---
Issue Type: Improvement  (was: Bug)

> Bucketing: Make CLUSTERED BY do CLUSTER BY if no explicit sorting is specified
> --
>
> Key: HIVE-19113
> URL: https://issues.apache.org/jira/browse/HIVE-19113
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Priority: Major
>
> The user's expectation of 
> "create external table bucketed (key int) clustered by (key) into 4 buckets 
> stored as orc;"
> is that the table will cluster the key into 4 buckets, while the file layout 
> does not do any actual clustering of rows.
> In the absence of a "SORTED BY", this can automatically do a "SORTED BY 
> (key)" to cluster the keys within the file as expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19115) Merge: Semijoin hints are dropped by the merge

2018-04-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19115:
---
Description: 
{code}
create table target stored as orc as select ss_ticket_number, ss_item_sk, 
current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;

create table source stored as orc as select sr_ticket_number, sr_item_sk, 
d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;


merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select * 
from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
S.sr_ticket_number and sr_item_sk = ss_item_sk 
when matched is null THEN UPDATE SET ts = current_timestamp
when not matched and sr_item_sk is not null and sr_ticket_number is not null 
THEN INSERT VALUES(S.sr_ticket_number, S.sr_item_sk, current_timestamp);
{code}

The semijoin hints are ignored and the code says 

{code}
 todo: do we care to preserve comments in original SQL?
{code}


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624

in this case we do.


  was:
{code}
create table target stored as orc as select ss_ticket_number, ss_item_sk, 
current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;

create table source stored as orc as select sr_ticket_number, sr_item_sk, 
d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;


explain
merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select * 
from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
S.sr_ticket_number and sr_item_sk = ss_item_sk 
when matched and ss_item_sk is null THEN UPDATE SET ts = current_timestamp
when not matched and ss_item_sk is null and sr_item_sk is not null and 
sr_ticket_number is not null THEN INSERT VALUES(S.sr_ticket_number, 
S.sr_item_sk, current_timestamp);
{code}

The semijoin hints are ignored and the code says 

{code}
 todo: do we care to preserve comments in original SQL?
{code}


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624

in this case we do.



> Merge: Semijoin hints are dropped by the merge
> --
>
> Key: HIVE-19115
> URL: https://issues.apache.org/jira/browse/HIVE-19115
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
>
> {code}
> create table target stored as orc as select ss_ticket_number, ss_item_sk, 
> current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;
> create table source stored as orc as select sr_ticket_number, sr_item_sk, 
> d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
> tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;
> merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select 
> * from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
> S.sr_ticket_number and sr_item_sk = ss_item_sk 
> when matched is null THEN UPDATE SET ts = current_timestamp
> when not matched and sr_item_sk is not null and sr_ticket_number is not null 
> THEN INSERT VALUES(S.sr_ticket_number, S.sr_item_sk, current_timestamp);
> {code}
> The semijoin hints are ignored and the code says 
> {code}
>  todo: do we care to preserve comments in original SQL?
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624
> in this case we do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19115) Merge: Semijoin hints are dropped by the merge

2018-04-04 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19115:
---
Description: 
{code}
create table target stored as orc as select ss_ticket_number, ss_item_sk, 
current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;

create table source stored as orc as select sr_ticket_number, sr_item_sk, 
d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;


merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select * 
from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
S.sr_ticket_number and sr_item_sk = ss_item_sk 
when matched THEN UPDATE SET ts = current_timestamp
when not matched and sr_item_sk is not null and sr_ticket_number is not null 
THEN INSERT VALUES(S.sr_ticket_number, S.sr_item_sk, current_timestamp);
{code}

The semijoin hints are ignored and the code says 

{code}
 todo: do we care to preserve comments in original SQL?
{code}


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624

in this case we do.


  was:
{code}
create table target stored as orc as select ss_ticket_number, ss_item_sk, 
current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;

create table source stored as orc as select sr_ticket_number, sr_item_sk, 
d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;


merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select * 
from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
S.sr_ticket_number and sr_item_sk = ss_item_sk 
when matched is null THEN UPDATE SET ts = current_timestamp
when not matched and sr_item_sk is not null and sr_ticket_number is not null 
THEN INSERT VALUES(S.sr_ticket_number, S.sr_item_sk, current_timestamp);
{code}

The semijoin hints are ignored and the code says 

{code}
 todo: do we care to preserve comments in original SQL?
{code}


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624

in this case we do.



> Merge: Semijoin hints are dropped by the merge
> --
>
> Key: HIVE-19115
> URL: https://issues.apache.org/jira/browse/HIVE-19115
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
>
> {code}
> create table target stored as orc as select ss_ticket_number, ss_item_sk, 
> current_timestamp as `ts` from tpcds_bin_partitioned_orc_1000.store_sales;
> create table source stored as orc as select sr_ticket_number, sr_item_sk, 
> d_date from tpcds_bin_partitioned_orc_1000.store_returns join 
> tpcds_bin_partitioned_orc_1000.date_dim where d_date_sk = sr_returned_date_sk;
> merge /* +semi(T, sr_ticket_number, S, 1) */ into target T using (select 
> * from source where year(d_date) = 1998) S ON T.ss_ticket_number = 
> S.sr_ticket_number and sr_item_sk = ss_item_sk 
> when matched THEN UPDATE SET ts = current_timestamp
> when not matched and sr_item_sk is not null and sr_ticket_number is not null 
> THEN INSERT VALUES(S.sr_ticket_number, S.sr_item_sk, current_timestamp);
> {code}
> The semijoin hints are ignored and the code says 
> {code}
>  todo: do we care to preserve comments in original SQL?
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java#L624
> in this case we do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19117) hiveserver2 org.apache.thrift.transport.TTransportException error when running 2nd query after minute of inactivity

2018-04-05 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427063#comment-16427063
 ] 

Gopal V commented on HIVE-19117:


You can try adding {{;http.header.Connection=close}} to the end of your JDBC 
URLs

> hiveserver2 org.apache.thrift.transport.TTransportException error when 
> running 2nd query after minute of inactivity
> ---
>
> Key: HIVE-19117
> URL: https://issues.apache.org/jira/browse/HIVE-19117
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Metastore, Thrift API
>Affects Versions: 2.1.1
> Environment: * Hive 2.1.1 with hive.server2.transport.mode set to 
> binary (sample JDBC string is jdbc:hive2://remotehost:1/default)
>  * Hadoop 2.8.3
>  * Metastore using MySQL
>  * Java 8
>Reporter: t oo
>Priority: Blocker
>
> I make a JDBC connection from my SQL tool (ie Squirrel SQL, Oracle SQL 
> Developer) to HiveServer2 (running on remote server) with port 1.
> I am able to run some queries successfully. I then do something else (not in 
> the SQL tool) for 1-2minutes and then return to my SQL tool and attempt to 
> run a query but I get this error: 
> {code:java}
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Software caused connection abort: socket write error{code}
> If I now disconnect and reconnect in my SQL tool I can run queries again. But 
> does anyone know what HiveServer2 settings I should change to prevent the 
> error? I assume something in hive-site.xml
> From the hiveserver2 logs below, can see an exact 1 minute gap from 30th min 
> to 31stmin where the disconnect happens.
> {code:java}
> 2018-04-05T03:30:41,706 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,718 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,719 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,232 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Session disconnected without closing properly.
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a]
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> service.CompositeService: Session closed, SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a], current sessions:0
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.HiveSessionImpl: Operation log session directory is deleted: 
> /var/hive/hs2log/tmp/c81ec0f9-7a9d-46b6-9708-e7d78520a48a
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/scratch/tmp/anonymous/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs 
> with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/ec2-user/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> hive.metastore: Closed a connection to metastore, current connections: 1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19117) hiveserver2 org.apache.thrift.transport.TTransportException error when running 2nd query after minute of inactivity

2018-04-05 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427063#comment-16427063
 ] 

Gopal V edited comment on HIVE-19117 at 4/5/18 2:59 PM:


You can try adding {{;http.header.Connection=close}} to the end of your JDBC 
URL, if you are using the HTTP protocol for JDBC (or Knox).


was (Author: gopalv):
You can try adding {{;http.header.Connection=close}} to the end of your JDBC 
URLs

> hiveserver2 org.apache.thrift.transport.TTransportException error when 
> running 2nd query after minute of inactivity
> ---
>
> Key: HIVE-19117
> URL: https://issues.apache.org/jira/browse/HIVE-19117
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Metastore, Thrift API
>Affects Versions: 2.1.1
> Environment: * Hive 2.1.1 with hive.server2.transport.mode set to 
> binary (sample JDBC string is jdbc:hive2://remotehost:1/default)
>  * Hadoop 2.8.3
>  * Metastore using MySQL
>  * Java 8
>Reporter: t oo
>Priority: Blocker
>
> I make a JDBC connection from my SQL tool (ie Squirrel SQL, Oracle SQL 
> Developer) to HiveServer2 (running on remote server) with port 1.
> I am able to run some queries successfully. I then do something else (not in 
> the SQL tool) for 1-2minutes and then return to my SQL tool and attempt to 
> run a query but I get this error: 
> {code:java}
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Software caused connection abort: socket write error{code}
> If I now disconnect and reconnect in my SQL tool I can run queries again. But 
> does anyone know what HiveServer2 settings I should change to prevent the 
> error? I assume something in hive-site.xml
> From the hiveserver2 logs below, can see an exact 1 minute gap from 30th min 
> to 31stmin where the disconnect happens.
> {code:java}
> 2018-04-05T03:30:41,706 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,712 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:30:41,718 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:30:41,719 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,232 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Session disconnected without closing properly.
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a]
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> service.CompositeService: Session closed, SessionHandle 
> [c81ec0f9-7a9d-46b6-9708-e7d78520a48a], current sessions:0
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Updating thread name to 
> c81ec0f9-7a9d-46b6-9708-e7d78520a48a HiveServer2-Handler-Pool: Thread-36
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.HiveSessionImpl: Operation log session directory is deleted: 
> /var/hive/hs2log/tmp/c81ec0f9-7a9d-46b6-9708-e7d78520a48a
>  2018-04-05T03:31:41,233 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Resetting thread name to HiveServer2-Handler-Pool: 
> Thread-36
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/scratch/tmp/anonymous/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs 
> with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> session.SessionState: Deleted directory: 
> /var/hive/ec2-user/c81ec0f9-7a9d-46b6-9708-e7d78520a48a on fs with scheme file
>  2018-04-05T03:31:41,236 INFO [HiveServer2-Handler-Pool: Thread-36] 
> hive.metastore: Closed a connection to metastore, current connections: 1{code}



--
This message wa

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-04-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Fix Version/s: 3.0.0

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal is 0 (in this c

[jira] [Updated] (HIVE-19085) HIVE: impossible to insert abs(0) in a decimal column in a parquet's table

2018-04-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Fix Version/s: (was: 3.0.0)

> HIVE: impossible to insert abs(0)  in a decimal column in a parquet's table
> ---
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force "fastSignum" 
> to 1 even when the decimal is 0

[jira] [Updated] (HIVE-19085) HIVE: FastHiveDecimal abs(0) sets sign to +ve

2018-04-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-19085:
---
Summary: HIVE: FastHiveDecimal abs(0) sets sign to +ve  (was: HIVE: 
impossible to insert abs(0)  in a decimal column in a parquet's table)

> HIVE: FastHiveDecimal abs(0) sets sign to +ve
> -
>
> Key: HIVE-19085
> URL: https://issues.apache.org/jira/browse/HIVE-19085
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Vectorization
>Affects Versions: 2.3.2
>Reporter: ACOSS
>Assignee: Gopal V
>Priority: Minor
>  Labels: fastdecimal, vectorization
> Attachments: HIVE-19085.1.patch
>
>
> Hi,
> We use parquet's table to store the result of others query. Some query use 
> the function abs. If the function "abs" take 0 (type decimal)  in input, then 
> the insert in the parquet's table failed 
>  
> +Scenario:+
> create table test (col1 decimal(10,2)) stored as parquet;
> insert into test values(0);
> insert into test select abs(col1) from test;
>  
> +Result;+
> The insert query crash with the error:
>  
> 2018-03-30 17:39:02,123 FATAL [IPC Server handler 2 on 35885] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1522311557218_0002_m_00_0 - exited : java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row \{"col1":0}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>  ... 8 more
> Caused by: java.lang.RuntimeException: Unexpected #3
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesUnscaled(FastHiveDecimalImpl.java:2550)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimalImpl.fastBigIntegerBytesScaled(FastHiveDecimalImpl.java:2806)
> at 
> org.apache.hadoop.hive.common.type.FastHiveDecimal.fastBigIntegerBytesScaled(FastHiveDecimal.java:295)
> at 
> org.apache.hadoop.hive.common.type.HiveDecimal.bigIntegerBytesScaled(HiveDecimal.java:712)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.decimalToBinary(DataWritableWriter.java:521)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$DecimalDataWriter.write(DataWritableWriter.java:514)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$GroupDataWriter.write(DataWritableWriter.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter$MessageDataWriter.write(DataWritableWriter.java:220)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:91)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
> at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
> at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>  
> Problem is probably due to the fastABS method. This method force

  1   2   3   4   5   6   7   8   9   10   >