date:20161021

[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-10-21 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14803:

Status: Patch Available  (was: Open)

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch, HIVE-14803.2.patch, 
> HIVE-14803.3.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15030) Fixes in inference of collation for Tez cost model

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15030:
---
Status: Patch Available  (was: In Progress)

> Fixes in inference of collation for Tez cost model
> --
>
> Key: HIVE-15030
> URL: https://issues.apache.org/jira/browse/HIVE-15030
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15030.patch
>
>
> Tez cost model might get NPE if collation returned by join algorithm is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-15030) Fixes in inference of collation for Tez cost model

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15030 started by Jesus Camacho Rodriguez.
--
> Fixes in inference of collation for Tez cost model
> --
>
> Key: HIVE-15030
> URL: https://issues.apache.org/jira/browse/HIVE-15030
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15030.patch
>
>
> Tez cost model might get NPE if collation returned by join algorithm is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15030) Fixes in inference of collation for Tez cost model

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15030:
---
Attachment: HIVE-15030.patch

> Fixes in inference of collation for Tez cost model
> --
>
> Key: HIVE-15030
> URL: https://issues.apache.org/jira/browse/HIVE-15030
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15030.patch
>
>
> Tez cost model might get NPE if collation returned by join algorithm is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work stopped] (HIVE-15030) Fixes in inference of collation for Tez cost model

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15030 stopped by Jesus Camacho Rodriguez.
--
> Fixes in inference of collation for Tez cost model
> --
>
> Key: HIVE-15030
> URL: https://issues.apache.org/jira/browse/HIVE-15030
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15030.patch
>
>
> Tez cost model might get NPE if collation returned by join algorithm is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-15030) Fixes in inference of collation for Tez cost model

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15030 started by Jesus Camacho Rodriguez.
--
> Fixes in inference of collation for Tez cost model
> --
>
> Key: HIVE-15030
> URL: https://issues.apache.org/jira/browse/HIVE-15030
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Tez cost model might get NPE if collation returned by join algorithm is null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15029:
---
Attachment: (was: HIVE-15029.patch)

> Add logic to estimate stats for BETWEEN operator
> 
>
> Key: HIVE-15029
> URL: https://issues.apache.org/jira/browse/HIVE-15029
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15029.01.patch
>
>
> Currently, BETWEEN operator is considered in the default case: reduces the 
> input rows to the half. This may lead to wrong estimates for the number of 
> rows produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15029:
---
Attachment: HIVE-15029.01.patch

> Add logic to estimate stats for BETWEEN operator
> 
>
> Key: HIVE-15029
> URL: https://issues.apache.org/jira/browse/HIVE-15029
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15029.01.patch
>
>
> Currently, BETWEEN operator is considered in the default case: reduces the 
> input rows to the half. This may lead to wrong estimates for the number of 
> rows produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15029:
---
Attachment: HIVE-15029.patch

> Add logic to estimate stats for BETWEEN operator
> 
>
> Key: HIVE-15029
> URL: https://issues.apache.org/jira/browse/HIVE-15029
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15029.patch
>
>
> Currently, BETWEEN operator is considered in the default case: reduces the 
> input rows to the half. This may lead to wrong estimates for the number of 
> rows produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14839) Improve the stability of TestSessionManagerMetrics

2016-10-21 Thread Marta Kuczora (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594850#comment-15594850
 ] 

Marta Kuczora commented on HIVE-14839:
--

Thanks [~aihuaxu] for committing the patch.

> Improve the stability of TestSessionManagerMetrics
> --
>
> Key: HIVE-14839
> URL: https://issues.apache.org/jira/browse/HIVE-14839
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14839.patch
>
>
> The TestSessionManagerMetrics fails occasionally with the following error: 
> {noformat}
> org.junit.ComparisonFailure: expected:<[0]> but was:<[1]>
>   at 
> org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics(TestSessionManagerMetrics.java:98)
> Failed tests: 
>   TestSessionManagerMetrics.testThreadPoolMetrics:98 expected:<[0]> but 
> was:<[1]>
> {noformat}
> This test starts four background threads with a "wait" call in their run 
> method. The threads are using the common "barrier" object as lock. 
> The expected behaviour is that two threads will be in the async pool (because 
> the hive.server2.async.exec.threads is set to 2) and the other two thread 
> will be waiting in the queue. This condition is checked like this:
> {noformat}
> MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, 
> MetricsConstant.EXEC_ASYNC_POOL_SIZE, 2);
> MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, 
> MetricsConstant.EXEC_ASYNC_QUEUE_SIZE, 2);
> {noformat}
>   
> Then a notifyAll is called on the lock object, so the two threads in the pool 
> should "wake up" and complete and the other two threads should go from the 
> queue to the pool. This is checked like this in the test:
> {noformat}
> MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, 
> MetricsConstant.EXEC_ASYNC_POOL_SIZE, 2);
> MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.GAUGE, 
> MetricsConstant.EXEC_ASYNC_QUEUE_SIZE, 0);
> {noformat}
> 
> There are two use cases which can cause error in this test:
> # The notifyAll call happens before both threads in the pool are up and 
> running and in the "wait" phase.
> In this case the thread which is not up in time will stuck in the pool, so 
> the other two threads can not move from the queue to the pool. 
> # After the notifyAll call, the threads in the pool "wake up" with some 
> delay. So they don't complete and removed from the pool and the other two 
> threads are not moved from the queue to the pool until the metrics are 
> checked. Therefore the check fails, since the queue is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15029:
---
Status: Patch Available  (was: In Progress)

> Add logic to estimate stats for BETWEEN operator
> 
>
> Key: HIVE-15029
> URL: https://issues.apache.org/jira/browse/HIVE-15029
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Currently, BETWEEN operator is considered in the default case: reduces the 
> input rows to the half. This may lead to wrong estimates for the number of 
> rows produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15029 started by Jesus Camacho Rodriguez.
--
> Add logic to estimate stats for BETWEEN operator
> 
>
> Key: HIVE-15029
> URL: https://issues.apache.org/jira/browse/HIVE-15029
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Currently, BETWEEN operator is considered in the default case: reduces the 
> input rows to the half. This may lead to wrong estimates for the number of 
> rows produced by Filter operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14543) Create Druid table without specifying data source

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594844#comment-15594844
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14543 at 10/21/16 11:28 AM:
---

[~bslim], this was not implemented. Thus you can extract the name from the 
table properties; we assume that the user specifies it as e.g.:

{code}
TBLPROPERTIES ("druid.datasource" = "wikipedia")
{code}

Check for usage of DRUID_DATA_SOURCE in 
common/src/java/org/apache/hadoop/hive/conf/Constants.java to see how to 
retrieve it.


was (Author: jcamachorodriguez):
[~bslim], this was not implemented. Thus you can extract the name from the 
table properties; we assume that the user specifies it as e.g.:

{code}
TBLPROPERTIES ("druid.datasource" = "wikipedia")
{code}

Check for usage of DRUID_DATA_SOURCE in 
common/src/java/org/apache/hadoop/hive/conf/Constants.java to see how to read 
it.

> Create Druid table without specifying data source
> -
>
> Key: HIVE-14543
> URL: https://issues.apache.org/jira/browse/HIVE-14543
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> We should be able to omit the Druid datasource from the TBLPROPERTIES. In 
> that case, the Druid datasource name should match the Hive table name.
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
> TBLPROPERTIES ("druid.address" = "localhost");
> {code}
> For instance, the statement above creates a table that references the Druid 
> datasource "druid_table_1".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14543) Create Druid table without specifying data source

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594844#comment-15594844
 ] 

Jesus Camacho Rodriguez commented on HIVE-14543:


[~bslim], this was not implemented. Thus you can extract the name from the 
table properties; we assume that the user specifies it as e.g.:

{code}
TBLPROPERTIES ("druid.datasource" = "wikipedia")
{code}

Check for usage of DRUID_DATA_SOURCE in 
common/src/java/org/apache/hadoop/hive/conf/Constants.java to see how to read 
it.

> Create Druid table without specifying data source
> -
>
> Key: HIVE-14543
> URL: https://issues.apache.org/jira/browse/HIVE-14543
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> We should be able to omit the Druid datasource from the TBLPROPERTIES. In 
> that case, the Druid datasource name should match the Hive table name.
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler';
> TBLPROPERTIES ("druid.address" = "localhost");
> {code}
> For instance, the statement above creates a table that references the Druid 
> datasource "druid_table_1".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15026) Option to not merge the views

2016-10-21 Thread Carlos Martinez Moller (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlos Martinez Moller updated HIVE-15026:
--
Attachment: explain.txt
testcase.txt
testcase.png

> Option to not merge the views
> -
>
> Key: HIVE-15026
> URL: https://issues.apache.org/jira/browse/HIVE-15026
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Carlos Martinez Moller
> Attachments: explain.txt, testcase.png, testcase.txt
>
>
> Note: I am trying to simplify a real case scenario we are having and 
> simplifying the queries for the example. Hope they make sense and that the 
> proposal I am doing can be understood. The real query is a lot more complex 
> and long.
> When performing a query of this type:
> --
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TABLE_A
> WHERE COLUMNA=1 AND COLUMND='Case 1'
> UNION ALL
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TABLE_A
> WHERE COLUMNA=10 AND COLUMNE='Case 2'
> --
> This creates Three Stages. First Stage is FULL SCAN of TABLE_A + Filter 
> (COLUMNA=1/COLUMND='Case 1'),  Second Stage is FULL SCAN of TABLE_A again + 
> Filter (COLUMNA=10/COLUMNE='Case 2'), and third stage is the UNION ALL.
> TABLE_A has 2TB data of information.
> But COLUMNA=1 and COLUMNA=10 filter all together only 2GB of information.
> So I thought to use:
> --
> WITH TEMP_VIEW AS
> (SELECT COLUMNA,COLUMNB,COLUMNC,COLUMND
> FROM TABLE_A
> WHERE COLUMNA=1 AND COLUMNA=10)
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TEMP_VIEW
> WHERE COLUMNA=1 AND COLUMND='Case 1'
> UNION ALL
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TEMP_VIEW
> WHERE COLUMNA=10 AND COLUMNE='Case 2'
> ---
> I thought that with this it would create 4 Stages:
> - Stage 1: Full Scan of TABLE_A and generate intermediate data
> - Stage 2: In the data of Stage 1 Filter (COLUMNA=1/COLUMND='Case 1')
> - Stage 3: In the data of Stage 1 Filter (COLUMNA=10/COLUMNE='Case 2')
> - Stage 4: Union ALL
> With this instead of 4TB being read from disk, only 2TB+4GB (twice going 
> through the view) would be read (In our case complexity is even bigger and we 
> will be saving 20TB reading)
> But it does the same than in the original query. It internally pushes the 
> predicates of the "WITH" query in the two parts of the UNION.
> It would be good to have a control on this, or for the optimizer to choose 
> the best approach using histogram/statistics information.
> For those knowing Oracle RDBMS this is equivalent to the MERGE/NO_MERGE and 
> NEST behaviour:
> http://www.dba-oracle.com/t_hint_no_merge.htm as an explanation...
> Other approaches for my example could apply, as partitioning by COLUMNA of 
> BUCKETING. But are not applicable in our case as COLUMNA is not commonly used 
> when accessing this table.
> The point of this JIRA is to add a functionality similar to the one of Oracle 
> (not Merging the query, but generating an in-memory/disk temporary view) both 
> for "WITH" clauses and VIEWS.
> This is very very commonly used in Data Ware Houses managing big amounts of 
> data and provides big performance benefits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15026) Option to not merge the views

2016-10-21 Thread Carlos Martinez Moller (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594648#comment-15594648
 ] 

Carlos Martinez Moller commented on HIVE-15026:
---

In our production we have an old version and in consequence I can't try Hive on 
Spark with the real query.

But I prepared a TestCase to check locally as I have a higher version.

>From my TestCase, using Hive on Spark the file is read twice. It is not 
>generating an intermediate RDD. I don't know if I may be missing a parameter 
>which would control this, I have my parameters set to the default values.

I upload:
- TestCase if you want to play with it
- Screenshot of the Stages of the Job in Spark, where you can see that twice 
523K were read (The size of the data I play with is 523K) Each SubSelect is 
working only with a single row of the table.
- Explain Plan (You can generate it with the TestCase) where you can see that 
two Map Tasks are generated, one for each SubSelect. There is none for the 
"logical view" of the WITH clause.

Creating temp tables is a workaround we thought of. But it would be nice to be 
able to create a single Select that can execute an optimized plan, otherwise it 
implies a bit of development to do something that could be done with a single 
optimized SQL, this is the reason of the Jira.

> Option to not merge the views
> -
>
> Key: HIVE-15026
> URL: https://issues.apache.org/jira/browse/HIVE-15026
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Carlos Martinez Moller
>
> Note: I am trying to simplify a real case scenario we are having and 
> simplifying the queries for the example. Hope they make sense and that the 
> proposal I am doing can be understood. The real query is a lot more complex 
> and long.
> When performing a query of this type:
> --
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TABLE_A
> WHERE COLUMNA=1 AND COLUMND='Case 1'
> UNION ALL
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TABLE_A
> WHERE COLUMNA=10 AND COLUMNE='Case 2'
> --
> This creates Three Stages. First Stage is FULL SCAN of TABLE_A + Filter 
> (COLUMNA=1/COLUMND='Case 1'),  Second Stage is FULL SCAN of TABLE_A again + 
> Filter (COLUMNA=10/COLUMNE='Case 2'), and third stage is the UNION ALL.
> TABLE_A has 2TB data of information.
> But COLUMNA=1 and COLUMNA=10 filter all together only 2GB of information.
> So I thought to use:
> --
> WITH TEMP_VIEW AS
> (SELECT COLUMNA,COLUMNB,COLUMNC,COLUMND
> FROM TABLE_A
> WHERE COLUMNA=1 AND COLUMNA=10)
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TEMP_VIEW
> WHERE COLUMNA=1 AND COLUMND='Case 1'
> UNION ALL
> SELECT COLUMNA, COLUMNB, MAX (COLUMNC)
> FROM TEMP_VIEW
> WHERE COLUMNA=10 AND COLUMNE='Case 2'
> ---
> I thought that with this it would create 4 Stages:
> - Stage 1: Full Scan of TABLE_A and generate intermediate data
> - Stage 2: In the data of Stage 1 Filter (COLUMNA=1/COLUMND='Case 1')
> - Stage 3: In the data of Stage 1 Filter (COLUMNA=10/COLUMNE='Case 2')
> - Stage 4: Union ALL
> With this instead of 4TB being read from disk, only 2TB+4GB (twice going 
> through the view) would be read (In our case complexity is even bigger and we 
> will be saving 20TB reading)
> But it does the same than in the original query. It internally pushes the 
> predicates of the "WITH" query in the two parts of the UNION.
> It would be good to have a control on this, or for the optimizer to choose 
> the best approach using histogram/statistics information.
> For those knowing Oracle RDBMS this is equivalent to the MERGE/NO_MERGE and 
> NEST behaviour:
> http://www.dba-oracle.com/t_hint_no_merge.htm as an explanation...
> Other approaches for my example could apply, as partitioning by COLUMNA of 
> BUCKETING. But are not applicable in our case as COLUMNA is not commonly used 
> when accessing this table.
> The point of this JIRA is to add a functionality similar to the one of Oracle 
> (not Merging the query, but generating an in-memory/disk temporary view) both 
> for "WITH" clauses and VIEWS.
> This is very very commonly used in Data Ware Houses managing big amounts of 
> data and provides big performance benefits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-10-21 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14803:

Attachment: HIVE-14803.3.patch

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch, HIVE-14803.2.patch, 
> HIVE-14803.3.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14968) Fix compilation failure on branch-1

2016-10-21 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594540#comment-15594540
 ] 

Thejas M Nair commented on HIVE-14968:
--

[~sershe] / [~prasanth_j] Can you please review this patch ? (Also, see 
question from [~spena])



> Fix compilation failure on branch-1
> ---
>
> Key: HIVE-14968
> URL: https://issues.apache.org/jira/browse/HIVE-14968
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 1.3.0
>
> Attachments: HIVE-14968-branch-1.1.patch, HIVE-14968.1.patch
>
>
> branch-1 compilation failure due to:
> HIVE-14436: Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'" after enabling 
> hive.optimize.skewjoin and with MR engine
> HIVE-14483 : java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory.commonReadByteArrays
> 1.2 branch is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13873) Column pruning for nested fields

2016-10-21 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-13873:

Attachment: HIVE-13873.5.patch

> Column pruning for nested fields
> 
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
> Attachments: HIVE-13873.1.patch, HIVE-13873.2.patch, 
> HIVE-13873.3.patch, HIVE-13873.4.patch, HIVE-13873.5.patch, HIVE-13873.patch, 
> HIVE-13873.wip.patch
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-10-21 Thread Rajesh Balamohan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594460#comment-15594460
 ] 

Rajesh Balamohan commented on HIVE-14803:
-

Will check running the patch again. I tried it on local environment and did not 
get any of the test failures reported by jenkins (e.g testclidriver).


{noformat}
In qtest, "mvn test -Dtest=TestCliDriver 
-Dqfile=index_auto_partitioned.q,autoColumnStats_4.q,nonmr_fetch.q,outer_join_ppr.q,ppd2.q,input_part9.q,orc_merge9.q,deleteAnalyze.q,louter_join_ppr.q,transform_ppr2.q,ppd_udf_case.q,acid_table_stats.q,pcr.q,stats2.q,insert_values_orig_table_use_metadata.q,union25.q,ppr_allchildsarenull.q
  -Dtest.output.overwrite=true"

---
 T E S T S
---
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was 
removed in 8.0
Running org.apache.hadoop.hive.cli.TestCliDriver
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 850.707 sec - 
in org.apache.hadoop.hive.cli.TestCliDriver

Results :
{noformat}

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch, HIVE-14803.2.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14866:
---
Attachment: (was: HIVE-14866.01.patch)

> Remove logic to set global limit from SemanticAnalyzer
> --
>
> Key: HIVE-14866
> URL: https://issues.apache.org/jira/browse/HIVE-14866
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14866.01.patch, HIVE-14866.patch
>
>
> Currently, we set up the global limit for the query in the SemanticAnalyzer. 
> In addition, we have an optimization rule GlobalLimitOptimizer that prunes 
> the input depending on the global limit and under certain conditions (off by 
> default).
> We would like to remove the dependency on the SemanticAnalyzer and set the 
> global limit within GlobalLimitOptimizer.
> Further, we need to solve the problem with SimpleFetchOptimizer, which only 
> checks the limit but does not take into account the offset of the query, 
> which I think might lead to incorrect results if FetchOptimizer kicks in (not 
> verified yet).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-14866 started by Jesus Camacho Rodriguez.
--
> Remove logic to set global limit from SemanticAnalyzer
> --
>
> Key: HIVE-14866
> URL: https://issues.apache.org/jira/browse/HIVE-14866
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14866.01.patch, HIVE-14866.patch
>
>
> Currently, we set up the global limit for the query in the SemanticAnalyzer. 
> In addition, we have an optimization rule GlobalLimitOptimizer that prunes 
> the input depending on the global limit and under certain conditions (off by 
> default).
> We would like to remove the dependency on the SemanticAnalyzer and set the 
> global limit within GlobalLimitOptimizer.
> Further, we need to solve the problem with SimpleFetchOptimizer, which only 
> checks the limit but does not take into account the offset of the query, 
> which I think might lead to incorrect results if FetchOptimizer kicks in (not 
> verified yet).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14866:
---
Attachment: HIVE-14866.01.patch

> Remove logic to set global limit from SemanticAnalyzer
> --
>
> Key: HIVE-14866
> URL: https://issues.apache.org/jira/browse/HIVE-14866
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14866.01.patch, HIVE-14866.patch
>
>
> Currently, we set up the global limit for the query in the SemanticAnalyzer. 
> In addition, we have an optimization rule GlobalLimitOptimizer that prunes 
> the input depending on the global limit and under certain conditions (off by 
> default).
> We would like to remove the dependency on the SemanticAnalyzer and set the 
> global limit within GlobalLimitOptimizer.
> Further, we need to solve the problem with SimpleFetchOptimizer, which only 
> checks the limit but does not take into account the offset of the query, 
> which I think might lead to incorrect results if FetchOptimizer kicks in (not 
> verified yet).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

2016-10-21 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14866:
---
Status: Open  (was: Patch Available)

> Remove logic to set global limit from SemanticAnalyzer
> --
>
> Key: HIVE-14866
> URL: https://issues.apache.org/jira/browse/HIVE-14866
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14866.01.patch, HIVE-14866.patch
>
>
> Currently, we set up the global limit for the query in the SemanticAnalyzer. 
> In addition, we have an optimization rule GlobalLimitOptimizer that prunes 
> the input depending on the global limit and under certain conditions (off by 
> default).
> We would like to remove the dependency on the SemanticAnalyzer and set the 
> global limit within GlobalLimitOptimizer.
> Further, we need to solve the problem with SimpleFetchOptimizer, which only 
> checks the limit but does not take into account the offset of the query, 
> which I think might lead to incorrect results if FetchOptimizer kicks in (not 
> verified yet).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14920) S3: Optimize SimpleFetchOptimizer::checkThreshold()

2016-10-21 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14920:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed patch to master. Thanks [~rajesh.balamohan] for the patch!

> S3: Optimize SimpleFetchOptimizer::checkThreshold()
> ---
>
> Key: HIVE-14920
> URL: https://issues.apache.org/jira/browse/HIVE-14920
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14920.1.patch, HIVE-14920.2.patch
>
>
> Query: Simple query like the following takes lot longer time in query 
> compilation phase (~330 seconds for 200 GB dataset with tpc-ds)
> {noformat}
> select ws_item_sk from web_sales where ws_item_sk > 10 limit 10;
> {noformat}
> This enables {{SimpleFetchOptimizer}} which internally tries to figure out if 
> the size of the data is within the threshold defined in 
> {{hive.fetch.task.conversion.threshold}} ~1GB.
> This turns out to be super expensive when the dataset is partitioned. E.g 
> stacktrace is given below. Note that this happens in client side and tries to 
> get the length for 1800+ partitions before proceeding to next rule.
> {noformat}
> at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1486)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.getFileLength(SimpleFetchOptimizer.java:466)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.calculateLength(SimpleFetchOptimizer.java:451)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.getInputLength(SimpleFetchOptimizer.java:423)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$300(SimpleFetchOptimizer.java:323)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.checkThreshold(SimpleFetchOptimizer.java:168)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:133)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:105)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:207)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10466)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:216)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:230)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:230)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:464)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14920) S3: Optimize SimpleFetchOptimizer::checkThreshold()

2016-10-21 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14920:
-
Affects Version/s: 2.2.0

> S3: Optimize SimpleFetchOptimizer::checkThreshold()
> ---
>
> Key: HIVE-14920
> URL: https://issues.apache.org/jira/browse/HIVE-14920
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14920.1.patch, HIVE-14920.2.patch
>
>
> Query: Simple query like the following takes lot longer time in query 
> compilation phase (~330 seconds for 200 GB dataset with tpc-ds)
> {noformat}
> select ws_item_sk from web_sales where ws_item_sk > 10 limit 10;
> {noformat}
> This enables {{SimpleFetchOptimizer}} which internally tries to figure out if 
> the size of the data is within the threshold defined in 
> {{hive.fetch.task.conversion.threshold}} ~1GB.
> This turns out to be super expensive when the dataset is partitioned. E.g 
> stacktrace is given below. Note that this happens in client side and tries to 
> get the length for 1800+ partitions before proceeding to next rule.
> {noformat}
> at 
> org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1486)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.getFileLength(SimpleFetchOptimizer.java:466)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.calculateLength(SimpleFetchOptimizer.java:451)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.getInputLength(SimpleFetchOptimizer.java:423)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$300(SimpleFetchOptimizer.java:323)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.checkThreshold(SimpleFetchOptimizer.java:168)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:133)
> at 
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:105)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:207)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10466)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:216)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:230)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:230)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:464)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-15028) LLAP: Enable KeepAlive by default

2016-10-21 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-15028:
--

Assignee: Gopal V

> LLAP: Enable KeepAlive by default
> -
>
> Key: HIVE-15028
> URL: https://issues.apache.org/jira/browse/HIVE-15028
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> {code}
>   public static final String SHUFFLE_CONNECTION_KEEP_ALIVE_ENABLED =
>   "llap.shuffle.connection-keep-alive.enable";
>   public static final boolean DEFAULT_SHUFFLE_CONNECTION_KEEP_ALIVE_ENABLED = 
> false;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

[jira] [Updated] (HIVE-15030) Fixes in inference of collation for Tez cost model

[jira] [Work started] (HIVE-15030) Fixes in inference of collation for Tez cost model

[jira] [Updated] (HIVE-15030) Fixes in inference of collation for Tez cost model

[jira] [Work stopped] (HIVE-15030) Fixes in inference of collation for Tez cost model

[jira] [Work started] (HIVE-15030) Fixes in inference of collation for Tez cost model

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

[jira] [Commented] (HIVE-14839) Improve the stability of TestSessionManagerMetrics

[jira] [Updated] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

[jira] [Work started] (HIVE-15029) Add logic to estimate stats for BETWEEN operator

[jira] [Comment Edited] (HIVE-14543) Create Druid table without specifying data source

[jira] [Commented] (HIVE-14543) Create Druid table without specifying data source

[jira] [Updated] (HIVE-15026) Option to not merge the views

[jira] [Commented] (HIVE-15026) Option to not merge the views

[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

[jira] [Commented] (HIVE-14968) Fix compilation failure on branch-1

[jira] [Updated] (HIVE-13873) Column pruning for nested fields

[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

[jira] [Updated] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

[jira] [Work started] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

[jira] [Updated] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

[jira] [Updated] (HIVE-14866) Remove logic to set global limit from SemanticAnalyzer

[jira] [Updated] (HIVE-14920) S3: Optimize SimpleFetchOptimizer::checkThreshold()

[jira] [Updated] (HIVE-14920) S3: Optimize SimpleFetchOptimizer::checkThreshold()

[jira] [Assigned] (HIVE-15028) LLAP: Enable KeepAlive by default

< 1 2

101 - 127 of 127 matches

Site Navigation

Mail list logo

Footer information