date:20220111

[jira] [Assigned] (HIVE-25862) Persist the time of last run in the initiator

2022-01-11 Thread Antal Sinkovits (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25862:
--


> Persist the time of last run in the initiator
> -
>
> Key: HIVE-25862
> URL: https://issues.apache.org/jira/browse/HIVE-25862
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>
> The time of last run is used as a filter when finding compaction candidates.
> Because its only stored in memory, we lose this filtering capability if the 
> service restarts, so it would make sense to persist it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work stopped] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan

2022-01-11 Thread Jun Di (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25861 stopped by Jun Di.
-
> When ConstantPropagate optimizer optimizes case when equals case when twice, 
> got wrong logical execution plan
> -
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Jun Di
>Assignee: Jun Di
>Priority: Critical
> Attachments: 1.png, 2.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case 
> when (
> case 
> when t1.column_1 in (31, 32, 33, 34) 
> then 31
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (31, 32, 33, 34) 
> then 31
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 44 and t2.column_1 is 44 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (31, 32, 33, 34)) THEN 
> (CASE WHEN ((_col46) IN (31, 32, 33, 34)) THEN ((true = 
> _col20)) ELSE (((_col46 = 31) = _col20)) END) ELSE (CASE WHEN ((_col46) 
> IN (31, 32, 33, 34)) THEN ((true = _col20)) ELSE (((_col46 = 
> 31) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan

2022-01-11 Thread Jun Di (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474311#comment-17474311
 ] 

Jun Di edited comment on HIVE-25861 at 1/12/22, 7:28 AM:
-

case when unoptimized ExprNode is
{code:sql}
GenericUDFOPEqual(
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 31, Const int 32, Const 
int 33, Const int 34), 
Const int 31, 
Column[_col20]),
GenericUDFWhen(
GenericUDFIn(Column[_col46], Const int 31, Const int 32, Const 
int 33, Const int 34), 
Const int 31, 
Column[_col46])
)
{code}

ExprNode optimized by ConstantPropagate for the first time

{code:sql}
GenericUDFWhen(
  GenericUDFIn(Column[_col20], Const int 31, Const int 32, Const int 
33, Const int 34), 
  GenericUDFOPEqual(
Const int 31, 
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 31, Const int 
32, Const int 33, Const int 34), Const int 31, Column[_col46]))
  GenericUDFOPEqual(
Column[_col20], 
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 31, Const int 
32, Const int 33, Const int 34), Const int 31, Column[_col46])))
{code}

But two GenericUDFWhen in GenericUDFOPEqual are the same object
So it resulted in the wrong result when optimized by ConstantPropagate for the 
second time

 !2.png! 



was (Author: JIRAUSER283421):
case when unoptimized ExprNode in sql is
{code:sql}
GenericUDFOPEqual(
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 31, Const int 32, Const 
int 33, Const int 34), 
Const int 31, 
Column[_col20]),
GenericUDFWhen(
GenericUDFIn(Column[_col46], Const int 31, Const int 32, Const 
int 33, Const int 34), 
Const int 31, 
Column[_col46])
)
{code}

ExprNode optimized by ConstantPropagate for the first time

{code:sql}
GenericUDFWhen(
  GenericUDFIn(Column[_col20], Const int 31, Const int 32, Const int 
33, Const int 34), 
  GenericUDFOPEqual(
Const int 31, 
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 31, Const int 
32, Const int 33, Const int 34), Const int 31, Column[_col46]))
  GenericUDFOPEqual(
Column[_col20], 
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 31, Const int 
32, Const int 33, Const int 34), Const int 31, Column[_col46])))
{code}

But two GenericUDFWhen in GenericUDFOPEqual are the same object
So it resulted in the wrong result when optimized by ConstantPropagate for the 
second time

 !2.png! 


> When ConstantPropagate optimizer optimizes case when equals case when twice, 
> got wrong logical execution plan
> -
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Jun Di
>Assignee: Jun Di
>Priority: Critical
> Attachments: 1.png, 2.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case 
> when (
> case 
> when t1.column_1 in (31, 32, 33, 34) 
> then 31
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (31, 32, 33, 34) 
> then 31
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 44 and t2.column_1 is 44 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (31, 32, 33, 34)) THEN 
> (CASE WHEN ((_col46) IN (31, 32, 33, 34)) THEN ((true = 
> _col20)) ELSE (((_col46 = 31) = _col20)) END) ELSE (CASE WHEN ((_col46) 
> IN (31, 32, 33, 34)) THEN ((true = _col20)) ELSE (((_col46 = 
> 31) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25384) Bump ORC to 1.6.9

2022-01-11 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474312#comment-17474312
 ] 

Dongjoon Hyun commented on HIVE-25384:
--

[~euigeun_chung] Please see HIVE-25497 . The Hive community is already tying to 
use Apache ORC 1.7.2.

> Bump ORC to 1.6.9
> -
>
> Key: HIVE-25384
> URL: https://issues.apache.org/jira/browse/HIVE-25384
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ORC-804 affects ORC 1.6.0 ~ 1.6.8.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25497) Bump ORC to 1.7.1

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25497?focusedWorklogId=707305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707305
 ]

ASF GitHub Bot logged work on HIVE-25497:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 07:25
Start Date: 12/Jan/22 07:25
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #2853:
URL: https://github.com/apache/hive/pull/2853#issuecomment-1010720407


   Hi, @pgaref .
   - What is the current status of this PR?
   - Could you update the PR title to 1.7.2?
   
   cc @williamhyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707305)
Time Spent: 1h 40m  (was: 1.5h)

> Bump ORC to 1.7.1
> -
>
> Key: HIVE-25497
> URL: https://issues.apache.org/jira/browse/HIVE-25497
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: William Hyun
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25497) Bump ORC to 1.7.2

2022-01-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated HIVE-25497:
-
Summary: Bump ORC to 1.7.2  (was: Bump ORC to 1.7.1)

> Bump ORC to 1.7.2
> -
>
> Key: HIVE-25497
> URL: https://issues.apache.org/jira/browse/HIVE-25497
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: William Hyun
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan

2022-01-11 Thread Jun Di (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Di updated HIVE-25861:
--
Attachment: 2.png

> When ConstantPropagate optimizer optimizes case when equals case when twice, 
> got wrong logical execution plan
> -
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Jun Di
>Assignee: Jun Di
>Priority: Critical
> Attachments: 1.png, 2.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case 
> when (
> case 
> when t1.column_1 in (31, 32, 33, 34) 
> then 31
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (31, 32, 33, 34) 
> then 31
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 44 and t2.column_1 is 44 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (31, 32, 33, 34)) THEN 
> (CASE WHEN ((_col46) IN (31, 32, 33, 34)) THEN ((true = 
> _col20)) ELSE (((_col46 = 31) = _col20)) END) ELSE (CASE WHEN ((_col46) 
> IN (31, 32, 33, 34)) THEN ((true = _col20)) ELSE (((_col46 = 
> 31) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan

2022-01-11 Thread Jun Di (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474311#comment-17474311
 ] 

Jun Di commented on HIVE-25861:
---

case when unoptimized ExprNode in sql is
{code:sql}
GenericUDFOPEqual(
GenericUDFWhen(
GenericUDFIn(Column[_col20], Const int 31, Const int 32, Const 
int 33, Const int 34), 
Const int 31, 
Column[_col20]),
GenericUDFWhen(
GenericUDFIn(Column[_col46], Const int 31, Const int 32, Const 
int 33, Const int 34), 
Const int 31, 
Column[_col46])
)
{code}

ExprNode optimized by ConstantPropagate for the first time

{code:sql}
GenericUDFWhen(
  GenericUDFIn(Column[_col20], Const int 31, Const int 32, Const int 
33, Const int 34), 
  GenericUDFOPEqual(
Const int 31, 
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 31, Const int 
32, Const int 33, Const int 34), Const int 31, Column[_col46]))
  GenericUDFOPEqual(
Column[_col20], 
GenericUDFWhen(GenericUDFIn(Column[_col46], Const int 31, Const int 
32, Const int 33, Const int 34), Const int 31, Column[_col46])))
{code}

But two GenericUDFWhen in GenericUDFOPEqual are the same object
So it resulted in the wrong result when optimized by ConstantPropagate for the 
second time

 !2.png! 


> When ConstantPropagate optimizer optimizes case when equals case when twice, 
> got wrong logical execution plan
> -
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Jun Di
>Assignee: Jun Di
>Priority: Critical
> Attachments: 1.png, 2.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case 
> when (
> case 
> when t1.column_1 in (31, 32, 33, 34) 
> then 31
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (31, 32, 33, 34) 
> then 31
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 44 and t2.column_1 is 44 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (31, 32, 33, 34)) THEN 
> (CASE WHEN ((_col46) IN (31, 32, 33, 34)) THEN ((true = 
> _col20)) ELSE (((_col46 = 31) = _col20)) END) ELSE (CASE WHEN ((_col46) 
> IN (31, 32, 33, 34)) THEN ((true = _col20)) ELSE (((_col46 = 
> 31) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work started] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan

2022-01-11 Thread Jun Di (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25861 started by Jun Di.
-
> When ConstantPropagate optimizer optimizes case when equals case when twice, 
> got wrong logical execution plan
> -
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Jun Di
>Assignee: Jun Di
>Priority: Critical
> Attachments: 1.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case 
> when (
> case 
> when t1.column_1 in (31, 32, 33, 34) 
> then 31
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (31, 32, 33, 34) 
> then 31
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 44 and t2.column_1 is 44 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (31, 32, 33, 34)) THEN 
> (CASE WHEN ((_col46) IN (31, 32, 33, 34)) THEN ((true = 
> _col20)) ELSE (((_col46 = 31) = _col20)) END) ELSE (CASE WHEN ((_col46) 
> IN (31, 32, 33, 34)) THEN ((true = _col20)) ELSE (((_col46 = 
> 31) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25861) When ConstantPropagate optimizer optimizes case when equals case when twice, got wrong logical execution plan

2022-01-11 Thread Jun Di (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Di reassigned HIVE-25861:
-


> When ConstantPropagate optimizer optimizes case when equals case when twice, 
> got wrong logical execution plan
> -
>
> Key: HIVE-25861
> URL: https://issues.apache.org/jira/browse/HIVE-25861
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Jun Di
>Assignee: Jun Di
>Priority: Critical
> Attachments: 1.png
>
>
> when run the following sql:
> {code:sql}
> select
> t1.column_1,
> t2.column_1,
> t1.column_2,
> t1.column_3,
> case 
> when (
> case 
> when t1.column_1 in (31, 32, 33, 34) 
> then 31
> else t1.column_1
> end
> ) = (
> case
> when t2.column_1 in (31, 32, 33, 34) 
> then 31
> else t2.column_1
> end
> )
> then t1.column_2
> else t1.column_3
> end as result
> from
> dim.dim_xmf_center t1
> left join dim.dim_xmf_center t2
> where
> t1.mt = '202201';
> {code}
> t1.column_1 is 44 and t2.column_1 is 44 but the result is t1.column_3
> Please see picture 1.png in the attachment for the result
> I found that the case when part of the execution plan is wrong:
> {code:sql}
> CASE WHEN (CASE WHEN ((_col20) IN (31, 32, 33, 34)) THEN 
> (CASE WHEN ((_col46) IN (31, 32, 33, 34)) THEN ((true = 
> _col20)) ELSE (((_col46 = 31) = _col20)) END) ELSE (CASE WHEN ((_col46) 
> IN (31, 32, 33, 34)) THEN ((true = _col20)) ELSE (((_col46 = 
> 31) = _col20)) END) END) THEN (_col12) ELSE (_col15) END
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25384) Bump ORC to 1.6.9

2022-01-11 Thread Eugene Chung (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474274#comment-17474274
 ] 

Eugene Chung commented on HIVE-25384:
-

[~dongjoon] Would you please update the ORC version of Hive 4 to 1.6.12? 
IllegalArgumentException has been occurred while using zstd option with my 
company's data. It doesn't happen when using 1.6.12.

{{Caused by: java.lang.IllegalArgumentException: fromIndex(4) > toIndex(0)
  at java.util.ArrayList.subListRangeCheck(ArrayList.java:1006)
  at java.util.ArrayList.subList(ArrayList.java:996)
  at 
java.util.Collections$UnmodifiableRandomAccessList.subList(Collections.java:1400)
  at 
org.apache.orc.impl.writer.TreeWriterBase.removeIsPresentPositions(TreeWriterBase.java:228)
  at 
org.apache.orc.impl.writer.TreeWriterBase.writeStripe(TreeWriterBase.java:257)
  at 
org.apache.orc.impl.writer.StructTreeWriter.writeStripe(StructTreeWriter.java:112)
  at org.apache.orc.impl.WriterImpl.flushStripe(WriterImpl.java:526)
  at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:728)
  at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:338)
  at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:113)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:229)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeWriters(FileSinkOperator.java:1218)
  at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1209)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
  at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
  at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
  ... 19 more}}

> Bump ORC to 1.6.9
> -
>
> Key: HIVE-25384
> URL: https://issues.apache.org/jira/browse/HIVE-25384
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ORC-804 affects ORC 1.6.0 ~ 1.6.8.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25637) Hive on Tez: inserting data failing into the non native hive external table managed by kafka storage handler

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25637?focusedWorklogId=707187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707187
 ]

ASF GitHub Bot logged work on HIVE-25637:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 00:13
Start Date: 12/Jan/22 00:13
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2753:
URL: https://github.com/apache/hive/pull/2753


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707187)
Time Spent: 50m  (was: 40m)

> Hive on Tez: inserting data failing into the non native hive external table 
> managed by kafka storage handler 
> -
>
> Key: HIVE-25637
> URL: https://issues.apache.org/jira/browse/HIVE-25637
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is the followup for HIVE-23408, repro is below:
> {code}
> CREATE EXTERNAL TABLE `kafka_table`( 
>   `timestamp` timestamp COMMENT 'from deserializer',
>   `page` string COMMENT 'from deserializer', 
>   `newpage` boolean COMMENT 'from deserializer', 
>   `added` int COMMENT 'from deserializer',   
>   `deleted` bigint COMMENT 'from deserializer',  
>   `delta` double COMMENT 'from deserializer')
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.kafka.KafkaSerDe'  
> STORED BY
>   'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
> WITH SERDEPROPERTIES (   
>   'serialization.format'='1')
> LOCATION 
>   
> 'hdfs://lbodorkafkaunsec-2.lbodorkafkaunsec.root.hwx.site:8020/warehouse/tablespace/external/hive/kafka_table'
> TBLPROPERTIES (  
>   'bucketing_version'='2',   
>   'hive.kafka.max.retries'='6',  
>   'hive.kafka.metadata.poll.timeout.ms'='3', 
>   'hive.kafka.optimistic.commit'='false',
>   'hive.kafka.poll.timeout.ms'='5000',   
>   
> 'kafka.bootstrap.servers'='lbodorkafkaunsec-1.lbodorkafkaunsec.root.hwx.site:9092,lbodorkafkaunsec-2.lbodorkafkaunsec.root.hwx.site:9092,lbodorkafkaunsec-3.lbodorkafkaunsec.root.hwx.site:9092',
>   'kafka.serde.class'='org.apache.hadoop.hive.serde2.JsonSerDe',
>   'kafka.topic'='hit-topic-1',   
>   'kafka.write.semantic'='AT_LEAST_ONCE');
> SELECT COUNT(*) FROM kafka_table WHERE `__timestamp` > 1000 * 
> to_unix_timestamp(CURRENT_TIMESTAMP - interval '10' MINUTES); # works due to 
> HIVE-23408
> insert into kafka_table values(NULL, 'comment', 0, 1, 2, 3.0, NULL, NULL, 
> NULL, NULL); # fails
> {code}
> exception I get:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.kafkaesque.common.KafkaException: Failed to construct kafka 
> producer
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:829)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1004)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:133)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:110)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFInline.process(GenericUDTFInline.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
>   at 
>

[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=707186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707186
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 00:10
Start Date: 12/Jan/22 00:10
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #2906:
URL: https://github.com/apache/hive/pull/2906#discussion_r782617149



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
##
@@ -164,7 +164,7 @@ public void run() {
 
   for (CompactionInfo ci : potentials) {
 try {
-  Table t = resolveTable(ci);
+  Table t = resolveTableAndCache(ci);

Review comment:
   Issue is more pronounced in the following code path.
   
   {code}
   Set potentials = compactionExecutor.submit(() ->
   txnHandler.findPotentialCompactions(abortedThreshold, 
abortedTimeThreshold, compactionInterval)
 .parallelStream()
 .filter(ci -> isEligibleForCompaction(ci, currentCompactions, 
skipDBs, skipTables))
 .collect(Collectors.toSet())).get();
   {code}
   
   i.e in `isEligibleForCompaction`.  
   
   Instead of introducing the cache and invalidating it later, is it possible 
to create a simple hashmap in the method itself and use it on need basis? (e.g 
if there are multiple partitions, it will just do one lookup for table and the 
rest of the partitions will make use of the value in map).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707186)
Time Spent: 2h  (was: 1h 50m)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of databases/tables, it takes lot of time for Initiator to 
> complete its initial iteration and load on DB also goes higher.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L129
> https://github.com/apache/hive/blob/64bb52316f19426ebea0087ee15e282cbde1d852/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L456
> For all the following partitions, table details would be the same. However, 
> it ends up fetching table details from HMS again and again.
> {noformat}
> 2021-02-22 08:13:16,106 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451899
> 2021-02-22 08:13:16,124 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451830
> 2021-02-22 08:13:16,140 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452586
> 2021-02-22 08:13:16,149 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452698
> 2021-02-22 08:13:16,158 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25829) Tez exec mode support for credential provider for jobs

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25829?focusedWorklogId=707070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707070
 ]

ASF GitHub Bot logged work on HIVE-25829:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 19:06
Start Date: 11/Jan/22 19:06
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2911:
URL: https://github.com/apache/hive/pull/2911#issuecomment-1010274209


   > I would expect to add the TEZ case as part of 
https://github.com/apache/hive/blob/master/ql/src/test/org/apache/hadoop/hive/ql/exec/TestHiveCredentialProviders.java
   > 
   > This can also be done as a follow up -- changes look straightforward to me
   
   absolutely makes sense, added corresponding cases to the existing unit tests
   if it passes in precommit, I'll merge this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707070)
Time Spent: 2.5h  (was: 2h 20m)

> Tez exec mode support for credential provider for jobs
> --
>
> Key: HIVE-25829
> URL: https://issues.apache.org/jira/browse/HIVE-25829
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ádám Szita
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> HIVE-14822 introduced support to securely forward a job specific java 
> credential store path, and a corresponding password to the backend executors. 
> This is currently implemented for only MR2 and Spark execution engines. I 
> propose we extend this feature by adding Tez mode to said list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25829) Tez exec mode support for credential provider for jobs

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25829?focusedWorklogId=707067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707067
 ]

ASF GitHub Bot logged work on HIVE-25829:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 18:58
Start Date: 11/Jan/22 18:58
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2911:
URL: https://github.com/apache/hive/pull/2911#discussion_r782442107



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java
##
@@ -244,6 +256,16 @@ public static String 
getJobCredentialProviderPassword(Configuration conf) {
 return null;
   }
 
+  /**
+   * Sets a "keyName=newKeyValue" pair to a jobConf to a given property.
+   * If the property is empty, is simply inserts keyName=newKeyValue,

Review comment:
   going to fix it before committing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707067)
Time Spent: 2h 20m  (was: 2h 10m)

> Tez exec mode support for credential provider for jobs
> --
>
> Key: HIVE-25829
> URL: https://issues.apache.org/jira/browse/HIVE-25829
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ádám Szita
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> HIVE-14822 introduced support to securely forward a job specific java 
> credential store path, and a corresponding password to the backend executors. 
> This is currently implemented for only MR2 and Spark execution engines. I 
> propose we extend this feature by adding Tez mode to said list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25829) Tez exec mode support for credential provider for jobs

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25829?focusedWorklogId=707040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707040
 ]

ASF GitHub Bot logged work on HIVE-25829:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 18:21
Start Date: 11/Jan/22 18:21
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #2911:
URL: https://github.com/apache/hive/pull/2911#discussion_r782417105



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java
##
@@ -244,6 +256,16 @@ public static String 
getJobCredentialProviderPassword(Configuration conf) {
 return null;
   }
 
+  /**
+   * Sets a "keyName=newKeyValue" pair to a jobConf to a given property.
+   * If the property is empty, is simply inserts keyName=newKeyValue,

Review comment:
   typo: it simply




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707040)
Time Spent: 2h 10m  (was: 2h)

> Tez exec mode support for credential provider for jobs
> --
>
> Key: HIVE-25829
> URL: https://issues.apache.org/jira/browse/HIVE-25829
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ádám Szita
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> HIVE-14822 introduced support to securely forward a job specific java 
> credential store path, and a corresponding password to the backend executors. 
> This is currently implemented for only MR2 and Spark execution engines. I 
> propose we extend this feature by adding Tez mode to said list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25859:

Fix Version/s: 4.0.0

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: hive.log
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17473007#comment-17473007
 ] 

László Bodor commented on HIVE-25859:
-

thanks for the fix [~ayushtkn], assigned the jira to you for credits
I'm not resolving this at the moment, in case we want to double-check how 
exactly happened that it hasn't failed until now


> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: hive.log
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-25859:
---

Assignee: Ayush Saxena

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Ayush Saxena
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: hive.log
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?focusedWorklogId=707038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707038
 ]

ASF GitHub Bot logged work on HIVE-25859:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 18:17
Start Date: 11/Jan/22 18:17
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2939:
URL: https://github.com/apache/hive/pull/2939


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707038)
Time Spent: 20m  (was: 10m)

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: hive.log
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?focusedWorklogId=707029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707029
 ]

ASF GitHub Bot logged work on HIVE-25859:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 18:05
Start Date: 11/Jan/22 18:05
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2939:
URL: https://github.com/apache/hive/pull/2939#issuecomment-1010225129


   I'm about to approve this PR, tested the patch locally, works, also I can 
see TestMetastoreVersion failing, which is unrelated (and passes locally), 
let's unblock master as soon as possible
   thanks a lot @ayushtkn for the quick fix!
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707029)
Remaining Estimate: 0h
Time Spent: 10m

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25859:
--
Labels: pull-request-available  (was: )

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: hive.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25829) Tez exec mode support for credential provider for jobs

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25829?focusedWorklogId=707022=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707022
 ]

ASF GitHub Bot logged work on HIVE-25829:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 17:55
Start Date: 11/Jan/22 17:55
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2911:
URL: https://github.com/apache/hive/pull/2911#issuecomment-1010217059


   can I get approval or rejection on this one? looks like there is an extreme 
customer push on this case
   
   test failures are all due to https://issues.apache.org/jira/browse/HIVE-25859


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707022)
Time Spent: 2h  (was: 1h 50m)

> Tez exec mode support for credential provider for jobs
> --
>
> Key: HIVE-25829
> URL: https://issues.apache.org/jira/browse/HIVE-25829
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ádám Szita
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HIVE-14822 introduced support to securely forward a job specific java 
> credential store path, and a corresponding password to the backend executors. 
> This is currently implemented for only MR2 and Spark execution engines. I 
> propose we extend this feature by adding Tez mode to said list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=706982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706982
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 16:33
Start Date: 11/Jan/22 16:33
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #2906:
URL: https://github.com/apache/hive/pull/2906#issuecomment-1010144652


   @asinkovits, please remove partition cache from here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706982)
Time Spent: 1h 50m  (was: 1h 40m)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of databases/tables, it takes lot of time for Initiator to 
> complete its initial iteration and load on DB also goes higher.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L129
> https://github.com/apache/hive/blob/64bb52316f19426ebea0087ee15e282cbde1d852/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L456
> For all the following partitions, table details would be the same. However, 
> it ends up fetching table details from HMS again and again.
> {noformat}
> 2021-02-22 08:13:16,106 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451899
> 2021-02-22 08:13:16,124 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451830
> 2021-02-22 08:13:16,140 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452586
> 2021-02-22 08:13:16,149 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452698
> 2021-02-22 08:13:16,158 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-24029) MV fails for queries with subqueries

2022-01-11 Thread Aman Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved HIVE-24029.
---
Resolution: Cannot Reproduce

Resolving based on above.

> MV fails for queries with subqueries
> 
>
> Key: HIVE-24029
> URL: https://issues.apache.org/jira/browse/HIVE-24029
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Priority: Major
>
> {noformat}
>  explain create materialized view q16 as select  
>count(distinct cs_order_number) as `order count`
>   ,sum(cs_ext_ship_cost) as `total shipping cost`
>   ,sum(cs_net_profit) as `total net profit`
> from
>catalog_sales cs1
>   ,date_dim
>   ,customer_address
>   ,call_center
> where
> d_date between '1999-4-01' and 
>(cast('1999-4-01' as date) + 60 days)
> and cs1.cs_ship_date_sk = d_date_sk
> and cs1.cs_ship_addr_sk = ca_address_sk
> and ca_state = 'IL'
> and cs1.cs_call_center_sk = cc_call_center_sk
> and cc_county in ('Richland County','Bronx County','Maverick County','Mesa 
> County',
>   'Raleigh County'
> )
> and exists (select *
> from catalog_sales cs2
> where cs1.cs_order_number = cs2.cs_order_number
>   and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
> and not exists(select *
>from catalog_returns cr1
>where cs1.cs_order_number = cr1.cr_order_number)
> {noformat}
> Error
> {noformat}
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10249]: Line 24:8 Unsupported SubQuery Expression 'cr_order_number': Only 1 
> SubQuery expression is supported. (state=42000,code=10249)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-24028) MV query fails with CalciteViewSemanticException

2022-01-11 Thread Aman Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved HIVE-24028.
---
Resolution: Not A Bug

Resolving based on above.

> MV query fails with CalciteViewSemanticException
> 
>
> Key: HIVE-24028
> URL: https://issues.apache.org/jira/browse/HIVE-24028
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Reporter: Rajesh Balamohan
>Priority: Major
>
> {noformat}
> explain create materialized view qmv39 as 
> with inv as
> (select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy
>,stdev,mean, case mean when 0 then null else stdev/mean end cov
>  from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy
> ,stddev_samp(inv_quantity_on_hand) 
> stdev,avg(inv_quantity_on_hand) mean
>   from inventory
>   ,item
>   ,warehouse
>   ,date_dim
>   where inv_item_sk = i_item_sk
> and inv_warehouse_sk = w_warehouse_sk
> and inv_date_sk = d_date_sk
> and d_year =2000
>   group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo
>  where case mean when 0 then 0 else stdev/mean end > 1)
> select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov
> ,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov
> from inv inv1,inv inv2
> where inv1.i_item_sk = inv2.i_item_sk
>   and inv1.w_warehouse_sk =  inv2.w_warehouse_sk
>   and inv1.d_moy=2
>   and inv2.d_moy=2+1
> {noformat}
> {noformat}
> Error: Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteViewSemanticException: 
> Duplicate column name: w_warehouse_sk (state=42000,code=4)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25801?focusedWorklogId=706898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706898
 ]

ASF GitHub Bot logged work on HIVE-25801:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:43
Start Date: 11/Jan/22 14:43
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on a change in pull request #2879:
URL: https://github.com/apache/hive/pull/2879#discussion_r782214222



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3257,11 +3257,23 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_COMPACTOR_WAIT_TIMEOUT("hive.compactor.wait.timeout", 30L, "Time 
out in "
 + "milliseconds for blocking compaction. It's value has to be higher 
than 2000 milliseconds. "),
 
-HIVE_MR_COMPACTOR_GATHER_STATS("hive.mr.compactor.gather.stats", true, "If 
set to true MAJOR compaction " +
+/**
+ * @deprecated This config value is honoured by the MR based compaction 
only.
+ * Use the {@link HiveConf.ConfVars#HIVE_COMPACTOR_GATHER_STATS}
+ * config instead which is honoured by both the MR and Query based 
compaction.
+ */
+@Deprecated
+HIVE_MR_COMPACTOR_GATHER_STATS("hive.mr.compactor.gather.stats", false, 
"If set to true MAJOR compaction " +
 "will gather stats if there are stats already associated with the 
table/partition.\n" +
 "Turn this off to save some resources and the stats are not used 
anyway.\n" +
 "Works only for MR based compaction, CRUD based compaction uses 
hive.stats.autogather."),
 
+HIVE_COMPACTOR_GATHER_STATS("hive.compactor.gather.stats", true, "If set 
to true MAJOR compaction " +

Review comment:
   Yes, this is the current behavior




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706898)
Time Spent: 3.5h  (was: 3h 20m)

> Custom queue settings is not honoured by Query based compaction StatsUpdater
> 
>
> Key: HIVE-25801
> URL: https://issues.apache.org/jira/browse/HIVE-25801
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {{hive.compactor.job.queue}} config limits resources available for 
> compaction, so users can limit the effects of compaction on the cluster. 
> However this settings does not affect stats collection which uses Driver.
> HIVE-25595 is addressing the above issue for MR-based compaction. We need to 
> incorporate the same thing for the Query-based compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25801?focusedWorklogId=706888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706888
 ]

ASF GitHub Bot logged work on HIVE-25801:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:28
Start Date: 11/Jan/22 14:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2879:
URL: https://github.com/apache/hive/pull/2879#discussion_r782198917



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -3257,11 +3257,23 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 HIVE_COMPACTOR_WAIT_TIMEOUT("hive.compactor.wait.timeout", 30L, "Time 
out in "
 + "milliseconds for blocking compaction. It's value has to be higher 
than 2000 milliseconds. "),
 
-HIVE_MR_COMPACTOR_GATHER_STATS("hive.mr.compactor.gather.stats", true, "If 
set to true MAJOR compaction " +
+/**
+ * @deprecated This config value is honoured by the MR based compaction 
only.
+ * Use the {@link HiveConf.ConfVars#HIVE_COMPACTOR_GATHER_STATS}
+ * config instead which is honoured by both the MR and Query based 
compaction.
+ */
+@Deprecated
+HIVE_MR_COMPACTOR_GATHER_STATS("hive.mr.compactor.gather.stats", false, 
"If set to true MAJOR compaction " +
 "will gather stats if there are stats already associated with the 
table/partition.\n" +
 "Turn this off to save some resources and the stats are not used 
anyway.\n" +
 "Works only for MR based compaction, CRUD based compaction uses 
hive.stats.autogather."),
 
+HIVE_COMPACTOR_GATHER_STATS("hive.compactor.gather.stats", true, "If set 
to true MAJOR compaction " +

Review comment:
   Do we re-calculate stats only in case of MAJOR compaction?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706888)
Time Spent: 3h 20m  (was: 3h 10m)

> Custom queue settings is not honoured by Query based compaction StatsUpdater
> 
>
> Key: HIVE-25801
> URL: https://issues.apache.org/jira/browse/HIVE-25801
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {{hive.compactor.job.queue}} config limits resources available for 
> compaction, so users can limit the effects of compaction on the cluster. 
> However this settings does not affect stats collection which uses Driver.
> HIVE-25595 is addressing the above issue for MR-based compaction. We need to 
> incorporate the same thing for the Query-based compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25801?focusedWorklogId=706882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706882
 ]

ASF GitHub Bot logged work on HIVE-25801:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:22
Start Date: 11/Jan/22 14:22
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2879:
URL: https://github.com/apache/hive/pull/2879#discussion_r782193090



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java
##
@@ -833,10 +833,6 @@ public void testCompactStatsGather() throws Exception {
 Assert.assertEquals("Unexpected number of compactions in history", 1, 
resp.getCompactsSize());
 Assert.assertEquals("Unexpected 0 compaction state", 
TxnStore.CLEANING_RESPONSE, resp.getCompacts().get(0).getState());
 
Assert.assertTrue(resp.getCompacts().get(0).getHadoopJobId().startsWith("job_local"));
-
-//now check that stats were updated
-map = hms.getPartitionColumnStatistics("default","T", partNames, colNames, 
Constants.HIVE_ENGINE);

Review comment:
   should we check basic statistics here instead of colStat?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706882)
Time Spent: 3h 10m  (was: 3h)

> Custom queue settings is not honoured by Query based compaction StatsUpdater
> 
>
> Key: HIVE-25801
> URL: https://issues.apache.org/jira/browse/HIVE-25801
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> {{hive.compactor.job.queue}} config limits resources available for 
> compaction, so users can limit the effects of compaction on the cluster. 
> However this settings does not affect stats collection which uses Driver.
> HIVE-25595 is addressing the above issue for MR-based compaction. We need to 
> incorporate the same thing for the Query-based compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25801?focusedWorklogId=706881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706881
 ]

ASF GitHub Bot logged work on HIVE-25801:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:21
Start Date: 11/Jan/22 14:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2879:
URL: https://github.com/apache/hive/pull/2879#discussion_r782193090



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java
##
@@ -833,10 +833,6 @@ public void testCompactStatsGather() throws Exception {
 Assert.assertEquals("Unexpected number of compactions in history", 1, 
resp.getCompactsSize());
 Assert.assertEquals("Unexpected 0 compaction state", 
TxnStore.CLEANING_RESPONSE, resp.getCompacts().get(0).getState());
 
Assert.assertTrue(resp.getCompacts().get(0).getHadoopJobId().startsWith("job_local"));
-
-//now check that stats were updated
-map = hms.getPartitionColumnStatistics("default","T", partNames, colNames, 
Constants.HIVE_ENGINE);

Review comment:
   should we check basic statistics here instead on colStat?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706881)
Time Spent: 3h  (was: 2h 50m)

> Custom queue settings is not honoured by Query based compaction StatsUpdater
> 
>
> Key: HIVE-25801
> URL: https://issues.apache.org/jira/browse/HIVE-25801
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {{hive.compactor.job.queue}} config limits resources available for 
> compaction, so users can limit the effects of compaction on the cluster. 
> However this settings does not affect stats collection which uses Driver.
> HIVE-25595 is addressing the above issue for MR-based compaction. We need to 
> incorporate the same thing for the Query-based compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25801?focusedWorklogId=706875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706875
 ]

ASF GitHub Bot logged work on HIVE-25801:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:17
Start Date: 11/Jan/22 14:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2879:
URL: https://github.com/apache/hive/pull/2879#discussion_r782189125



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -335,153 +330,144 @@ public void 
schemaEvolutionAddColDynamicPartitioningUpdate() throws Exception {
   }
 
   /**
-   * After each major compaction, stats need to be updated on each column of 
the
-   * table/partition which previously had stats.
-   * 1. create a bucketed ORC backed table (Orc is currently required by ACID)
-   * 2. populate 2 partitions with data
+   * After each major compaction, stats need to be updated on the table
+   * 1. create a partitioned ORC backed table (Orc is currently required by 
ACID)
+   * 2. populate with data
* 3. compute stats
-   * 4. insert some data into the table using StreamingAPI
-   * 5. Trigger major compaction (which should update stats)
-   * 6. check that stats have been updated
+   * 4. Trigger major compaction on one of the partitions (which should update 
stats)
+   * 5. check that stats have been updated for that partition only
*
* @throws Exception todo:
-   *   2. add non-partitioned test
*   4. add a test with sorted table?
*/
   @Test
   public void testStatsAfterCompactionPartTbl() throws Exception {
 //as of (8/27/2014) Hive 0.14, ACID/Orc requires HiveInputFormat
+String dbName = "default";
 String tblName = "compaction_test";
-String tblNameStg = tblName + "_stg";
-List colNames = Arrays.asList("a", "b");
 executeStatementOnDriver("drop table if exists " + tblName, driver);
-executeStatementOnDriver("drop table if exists " + tblNameStg, driver);
 executeStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " +
   " PARTITIONED BY(bkt INT)" +
   " CLUSTERED BY(a) INTO 4 BUCKETS" + //currently ACID requires table to 
be bucketed
   " STORED AS ORC  TBLPROPERTIES ('transactional'='true')", driver);
-executeStatementOnDriver("CREATE EXTERNAL TABLE " + tblNameStg + "(a INT, 
b STRING)" +
-  " ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' LINES TERMINATED BY 
'\\n'" +
-  " STORED AS TEXTFILE" +
-  " LOCATION '" + stagingFolder.newFolder().toURI().getPath() + "'", 
driver);
-
-executeStatementOnDriver("load data local inpath '" + BASIC_FILE_NAME +
-  "' overwrite into table " + tblNameStg, driver);
-execSelectAndDumpData("select * from " + tblNameStg, driver, "Dumping data 
for " +
-  tblNameStg + " after load:");
-executeStatementOnDriver("FROM " + tblNameStg +
-  " INSERT INTO TABLE " + tblName + " PARTITION(bkt=0) " +
-  "SELECT a, b where a < 2", driver);
-executeStatementOnDriver("FROM " + tblNameStg +
-  " INSERT INTO TABLE " + tblName + " PARTITION(bkt=1) " +
-  "SELECT a, b where a >= 2", driver);
+executeStatementOnDriver("INSERT INTO TABLE " + tblName + " 
PARTITION(bkt=0)" +
+  " values(55, 'London')", driver);
+executeStatementOnDriver("INSERT INTO TABLE " + tblName + " 
PARTITION(bkt=0)" +
+  " values(56, 'Paris')", driver);
+executeStatementOnDriver("INSERT INTO TABLE " + tblName + " 
PARTITION(bkt=1)" +
+  " values(57, 'Budapest')", driver);
+executeStatementOnDriver("INSERT INTO TABLE " + tblName + " 
PARTITION(bkt=1)" +
+" values(58, 'Milano')", driver);
 execSelectAndDumpData("select * from " + tblName, driver, "Dumping data 
for " +
   tblName + " after load:");
 
 TxnStore txnHandler = TxnUtils.getTxnStore(conf);
-CompactionInfo ci = new CompactionInfo("default", tblName, "bkt=0", 
CompactionType.MAJOR);
-Table table = msClient.getTable("default", tblName);
-LOG.debug("List of stats columns before analyze Part1: " + 
txnHandler.findColumnsWithStats(ci));
-Worker.StatsUpdater su = Worker.StatsUpdater.init(ci, colNames, conf,
-  System.getProperty("user.name"), 
CompactorUtil.getCompactorJobQueueName(conf, ci, table));
-su.gatherStats();//compute stats before compaction
-LOG.debug("List of stats columns after analyze Part1: " + 
txnHandler.findColumnsWithStats(ci));
-
-CompactionInfo ciPart2 = new CompactionInfo("default", tblName, "bkt=1", 
CompactionType.MAJOR);
-LOG.debug("List of stats columns before analyze Part2: " + 
txnHandler.findColumnsWithStats(ci));
-su = Worker.StatsUpdater.init(ciPart2, colNames, conf, 
System.getProperty("user.name"),
-CompactorUtil.getCompactorJobQueueName(conf, ciPart2,

[jira] [Work logged] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25801?focusedWorklogId=706876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706876
 ]

ASF GitHub Bot logged work on HIVE-25801:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:17
Start Date: 11/Jan/22 14:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2879:
URL: https://github.com/apache/hive/pull/2879#discussion_r782189473



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##
@@ -90,7 +67,25 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import com.google.common.collect.Lists;
+import java.io.File;

Review comment:
   imports structure is not restored




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706876)
Time Spent: 2h 50m  (was: 2h 40m)

> Custom queue settings is not honoured by Query based compaction StatsUpdater
> 
>
> Key: HIVE-25801
> URL: https://issues.apache.org/jira/browse/HIVE-25801
> Project: Hive
>  Issue Type: Bug
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {{hive.compactor.job.queue}} config limits resources available for 
> compaction, so users can limit the effects of compaction on the cluster. 
> However this settings does not affect stats collection which uses Driver.
> HIVE-25595 is addressing the above issue for MR-based compaction. We need to 
> incorporate the same thing for the Query-based compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25858?focusedWorklogId=706873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706873
 ]

ASF GitHub Bot logged work on HIVE-25858:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:17
Start Date: 11/Jan/22 14:17
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #2936:
URL: https://github.com/apache/hive/pull/2936#issuecomment-1010006488


   #2941


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706873)
Time Spent: 0.5h  (was: 20m)

> DISTINCT with ORDER BY on ordinals fails with NPE
> -
>
> Key: HIVE-25858
> URL: https://issues.apache.org/jira/browse/HIVE-25858
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> explain cbo select distinct int_col x, bigint_col y from alltypes order by 1, 
> 2;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25858?focusedWorklogId=706874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706874
 ]

ASF GitHub Bot logged work on HIVE-25858:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:17
Start Date: 11/Jan/22 14:17
Worklog Time Spent: 10m 
  Work Description: kasakrisz closed pull request #2936:
URL: https://github.com/apache/hive/pull/2936


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706874)
Time Spent: 40m  (was: 0.5h)

> DISTINCT with ORDER BY on ordinals fails with NPE
> -
>
> Key: HIVE-25858
> URL: https://issues.apache.org/jira/browse/HIVE-25858
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> explain cbo select distinct int_col x, bigint_col y from alltypes order by 1, 
> 2;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25858?focusedWorklogId=706872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706872
 ]

ASF GitHub Bot logged work on HIVE-25858:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 14:16
Start Date: 11/Jan/22 14:16
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2941:
URL: https://github.com/apache/hive/pull/2941


   
   
   ### What changes were proposed in this pull request?
   Return the output `RowResolver` object when generating Select logical plan 
instead of null.
   
   ### Why are the changes needed?
   the select row resolver is required for generating Order by logical plan 
when the columns are referenced by indexes.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=order_by_pos.q -pl itests/qtest 
-Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706872)
Time Spent: 20m  (was: 10m)

> DISTINCT with ORDER BY on ordinals fails with NPE
> -
>
> Key: HIVE-25858
> URL: https://issues.apache.org/jira/browse/HIVE-25858
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> explain cbo select distinct int_col x, bigint_col y from alltypes order by 1, 
> 2;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25609) Preserve XAttrs in normal file copy case.

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25609?focusedWorklogId=706826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706826
 ]

ASF GitHub Bot logged work on HIVE-25609:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 12:53
Start Date: 11/Jan/22 12:53
Worklog Time Spent: 10m 
  Work Description: szlta opened a new pull request #2938:
URL: https://github.com/apache/hive/pull/2938


   This reverts #2793 as it is causing 20+ failing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706826)
Time Spent: 50m  (was: 40m)

> Preserve XAttrs in normal file copy case.
> -
>
> Key: HIVE-25609
> URL: https://issues.apache.org/jira/browse/HIVE-25609
> Project: Hive
>  Issue Type: Improvement
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-11 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472679#comment-17472679
 ] 

Zoltan Haindrich commented on HIVE-25844:
-

makes sense; backported HIVE-24772 instead

> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25726) Upgrade velocity to 2.3 due to CVE-2020-13936

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25726?focusedWorklogId=706794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706794
 ]

ASF GitHub Bot logged work on HIVE-25726:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 11:37
Start Date: 11/Jan/22 11:37
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on pull request #2805:
URL: https://github.com/apache/hive/pull/2805#issuecomment-1009878238


   Thank you @nrg4878  for approving and merging the fix. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706794)
Time Spent: 1h  (was: 50m)

> Upgrade velocity to 2.3 due to CVE-2020-13936
> -
>
> Key: HIVE-25726
> URL: https://issues.apache.org/jira/browse/HIVE-25726
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Velocity project announced CVE-2020-13936 on 20210309 and through NVD 
> 20210317 to get detected internally.
>  * [https://nvd.nist.gov/vuln/detail/CVE-2020-13936]
>  * [http://velocity.apache.org/news.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25726) Upgrade velocity to 2.3 due to CVE-2020-13936

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25726?focusedWorklogId=706795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706795
 ]

ASF GitHub Bot logged work on HIVE-25726:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 11:37
Start Date: 11/Jan/22 11:37
Worklog Time Spent: 10m 
  Work Description: sourabh912 closed pull request #2805:
URL: https://github.com/apache/hive/pull/2805


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706795)
Time Spent: 1h 10m  (was: 1h)

> Upgrade velocity to 2.3 due to CVE-2020-13936
> -
>
> Key: HIVE-25726
> URL: https://issues.apache.org/jira/browse/HIVE-25726
> Project: Hive
>  Issue Type: Task
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Velocity project announced CVE-2020-13936 on 20210309 and through NVD 
> 20210317 to get detected internally.
>  * [https://nvd.nist.gov/vuln/detail/CVE-2020-13936]
>  * [http://velocity.apache.org/news.html]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25860) Optimize the synchronized scope of the renewToken method to improve concurrency

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25860:
--
Labels: pull-request-available  (was: )

> Optimize the synchronized scope of the renewToken method to improve 
> concurrency
> ---
>
> Key: HIVE-25860
> URL: https://issues.apache.org/jira/browse/HIVE-25860
> Project: Hive
>  Issue Type: Bug
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-22033 Using tokenStore in renewToken method does not need to be in 
> synchronized scope.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25860) Optimize the synchronized scope of the renewToken method to improve concurrency

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25860?focusedWorklogId=706783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706783
 ]

ASF GitHub Bot logged work on HIVE-25860:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 11:08
Start Date: 11/Jan/22 11:08
Worklog Time Spent: 10m 
  Work Description: cxzl25 opened a new pull request #2937:
URL: https://github.com/apache/hive/pull/2937


   ### What changes were proposed in this pull request?
   Optimize the synchronized scope of the renewToken method to improve 
concurrency.
   
   ### Why are the changes needed?
   HIVE-22033 Using `tokenStore` in `renewToken` method does not need to be in 
synchronized scope.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   exist UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706783)
Remaining Estimate: 0h
Time Spent: 10m

> Optimize the synchronized scope of the renewToken method to improve 
> concurrency
> ---
>
> Key: HIVE-25860
> URL: https://issues.apache.org/jira/browse/HIVE-25860
> Project: Hive
>  Issue Type: Bug
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-22033 Using tokenStore in renewToken method does not need to be in 
> synchronized scope.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25856?focusedWorklogId=706775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706775
 ]

ASF GitHub Bot logged work on HIVE-25856:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 10:57
Start Date: 11/Jan/22 10:57
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on pull request #2932:
URL: https://github.com/apache/hive/pull/2932#issuecomment-1009846688


   @zabetak  Nice catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706775)
Time Spent: 40m  (was: 0.5h)

> Intermittent null ordering in plans of queries with GROUP BY and LIMIT
> --
>
> Key: HIVE-25856
> URL: https://issues.apache.org/jira/browse/HIVE-25856
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:sql}
> CREATE TABLE person (id INTEGER, country STRING);
> EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
> {code}
> The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) 
> from one execution to another.
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
> HiveAggregate(group=[{1}], agg#0=[count()])
>   HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
> HiveAggregate(group=[{1}], agg#0=[count()])
>   HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> This is unlikely to cause wrong results cause most aggregate functions (not 
> all) do not return nulls thus null ordering doesn't matter much but it can 
> lead to other problems such as:
> * intermittent CI failures
> * query/plan caching
> I bumped into this problem after investigating test failures in CI. The 
> following query in 
> [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q]
>  returns different plan when it runs individually and when it runs along with 
> some other qtest files.
> {code:sql}
> explain
> select * from
> (select key, count(1) from src group by key order by key limit 10,20) subq
> join
> (select key, count(1) from src group by key limit 20,20) subq2
> on subq.key=subq2.key limit 3,5;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25856?focusedWorklogId=706773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706773
 ]

ASF GitHub Bot logged work on HIVE-25856:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 10:55
Start Date: 11/Jan/22 10:55
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2932:
URL: https://github.com/apache/hive/pull/2932#discussion_r782032263



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateSortLimitRule.java
##
@@ -55,29 +54,13 @@
  */
 public class HiveAggregateSortLimitRule extends RelOptRule {
 
-  private static HiveAggregateSortLimitRule instance = null;
-
-  public static final HiveAggregateSortLimitRule getInstance(HiveConf 
hiveConf) {
-if (instance == null) {
-  RelFieldCollation.NullDirection defaultAscNullDirection;
-  if (HiveConf.getBoolVar(hiveConf, 
HiveConf.ConfVars.HIVE_DEFAULT_NULLS_LAST)) {
-defaultAscNullDirection = RelFieldCollation.NullDirection.LAST;
-  } else {
-defaultAscNullDirection = RelFieldCollation.NullDirection.FIRST;
-  }
-  instance = new HiveAggregateSortLimitRule(defaultAscNullDirection);
-}
-
-return instance;
-  }
-
   private final RelFieldCollation.NullDirection defaultAscNullDirection;
 
-
-  private HiveAggregateSortLimitRule(RelFieldCollation.NullDirection 
defaultAscNullDirection) {
+  public HiveAggregateSortLimitRule(boolean nullsLast) {

Review comment:
   I generally avoid hardcoding because the null ordering behavior affects 
Top n key operator pushdown optimization.
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
   
   The `Top N Key` operator introduced into the physical plan as a parent 
operator of the `Reduce Sink`. It takes the sort keys and ordering parameters 
from the `Reduce Sink`. The push down optimization tries to move TNK until TS 
if possible.
   More complex queries may have more `Reduce Sinks` or even other TNKs which 
should be merged. This is the point where null ordering also count.
   
   I think it is safer to use the config.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706773)
Time Spent: 0.5h  (was: 20m)

> Intermittent null ordering in plans of queries with GROUP BY and LIMIT
> --
>
> Key: HIVE-25856
> URL: https://issues.apache.org/jira/browse/HIVE-25856
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:sql}
> CREATE TABLE person (id INTEGER, country STRING);
> EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
> {code}
> The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) 
> from one execution to another.
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
> HiveAggregate(group=[{1}], agg#0=[count()])
>   HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
> HiveAggregate(group=[{1}], agg#0=[count()])
>   HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> This is unlikely to cause wrong results cause most aggregate functions (not 
> all) do not return nulls thus null ordering doesn't matter much but it can 
> lead to other problems such as:
> * intermittent CI failures
> * query/plan caching
> I bumped into this problem after investigating test failures in CI. The 
> following query in 
> [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q]
>  returns different plan when it runs individually and when it runs along with 
> some other qtest files.
> {code:sql}
> explain
> select * from
> (select key, count(1) from src group by key order by key limit 10,20) subq
> join
> (select key, count(1) from src group by key limit 20,20) subq2
> on subq.key=subq2.key limit 3,5;
> {code}



--
This message was sent by Atlassian Jira

[jira] [Commented] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472623#comment-17472623
 ] 

László Bodor commented on HIVE-25859:
-

according to git bisect, this is caused by HIVE-25609, could you please take a 
look [~haymant],[~ayushtkn], [~pkumarsinha]?

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25859:

Description: 
repro
{code}
 mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
-Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
{code}

{code}
Caused by: java.io.FileNotFoundException: File 
file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
... 64 more
{code}

  was:
{code}
Caused by: java.io.FileNotFoundException: File 
file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
... 64 more
{code}


> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>
> repro
> {code}
>  mvn clean install -Dtest.output.overwrite=true -Pitests,hadoop-2 
> -Denforcer.skip=true -pl itests/qtest -pl itests/util -am 
> -Dtest=TestMiniLlapLocalCliDriver -Dqfile=load_non_hdfs_path.q
> {code}
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at

[jira] [Work logged] (HIVE-25829) Tez exec mode support for credential provider for jobs

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25829?focusedWorklogId=706710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706710
 ]

ASF GitHub Bot logged work on HIVE-25829:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 09:20
Start Date: 11/Jan/22 09:20
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2911:
URL: https://github.com/apache/hive/pull/2911#issuecomment-1009749956


   test failures are due to https://issues.apache.org/jira/browse/HIVE-25859


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706710)
Time Spent: 1h 50m  (was: 1h 40m)

> Tez exec mode support for credential provider for jobs
> --
>
> Key: HIVE-25829
> URL: https://issues.apache.org/jira/browse/HIVE-25829
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ádám Szita
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HIVE-14822 introduced support to securely forward a job specific java 
> credential store path, and a corresponding password to the backend executors. 
> This is currently implemented for only MR2 and Spark execution engines. I 
> propose we extend this feature by adding Tez mode to said list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472588#comment-17472588
 ] 

László Bodor edited comment on HIVE-25859 at 1/11/22, 9:17 AM:
---

I guess the same applies to some blobstore tests:
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2911/4/tests
{code}
 at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:158)
 at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:113)
 ... 63 more
Caused by: java.lang.RuntimeException: Error while running command to get file 
permissions : ExitCodeException exitCode=2: ls: cannot access 
'/home/jenkins/agent/workspace/hive-precommit_PR-2911/itests/hive-blobstore/target/tmp/bucket/CoreBlobstoreCliDriver/20220110.150226.667-809/ctas_hdfs_to_blobstore/target_db/.hive-staging_hive_2022-01-10_15-03-22_694_8267272566326808457-1/-ext-10002/.00_0.crc':
 No such file or directory
{code}


was (Author: abstractdog):
I guess the same applies to some blobstore tests:
{code}
apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4872)
 at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:158)
 at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:113)
 ... 63 more
Caused by: java.lang.RuntimeException: Error while running command to get file 
permissions : ExitCodeException exitCode=2: ls: cannot access 
'/home/jenkins/agent/workspace/hive-precommit_PR-2911/itests/hive-blobstore/target/tmp/bucket/CoreBlobstoreCliDriver/20220110.150226.667-809/ctas_hdfs_to_blobstore/target_db/.hive-staging_hive_2022-01-10_15-03-22_694_8267272566326808457-1/-ext-10002/.00_0.crc':
 No such file or directory
{code}

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472588#comment-17472588
 ] 

László Bodor commented on HIVE-25859:
-

I guess the same applies to some blobstore tests:
{code}
apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4872)
 at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:158)
 at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:113)
 ... 63 more
Caused by: java.lang.RuntimeException: Error while running command to get file 
permissions : ExitCodeException exitCode=2: ls: cannot access 
'/home/jenkins/agent/workspace/hive-precommit_PR-2911/itests/hive-blobstore/target/tmp/bucket/CoreBlobstoreCliDriver/20220110.150226.667-809/ctas_hdfs_to_blobstore/target_db/.hive-staging_hive_2022-01-10_15-03-22_694_8267272566326808457-1/-ext-10002/.00_0.crc':
 No such file or directory
{code}

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25859:

Description: 
{code}
Caused by: java.io.FileNotFoundException: File 
file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
... 64 more
{code}

  was:
Caused by: java.io.FileNotFoundException: File 
file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
... 64 more



> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>
> {code}
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ...

[jira] [Updated] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25859:

Attachment: hive.log

> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
> Attachments: hive.log
>
>
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25859) load_non_hdfs_path.q fails on master: .1.txt.crc does not exist

2022-01-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HIVE-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25859:

Description: 
Caused by: java.io.FileNotFoundException: File 
file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
 does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
... 64 more


> load_non_hdfs_path.q fails on master: .1.txt.crc does not exist
> ---
>
> Key: HIVE-25859
> URL: https://issues.apache.org/jira/browse/HIVE-25859
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Blocker
>
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/laszlobodor/apache/hive/itests/qtest/target/tmp/non_hdfs_path/.1.txt.crc
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.ProxyFileSystem.open(ProxyFileSystem.java:153)
>   at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:164)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:695)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:685)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:667)
>   at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:634)
>   at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:4809)
>   ... 64 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=706692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706692
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 08:54
Start Date: 11/Jan/22 08:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ edited a comment on pull request #2915:
URL: https://github.com/apache/hive/pull/2915#issuecomment-1009723611


   Per design doc:
   
   Since we do not want to access the FileSystem in a new separate 
MetricsSystem, this only can be collected at points where we already list the 
table/partition directory content. 
   One way would be to use Initiator / Cleaner for this purpose, but that won’t 
be available in DWX for default DBC-s. 
   The best option seems to be the AcidUtils.getAcidState call, which is called 
by every read query (In TEZ AM).
   
   The idea was not to add extra overhead with the metrics activation. 
   - Are we addressing these concerns here? 
   - How do we handle default DBC-s on DWX with disabled Cleaner and Initiator?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706692)
Time Spent: 50m  (was: 40m)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=706691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706691
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 08:51
Start Date: 11/Jan/22 08:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #2915:
URL: https://github.com/apache/hive/pull/2915#issuecomment-1009723611


   Per design doc:
   
   Since we do not want to access the FileSystem in a new separate 
MetricsSystem, this only can be collected at points where we already list the 
table/partition directory content. One way would be to use Initiator / Cleaner 
for this purpose, but that won’t be available in DWX for default DBC-s. The 
best option seems to be the AcidUtils.getAcidState call, which is called by 
every read query (In TEZ AM).
   
   Idea was not to add extra overhead with the metrics activation. Are we 
addressing these concerns here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706691)
Time Spent: 40m  (was: 0.5h)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25858?focusedWorklogId=706670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706670
 ]

ASF GitHub Bot logged work on HIVE-25858:
-

Author: ASF GitHub Bot
Created on: 11/Jan/22 08:01
Start Date: 11/Jan/22 08:01
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2936:
URL: https://github.com/apache/hive/pull/2936


   
   
   ### What changes were proposed in this pull request?
   Return the output `RowResolver` object when generating Select logical plan 
instead of null.
   
   ### Why are the changes needed?
   the select row resolver is required for generating Order by logical plan 
when the columns are referenced by indexes.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver -Dqfile=order_by_pos.q -pl itests/qtest 
-Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 706670)
Remaining Estimate: 0h
Time Spent: 10m

> DISTINCT with ORDER BY on ordinals fails with NPE
> -
>
> Key: HIVE-25858
> URL: https://issues.apache.org/jira/browse/HIVE-25858
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> explain cbo select distinct int_col x, bigint_col y from alltypes order by 1, 
> 2;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25858) DISTINCT with ORDER BY on ordinals fails with NPE

2022-01-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25858:
--
Labels: pull-request-available  (was: )

> DISTINCT with ORDER BY on ordinals fails with NPE
> -
>
> Key: HIVE-25858
> URL: https://issues.apache.org/jira/browse/HIVE-25858
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> explain cbo select distinct int_col x, bigint_col y from alltypes order by 1, 
> 2;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

55 matches

Mail list logo