[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

2022-09-08 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601740#comment-17601740
 ] 

Ayush Saxena commented on HIVE-24484:
-

Nopes,  no such plans

> Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 
> --
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 15.05h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

2022-09-08 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601737#comment-17601737
 ] 

Takanobu Asanuma commented on HIVE-24484:
-

Great work! Do you have any plan to move to Hadoop-3.3.1 & Tez-0.10.2 in 
Hive-3.x?

> Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 
> --
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 15.05h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

2022-08-03 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574635#comment-17574635
 ] 

Steve Loughran commented on HIVE-24484:
---

nice!

> Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 
> --
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 15.05h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

2022-08-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574496#comment-17574496
 ] 

Ayush Saxena commented on HIVE-24484:
-

Committed to master.
Thanx Everyone. Moved to Hadoop 3.3.1 & Tez 0.10.2 :-) 

> Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 
> --
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14h 53m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

2022-07-08 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564175#comment-17564175
 ] 

Ayush Saxena commented on HIVE-24484:
-

Moving to 3.3.1 & clubbed with Tez upgrade to 0.10.2, since both needs to go 
together for a green run. Changed the title to reflect that.

We have the tests sorted now in the PR, once we get an official Tez release, 
most probably within 2 weeks, will go ahead and commit the PR.

 

> Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 
> --
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13.05h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-09-09 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412556#comment-17412556
 ] 

Steve Loughran commented on HIVE-24484:
---

HADOOP-17313 actually went in to deal with hive processes having problems 
instantiating ABFS clients across many threads... Every thread would create its 
own client only for all but one of these to be discarded -there was enough 
contention in that creation process and that things would get really slow. Most 
noticeable when a service like Ranger was involved in FileSystem.initialize().

The patch considers being interrupted as a failure... What is Taz expecting?

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 43m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371600#comment-17371600
 ] 

David Mollitor commented on HIVE-24484:
---

OK. When there is Hive+Tez and a LIMIT clause, each Tez Vertex is "interrupted" 
to signal that it should stop running when the LIMIT is reached.  
[HADOOP-17313] adds a lock into the FileSystem API that throws an 
{{InterruptedIOException}} if the thread is interrupted (and clears the 
interrupt flag).  Tez sees this exception as a failure and reports an error.  
Tez probably needs to be updated to handle this situation, but it ain't fun.

https://github.com/apache/hadoop/blob/a3b9c37a397ad4188041dd80621bdeefc46885f2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3556-L3560

{code:none}
Error while running task ( failure ) : java.lang.RuntimeException: 
java.io.IOException: java.io.IOException: java.io.InterruptedIOException: 
java.lang.InterruptedException
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
 at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:156)
 at org.apache.tez.mapreduce.lib.MRReaderMapred.(MRReaderMapred.java:75)
 at 
org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:196)
 at 
org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:154)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:739)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:108)
 at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:816)
 at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.io.IOException: 
java.io.InterruptedIOException: java.lang.InterruptedException
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:422)
 at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203)
 ... 11 more
Caused by: java.io.IOException: java.io.InterruptedIOException: 
java.lang.InterruptedException
 at 
org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat.getRecordReader(LlapInputFormat.java:141)
 at 
org.apache.hadoop.hive.ql.io.RecordReaderWrapper.create(RecordReaderWrapper.java:72)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:419)
 ... 12 more
Caused by: java.io.InterruptedIOException: java.lang.InterruptedException
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3559)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
 at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:111)
 at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
 at 
org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat.getRecordReader(LlapInputFormat.java:123)
 ... 14 more
Caused by: java.lang.InterruptedException
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1306)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3556)
 ... 20 more
{code}

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371578#comment-17371578
 ] 

David Mollitor commented on HIVE-24484:
---

It seems that [HADOOP-17313] is causing one test to fail with 
{{InterruptedException}}.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371568#comment-17371568
 ] 

David Mollitor commented on HIVE-24484:
---

bq. Apparently Hive is doing something that it shouldn't be in one of its tests 
for moving data across encryption zones. T

As I understand it, the test is checking to make sure that this is not allowed. 
 Hive is using DistCP for the copy.  As I understand it, DistCP would fail 
quietly when moving across encryption zones (RAW zones).  The test would check 
that there was no replication.  However, after [HDFS-14884], DistCP actually 
fails and Hive isn't handling it.  I updated unit test to catch this Exception 
instead of looking for no changes in the destination file system.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 23m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-29 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371529#comment-17371529
 ] 

David Mollitor commented on HIVE-24484:
---

Apparently Hive is doing something that it shouldn't be in one of its tests for 
moving data across encryption zones.  This is now forbidden via [HDFS-14884]

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-28 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370667#comment-17370667
 ] 

David Mollitor commented on HIVE-24484:
---

Just bumped into another one... the latest Hadoop added some new DEBUG logging 
that is *very* chatty.  Hive has a feature that allows clients to download the 
logging, however, it is currently capped at 
{{HIVE_SERVER2_THRIFT_RESULTSET_MAX_FETCH_SIZE}} (default: 1) right now.  
Why is it capped at the fetch size?  Because HS will truncate the row count and 
return {{hasMoreRows}} 'false' to the client, so the client does not know there 
are more rows to fetch.  The extra DEBUG log lines explodes the line count and 
pushes out the required (by unit tests) log lines past the truncation mark.

https://github.com/apache/hive/blob/f7a21abf5579a8df07117928caff2d72ecae27e3/service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java#L888-L912

It looks like there was some other work related to this via [HIVE-24861].

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-25 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369708#comment-17369708
 ] 

David Mollitor commented on HIVE-24484:
---

A lot of heartburn still with [HADOOP-17367]. Ugh.  So, HMS uses Hadoop's 
{{ProxyUser}} class which stores its configuration in a static variable.  Well, 
there are some tests in Hive that launch two HMS instances within the same JVM. 
 So, setting the configuration for one instance of HMS blows away the other 
instance's Proxy configuration.  This was working previously because the the 
code would only load the instance once if it's already been loaded before 
(first-loader wins).  But since the change with [HADOOP-17367] this setup in 
HMS no longer works (it cannot detect if the {{ProxyUser}} has already been 
created because now a default instance is always returned).

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-24 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368871#comment-17368871
 ] 

David Mollitor commented on HIVE-24484:
---

[HADOOP-17367] is another biting issue for Hive.  In 3.1.0, the the return 
value from {{ProxyUsers# getDefaultImpersonationProvider}} changed.  In 3.1.0, 
the method could return a {{null} value and then it was up to the caller to 
create a new one and initialize it.  It seems like in 3.3.1, it always returns 
a value, but it looks like the initialization isn't what Hive is expecting.  
The initialization  {{refreshSuperUserGroupsConfiguration}} creates its own 
configuration whereas before Hive was passing in its own Configuration.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-23 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368433#comment-17368433
 ] 

David Mollitor commented on HIVE-24484:
---

[HDFS-13505] changed {{dfs.namenode.acls.enabled}} from {{false}} to {{true}}.  
This broke a test in {{TestHCatMultiOutputFormat}}.  The test creates some 
files and then changes their permissions manually.  The test actually checks 
that the file permissions are a certain value.  The overall effect is that the 
files inherited the permissions of their parent directories.  With 
{{dfs.namenode.acls.enabled}} set to {{true}} this manual process 
{{HdfsUtils#setFullFileStatus}} does not perform the manual process.

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1

2021-06-23 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368073#comment-17368073
 ] 

Steve Loughran commented on HIVE-24484:
---

bq. Would be great if folks could work on syncing the version of Guava which 
these products use, especially upgrading Druid.

hadoop trunk is trying to rip out a lot of its uses of Guava as its too brittle 
a dependency for everything

> Upgrade Hadoop to 3.3.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)