[jira] [Commented] (TEZ-4451) ThreadLevel IO Stats Support for TEZ

2024-01-29 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811931#comment-17811931
 ] 

Steve Loughran commented on TEZ-4451:
-

write of 204 bytes to an output stream. nothing else collected. has the output 
stream been closed? were any input streams used?

stat names are intended to be shared across the different stores
* org.apache.hadoop.fs.statistics.StoreStatisticNames : store-level stats, 
including fs api "op_" operations
* org.apache.hadoop.fs.statistics.StreamStatisticNames stream stats; javadocs 
should expnaiun
* full set of s3a stats: org.apache.hadoop.fs.s3a.Statistic ; includes 
description and type.


types
* counters: aggregation is "add"
* gauges: can go up and down; don't really aggregate well but adding is the one 
used
* minimums: minimum value for something (usually a duration, data read/write 
size). Aggregation: smallest wins
* maximums: max value for something (usually a duration, data read/write size). 
Aggregation: largest wins
* means: average value...usually duration. aggregation: make the new average of 
the combined values

A lot of stats collected are durations, which have a counter, min, max and 
mean; these are on each call and aggregated into the current set. lets you see 
things like the mean time for an operation, but also when there's a very slow 
outlier.

input/output streams only update their filesystems and context in close(), 
rather than on every read/write call. s3a input stream also updates on 
unbuffer(), so apps like impala can get updated values while still keeping the 
stream open.


{code}
 bin/hadoop fs -ls $BUCKET
Found 1 items
drwxrwxrwx   - stevel stevel  0 2024-01-29 15:05 
s3a://stevel-london/user/stevel/target
2024-01-29 15:05:15,265 [shutdown-hook-0] INFO  statistics.IOStatisticsLogging 
(IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics: 
counters=((action_http_head_request=1)
(audit_request_execution=3)
(audit_span_creation=4)
(object_list_request=2)
(object_metadata_request=1)
(op_get_file_status=1)
(op_glob_status=1)
(op_list_status=1)
(store_io_request=3));

gauges=();

minimums=((action_http_head_request.min=812)
(object_list_request.min=119)
(op_get_file_status.min=943)
(op_glob_status.min=958)
(op_list_status.min=163));

maximums=((action_http_head_request.max=812)
(object_list_request.max=154)
(op_get_file_status.max=943)
(op_glob_status.max=958)
(op_list_status.max=163));

means=((action_http_head_request.mean=(samples=1, sum=812, mean=812.))
(object_list_request.mean=(samples=2, sum=273, mean=136.5000))
(op_get_file_status.mean=(samples=1, sum=943, mean=943.))
(op_glob_status.mean=(samples=1, sum=958, mean=958.))
(op_list_status.mean=(samples=1, sum=163, mean=163.)));


{code}


> ThreadLevel IO Stats Support for TEZ
> 
>
> Key: TEZ-4451
> URL: https://issues.apache.org/jira/browse/TEZ-4451
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Harshit Gupta
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>  
> cc: [~rbalamohan] [~abstractdog] [~mthakur] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TEZ-4451) ThreadLevel IO Stats Support for TEZ

2024-01-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811242#comment-17811242
 ] 

Steve Loughran commented on TEZ-4451:
-

AFAIK I think it's only S3A input/output/listing/committers which update 
context, plus RawLocal (for testing). ABFS is still on the TODO list, though 
I'd love it.

Now that spark is hadoop-3.3.6+ only, I'd like to collect the context for each 
task and report back for total stats spanning the entire job

> ThreadLevel IO Stats Support for TEZ
> 
>
> Key: TEZ-4451
> URL: https://issues.apache.org/jira/browse/TEZ-4451
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Harshit Gupta
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>  
> cc: [~rbalamohan] [~abstractdog] [~mthakur] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TEZ-4451) ThreadLevel IO Stats Support for TEZ

2024-01-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811241#comment-17811241
 ] 

Steve Loughran commented on TEZ-4451:
-

look at {{IOStatisticsLogging.ioStatisticsToPrettyString()}}; this renders 
better and strips out the zero entries. A bit more work (so wasteful for debug 
logs) but nicer here. There's other stuff there to help too, including a 
demandStringifyIOStatistics() which only does the (non-pretty) rendering when 
.toString is called

{code}
LOG.info("context stats", 
demandStringifyIOStatistics(getCurrentIOStatisticsContext()))

{code}

cost of one new() but nothing else if not printed. And you could always add a 
specific logger for stats so avoid having a new context

> ThreadLevel IO Stats Support for TEZ
> 
>
> Key: TEZ-4451
> URL: https://issues.apache.org/jira/browse/TEZ-4451
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Harshit Gupta
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>  
> cc: [~rbalamohan] [~abstractdog] [~mthakur] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TEZ-4451) ThreadLevel IO Stats Support for TEZ

2023-07-14 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743248#comment-17743248
 ] 

Steve Loughran commented on TEZ-4451:
-

you can log filesystem stats in filesystem close for abfs and s3a. what the 
stream now does is let you collect aggregate stream statistics for each thread, 
so if you do different work in different threads, you can collect the isolated 
work.

if you want all work for the entire life of a FileSystem instance, that is much 
easier. FileSystem.getIOStatistics() will return io stats or null; you can 
create a snapshot of that which can be marshalled as a java serializable or to 
json and back. enjoy

{code}
IOStatisticsSupport.snapshotIOStatistics(FileSystem..getIOStatistics())
{code}


> ThreadLevel IO Stats Support for TEZ
> 
>
> Key: TEZ-4451
> URL: https://issues.apache.org/jira/browse/TEZ-4451
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Harshit Gupta
>Priority: Major
>
> Dump IO Statistics for each of the tasks in the log.
> This will requires upgrading Tez to use Hadoop-3.3.9-SNAPSHOT
>  
> cc: [~rbalamohan] [~abstractdog] [~mthakur] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown

2020-07-31 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168587#comment-17168587
 ] 

Steve Loughran commented on TEZ-1661:
-

Just hit this problem in a hadoop-aws test run inside log4j. Funny that on the 
first page of google results, up come my colleagues and other ASF people.

Did anyone ever come up with a root cause for the hang?

> LocalTaskScheduler hangs when shutdown
> --
>
> Key: TEZ-1661
> URL: https://issues.apache.org/jira/browse/TEZ-1661
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
> Environment: Local Mode
>Reporter: Oleg Zhurakousky
>Assignee: Jeff Zhang
>Priority: Major
> Fix For: 0.7.0, 0.6.1
>
> Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch
>
>
> LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when 
> TezClient shuts down (e.g., TezClient.stop).
> Below is jstack output observed when running in Tez local mode:
> {code}
> "Thread-53" prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable 
> [0x00011df9]
>java.lang.Thread.State: RUNNABLE
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> - locked <0x0007b6ce60a0> (a java.lang.InterruptedException)
> at java.lang.Throwable.(Throwable.java:250)
> at java.lang.Exception.(Exception.java:54)
> at java.lang.InterruptedException.(InterruptedException.java:57)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
> at 
> java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535)
> at 
> org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310)
> at 
> org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4064) Integrate Tez with Github

2019-04-23 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824496#comment-16824496
 ] 

Steve Loughran commented on TEZ-4064:
-

+create a gmail account and then use that to create a github a/c; the ASF email 
mechanism to create a list doesn't quite work for the github world and its 
easiest to roll your own alias

> Integrate Tez with Github
> -
>
> Key: TEZ-4064
> URL: https://issues.apache.org/jira/browse/TEZ-4064
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
>
> According to HADOOP-16035, steps are as follows.
> - an account that can read Github
> - Apache Yetus 0.9.0+
> - a Jenkinsfile that uses the above



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3916) Add hadoop-azure-datalake jar to azure profile

2018-05-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463857#comment-16463857
 ] 

Steve Loughran commented on TEZ-3916:
-

# if you want to use adl:// as a source or dest, you need hadoop-azure-datalake 
in your classpath.
# same as s3a: ==> hadoop-aws, and wasb => hadoop-azure.

If you put the new azure JAR in the azure profile then people who don't ask for 
the azure packaging, don't get it. Which is clearly what they want.

> Add hadoop-azure-datalake jar to azure profile
> --
>
> Key: TEZ-3916
> URL: https://issues.apache.org/jira/browse/TEZ-3916
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Critical
> Fix For: 0.10.0
>
> Attachments: TEZ-3916.1.patch
>
>
> This jar is required for secure access to Azure object storage: 
> https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html
> There is already an azure profile in Tez but it doesn't include this jar.
> Since the jar is only supported on Hadoop 2.8+, will either need to:
> 1. Determine that including it in a 2.7 build is fine
> 2. Or if it is not fine, then include the jar only when both the 2.8 profile 
> and the azure profile are activated



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available

2016-06-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323031#comment-15323031
 ] 

Steve Loughran commented on TEZ-3291:
-

I don't see any problem with adding fake locations; we know that it hasn't 
caused problems on Azure, and it isn't going to interfere with things that 
don't care about locality. If that's all that's needed for a fix, it's easily 
done.

That doesn't mean that the Tez-side patch won't be good: it will work for other 
filesystems. A quick check of {{RawLocalFileSystem}} shows it hard codes to 
local too... if ever someone tried to run Tez against a large NFS cluster or 
other distributed FS accessed via the native OS, it'd replicate the problem

> Optimize splits grouping when locality information is not available
> ---
>
> Key: TEZ-3291
> URL: https://issues.apache.org/jira/browse/TEZ-3291
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: TEZ-3291.WIP.patch
>
>
> There are scenarios where splits might not contain the location details. S3 
> is an example, where all splits would have "localhost" for the location 
> details. In such cases, curent split computation does not go through the 
> rack local and allow-small groups optimizations and ends up creating small 
> number of splits. Depending on clusters this can end creating long running 
> map jobs.
> Example with hive:
> ==
> 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small 
> table.
> 2. With query-22, hive requests with the original splits count as 52 and 
> overall length of splits themselves is around 12061817 bytes. 
> {{tez.grouping.min-size}} was set to 16 MB.
> 3. In tez splits grouping, this ends up creating a single split with 52+ 
> files be processed in the split.  In clusters with split locations, this 
> would have landed up with multiple splits since {{allowSmallGroups}} would 
> have kicked in.
> But in S3, since everything would have "localhost" all splits get added to 
> single group. This makes things a lot worse.
> 4. Depending on the dataset and the format, this can be problematic. For 
> instance, file open calls and random seeks can be expensive in S3.
> 5. In this case, 52 files have to be opened and processed by single task in 
> sequential fashion. Had it been processed by multiple tasks, response time 
> would have drastically reduced.
> E.g log details
> {noformat}
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Grouping splits in Tez
> 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired splits: 110 too large.  Desired 
> splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total 
> length: 12061817 Original splits: 52
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 
> numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: 
> 12061817 numOriginalSplits: 52 . Grouping by length: true count: false
> 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] 
> |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 
> splitsProcessed: 52
> {noformat}
> Alternate options:
> ==
> 1. Force Hadoop to provide bogus locations for S3. But not sure, if that 
> would be accepted anytime soon. Ref: HADOOP-12878
> 2. Set {{tez.grouping.min-size}} to very very low value. But should the end 
> user always be doing this on query to query basis?
> 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute 
> desiredNumSplits only when number of distinct locations in the splits is > 1. 
> This would force more number of splits to be generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1529) ATS and TezClient integration in secure kerberos enabled cluster

2015-06-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594511#comment-14594511
 ] 

Steve Loughran commented on TEZ-1529:
-

Having just been through the pain that is Kerberos integration with the Spark 
history server, one thing i can see from this code is it's using delegation 
tokens, not kerberos keytab auth direct. Is this intended to be used in YARN 
components, or actual client-side code with keytabs?

 ATS and TezClient integration  in secure kerberos enabled cluster
 -

 Key: TEZ-1529
 URL: https://issues.apache.org/jira/browse/TEZ-1529
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
Priority: Blocker
 Attachments: TEZ-1529-branch6.2.patch, TEZ-1529.1.patch, 
 TEZ-1529.2.patch, TEZ-1529.3.patch, TEZ-1529.4.patch, TEZ-1529.5.patch


 This is a follow up for TEZ-1495 which address ATS - TezClient integration. 
 however it does not enable it  in secure kerberos enabled cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1529) ATS and TezClient integration in secure kerberos enabled cluster

2015-04-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519545#comment-14519545
 ] 

Steve Loughran commented on TEZ-1529:
-

oh, this is exactly what I'm looking for. I've been trying to work out how to 
use the YARN {{TimelineClient}} to do the token renewal for the read side of 
things

 ATS and TezClient integration  in secure kerberos enabled cluster
 -

 Key: TEZ-1529
 URL: https://issues.apache.org/jira/browse/TEZ-1529
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
Priority: Blocker
 Attachments: TEZ-1529.1.patch


 This is a follow up for TEZ-1495 which address ATS - TezClient integration. 
 however it does not enable it  in secure kerberos enabled cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)