Github user steveloughran commented on the pull request:
https://github.com/apache/spark/pull/12004#issuecomment-214747671
The latest version of this does, among other things, call
FileSystem.toString after operations. In HADOOP-13028, along with seek
optimisation, S3aFileSystem.toString() now dumps all the statistics to date.
This means that the aggregate state of all test runs are displayed; if you run
a specific test standalone you can see the stats purely for that test
Here's a test with the maven args `-Phadoop-2.7
-DwildcardSuites=org.apache.spark.cloud.s3.S3aIOSuite
-Dcloud.test.configuration.file=../cloud.xml -Dhadoop.version=2.9.0-SNAPSHOT
-Dtest.method.keys=CSVgz`
```
2016-04-26 14:32:17,104 INFO scheduler.TaskSetManager
(Logging.scala:logInfo(54)) - Starting task 0.0 in stage 0.0 (TID 0, localhost,
partition 0,PROCESS_LOCAL, 5261 bytes)
2016-04-26 14:32:17,105 INFO executor.Executor (Logging.scala:logInfo(54))
- Running task 0.0 in stage 0.0 (TID 0)
2016-04-26 14:32:17,111 INFO rdd.HadoopRDD (Logging.scala:logInfo(54)) -
Input split: s3a://landsat-pds/scene_list.gz:0+20430493
2016-04-26 14:32:17,285 INFO compress.CodecPool
(CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.gz]
2016-04-26 14:32:21,724 INFO executor.Executor (Logging.scala:logInfo(54))
- Finished task 0.0 in stage 0.0 (TID 0). 2643 bytes result sent to driver
2016-04-26 14:32:21,727 INFO scheduler.TaskSetManager
(Logging.scala:logInfo(54)) - Finished task 0.0 in stage 0.0 (TID 0) in 4625 ms
on localhost (1/1)
2016-04-26 14:32:21,727 INFO scheduler.TaskSchedulerImpl
(Logging.scala:logInfo(54)) - Removed TaskSet 0.0, whose tasks have all
completed, from pool
2016-04-26 14:32:21,728 INFO scheduler.DAGScheduler
(Logging.scala:logInfo(54)) - ResultStage 0 (count at S3aIOSuite.scala:127)
finished in 4.626 s
2016-04-26 14:32:21,728 INFO scheduler.DAGScheduler
(Logging.scala:logInfo(54)) - Job 0 finished: count at S3aIOSuite.scala:127,
took 4.636417 s
2016-04-26 14:32:21,729 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
size of s3a://landsat-pds/scene_list.gz = 464105 rows read in 4815885000 nS
2016-04-26 14:32:21,729 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Filesystem statistics S3AFileSystem{uri=s3a://landsat-pds,
workingDir=s3a://landsat-pds/user/stevel, partSize=104857600,
enableMultiObjectsDelete=true, multiPartThreshold=2147483647, statistics
{40864879 bytes read, 7786 bytes written, 110 read ops, 0 large read ops, 26
write ops}, metrics {{Context=S3AFileSystem}
{FileSystemId=bc5db77d-e17d-41bb-88ab-44b26cf3eda4-landsat-pds}
{fsURI=s3a://landsat-pds/scene_list.gz} {files_created=0} {files_copied=0}
{files_copied_bytes=0} {files_deleted=0} {directories_created=0}
{directories_deleted=0} {ignored_errors=0} {streamForwardSeekOperations=0}
{streamCloseOperations=2} {streamBytesSkippedOnSeek=0}
{streamReadOperations=2821} {streamReadExceptions=0} {streamAborted=0}
{streamBackwardSeekOperations=0} {streamClosed=2} {streamOpened=2}
{streamSeekOperations=0} {streamBytesRead=40860986}
{streamReadOperationsIncomplete=2821} {streamReadFullyOperations=0} }}
2016-04-26 14:32:21,729 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]