[GitHub] spark pull request: [SPARK-7481][build][WIP] Add Hadoop 2.6+ spark...

steveloughran Tue, 26 Apr 2016 06:39:07 -0700

Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/12004#issuecomment-214747671
  
    The latest version of this does, among other things, call 
FileSystem.toString after operations. In HADOOP-13028, along with seek 
optimisation, S3aFileSystem.toString() now dumps all the statistics to date. 
This means that the aggregate state of all test runs are displayed; if you run 
a specific test standalone you can see the stats purely for that test
    
    Here's a test with the maven args `-Phadoop-2.7 
-DwildcardSuites=org.apache.spark.cloud.s3.S3aIOSuite 
-Dcloud.test.configuration.file=../cloud.xml -Dhadoop.version=2.9.0-SNAPSHOT 
-Dtest.method.keys=CSVgz` 
    ```
    2016-04-26 14:32:17,104 INFO  scheduler.TaskSetManager 
(Logging.scala:logInfo(54)) - Starting task 0.0 in stage 0.0 (TID 0, localhost, 
partition 0,PROCESS_LOCAL, 5261 bytes)
    2016-04-26 14:32:17,105 INFO  executor.Executor (Logging.scala:logInfo(54)) 
- Running task 0.0 in stage 0.0 (TID 0)
    2016-04-26 14:32:17,111 INFO  rdd.HadoopRDD (Logging.scala:logInfo(54)) - 
Input split: s3a://landsat-pds/scene_list.gz:0+20430493
    2016-04-26 14:32:17,285 INFO  compress.CodecPool 
(CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.gz]
    2016-04-26 14:32:21,724 INFO  executor.Executor (Logging.scala:logInfo(54)) 
- Finished task 0.0 in stage 0.0 (TID 0). 2643 bytes result sent to driver
    2016-04-26 14:32:21,727 INFO  scheduler.TaskSetManager 
(Logging.scala:logInfo(54)) - Finished task 0.0 in stage 0.0 (TID 0) in 4625 ms 
on localhost (1/1)
    2016-04-26 14:32:21,727 INFO  scheduler.TaskSchedulerImpl 
(Logging.scala:logInfo(54)) - Removed TaskSet 0.0, whose tasks have all 
completed, from pool 
    2016-04-26 14:32:21,728 INFO  scheduler.DAGScheduler 
(Logging.scala:logInfo(54)) - ResultStage 0 (count at S3aIOSuite.scala:127) 
finished in 4.626 s
    2016-04-26 14:32:21,728 INFO  scheduler.DAGScheduler 
(Logging.scala:logInfo(54)) - Job 0 finished: count at S3aIOSuite.scala:127, 
took 4.636417 s
    2016-04-26 14:32:21,729 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -  
size of s3a://landsat-pds/scene_list.gz = 464105 rows read in 4815885000 nS
    2016-04-26 14:32:21,729 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Filesystem statistics S3AFileSystem{uri=s3a://landsat-pds, 
workingDir=s3a://landsat-pds/user/stevel, partSize=104857600, 
enableMultiObjectsDelete=true, multiPartThreshold=2147483647, statistics 
{40864879 bytes read, 7786 bytes written, 110 read ops, 0 large read ops, 26 
write ops}, metrics {{Context=S3AFileSystem} 
{FileSystemId=bc5db77d-e17d-41bb-88ab-44b26cf3eda4-landsat-pds} 
{fsURI=s3a://landsat-pds/scene_list.gz} {files_created=0} {files_copied=0} 
{files_copied_bytes=0} {files_deleted=0} {directories_created=0} 
{directories_deleted=0} {ignored_errors=0} {streamForwardSeekOperations=0} 
{streamCloseOperations=2} {streamBytesSkippedOnSeek=0} 
{streamReadOperations=2821} {streamReadExceptions=0} {streamAborted=0} 
{streamBackwardSeekOperations=0} {streamClosed=2} {streamOpened=2} 
{streamSeekOperations=0} {streamBytesRead=40860986} 
{streamReadOperationsIncomplete=2821} {streamReadFullyOperations=0} }}
    2016-04-26 14:32:21,729 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
    ```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7481][build][WIP] Add Hadoop 2.6+ spark...

Reply via email to