Hi, Gavin, Shuffling is exactly the same in both requests and is minimal. Both requests produces one shuffle task. Running time is the only difference I can see in metrics:
timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect, number=1) 0.713730096817 { "id" : 368, "name" : "duration total (min, med, max)", "value" : "524" }, { "id" : 375, "name" : "internal.metrics.executorRunTime", "value" : "527" }, { "id" : 391, "name" : "internal.metrics.shuffle.write.writeTime", "value" : "244495" } timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 58) ).collect, number=1) 2.97951102257 }, { "id" : 469, "name" : "duration total (min, med, max)", "value" : "2654" }, { "id" : 476, "name" : "internal.metrics.executorRunTime", "value" : "2661" }, { "id" : 492, "name" : "internal.metrics.shuffle.write.writeTime", "value" : "371883" }, { Full metrics in attachment. >Суббота, 3 сентября 2016, 19:53 +03:00 от Gavin Yue <yue.yuany...@gmail.com>: > >Any shuffling? > > >On Sep 3, 2016, at 5:50 AM, Сергей Романов < romano...@inbox.ru.INVALID > >wrote: > >>Same problem happens with CSV data file, so it's not parquet-related either. >> >>Welcome to >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /__ / .__/\_,_/_/ /_/\_\ version 2.0.0 >> /_/ >> >>Using Python version 2.7.6 (default, Jun 22 2015 17:58:13) >>SparkSession available as 'spark'. >>>>> import timeit >>>>> from pyspark.sql.types import * >>>>> schema = StructType([StructField('dd_convs', FloatType(), True)]) >>>>> for x in range(50, 70): print x, >>>>> timeit.timeit(spark.read.csv('file:///data/dump/test_csv', >>>>> schema=schema).groupBy().sum(*(['dd_convs'] * x) ).collect, number=1) >>50 0.372850894928 >>51 0.376906871796 >>52 0.381325960159 >>53 0.385444164276 >>54 0.386877775192 >>55 0.388918161392 >>56 0.397624969482 >>57 0.391713142395 >>58 2.62714004517 >>59 2.68421196938 >>60 2.74627685547 >>61 2.81081581116 >>62 3.43532109261 >>63 3.07742786407 >>64 3.03904604912 >>65 3.01616096497 >>66 3.06293702126 >>67 3.09386610985 >>68 3.27610206604 >>69 3.2041969299 Суббота, 3 сентября 2016, 15:40 +03:00 от Сергей Романов < >>romano...@inbox.ru.INVALID >: >>> >>>Hi, >>>I had narrowed down my problem to a very simple case. I'm sending 27kb >>>parquet in attachment. (file:///data/dump/test2 in example) >>>Please, can you take a look at it? Why there is performance drop after 57 >>>sum columns? >>>Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /__ / .__/\_,_/_/ /_/\_\ version 2.0.0 >>> /_/ >>> >>>Using Python version 2.7.6 (default, Jun 22 2015 17:58:13) >>>SparkSession available as 'spark'. >>>>>> import timeit >>>>>> for x in range(70): print x, >>>>>> timeit.timeit(spark.read.parquet('file:///data/dump/test2').groupBy().sum(*(['dd_convs'] >>>>>> * x) ).collect, number=1) >>>... >>>SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". >>>SLF4J: Defaulting to no-operation (NOP) logger implementation >>>SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further >>>details. >>>0 1.05591607094 >>>1 0.200426101685 >>>2 0.203800916672 >>>3 0.176458120346 >>>4 0.184863805771 >>>5 0.232321023941 >>>6 0.216032981873 >>>7 0.201778173447 >>>8 0.292424917221 >>>9 0.228524923325 >>>10 0.190534114838 >>>11 0.197028160095 >>>12 0.270443916321 >>>13 0.429781913757 >>>14 0.270851135254 >>>15 0.776989936829 >>>16 0.233337879181 >>>17 0.227638959885 >>>18 0.212944030762 >>>19 0.2144780159 >>>20 0.22200012207 >>>21 0.262261152267 >>>22 0.254227876663 >>>23 0.275084018707 >>>24 0.292124032974 >>>25 0.280488014221 >>>16/09/03 15:31:28 WARN Utils: Truncated the string representation of a plan >>>since it was too large. This behavior can be adjusted by setting >>>'spark.debug.maxToStringFields' in SparkEnv.conf. >>>26 0.290093898773 >>>27 0.238478899002 >>>28 0.246420860291 >>>29 0.241401195526 >>>30 0.255286931992 >>>31 0.42702794075 >>>32 0.327946186066 >>>33 0.434395074844 >>>34 0.314198970795 >>>35 0.34576010704 >>>36 0.278323888779 >>>37 0.289474964142 >>>38 0.290827989578 >>>39 0.376291036606 >>>40 0.347742080688 >>>41 0.363158941269 >>>42 0.318687915802 >>>43 0.376327991486 >>>44 0.374994039536 >>>45 0.362971067429 >>>46 0.425967931747 >>>47 0.370860099792 >>>48 0.443903923035 >>>49 0.374128103256 >>>50 0.378985881805 >>>51 0.476850986481 >>>52 0.451028823853 >>>53 0.432540893555 >>>54 0.514838933945 >>>55 0.53990483284 >>>56 0.449142932892 >>>57 0.465240001678 // 5x slower after 57 columns >>>58 2.40412116051 >>>59 2.41632795334 >>>60 2.41812801361 >>>61 2.55726218224 >>>62 2.55484509468 >>>63 2.56128406525 >>>64 2.54642391205 >>>65 2.56381797791 >>>66 2.56871509552 >>>67 2.66187620163 >>>68 2.63496208191 >>>69 2.81545996666 >>> >>>Sergei Romanov >>> >>>--------------------------------------------------------------------- >>>To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>Sergei Romanov >><bad.csv.tgz> >> >>--------------------------------------------------------------------- >>To unsubscribe e-mail: user-unsubscr...@spark.apache.org
timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 57) ).collect, number=1) 0.713730096817 { "jobId" : 4, "name" : "collect at /usr/lib/python2.7/timeit.py:100", "submissionTime" : "2016-09-05T10:51:45.764GMT", "completionTime" : "2016-09-05T10:51:46.327GMT", "stageIds" : [ 9, 8 ], "status" : "SUCCEEDED", "numTasks" : 2, "numActiveTasks" : 0, "numCompletedTasks" : 2, "numSkippedTasks" : 0, "numFailedTasks" : 0, "numActiveStages" : 0, "numCompletedStages" : 2, "numSkippedStages" : 0, "numFailedStages" : 0 } [ { "status" : "COMPLETE", "stageId" : 8, "attemptId" : 0, "numActiveTasks" : 0, "numCompleteTasks" : 1, "numFailedTasks" : 0, "executorRunTime" : 527, "submissionTime" : "2016-09-05T10:51:45.770GMT", "firstTaskLaunchedTime" : "2016-09-05T10:51:45.770GMT", "completionTime" : "2016-09-05T10:51:46.311GMT", "inputBytes" : 1538820, "inputRecords" : 769163, "outputBytes" : 0, "outputRecords" : 0, "shuffleReadBytes" : 0, "shuffleReadRecords" : 0, "shuffleWriteBytes" : 68, "shuffleWriteRecords" : 1, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "name" : "collect at /usr/lib/python2.7/timeit.py:100", "details" : "org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:2512)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:280)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.GatewayConnection.run(GatewayConnection.java:211)\njava.lang.Thread.run(Thread.java:745)", "schedulingPool" : "default", "accumulatorUpdates" : [ { "id" : 391, "name" : "internal.metrics.shuffle.write.writeTime", "value" : "244495" }, { "id" : 373, "name" : "number of output rows", "value" : "769163" }, { "id" : 376, "name" : "internal.metrics.resultSize", "value" : "1871" }, { "id" : 369, "name" : "number of output rows", "value" : "1" }, { "id" : 390, "name" : "internal.metrics.shuffle.write.recordsWritten", "value" : "1" }, { "id" : 372, "name" : "aggregate time total (min, med, max)", "value" : "524" }, { "id" : 375, "name" : "internal.metrics.executorRunTime", "value" : "527" }, { "id" : 393, "name" : "internal.metrics.input.recordsRead", "value" : "769163" }, { "id" : 392, "name" : "internal.metrics.input.bytesRead", "value" : "1538820" }, { "id" : 377, "name" : "internal.metrics.jvmGCTime", "value" : "4" }, { "id" : 368, "name" : "duration total (min, med, max)", "value" : "524" }, { "id" : 389, "name" : "internal.metrics.shuffle.write.bytesWritten", "value" : "68" }, { "id" : 362, "name" : "data size total (min, med, max)", "value" : "462" }, { "id" : 374, "name" : "internal.metrics.executorDeserializeTime", "value" : "9" } ], "tasks" : { "7" : { "taskId" : 7, "index" : 0, "attempt" : 0, "launchTime" : "2016-09-05T10:51:45.770GMT", "executorId" : "driver", "host" : "localhost", "taskLocality" : "PROCESS_LOCAL", "speculative" : false, "accumulatorUpdates" : [ ], "taskMetrics" : { "executorDeserializeTime" : 9, "executorRunTime" : 527, "resultSize" : 1871, "jvmGcTime" : 4, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "inputMetrics" : { "bytesRead" : 1538820, "recordsRead" : 769163 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "localBytesRead" : 0, "recordsRead" : 0 }, "shuffleWriteMetrics" : { "bytesWritten" : 68, "writeTime" : 244495, "recordsWritten" : 1 } } } }, "executorSummary" : { "driver" : { "taskTime" : 540, "failedTasks" : 0, "succeededTasks" : 1, "inputBytes" : 1538820, "outputBytes" : 0, "shuffleRead" : 0, "shuffleWrite" : 68, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0 } } } ] [ { "status" : "COMPLETE", "stageId" : 9, "attemptId" : 0, "numActiveTasks" : 0, "numCompleteTasks" : 1, "numFailedTasks" : 0, "executorRunTime" : 2, "submissionTime" : "2016-09-05T10:51:46.315GMT", "firstTaskLaunchedTime" : "2016-09-05T10:51:46.315GMT", "completionTime" : "2016-09-05T10:51:46.327GMT", "inputBytes" : 0, "inputRecords" : 0, "outputBytes" : 0, "outputRecords" : 0, "shuffleReadBytes" : 68, "shuffleReadRecords" : 1, "shuffleWriteBytes" : 0, "shuffleWriteRecords" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "name" : "collect at /usr/lib/python2.7/timeit.py:100", "details" : "org.apache.spark.rdd.RDD.collect(RDD.scala:892)\norg.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)\norg.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply$mcI$sp(Dataset.scala:2513)\norg.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:2513)\norg.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:2513)\norg.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)\norg.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)\norg.apache.spark.sql.Dataset.collectToPython(Dataset.scala:2512)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:280)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.GatewayConnection.run(GatewayConnection.java:211)\njava.lang.Thread.run(Thread.java:745)", "schedulingPool" : "default", "accumulatorUpdates" : [ { "id" : 427, "name" : "internal.metrics.shuffle.read.remoteBlocksFetched", "value" : "0" }, { "id" : 364, "name" : "number of output rows", "value" : "1" }, { "id" : 418, "name" : "internal.metrics.executorDeserializeTime", "value" : "6" }, { "id" : 430, "name" : "internal.metrics.shuffle.read.localBytesRead", "value" : "68" }, { "id" : 420, "name" : "internal.metrics.resultSize", "value" : "6543" }, { "id" : 429, "name" : "internal.metrics.shuffle.read.remoteBytesRead", "value" : "0" }, { "id" : 432, "name" : "internal.metrics.shuffle.read.recordsRead", "value" : "1" }, { "id" : 363, "name" : "duration total (min, med, max)", "value" : "0" }, { "id" : 428, "name" : "internal.metrics.shuffle.read.localBlocksFetched", "value" : "1" }, { "id" : 419, "name" : "internal.metrics.executorRunTime", "value" : "2" }, { "id" : 431, "name" : "internal.metrics.shuffle.read.fetchWaitTime", "value" : "0" } ], "tasks" : { "8" : { "taskId" : 8, "index" : 0, "attempt" : 0, "launchTime" : "2016-09-05T10:51:46.315GMT", "executorId" : "driver", "host" : "localhost", "taskLocality" : "ANY", "speculative" : false, "accumulatorUpdates" : [ ], "taskMetrics" : { "executorDeserializeTime" : 6, "executorRunTime" : 2, "resultSize" : 6543, "jvmGcTime" : 0, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "inputMetrics" : { "bytesRead" : 0, "recordsRead" : 0 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 1, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "localBytesRead" : 68, "recordsRead" : 1 }, "shuffleWriteMetrics" : { "bytesWritten" : 0, "writeTime" : 0, "recordsWritten" : 0 } } } }, "executorSummary" : { "driver" : { "taskTime" : 11, "failedTasks" : 0, "succeededTasks" : 1, "inputBytes" : 0, "outputBytes" : 0, "shuffleRead" : 68, "shuffleWrite" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0 } } } ] XXX timeit.timeit(spark.read.csv('file:///data/dump/test_csv', schema=schema).groupBy().sum(*(['dd_convs'] * 58) ).collect, number=1) 2.97951102257 { "jobId" : 5, "name" : "collect at /usr/lib/python2.7/timeit.py:100", "submissionTime" : "2016-09-05T10:51:46.670GMT", "completionTime" : "2016-09-05T10:51:49.372GMT", "stageIds" : [ 10, 11 ], "status" : "SUCCEEDED", "numTasks" : 2, "numActiveTasks" : 0, "numCompletedTasks" : 2, "numSkippedTasks" : 0, "numFailedTasks" : 0, "numActiveStages" : 0, "numCompletedStages" : 2, "numSkippedStages" : 0, "numFailedStages" : 0 } [ { "status" : "COMPLETE", "stageId" : 10, "attemptId" : 0, "numActiveTasks" : 0, "numCompleteTasks" : 1, "numFailedTasks" : 0, "executorRunTime" : 2661, "submissionTime" : "2016-09-05T10:51:46.677GMT", "firstTaskLaunchedTime" : "2016-09-05T10:51:46.677GMT", "completionTime" : "2016-09-05T10:51:49.351GMT", "inputBytes" : 1538820, "inputRecords" : 769163, "outputBytes" : 0, "outputRecords" : 0, "shuffleReadBytes" : 0, "shuffleReadRecords" : 0, "shuffleWriteBytes" : 68, "shuffleWriteRecords" : 1, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "name" : "collect at /usr/lib/python2.7/timeit.py:100", "details" : "org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:2512)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:280)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.GatewayConnection.run(GatewayConnection.java:211)\njava.lang.Thread.run(Thread.java:745)", "schedulingPool" : "default", "accumulatorUpdates" : [ { "id" : 478, "name" : "internal.metrics.jvmGCTime", "value" : "5" }, { "id" : 469, "name" : "duration total (min, med, max)", "value" : "2654" }, { "id" : 463, "name" : "data size total (min, med, max)", "value" : "470" }, { "id" : 490, "name" : "internal.metrics.shuffle.write.bytesWritten", "value" : "68" }, { "id" : 492, "name" : "internal.metrics.shuffle.write.writeTime", "value" : "371883" }, { "id" : 474, "name" : "number of output rows", "value" : "769163" }, { "id" : 477, "name" : "internal.metrics.resultSize", "value" : "1871" }, { "id" : 470, "name" : "number of output rows", "value" : "1" }, { "id" : 473, "name" : "aggregate time total (min, med, max)", "value" : "2654" }, { "id" : 491, "name" : "internal.metrics.shuffle.write.recordsWritten", "value" : "1" }, { "id" : 476, "name" : "internal.metrics.executorRunTime", "value" : "2661" }, { "id" : 494, "name" : "internal.metrics.input.recordsRead", "value" : "769163" }, { "id" : 493, "name" : "internal.metrics.input.bytesRead", "value" : "1538820" }, { "id" : 475, "name" : "internal.metrics.executorDeserializeTime", "value" : "9" } ], "tasks" : { "9" : { "taskId" : 9, "index" : 0, "attempt" : 0, "launchTime" : "2016-09-05T10:51:46.677GMT", "executorId" : "driver", "host" : "localhost", "taskLocality" : "PROCESS_LOCAL", "speculative" : false, "accumulatorUpdates" : [ ], "taskMetrics" : { "executorDeserializeTime" : 9, "executorRunTime" : 2661, "resultSize" : 1871, "jvmGcTime" : 5, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "inputMetrics" : { "bytesRead" : 1538820, "recordsRead" : 769163 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "localBytesRead" : 0, "recordsRead" : 0 }, "shuffleWriteMetrics" : { "bytesWritten" : 68, "writeTime" : 371883, "recordsWritten" : 1 } } } }, "executorSummary" : { "driver" : { "taskTime" : 2674, "failedTasks" : 0, "succeededTasks" : 1, "inputBytes" : 1538820, "outputBytes" : 0, "shuffleRead" : 0, "shuffleWrite" : 68, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0 } } } ] [ { "status" : "COMPLETE", "stageId" : 11, "attemptId" : 0, "numActiveTasks" : 0, "numCompleteTasks" : 1, "numFailedTasks" : 0, "executorRunTime" : 8, "submissionTime" : "2016-09-05T10:51:49.355GMT", "firstTaskLaunchedTime" : "2016-09-05T10:51:49.355GMT", "completionTime" : "2016-09-05T10:51:49.372GMT", "inputBytes" : 0, "inputRecords" : 0, "outputBytes" : 0, "outputRecords" : 0, "shuffleReadBytes" : 68, "shuffleReadRecords" : 1, "shuffleWriteBytes" : 0, "shuffleWriteRecords" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "name" : "collect at /usr/lib/python2.7/timeit.py:100", "details" : "org.apache.spark.rdd.RDD.collect(RDD.scala:892)\norg.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)\norg.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply$mcI$sp(Dataset.scala:2513)\norg.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:2513)\norg.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:2513)\norg.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)\norg.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)\norg.apache.spark.sql.Dataset.collectToPython(Dataset.scala:2512)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:280)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.GatewayConnection.run(GatewayConnection.java:211)\njava.lang.Thread.run(Thread.java:745)", "schedulingPool" : "default", "accumulatorUpdates" : [ { "id" : 532, "name" : "internal.metrics.shuffle.read.fetchWaitTime", "value" : "0" }, { "id" : 528, "name" : "internal.metrics.shuffle.read.remoteBlocksFetched", "value" : "0" }, { "id" : 465, "name" : "number of output rows", "value" : "1" }, { "id" : 519, "name" : "internal.metrics.executorDeserializeTime", "value" : "6" }, { "id" : 531, "name" : "internal.metrics.shuffle.read.localBytesRead", "value" : "68" }, { "id" : 521, "name" : "internal.metrics.resultSize", "value" : "6623" }, { "id" : 530, "name" : "internal.metrics.shuffle.read.remoteBytesRead", "value" : "0" }, { "id" : 533, "name" : "internal.metrics.shuffle.read.recordsRead", "value" : "1" }, { "id" : 464, "name" : "duration total (min, med, max)", "value" : "0" }, { "id" : 520, "name" : "internal.metrics.executorRunTime", "value" : "8" }, { "id" : 529, "name" : "internal.metrics.shuffle.read.localBlocksFetched", "value" : "1" } ], "tasks" : { "10" : { "taskId" : 10, "index" : 0, "attempt" : 0, "launchTime" : "2016-09-05T10:51:49.355GMT", "executorId" : "driver", "host" : "localhost", "taskLocality" : "ANY", "speculative" : false, "accumulatorUpdates" : [ ], "taskMetrics" : { "executorDeserializeTime" : 6, "executorRunTime" : 8, "resultSize" : 6623, "jvmGcTime" : 0, "resultSerializationTime" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0, "inputMetrics" : { "bytesRead" : 0, "recordsRead" : 0 }, "outputMetrics" : { "bytesWritten" : 0, "recordsWritten" : 0 }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, "localBlocksFetched" : 1, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "localBytesRead" : 68, "recordsRead" : 1 }, "shuffleWriteMetrics" : { "bytesWritten" : 0, "writeTime" : 0, "recordsWritten" : 0 } } } }, "executorSummary" : { "driver" : { "taskTime" : 16, "failedTasks" : 0, "succeededTasks" : 1, "inputBytes" : 0, "outputBytes" : 0, "shuffleRead" : 68, "shuffleWrite" : 0, "memoryBytesSpilled" : 0, "diskBytesSpilled" : 0 } } } ]
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org