Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12252
Change subject: [spark] Add some logging to trace KuduContext operations ...................................................................... [spark] Add some logging to trace KuduContext operations This patch adds some logging to help track how long an entire KuduContext operation takes and also how long each part takes on each executor. This information has been sorely lacking in some cases where Spark's laziness makes attributing slowness to Kudu (vs other components of the Spark job) very difficult. Unfortunately, it's not as straightforward to add this sort of logging to reading from Kudu (KuduRDD) because Spark may lazily read batches from Kudu, and batches may be small enough that logging for each batch is so verbose that it is not useful. I tested this patch manually on a 3-node cluster and confirmed I saw the expected log messages on the driver and on the executors, e.g. 19/01/22 15:18:13 INFO kudu.KuduContext: applying operations of type 'insert' to table 'impala::default.aaa' 19/01/22 15:18:13 INFO kudu.KuduContext: applied 1 operations of type 'insert' to table 'impala::default.aaa' Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c --- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala 2 files changed, 26 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/12252/1 -- To view, visit http://gerrit.cloudera.org:8080/12252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c Gerrit-Change-Number: 12252 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <[email protected]>
