[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-11-27 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19118
  
@jiangxb1987 well, the test case is hard to construct if we just run app in 
local like comments above. Any ideas to crack?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-12 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19118
  
@jiangxb1987 well, I passed that part above but met other initialization 
chances before runJob. They are in the write function of SparkHadoopWriter.

> 
// Assert the output format/key/value class is set in JobConf.
config.assertConf(jobContext, rdd.conf) <= chance

val committer = config.createCommitter(stageId)
committer.setupJob(jobContext) <= chance

// Try to write all RDD partitions as a Hadoop OutputFormat.
try {
  val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: 
Iterator[(K, V)]) => {
executeTask(
  context = context,
  config = config,
  jobTrackerId = jobTrackerId,
  sparkStageId = context.stageId,
  sparkPartitionId = context.partitionId,
  sparkAttemptNumber = context.attemptNumber,
  committer = committer,
  iterator = iter)
  })

One trace list:

> java.lang.Thread.State: RUNNABLE
  at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:3270)
  - locked <0x126a> (a java.lang.Class)
  at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:202)
  at 
org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:92)
  at 
org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
  at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
  at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
  at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:91)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.getWrapped(FileOutputCommitter.java:65)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
  at 
org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:233)
  at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:125)
  at 
org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:74)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19118: [SPARK-21882][CORE] OutputMetrics doesn't count w...

2017-09-12 Thread awarrior
Github user awarrior commented on a diff in the pull request:

https://github.com/apache/spark/pull/19118#discussion_r138263099
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -112,11 +112,12 @@ object SparkHadoopWriter extends Logging {
   jobTrackerId, sparkStageId, sparkPartitionId, sparkAttemptNumber)
 committer.setupTask(taskContext)
 
-val (outputMetrics, callback) = initHadoopOutputMetrics(context)
-
 // Initiate the writer.
 config.initWriter(taskContext, sparkPartitionId)
 var recordsWritten = 0L
+
+// Initialize callback function after the writer.
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-11 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19118
  
I met a trouble when I write a test case. It seems that this issue won't be 
triggered in only one node. I found that Driver node do createPathFromString so 
that there is no problem.

>   java.lang.Thread.State: RUNNABLE
  at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:3271)
  - locked <0x1211> (a java.lang.Class)
  at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:202)
  at 
org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:92)
  at 
org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47)
  at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
  at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
  at 
org.apache.spark.internal.io.SparkHadoopWriterUtils$.createPathFromString(SparkHadoopWriterUtils.scala:55)

Does anyone know how to test in this case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-09 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19118
  
@jiangxb1987 ok. I add one later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-05 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19115
  
@markhamstra sorry to make trouble, I have opened a new PR #19118.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19118: [SPARK-21882][CORE] OutputMetrics doesn't count w...

2017-09-04 Thread awarrior
GitHub user awarrior opened a pull request:

https://github.com/apache/spark/pull/19118

[SPARK-21882][CORE] OutputMetrics doesn't count written bytes correctly in 
the saveAsHadoopDataset function

spark-21882

## What changes were proposed in this pull request?

Switch the initialization order of HadoopOutputMetrics and SparkHadoopWriter

## How was this patch tested?

Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/awarrior/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19118


commit 0f0c3b1c91b4f06c7e48874b8f6329c5c1c1b3ce
Author: Jarvis <awarr...@users.noreply.github.com>
Date:   2017-09-04T06:21:13Z

Update SparkHadoopWriter.scala

spark-21882




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-04 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19115
  
ok, thx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19115: [SPARK-21882][CORE] OutputMetrics doesn't count w...

2017-09-04 Thread awarrior
Github user awarrior closed the pull request at:

https://github.com/apache/spark/pull/19115


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-03 Thread awarrior
Github user awarrior commented on the issue:

https://github.com/apache/spark/pull/19115
  
@jerryshao hi~ I have modified this PR. But this patch just work in 2.2.0 
(some changes apply now). I want to confirm whether I need to create a new PR. 
Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19114: Update PairRDDFunctions.scala

2017-09-03 Thread awarrior
Github user awarrior closed the pull request at:

https://github.com/apache/spark/pull/19114


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19115: Update PairRDDFunctions.scala

2017-09-03 Thread awarrior
GitHub user awarrior opened a pull request:

https://github.com/apache/spark/pull/19115

Update PairRDDFunctions.scala

[https://issues.apache.org/jira/browse/SPARK-21882](url)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/awarrior/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19115


commit a096970b2f2cfa497a96870ebd26f83a106b4e07
Author: Jarvis <awarr...@users.noreply.github.com>
Date:   2017-09-04T02:48:35Z

Update PairRDDFunctions.scala

spark-21882




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19114: Update PairRDDFunctions.scala

2017-09-03 Thread awarrior
GitHub user awarrior opened a pull request:

https://github.com/apache/spark/pull/19114

Update PairRDDFunctions.scala

[https://issues.apache.org/jira/browse/SPARK-21882](url)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/awarrior/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19114


commit e7e42802b07c5148ba02761af1edd2ee81d6ef95
Author: Jarvis <awarr...@users.noreply.github.com>
Date:   2017-09-04T02:52:01Z

Update PairRDDFunctions.scala

spark-21882




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org