[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-17 Thread guoxiaolongzte
Github user guoxiaolongzte closed the pull request at:

https://github.com/apache/spark/pull/21036


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-17 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r181963024
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -323,7 +323,7 @@ package object config {
   .internal()
   .doc("When true, HadoopRDD/NewHadoopRDD will not create partitions 
for empty input splits.")
   .booleanConf
-  .createWithDefault(false)
+  .createWithDefault(true)
--- End diff --

This seems silently changed the behavior. I would suggest not to do it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-11 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r180655799
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -55,7 +56,8 @@ private[spark] class HadoopPartition(rddId: Int, override 
val index: Int, s: Inp
 
   /**
* Get any environment variables that should be added to the users 
environment when running pipes
-   * @return a Map with the environment variables and corresponding 
values, it could be empty
+*
+* @return a Map with the environment variables and corresponding 
values, it could be empty
--- End diff --

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-11 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r180652894
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -55,7 +56,8 @@ private[spark] class HadoopPartition(rddId: Int, override 
val index: Int, s: Inp
 
   /**
* Get any environment variables that should be added to the users 
environment when running pipes
-   * @return a Map with the environment variables and corresponding 
values, it could be empty
+*
+* @return a Map with the environment variables and corresponding 
values, it could be empty
--- End diff --

what mean?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-11 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r180650637
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -55,7 +56,8 @@ private[spark] class HadoopPartition(rddId: Int, override 
val index: Int, s: Inp
 
   /**
* Get any environment variables that should be added to the users 
environment when running pipes
-   * @return a Map with the environment variables and corresponding 
values, it could be empty
+*
+* @return a Map with the environment variables and corresponding 
values, it could be empty
--- End diff --

nit: comment style


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-11 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21036#discussion_r180650706
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -86,8 +88,7 @@ private[spark] class HadoopPartition(rddId: Int, override 
val index: Int, s: Inp
  * @param keyClass Class of the key associated with the inputFormatClass.
  * @param valueClass Class of the value associated with the 
inputFormatClass.
  * @param minPartitions Minimum number of HadoopRDD partitions (Hadoop 
Splits) to generate.
- *
- * @note Instantiating this class directly is not recommended, please use
+  * @note Instantiating this class directly is not recommended, please use
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...

2018-04-10 Thread guoxiaolongzte
GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/21036

[SPARK-23958][CORE] HadoopRdd filters empty files to avoid generating empty 
tasks that affect the performance of the Spark computing performance.

## What changes were proposed in this pull request?

HadoopRdd filter empty files to avoid generating empty tasks that affect 
the performance of the Spark computing performance.

Empty file's length is zero.

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-23958

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21036


commit e4ccdf913157b45f11efe8b8900d1f805d853278
Author: guoxiaolong 
Date:   2018-04-11T02:48:51Z

[SPARK-23958][CORE] HadoopRdd filters empty files to avoid generating empty 
tasks that affect the performance of the Spark computing performance.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org