[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27844: [MINOR][SQL] Add back ImageSchema.readImages in Spark 3.0

GitBox Sat, 07 Mar 2020 10:33:07 -0800

dongjoon-hyun commented on a change in pull request #27844: [MINOR][SQL] Add 
back ImageSchema.readImages in Spark 3.0
URL: https://github.com/apache/spark/pull/27844#discussion_r389302400


 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
 ##########
 @@ -188,4 +189,76 @@ object ImageSchema {
       Some(Row(Row(origin, height, width, nChannels, mode, decoded)))
     }
   }
+
+  /**
+   * Read the directory of images from the local or remote source
+   *
+   * @note If multiple jobs are run in parallel with different sampleRatio or 
recursive flag,
+   * there may be a race condition where one job overwrites the hadoop configs 
of another.
+   * @note If sample ratio is less than 1, sampling uses a PathFilter that is 
efficient but
+   * potentially non-deterministic.
+   *
+   * @param path Path to the image directory
+   * @return DataFrame with a single column "image" of images;
+   *         see ImageSchema for the details
+   */
+  @deprecated("use `spark.read.format(\"image\").load(path)` and this 
`readImages` will be " +
+    "removed in 3.1.0.", "2.4.0")
+  def readImages(path: String): DataFrame = readImages(path, null, false, -1, 
false, 1.0, 0)
+
+  /**
+   * Read the directory of images from the local or remote source
+   *
+   * @note If multiple jobs are run in parallel with different sampleRatio or 
recursive flag,
+   * there may be a race condition where one job overwrites the hadoop configs 
of another.
+   * @note If sample ratio is less than 1, sampling uses a PathFilter that is 
efficient but
+   * potentially non-deterministic.
+   *
+   * @param path Path to the image directory
+   * @param sparkSession Spark Session, if omitted gets or creates the session
+   * @param recursive Recursive path search flag
+   * @param numPartitions Number of the DataFrame partitions,
+   *                      if omitted uses defaultParallelism instead
+   * @param dropImageFailures Drop the files that are not valid images from 
the result
+   * @param sampleRatio Fraction of the files loaded
+   * @return DataFrame with a single column "image" of images;
+   *         see ImageSchema for the details
+   */
+  @deprecated("use `spark.read.format(\"image\").load(path)` and this 
`readImages` will be " +
+    "removed in 3.1.0.", "2.4.0")
 
 Review comment:
   Ditto. Are you sure that we agree on `will be removed in 3.1.0`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27844: [MINOR][SQL] Add back ImageSchema.readImages in Spark 3.0

Reply via email to