[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21623 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199116993 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +688,62 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.where("col > 0").count() === 2) } } + + test("filter pushdown - StringStartsWith") { +withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { implicit df => + // Test canDrop() --- End diff -- Both methods have been executed but it can't be confirmed which method has taken effect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199081727 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +688,62 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.where("col > 0").count() === 2) } } + + test("filter pushdown - StringStartsWith") { +withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { implicit df => + // Test canDrop() --- End diff -- to confirm, do they test `canDrop` or `keep`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199080962 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -140,6 +142,32 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex checkBinaryFilterPredicate(predicate, filterClass, Seq(Row(expected)))(df) } + // This function tests that exactly go through the `canDrop` and `inverseCanDrop`. + private def testStringStartsWith(dataFrame: DataFrame, filter: String): Unit = { +withTempPath { dir => + val path = dir.getCanonicalPath + dataFrame.write.option("parquet.block.size", 512).parquet(path) + Seq(true, false).foreach { pushDown => +withSQLConf( + SQLConf.PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED.key -> pushDown.toString) { + val accu = new NumRowGroupsAcc + sparkContext.register(accu) + + val df = spark.read.parquet(path).filter(filter) + df.foreachPartition((it: Iterator[Row]) => it.foreach(v => accu.add(0))) + df.collect --- End diff -- what does this `collect` do? `foreachPartition` is already an action --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199078944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +273,36 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if pushDownStartWith && canMakeFilterOn(name) => +Option(prefix).map { v => + FilterApi.userDefined(binaryColumn(name), +new UserDefinedPredicate[Binary] with Serializable { + private val strToBinary = Binary.fromReusedByteArray(v.getBytes) + private val size = strToBinary.length + + override def canDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) < 0 || + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) > 0 + } + + override def inverseCanDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) == 0 && + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) == 0 + } + + override def keep(value: Binary): Boolean = { + UTF8String.fromBytes(value.getBytes).startsWith(UTF8String.fromString(v)) --- End diff -- `UTF8String.fromString(v)` -> `UTF8String.fromBytes(strToBinary.getBytes)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199078056 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,14 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED = +buildConf("spark.sql.parquet.filterPushdown.string.startsWith") +.doc("If true, enables Parquet filter push-down optimization for string starts with. " + --- End diff -- `for string startsWith function` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user stanzhai commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199062132 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -378,6 +378,14 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_STRING_STARTSWITH_ENABLED = +buildConf("spark.sql.parquet.filterPushdown.string.startsWith") --- End diff -- It would be better if we added `.enabled` postfix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199043411 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +661,56 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.where("col > 0").count() === 2) } } + + test("filter pushdown - StringStartsWith") { +withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { implicit df => --- End diff -- Added `testStringStartsWith` to test that exactly go through the `canDrop` and `inverseCanDrop`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r199043210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -22,16 +22,23 @@ import java.sql.Date import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary +import org.apache.parquet.schema.PrimitiveComparator import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.sources import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ -private[parquet] class ParquetFilters(pushDownDate: Boolean) { +private[parquet] class ParquetFilters() { + + val sqlConf: SQLConf = SQLConf.get --- End diff -- You are right. I hit a bug here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198553569 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -22,16 +22,23 @@ import java.sql.Date import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary +import org.apache.parquet.schema.PrimitiveComparator import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.sources import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ -private[parquet] class ParquetFilters(pushDownDate: Boolean) { +private[parquet] class ParquetFilters() { + + val sqlConf: SQLConf = SQLConf.get --- End diff -- This should pass in `pushDownDate` and `pushDownStartWith` like the previous version did with just the date setting. The SQLConf is already available in ParquetFileFormat and it *would* be better to pass it in. The problem is that this class is instantiated in the function (`(file: PartitionedFile) => { ... }`) that gets serialized and sent to executors. That means we don't want SQLConf and its references in the function's closure. The way we got around this before was to put boolean config vals in the closure instead. I think you should go with that approach. I'm not sure what `SQLConf.get` is for or what a correct use would be. @gatorsmile, can you comment on use of `SQLConf.get`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198551889 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +661,56 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.where("col > 0").count() === 2) } } + + test("filter pushdown - StringStartsWith") { +withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { implicit df => --- End diff -- I think that all of these tests go through the `keep` method instead of the `canDrop` and `inverseCanDrop`. I think those methods need to be tested. You can do that by constructing a Parquet file with row groups that have predictable statistics, but that would be difficult. An easier way to do this is to define the predicate class elsewhere and create a unit test for it that passes in different statistics values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198269733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if pushDownStartWith && canMakeFilterOn(name) => +Option(prefix).map { v => + FilterApi.userDefined(binaryColumn(name), +new UserDefinedPredicate[Binary] with Serializable { + private val strToBinary = Binary.fromReusedByteArray(v.getBytes) + private val size = strToBinary.length + + override def canDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) < 0 || + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) > 0 + } + + override def inverseCanDrop(statistics: Statistics[Binary]): Boolean = false --- End diff -- @rdblue oh, sorry I have not seen your reply. Yes, in that case we can and you are right it is worth to do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198245076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if pushDownStartWith && canMakeFilterOn(name) => +Option(prefix).map { v => + FilterApi.userDefined(binaryColumn(name), +new UserDefinedPredicate[Binary] with Serializable { + private val strToBinary = Binary.fromReusedByteArray(v.getBytes) + private val size = strToBinary.length + + override def canDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) < 0 || + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) > 0 + } + + override def inverseCanDrop(statistics: Statistics[Binary]): Boolean = false --- End diff -- There is one rare case when you can drop it with inverse when both min and max starts the perfix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198244664 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if pushDownStartWith && canMakeFilterOn(name) => +Option(prefix).map { v => + FilterApi.userDefined(binaryColumn(name), +new UserDefinedPredicate[Binary] with Serializable { + private val strToBinary = Binary.fromReusedByteArray(v.getBytes) + private val size = strToBinary.length + + override def canDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) < 0 || + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) > 0 + } + + override def inverseCanDrop(statistics: Statistics[Binary]): Boolean = false --- End diff -- Sorry, I meant if the min and max both *include* the prefix, then we should be able to drop the range. The situation is where both min and max match, so all values must also match the filter. If we are looking for values that do not match the filter, then we can eliminate the row group. The example is prefix=CCC and values are between min=CCCa and max=CCCZ: all values start with CCC, so the entire row group can be skipped. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198241792 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if pushDownStartWith && canMakeFilterOn(name) => +Option(prefix).map { v => + FilterApi.userDefined(binaryColumn(name), +new UserDefinedPredicate[Binary] with Serializable { + private val strToBinary = Binary.fromReusedByteArray(v.getBytes) + private val size = strToBinary.length + + override def canDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) < 0 || + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) > 0 + } + + override def inverseCanDrop(statistics: Statistics[Binary]): Boolean = false --- End diff -- No. Let me illustrate this with an example: let's assume min="BBB", max="DDD" canDrop() means if your prefix is before "BBB" (like "A") we can stop as there is no reason to search within this range. This is also true for prefixes after "DDD" (like "E"). Now if your operator is negated. What can you say when your prefix is "C" and the range is "BBB" and "DDD"? Can you drop it? No. And if the prefix is "A" or "E". Still not. You see you should check the range. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r198230713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +277,29 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if pushDownStartWith && canMakeFilterOn(name) => +Option(prefix).map { v => + FilterApi.userDefined(binaryColumn(name), +new UserDefinedPredicate[Binary] with Serializable { + private val strToBinary = Binary.fromReusedByteArray(v.getBytes) + private val size = strToBinary.length + + override def canDrop(statistics: Statistics[Binary]): Boolean = { +val comparator = PrimitiveComparator.UNSIGNED_LEXICOGRAPHICAL_BINARY_COMPARATOR +val max = statistics.getMax +val min = statistics.getMin +comparator.compare(max.slice(0, math.min(size, max.length)), strToBinary) < 0 || + comparator.compare(min.slice(0, math.min(size, min.length)), strToBinary) > 0 + } + + override def inverseCanDrop(statistics: Statistics[Binary]): Boolean = false --- End diff -- Why can't this evaluate the inverse of `StartsWith`? If the min and max values exclude the prefix, then this should be able to filter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r197992151 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -660,6 +660,30 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex assert(df.where("col > 0").count() === 2) } } + + test("filter pushdown - StringStartsWith") { +withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) { implicit df => + Seq("2", "2s", "2st", "2str", "2str2").foreach { prefix => +checkFilterPredicate( + '_1.startsWith(prefix).asInstanceOf[Predicate], + classOf[UserDefinedByInstance[_, _]], + "2str2") + } + + Seq("2S", "null", "2str22").foreach { prefix => +checkFilterPredicate( + '_1.startsWith(prefix).asInstanceOf[Predicate], + classOf[UserDefinedByInstance[_, _]], + Seq.empty[Row]) + } + + assertResult(None) { +parquetFilters.createFilter( + df.schema, + sources.StringStartsWith("_1", null)) --- End diff -- Thanks @attilapiros , `sources.StringStartsWith("_1", null)` will not matches them, same as before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21623#discussion_r197988392 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +272,29 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.StringStartsWith(name, prefix) if canMakeFilterOn(name) => --- End diff -- What do you think about adding a configuration to control this and set it true by default? It's basically dependent on an user defined predicate we manually wrote here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21623: [SPARK-24638][SQL] StringStartsWith support push ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21623 [SPARK-24638][SQL] StringStartsWith support push down ## What changes were proposed in this pull request? `StringStartsWith` support push down. About 50% savings in compute time. ## How was this patch tested? unit tests and manual tests. Performance test: ```scala cat < SPARK-24638.scala spark.range(1000).selectExpr("concat(id, 'str', id) as id").coalesce(1).write.option("parquet.block.size", 1048576).parquet("/tmp/spark/parquet/string") val df = spark.read.parquet("/tmp/spark/parquet/string/") spark.sql("set spark.sql.parquet.filterPushdown=true") val pushdownEnableStart = System.currentTimeMillis() for(i <- 0 until 100) { df.where("id like '98%'").count() } val pushdownEnable = System.currentTimeMillis() - pushdownEnableStart spark.sql("set spark.sql.parquet.filterPushdown=false") val pushdownDisableStart = System.currentTimeMillis() for(i <- 0 until 100) { df.where("id like '98%'").count() } val pushdownDisable = System.currentTimeMillis() - pushdownDisableStart val improvements = pushdownDisable.toDouble - pushdownEnable.toDouble println(s"improvements: ${improvements}") EOF bin/spark-shell -i SPARK-24638.scala ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-24638 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21623.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21623 commit 5b52ace44c8a41631535c883b7a5c8545959e5e5 Author: Yuming Wang Date: 2018-06-23T13:27:30Z StringStartsWith support push down --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org