[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/7 Please remove the 0 semantics. IMO the zero vs negative number difference is too subtle. I only find Java String supporting that. Python doesn't. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214135400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,33 +229,58 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress /** - * Splits str around pat (pattern is a regular expression). + * Splits str around matches of the given regex. */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`" + +" and returns an array of at most `limit`", + arguments = """ +Arguments: + * str - a string expression to split. + * regex - a string representing a regular expression. The regex string should be a +Java regular expression. + * limit - an integer expression which controls the number of times the regex is applied. + +limit > 0: The resulting array's length will not be more than `limit`, and the resulting + array's last entry will contain all input beyond the last matched regex. + +limit < 0: `regex` will be applied as many times as possible, and the resulting + array can be of any size. + +limit = 0: `regex` will be applied as many times as possible, the resulting array can --- End diff -- yea but i'd focus on what behavior we want to enable. do other database systems have this split=0 semantics? if not, i'd rewrite split=0 internally to just -1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214131195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,33 +229,58 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress /** - * Splits str around pat (pattern is a regular expression). + * Splits str around matches of the given regex. */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`" + +" and returns an array of at most `limit`", + arguments = """ +Arguments: + * str - a string expression to split. + * regex - a string representing a regular expression. The regex string should be a +Java regular expression. + * limit - an integer expression which controls the number of times the regex is applied. + +limit > 0: The resulting array's length will not be more than `limit`, and the resulting + array's last entry will contain all input beyond the last matched regex. + +limit < 0: `regex` will be applied as many times as possible, and the resulting + array can be of any size. + +limit = 0: `regex` will be applied as many times as possible, the resulting array can --- End diff -- why do we want to build into split to remove trailing empty strings? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 Thanks for pinging. Please don't merge this until you've addressed the OOM issue. The aggregators were created to handle incoming data larger than size of memory. We should never use a Scala or Java hash set to put all the data in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r214103667 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1) +// If the data is already approriately partitioned with a known partitioner we can work locally. +def removeDuplicatesInPartition(itr: Iterator[T]): Iterator[T] = { + val set = new mutable.HashSet[T]() + itr.filter(set.add(_)) --- End diff -- This is a bad implementation and could OOM. You should reuse reduceByKey. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r214103223 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1) +// If the data is already approriately partitioned with a known partitioner we can work locally. +def removeDuplicatesInPartition(itr: Iterator[T]): Iterator[T] = { + val set = new mutable.HashSet[T]() + itr.filter(set.add(_)) --- End diff -- This is a bad implementation and could OOM. You should reuse reduceByKey. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22258: [SPARK-25266] Fix memory leak vulnerability in Barrier E...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22258 Can you remove vulnerability from the title? Otherwise it sounds like a security vulnerability here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213100428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) // since the other rules might make two separate Unions operators adjacent. Batch("Union", Once, CombineUnions) :: +// run this once earlier. this might simplify the plan and reduce cost of optimizer --- End diff -- can you put your comment above into the comment in code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18447 Yea I'd probably reject this for now, until we see bigger needs for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213026874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { --- End diff -- we shouldn't be adding methods here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212815703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + --- End diff -- you should say if limit is ignored if it is a non-positive number. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212815685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + --- End diff -- can we be more concise? e.g. presto's doc is just "Splits string on delimiter and returns an array of size at most limit. The last element in the array always contain everything left in the string. limit must be a positive number." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22185 cc @rdblue @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDo...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22185 [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatalystFilters ## What changes were proposed in this pull request? They depend on internal Expression APIs. Let's see how far we can get without it. ## How was this patch tested? Just some code removal. There's no existing tests as far as I can tell so it's easy to remove. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-25127 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22185.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22185 commit 3c4115d26cce3c0bc8b297da68cf752a0f666768 Author: Reynold Xin Date: 2018-08-22T16:14:30Z [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatalystFilters --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16600: [SPARK-19242][SQL] SHOW CREATE TABLE should generate new...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16600 Can you close this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22065 Are we talking about a 0.7% margin improvement? It doesn't seem like it's worth the complexity. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22157 Do we have a similar issue for Parquet? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22160: Revert "[SPARK-24418][BUILD] Upgrade Scala to 2.11.12 an...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22160 Can you add to the pr description why we are reverting? Just copy paste what you had above. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22134: [SPARK-25143][SQL] Support data source name mapping conf...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22134 I think it's premature to introduce this. The extra layer of abstraction actually makes it more difficult to reason about what's going on. We don't have that many data sources that require flexibility here, and we can always add the flexibility if needed later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21944: [SPARK-24988][SQL]Add a castBySchema method which casts ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21944 Thanks, Mahmoud! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21951 LGTM. On Thu, Aug 2, 2018 at 1:14 AM Xiao Li wrote: > This will simplify the code and improve the readability. We can do the > same in the other expression. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21951#issuecomment-409807975>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPIEkrQyLFUXLOl66_94V2f6MY7Obks5uMoqrgaJpZM4VrcXh> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21951 Why would we want to use the DSL here? Do we use it in other expressions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21938: [SPARK-24982][SQL] UDAF resolution should not thr...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21938 [SPARK-24982][SQL] UDAF resolution should not throw AssertionError ## What changes were proposed in this pull request? When user calls anUDAF with the wrong number of arguments, Spark previously throws an AssertionError, which is not supposed to be a user-facing exception. This patch updates it to throw AnalysisException instead. ## How was this patch tested? Updated test case udaf.sql.out. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-24982 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21938.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21938 commit 84262dc21dd9f9aa409dd5e873d31d5b26a231f3 Author: Reynold Xin Date: 2018-07-31T21:00:20Z [SPARK-24982][SQL] UDAF resolution should not throw java.lang.AssertionError --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21934#discussion_r206681479 --- Diff: sql/core/src/test/resources/sql-tests/results/table-valued-functions.sql.out --- @@ -83,8 +83,13 @@ select * from range(1, null) -- !query 6 schema struct<> -- !query 6 output -java.lang.IllegalArgumentException -Invalid arguments for resolved function: 1, null +org.apache.spark.sql.AnalysisException +error: table-valued function range with alternatives: --- End diff -- probably could go either way --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21934 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21934 cc @gatorsmile @ericl who originally wrote this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21934 [SPARK-24951][SQL] Table valued functions should throw AnalysisException ## What changes were proposed in this pull request? Previously TVF resolution could throw IllegalArgumentException if the data type is null type. This patch replaces that exception with AnalysisException, enriched with positional information, to improve error message reporting and to be more consistent with rest of Spark SQL. ## How was this patch tested? Updated the test case in table-valued-functions.sql.out, which is how I identified this problem in the first place. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-24951 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21934.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21934 commit 08a6b7fc1d2a1a6149c4da2ca1657258347fd18f Author: Reynold Xin Date: 2018-07-31T17:47:55Z [SPARK-24951][SQL] Table valued functions should throw AnalysisException commit 514fd77501194e43e8029734e4a3669f12fbf749 Author: Reynold Xin Date: 2018-07-31T17:53:21Z Improve error message --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21932 Do we really need this? It's almost always the case for resolution that you'd want to do bottom up, so I thought Michael's original design to just call it resolveOperators make a lot of sense. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21923 Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21922: [WIP] Add an ANSI SQL parser mode
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21922 what are the actual changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 LGTM. On Fri, Jul 27, 2018 at 10:58 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin> re: #21318 (comment) > <https://github.com/apache/spark/pull/21318#discussion_r20582> I > meant to say for instance, like "Please refer the SQL function > documentation in the corresponding version". We don't have to bother update > and also it makes sense .. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21318#issuecomment-408585310>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPO0G9cnPbJEyopz69kHMROglqL_Iks5uK_2TgaJpZM4T9LE-> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21318#discussion_r20582 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -39,7 +39,21 @@ import org.apache.spark.util.Utils /** - * Functions available for DataFrame operations. + * Commonly used functions available for DataFrame operations. Using functions defined here provides + * a little bit more compile-time safety to make sure the function exists. + * + * Spark also includes more built-in functions that are less common and are not defined here. + * You can still access them (and all the functions defined here) using the [[functions.expr()]] API + * and calling them through a SQL expression string. You can find the entire list of functions for + * the latest version of Spark at [[https://spark.apache.org/docs/latest/api/sql/index.html]]. --- End diff -- it's just a lot of work and i'm sure we will forget to update ... so i'm pointing to the latest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21897: [minor] Improve documentation for HiveStringType's
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21897 cc @gatorsmile cc @hvanhovell why did we expose these types as public Scala APIs? I feel they should not have been public. If they are public, we should have more generic VarcharType and CharType instead. Something to fix in 3.0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21897: [minor] Improve documentation for HiveStringType'...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21897 [minor] Improve documentation for HiveStringType's The diff should be self-explanatory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark hivestringtypedoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21897.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21897 commit 8b954a48dab904f192c60872159d9c876fa2ee20 Author: Reynold Xin Date: 2018-07-27T18:08:19Z [minor] Improve documentation for HiveStringType's --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21706: [SPARK-24702] Fix Unable to cast to calendar inte...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21706#discussion_r205851385 --- Diff: sql/core/src/test/resources/sql-tests/inputs/cast.sql --- @@ -42,4 +42,38 @@ SELECT CAST('9223372036854775808' AS long); DESC FUNCTION boolean; DESC FUNCTION EXTENDED boolean; + +-- cast null to calendar interval should return null +SELECT CAST(NULL as calendarinterval); +SELECT CALENDARINTERVAL(NULL); + +-- cast invalid strings to calendar interval should return null +SELECT CAST('interval 10' as calendarinterval); +SELECT CAST('interval 100 nanoseconds' as calendarinterval); +SELECT CAST('interval 1 second 10 years -10 months 1 minute' as calendarinterval); +SELECT CAST('interval 60 hours + 1 minute' as calendarinterval); +SELECT CAST('interval 1 day +5 minutes' as calendarinterval); + +-- cast valid strings to calendar interval should return calendar interval +SELECT CAST('interval 5 minutes' as calendarinterval); +SELECT CAST('interval 10 hours' as calendarinterval); +SELECT CAST('interval 1 second' as calendarinterval); +SELECT CAST('interval 3 years -3 month 7 week 123 microseconds' as calendarinterval); +SELECT CAST('interval 100 years 15 months -24 weeks 66 seconds' as calendarinterval); + +-- casting invalid strings to calendar interval using the function should return null --- End diff -- i wouldn't add this many test cases, since they are just simple aliases. jsut add one for calendarinterval and keep the cast ones. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendum
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21896 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendu...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21896 [SPARK-24865][SQL] Remove AnalysisBarrier addendum ## What changes were proposed in this pull request? I didn't want to pollute the diff in the previous PR and left some TODOs. This is a follow-up to address those TODOs. ## How was this patch tested? Should be covered by existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-24865-addendum Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21896.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21896 commit 5751b027918ee5b70d22327120077908b3497900 Author: Reynold Xin Date: 2018-07-27T17:33:07Z [SPARK-24865][SQL] Remove AnalysisBarrier addendum --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 Yup will do. On Fri, Jul 27, 2018 at 10:23 AM Sean Owen wrote: > Just browsing old PRs .. want to finish this one up @rxin > <https://github.com/rxin>? looks simple and useful. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21318#issuecomment-408484731>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPEOWETw8YHHvABHQ7-oTiWlsBnGtks5uK0yagaJpZM4T9LE-> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21699 I'm OK with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21873: [SPARK-24919][BUILD] New linter rule for sparkCon...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21873#discussion_r205252848 --- Diff: scalastyle-config.xml --- @@ -150,6 +150,19 @@ This file is divided into 3 sections: // scalastyle:on println]]> + +spark(.sqlContext)?.sparkContext.hadoopConfiguration --- End diff -- can we just match on hadoopConfiguration ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21758 What's the failure mode if there are not enough slots for the barrier mode? We should throw an exception right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205250930 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ActiveJob.scala --- @@ -60,4 +60,10 @@ private[spark] class ActiveJob( val finished = Array.fill[Boolean](numPartitions)(false) var numFinished = 0 + + // Mark all the partitions of the stage to be not finished. --- End diff -- use `/** */` style. also the sentence is a bit awkward. perhaps "Resets the status of all partitions in this stage so they are marked as not finished." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205250352 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1839,6 +1847,20 @@ abstract class RDD[T: ClassTag]( def toJavaRDD() : JavaRDD[T] = { new JavaRDD(this)(elementClassTag) } + + /** + * Whether the RDD is in a barrier stage. Spark must launch all the tasks at the same time for a + * barrier stage. + * + * An RDD is in a barrier stage, if at least one of its parent RDD(s), or itself, are mapped from + * a RDDBarrier. This function always returns false for a [[ShuffledRDD]], since a + * [[ShuffledRDD]] indicates start of a new stage. + */ + private[spark] def isBarrier(): Boolean = isBarrier_ + + // From performance concern, cache the value to avoid repeatedly compute `isBarrier()` on a long + // RDD chain. + @transient protected lazy val isBarrier_ : Boolean = dependencies.exists(_.rdd.isBarrier()) --- End diff -- you need to explain why mappartitionsrdd has a different isBarrier implementation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249547 --- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala --- @@ -27,7 +27,8 @@ import org.apache.spark.{Partition, TaskContext} private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag]( var prev: RDD[T], f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator) -preservesPartitioning: Boolean = false) +preservesPartitioning: Boolean = false, +isFromBarrier: Boolean = false) --- End diff -- we should explain what this flag does in classdoc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249449 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.apache.spark.annotation.{Experimental, Since} + +/** A [[TaskContext]] with extra info and tooling for a barrier stage. */ +trait BarrierTaskContext extends TaskContext { + + /** + * :: Experimental :: + * Sets a global barrier and waits until all tasks in this stage hit this barrier. Similar to + * MPI_Barrier function in MPI, the barrier() function call blocks until all tasks in the same + * stage have reached this routine. + */ + @Experimental + @Since("2.4.0") + def barrier(): Unit + + /** + * :: Experimental :: + * Returns the all task infos in this barrier stage, the task infos are ordered by partitionId. + */ + @Experimental + @Since("2.4.0") + def getTaskInfos(): Array[BarrierTaskInfo] --- End diff -- what other things do you expect to be included in the future in BarrierTaskInfo? It seems overkill to have a new class for a single field (address). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249225 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.apache.spark.annotation.{Experimental, Since} + + +/** + * :: Experimental :: + * Carries all task infos of a barrier task. + * + * @param address the IPv4 address(host:port) of the executor that a barrier task is running on + */ +@Experimental +@Since("2.4.0") +class BarrierTaskInfo(val address: String) --- End diff -- Can we just bake address into TaskInfo? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249297 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.apache.spark.annotation.{Experimental, Since} + + +/** + * :: Experimental :: + * Carries all task infos of a barrier task. + * + * @param address the IPv4 address(host:port) of the executor that a barrier task is running on + */ +@Experimental +@Since("2.4.0") +class BarrierTaskInfo(val address: String) --- End diff -- or is TaskIinfo not a public API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21875: [SPARK-24288][SQL] Enable preventing predicate pushdown
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21875 Can you add JDBC to the title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21866#discussion_r204961291 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends FileFormat with DataSourceRegister { spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] = { -val conf = spark.sparkContext.hadoopConfiguration +val conf = spark.sessionState.newHadoopConf() --- End diff -- can we add a linter rule for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r204959300 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager( } if (data != null) { -return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize)) +// SPARK-24307 undocumented "escape-hatch" in case there are any issues in converting to +// to ChunkedByteBuffer, to go back to old code-path. Can be removed post Spark 2.4 if +// new path is stable. +if (conf.getBoolean("spark.fetchToNioBuffer", false)) { --- End diff -- can we have a better prefix, rather than just spark. ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204957474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -751,7 +751,8 @@ object TypeCoercion { */ case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { -override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transform { case p => +override protected def coerceTypes( + plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p => --- End diff -- im using a weird wrapping here to minimize the diff. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 I changed the way we do the checks in test to use a thread local rather than checking the stacktrace, so they should run faster now. Also added test cases for the various new methods. Also moved the relevant code into AnalysisHelper for better code structure. This should be ready now if tests pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204955869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) + // TODO(rxin): Why do we need transformUp here? --- End diff -- @cloud-fan ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 If that's the only one I think that PR itself needs to be fixed (significantly increases test runtime), and I wouldn't increase the time here. On Mon, Jul 23, 2018 at 11:44 PM Hyukjin Kwon wrote: > Yup, looks so in your PR #21822 (comment) > <https://github.com/apache/spark/pull/21822#issuecomment-407298841> > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21845#issuecomment-407298942>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPLusPAUegZcctlDTG2lWAJO1pjlDks5uJsJagaJpZM4VaY0E> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 Yea the extra check in test cases might've contributed to the longer test time. Let me think about how to reduce it. On Mon, Jul 23, 2018 at 11:28 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin>, I think this PR could possibly cause > some performance effect given the latest test ran above > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93470/ > and from a rough scan of other builds: > > ArithmeticExpressionSuite: > - SPARK-22499: Least and greatest should not generate codes beyond 64KB (11 minutes, 51 seconds) > > CastSuite: > - cast string to timestamp (8 minutes, 42 seconds) > > TPCDSQuerySuite: > - q14a-v2.7 (2 minutes, 29 seconds) > > SQLQueryTestSuite: > - subquery/in-subquery/in-joins.sql (2 minutes, 36 seconds) > > ContinuousStressSuite: > - only one epoch (3 minutes, 21 seconds) > - automatic epoch advancement (3 minutes, 21 seconds) > > vs > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/4699/consoleFull > > > ArithmeticExpressionSuite: > - SPARK-22499: Least and greatest should not generate codes beyond 64KB (7 minutes, 49 seconds) > > CastSuite: > - cast string to timestamp (1 minute) > > TPCDSQuerySuite: > - q14a-v2.7 (3 seconds, 442 milliseconds) > > SQLQueryTestSuite: > - subquery/in-subquery/in-joins.sql (2 minutes, 21 seconds) > > ContinuousStressSuite: > - only one epoch (3 minutes, 21 seconds) > - automatic epoch advancement (3 minutes, 21 seconds) > > There could of course other factors like machine's status as well but I > thought it's good to note while I am taking a look for those. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21822#issuecomment-407295739>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPMFWaX8huI8hkWBghiP740jtu592ks5uJr6fgaJpZM4VXRDF> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 Are more pull requests failing due to time out right now? On Mon, Jul 23, 2018 at 6:30 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin>, btw you want me close this one or get > this in? Will take a look for the build and tests thing again during this > week for sure anyway. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21845#issuecomment-407250594>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPAz8b9kFd3V6puP3zYfyw-GSv2BGks5uJnimgaJpZM4VaY0E> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204504127 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import org.apache.spark.annotation.{Experimental, Since} + + +/** + * :: Experimental :: + * Carries all task infos of a barrier task. + * + * @param address the IPv4 address(host:port) of the executor that a barrier task is running on + */ +@Experimental +@Since("2.4.0") +class BarrierTaskInfo(val address: String) --- End diff -- does this need to be a public API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol â||â of the âORâ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 No we can't because you can still use string concat in filters, e.g. colA || colB == "ab" What is "||" here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 This helps, but it is not sustainable to keep increasing the threshold. What we need to do is to look at test time distribution and figure out what test suites are unnecessarily long and actually cut down the time there. @HyukjinKwon Would you be interested in doing that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21802 Do we really need full codegen for all of these collection functions? They seem pretty slow and specialization with full codegen won't help perf that much (and might even hurt by blowing up the code size) right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol â||â of the âORâ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 cc @gatorsmile @cloud-fan @HyukjinKwon this is a good thing to do? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -33,6 +49,116 @@ abstract class LogicalPlan with QueryPlanConstraints with Logging { + private var _analyzed: Boolean = false + + /** + * Marks this plan as already analyzed. This should only be called by [[CheckAnalysis]]. + */ + private[catalyst] def setAnalyzed(): Unit = { _analyzed = true } + + /** + * Returns true if this node and its children have already been gone through analysis and + * verification. Note that this is only an optimization used to avoid analyzing trees that + * have already been analyzed, and can be reset by transformations. + */ + def analyzed: Boolean = _analyzed + + /** + * Returns a copy of this node where `rule` has been recursively applied first to all of its + * children and then itself (post-order, bottom-up). When `rule` does not apply to a given node, + * it is left unchanged. This function is similar to `transformUp`, but skips sub-trees that + * have already been marked as analyzed. + * + * @param rule the function use to transform this nodes children + */ + def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = { --- End diff -- todo: add unit tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -23,8 +23,24 @@ import org.apache.spark.sql.catalyst.analysis._ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.QueryPlan import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats -import org.apache.spark.sql.catalyst.trees.CurrentOrigin +import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, TreeNode} import org.apache.spark.sql.types.StructType +import org.apache.spark.util.Utils + + +object LogicalPlan { + + private val resolveOperatorDepth = new ThreadLocal[Int] { --- End diff -- todo: explain what this is --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2390,16 +2375,21 @@ class Analyzer( * scoping information for attributes and can be removed once analysis is complete. */ object EliminateSubqueryAliases extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { -case SubqueryAlias(_, child) => child + // This is actually called in the beginning of the optimization phase, and as a result + // is using transformUp rather than resolveOperators. This is also often called in the + // --- End diff -- note: finish comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160853 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- Thanks. I'm going to add it back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) + // TODO(rxin): Why do we need transformUp here? --- End diff -- cc @cloud-fan why do we need transformUp here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21803 Should we do schema.toDDL, or StructType.toDDL(schema)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r203918981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- cc @dilipbiswal @cloud-fan @gatorsmile Why did we need BooleanSimplification here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18784 Let's remove it in 3.0 then. We can do it after 2.4 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r203496489 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +package object avro { + /** + * Adds a method, `avro`, to DataFrameWriter that allows you to write avro files using + * the DataFileWriter + */ + implicit class AvroDataFrameWriter[T](writer: DataFrameWriter[T]) { +def avro: String => Unit = writer.format("avro").save --- End diff -- I'd remove the Scala API. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21766 Why did you need this change? Given it's very difficult to revert the change (or introduce a proper numeric type if ever needed in the future), I would not merge this pull request unless there are sufficient justification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 To me it is actually confusing to have the decimal one in there at all, by defining a list of queries that are reused for different functional testing. It is very easy to just ignore the subtle differences. We are also risk over engineering this with only one use case. On Tue, Jul 10, 2018 at 8:20 AM Wenchen Fan wrote: > We can deal with the decimal test file specially if that's the only use > case. For now I'd say the join test is more important and let's finish it > first. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21568#issuecomment-403860889>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPMjJPsZhXrOo_pbuxz-GwvKdds9lks5uFMYkgaJpZM4UoVQo> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 What are the use cases other than decimal? I am not sure if we need to build a lot of infrastructure just for one or two use cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 If they produce different results why do you need any infrastructure for them? They are just part of the normal test flow. If they produce the same result, and you don't want to define the same test queries twice, we can create an infra for that. I thought that's what this is about? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 Can you just define a config matrix in the beginning of the file, and each file is run with the config matrix? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 I think it's super confusing to have the config names encoded in file names. Makes the names super long and difficult to read, and also hard to verify what was set, and difficult to get multiple configs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21705#discussion_r199940775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala --- @@ -66,6 +66,12 @@ object StaticSQLConf { .checkValue(cacheSize => cacheSize >= 0, "The maximum size of the cache must not be negative") .createWithDefault(1000) + val CODEGEN_CACHE_SIZE = buildStaticConf("spark.sql.codegen.cacheSize") + .doc("The maximum cache size for generated classes.") --- End diff -- spark.sql.codegen.cache.maxEntries? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21686 Thanks. Awesome. This matches what I had in mind then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21459 SGTM. On Mon, Jul 2, 2018 at 4:38 PM DB Tsai wrote: > There are three approvals from the committers, and the changes are pretty > trivial to revert if we see any performance regression which is unlikely. > To move thing forward, if there is no further objection, I'll merge it > tomorrow. Thanks. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21459#issuecomment-401968866>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPGuKGHBfHvFrtH9xZ79XWTJbrGSXks5uCq7hgaJpZM4USoCT> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21686 Does this actually work in SQL? How does it work when we don't have a data type that's a schema? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21626: [SPARK-24642][SQL] New function infers schema for JSON c...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21626 It is on the public list: https://issues.apache.org/jira/browse/SPARK-24642 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21598#discussion_r198364343 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1324,6 +1324,12 @@ object SQLConf { "Other column values can be ignored during parsing even if they are malformed.") .booleanConf .createWithDefault(true) + + val LEGACY_SIZE_OF_NULL = buildConf("spark.sql.legacy.sizeOfNull") +.doc("If it is set to true, size of null returns -1. This behavior was inherited from Hive. " + + "The size function returns null for null input if the flag is disabled.") --- End diff -- perhaps you should say this will be updated to false in spark 3.0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 OK I double checked. I don't think we should be adding this functionality, since different databases implemented it differently, and it is somewhat difficult to create Infinity in Spark SQL given we return null or nan. On top of that, we already support equality for infinity, e.g. ``` spark.range(1).select( org.apache.spark.sql.functions.lit(java.lang.Double.POSITIVE_INFINITY) ===org.apache.spark.sql.functions.lit(java.lang.Double.POSITIVE_INFINITY)).show() ``` The above shows true. If you start adding inf, we'd need to soon add functions for negative infinity, and these functions provide very little value beyond what we already support using equality. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 Hey I have an additional thought on this. Will leave it in the next ten mins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Here: https://en.wikipedia.org/wiki/Bug_compatibility On Tue, Jun 26, 2018 at 9:28 AM Reynold Xin wrote: > Itâs actually common software engineering practice to keep âbuggyâ > semantics if a bug has been out there long enough and a lot of applications > depend on the semantics. > > On Tue, Jun 26, 2018 at 9:18 AM Hyukjin Kwon > wrote: > >> I don't think it's too discretionary. We have a safeguard to control the >> behaviour. Spark mentions it in the migration guide. In case of changing >> public interface which breaks binary or source compatibility, there should >> really be strong argument, sure. For clarification, I don't think such >> change is made usually. >> >> In this case, it changes a behaviour even with a safeguard. Sounds pretty >> minor. I wonder why this is suddenly poped up. As I said, if the standards >> don't reflect the practice, the standards should be corrected or the >> practice should be complied. Committer's judgement is needed time to time. >> We need more committers for the more proper review iteration. Let's back it >> roll. >> >> If you prefer more conservative distribution, it should be an option to >> consider using a maintenance release. >> >> we may choose to retain that buggy behavior >> >> I strongly disagree. We should fix the buggy behavior. There's no point >> of having upper versions. >> >> If you strongly doubt it, please open a discussion in the mailing list >> and see if we get agreed at some point. >> >> â >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> <https://github.com/apache/spark/pull/21598#issuecomment-400372980>, or mute >> the thread >> <https://github.com/notifications/unsubscribe-auth/AATvPGegZOtghAWSAzJFk-PHGWTMthCMks5uAl7VgaJpZM4UvHQD> >> . >> > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Itâs actually common software engineering practice to keep âbuggyâ semantics if a bug has been out there long enough and a lot of applications depend on the semantics. On Tue, Jun 26, 2018 at 9:18 AM Hyukjin Kwon wrote: > I don't think it's too discretionary. We have a safeguard to control the > behaviour. Spark mentions it in the migration guide. In case of changing > public interface which breaks binary or source compatibility, there should > really be strong argument, sure. For clarification, I don't think such > change is made usually. > > In this case, it changes a behaviour even with a safeguard. Sounds pretty > minor. I wonder why this is suddenly poped up. As I said, if the standards > don't reflect the practice, the standards should be corrected or the > practice should be complied. Committer's judgement is needed time to time. > We need more committers for the more proper review iteration. Let's back it > roll. > > If you prefer more conservative distribution, it should be an option to > consider using a maintenance release. > > we may choose to retain that buggy behavior > > I strongly disagree. We should fix the buggy behavior. There's no point of > having upper versions. > > If you strongly doubt it, please open a discussion in the mailing list and > see if we get agreed at some point. > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21598#issuecomment-400372980>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AATvPGegZOtghAWSAzJFk-PHGWTMthCMks5uAl7VgaJpZM4UvHQD> > . > --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Do we have other "legacy" configs that we haven't released and can change to match this prefix? It's pretty nice to have a single prefix for stuff like this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 This is not a "bug" and there is no "right" behavior in APIs. It's been defined as -1 since the very beginning (when was it added?), so we can't just change the default value in a feature release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21544: add one supported type missing from the javadoc
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21544 Thanks. Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: add one supported type missing from the javadoc
Repository: spark Updated Branches: refs/heads/master e4fee395e -> c7c0b086a add one supported type missing from the javadoc ## What changes were proposed in this pull request? The supported java.math.BigInteger type is not mentioned in the javadoc of Encoders.bean() ## How was this patch tested? only Javadoc fix Please review http://spark.apache.org/contributing.html before opening a pull request. Author: James Yu Closes #21544 from yuj/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7c0b086 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7c0b086 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c7c0b086 Branch: refs/heads/master Commit: c7c0b086a0b18424725433ade840d5121ac2b86e Parents: e4fee39 Author: James Yu Authored: Fri Jun 15 21:04:04 2018 -0700 Committer: Reynold Xin Committed: Fri Jun 15 21:04:04 2018 -0700 -- sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c7c0b086/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala index 0b95a88..b47ec0b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala @@ -132,7 +132,7 @@ object Encoders { * - primitive types: boolean, int, double, etc. * - boxed types: Boolean, Integer, Double, etc. * - String - * - java.math.BigDecimal + * - java.math.BigDecimal, java.math.BigInteger * - time related: java.sql.Date, java.sql.Timestamp * - collection types: only array and java.util.List currently, map support is in progress * - nested java bean. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 I'm confused by the description. What does this PR actually do? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21574 Does this move actually make sense? It'd destroy stats estimation for partition pruning. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21502 How does this solve the problem you described? If the container is gone, the process is gone and users can't destroy things anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19498 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 Thanks, Henry. In general I'm not a huge fan of adding something because hypothetically somebody might want it. Also if you want this to be compatible with Impala, wouldn't you want to name this the same way as Impala? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 @henryr 1.0/0.0 also returns null in Spark SQL ... ``` scala> sql("select cast(1.0 as double)/cast(0 as double)").show() +-+ |(CAST(1.0 AS DOUBLE) / CAST(0 AS DOUBLE))| +-+ | null| +-+ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org