[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/7
  
Please remove the 0 semantics. IMO the zero vs negative number difference 
is too subtle. I only find Java String supporting that. Python doesn't. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214135400
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,58 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around matches of the given regex.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * regex - a string representing a regular expression. The regex 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the regex is applied.
+
+limit > 0: The resulting array's length will not be more than 
`limit`, and the resulting
+   array's last entry will contain all input beyond the 
last matched regex.
+
+limit < 0: `regex` will be applied as many times as possible, and 
the resulting
+   array can be of any size.
+
+limit = 0: `regex` will be applied as many times as possible, the 
resulting array can
--- End diff --

yea but i'd focus on what behavior we want to enable. do other database 
systems have this split=0 semantics? if not, i'd rewrite split=0 internally to 
just -1.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214131195
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -229,33 +229,58 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
 
 
 /**
- * Splits str around pat (pattern is a regular expression).
+ * Splits str around matches of the given regex.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`" +
+" and returns an array of at most `limit`",
+  arguments = """
+Arguments:
+  * str - a string expression to split.
+  * regex - a string representing a regular expression. The regex 
string should be a
+Java regular expression.
+  * limit - an integer expression which controls the number of times 
the regex is applied.
+
+limit > 0: The resulting array's length will not be more than 
`limit`, and the resulting
+   array's last entry will contain all input beyond the 
last matched regex.
+
+limit < 0: `regex` will be applied as many times as possible, and 
the resulting
+   array can be of any size.
+
+limit = 0: `regex` will be applied as many times as possible, the 
resulting array can
--- End diff --

why do we want to build into split to remove trailing empty strings?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22010
  
Thanks for pinging. Please don't merge this until you've addressed the OOM 
issue. The aggregators were created to handle incoming data larger than size of 
memory. We should never use a Scala or Java hash set to put all the data in.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r214103667
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag](
* Return a new RDD containing the distinct elements in this RDD.
*/
   def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): 
RDD[T] = withScope {
-map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
+// If the data is already approriately partitioned with a known 
partitioner we can work locally.
+def removeDuplicatesInPartition(itr: Iterator[T]): Iterator[T] = {
+  val set = new mutable.HashSet[T]()
+  itr.filter(set.add(_))
--- End diff --

This is a bad implementation and could OOM. You should reuse reduceByKey.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r214103223
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag](
* Return a new RDD containing the distinct elements in this RDD.
*/
   def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): 
RDD[T] = withScope {
-map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
+// If the data is already approriately partitioned with a known 
partitioner we can work locally.
+def removeDuplicatesInPartition(itr: Iterator[T]): Iterator[T] = {
+  val set = new mutable.HashSet[T]()
+  itr.filter(set.add(_))
--- End diff --

This is a bad implementation and could OOM. You should reuse reduceByKey.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22258: [SPARK-25266] Fix memory leak vulnerability in Barrier E...

2018-08-28 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22258
  
Can you remove vulnerability from the title? Otherwise it sounds like a 
security vulnerability here.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213100428
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
 //   since the other rules might make two separate Unions operators 
adjacent.
 Batch("Union", Once,
   CombineUnions) ::
+// run this once earlier. this might simplify the plan and reduce cost 
of optimizer
--- End diff --

can you put your comment above into the comment in code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-08-27 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18447
  
Yea I'd probably reject this for now, until we see bigger needs for it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22162#discussion_r213026874
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -815,6 +815,24 @@ class Dataset[T] private[sql](
 println(showString(numRows, truncate, vertical))
   // scalastyle:on println
 
+  /**
+   * Returns the default number of rows to show when the show function is 
called without
+   * a user specified max number of rows.
+   * @since 2.3.0
+   */
+  private def numberOfRowsToShow(): Int = {
--- End diff --

we shouldn't be adding methods here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...

2018-08-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r212815703
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
  * Splits str around pat (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`." +
+"The `limit` parameter controls the number of times the pattern is 
applied and " +
--- End diff --

you should say if limit is ignored if it is a non-positive number.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...

2018-08-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r212815685
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 ---
@@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
  * Splits str around pat (pattern is a regular expression).
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match 
`regex`.",
+  usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences 
that match `regex`." +
+"The `limit` parameter controls the number of times the pattern is 
applied and " +
--- End diff --

can we be more concise? e.g. presto's doc is just

"Splits string on delimiter and returns an array of size at most limit. The 
last element in the array always contain everything left in the string. limit 
must be a positive number."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...

2018-08-22 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22185
  
cc @rdblue @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDo...

2018-08-22 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/22185

[SPARK-25127] DataSourceV2: Remove SupportsPushDownCatalystFilters

## What changes were proposed in this pull request?
They depend on internal Expression APIs. Let's see how far we can get 
without it.

## How was this patch tested?
Just some code removal. There's no existing tests as far as I can tell so 
it's easy to remove.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-25127

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22185


commit 3c4115d26cce3c0bc8b297da68cf752a0f666768
Author: Reynold Xin 
Date:   2018-08-22T16:14:30Z

[SPARK-25127] DataSourceV2: Remove SupportsPushDownCatalystFilters




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16600: [SPARK-19242][SQL] SHOW CREATE TABLE should generate new...

2018-08-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16600
  
Can you close this pr?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22065
  
Are we talking about a 0.7% margin improvement? It doesn't seem like it's 
worth the complexity.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files

2018-08-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22157
  
Do we have a similar issue for Parquet?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22160: Revert "[SPARK-24418][BUILD] Upgrade Scala to 2.11.12 an...

2018-08-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22160
  
Can you add to the pr description why we are reverting? Just copy paste 
what you had above. Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22134: [SPARK-25143][SQL] Support data source name mapping conf...

2018-08-17 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22134
  
I think it's premature to introduce this. The extra layer of abstraction 
actually makes it more difficult to reason about what's going on. We don't have 
that many data sources that require flexibility here, and we can always add the 
flexibility if needed later.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21944: [SPARK-24988][SQL]Add a castBySchema method which casts ...

2018-08-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21944
  
Thanks, Mahmoud!



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21951
  
LGTM.

On Thu, Aug 2, 2018 at 1:14 AM Xiao Li  wrote:

> This will simplify the code and improve the readability. We can do the
> same in the other expression.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21951#issuecomment-409807975>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPIEkrQyLFUXLOl66_94V2f6MY7Obks5uMoqrgaJpZM4VrcXh>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21951
  
Why would we want to use the DSL here? Do we use it in other expressions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21938: [SPARK-24982][SQL] UDAF resolution should not thr...

2018-07-31 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/21938

[SPARK-24982][SQL] UDAF resolution should not throw AssertionError

## What changes were proposed in this pull request?
When user calls anUDAF with the wrong number of arguments, Spark previously 
throws an AssertionError, which is not supposed to be a user-facing exception.  
This patch updates it to throw AnalysisException instead.

## How was this patch tested?
Updated test case udaf.sql.out.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-24982

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21938


commit 84262dc21dd9f9aa409dd5e873d31d5b26a231f3
Author: Reynold Xin 
Date:   2018-07-31T21:00:20Z

[SPARK-24982][SQL] UDAF resolution should not throw java.lang.AssertionError




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21934#discussion_r206681479
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/table-valued-functions.sql.out ---
@@ -83,8 +83,13 @@ select * from range(1, null)
 -- !query 6 schema
 struct<>
 -- !query 6 output
-java.lang.IllegalArgumentException
-Invalid arguments for resolved function: 1, null
+org.apache.spark.sql.AnalysisException
+error: table-valued function range with alternatives:
--- End diff --

probably could go either way


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21934
  
Jenkins, retest this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21934
  
cc @gatorsmile @ericl who originally wrote this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/21934

[SPARK-24951][SQL] Table valued functions should throw AnalysisException

## What changes were proposed in this pull request?
Previously TVF resolution could throw IllegalArgumentException if the data 
type is null type. This patch replaces that exception with AnalysisException, 
enriched with positional information, to improve error message reporting and to 
be more consistent with rest of Spark SQL.

## How was this patch tested?
Updated the test case in table-valued-functions.sql.out, which is how I 
identified this problem in the first place.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-24951

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21934.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21934


commit 08a6b7fc1d2a1a6149c4da2ca1657258347fd18f
Author: Reynold Xin 
Date:   2018-07-31T17:47:55Z

[SPARK-24951][SQL] Table valued functions should throw AnalysisException

commit 514fd77501194e43e8029734e4a3669f12fbf749
Author: Reynold Xin 
Date:   2018-07-31T17:53:21Z

Improve error message




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp

2018-07-31 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21932
  
Do we really need this? It's almost always the case for resolution that 
you'd want to do bottom up, so I thought Michael's original design to just call 
it resolveOperators make a lot of sense.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-07-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21923
  
Are there more specific use cases? I always feel it'd be impossible to 
design APIs without seeing couple different use cases.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21922: [WIP] Add an ANSI SQL parser mode

2018-07-30 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21922
  
what are the actual changes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-07-27 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21318
  
LGTM.

On Fri, Jul 27, 2018 at 10:58 PM Hyukjin Kwon 
wrote:

> @rxin <https://github.com/rxin> re: #21318 (comment)
> <https://github.com/apache/spark/pull/21318#discussion_r20582> I
> meant to say for instance, like "Please refer the SQL function
> documentation in the corresponding version". We don't have to bother 
update
> and also it makes sense ..
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21318#issuecomment-408585310>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPO0G9cnPbJEyopz69kHMROglqL_Iks5uK_2TgaJpZM4T9LE->
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...

2018-07-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21318#discussion_r20582
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -39,7 +39,21 @@ import org.apache.spark.util.Utils
 
 
 /**
- * Functions available for DataFrame operations.
+ * Commonly used functions available for DataFrame operations. Using 
functions defined here provides
+ * a little bit more compile-time safety to make sure the function exists.
+ *
+ * Spark also includes more built-in functions that are less common and 
are not defined here.
+ * You can still access them (and all the functions defined here) using 
the [[functions.expr()]] API
+ * and calling them through a SQL expression string. You can find the 
entire list of functions for
+ * the latest version of Spark at 
[[https://spark.apache.org/docs/latest/api/sql/index.html]].
--- End diff --

it's just a lot of work and i'm sure we will forget to update ... so i'm 
pointing to the latest.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21897: [minor] Improve documentation for HiveStringType's

2018-07-27 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21897
  
cc @gatorsmile 

cc @hvanhovell why did we expose these types as public Scala APIs? I feel 
they should not have been public. If they are public, we should have more 
generic VarcharType and CharType instead.

Something to fix in 3.0?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21897: [minor] Improve documentation for HiveStringType'...

2018-07-27 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/21897

[minor] Improve documentation for HiveStringType's

The diff should be self-explanatory. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark hivestringtypedoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21897.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21897


commit 8b954a48dab904f192c60872159d9c876fa2ee20
Author: Reynold Xin 
Date:   2018-07-27T18:08:19Z

[minor] Improve documentation for HiveStringType's




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21706: [SPARK-24702] Fix Unable to cast to calendar inte...

2018-07-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21706#discussion_r205851385
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/cast.sql ---
@@ -42,4 +42,38 @@ SELECT CAST('9223372036854775808' AS long);
 
 DESC FUNCTION boolean;
 DESC FUNCTION EXTENDED boolean;
+
+-- cast null to calendar interval should return null
+SELECT CAST(NULL as calendarinterval);
+SELECT CALENDARINTERVAL(NULL);
+
+-- cast invalid strings to calendar interval should return null
+SELECT CAST('interval 10' as calendarinterval);
+SELECT CAST('interval 100 nanoseconds' as calendarinterval);
+SELECT CAST('interval 1 second 10 years -10 months 1 minute' as 
calendarinterval);
+SELECT CAST('interval 60 hours + 1 minute' as calendarinterval);
+SELECT CAST('interval 1 day +5 minutes' as calendarinterval);
+
+-- cast valid strings to calendar interval should return calendar interval
+SELECT CAST('interval 5 minutes' as calendarinterval);
+SELECT CAST('interval 10 hours' as calendarinterval);
+SELECT CAST('interval 1 second' as calendarinterval);
+SELECT CAST('interval 3 years -3 month 7 week 123 microseconds' as 
calendarinterval);
+SELECT CAST('interval 100 years 15 months -24 weeks 66 seconds' as 
calendarinterval);
+
+-- casting invalid strings to calendar interval using the function should 
return null
--- End diff --

i wouldn't add this many test cases, since they are just simple aliases. 
jsut add one for calendarinterval and keep the cast ones.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendum

2018-07-27 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21896
  
cc @gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendu...

2018-07-27 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/21896

[SPARK-24865][SQL] Remove AnalysisBarrier addendum

## What changes were proposed in this pull request?
I didn't want to pollute the diff in the previous PR and left some TODOs. 
This is a follow-up to address those TODOs.

## How was this patch tested?
Should be covered by existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-24865-addendum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21896.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21896


commit 5751b027918ee5b70d22327120077908b3497900
Author: Reynold Xin 
Date:   2018-07-27T17:33:07Z

[SPARK-24865][SQL] Remove AnalysisBarrier addendum




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-07-27 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21318
  
Yup will do.


On Fri, Jul 27, 2018 at 10:23 AM Sean Owen  wrote:

> Just browsing old PRs .. want to finish this one up @rxin
> <https://github.com/rxin>? looks simple and useful.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21318#issuecomment-408484731>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPEOWETw8YHHvABHQ7-oTiWlsBnGtks5uK0yagaJpZM4T9LE->
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21699
  
I'm OK with it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21873: [SPARK-24919][BUILD] New linter rule for sparkCon...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21873#discussion_r205252848
  
--- Diff: scalastyle-config.xml ---
@@ -150,6 +150,19 @@ This file is divided into 3 sections:
   // scalastyle:on println]]>
   
 
+  
+spark(.sqlContext)?.sparkContext.hadoopConfiguration
--- End diff --

can we just match on hadoopConfiguration ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21758
  
What's the failure mode if there are not enough slots for the barrier mode? 
We should throw an exception right?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205250930
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ActiveJob.scala ---
@@ -60,4 +60,10 @@ private[spark] class ActiveJob(
   val finished = Array.fill[Boolean](numPartitions)(false)
 
   var numFinished = 0
+
+  // Mark all the partitions of the stage to be not finished.
--- End diff --

use `/** */` style. also the sentence is a bit awkward. perhaps

"Resets the status of all partitions in this stage so they are marked as 
not finished."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205250352
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1839,6 +1847,20 @@ abstract class RDD[T: ClassTag](
   def toJavaRDD() : JavaRDD[T] = {
 new JavaRDD(this)(elementClassTag)
   }
+
+  /**
+   * Whether the RDD is in a barrier stage. Spark must launch all the 
tasks at the same time for a
+   * barrier stage.
+   *
+   * An RDD is in a barrier stage, if at least one of its parent RDD(s), 
or itself, are mapped from
+   * a RDDBarrier. This function always returns false for a 
[[ShuffledRDD]], since a
+   * [[ShuffledRDD]] indicates start of a new stage.
+   */
+  private[spark] def isBarrier(): Boolean = isBarrier_
+
+  // From performance concern, cache the value to avoid repeatedly compute 
`isBarrier()` on a long
+  // RDD chain.
+  @transient protected lazy val isBarrier_ : Boolean = 
dependencies.exists(_.rdd.isBarrier())
--- End diff --

you need to explain why mappartitionsrdd has a different isBarrier 
implementation.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205249547
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala 
---
@@ -27,7 +27,8 @@ import org.apache.spark.{Partition, TaskContext}
 private[spark] class MapPartitionsRDD[U: ClassTag, T: ClassTag](
 var prev: RDD[T],
 f: (TaskContext, Int, Iterator[T]) => Iterator[U],  // (TaskContext, 
partition index, iterator)
-preservesPartitioning: Boolean = false)
+preservesPartitioning: Boolean = false,
+isFromBarrier: Boolean = false)
--- End diff --

we should explain what this flag does in classdoc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205249449
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+/** A [[TaskContext]] with extra info and tooling for a barrier stage. */
+trait BarrierTaskContext extends TaskContext {
+
+  /**
+   * :: Experimental ::
+   * Sets a global barrier and waits until all tasks in this stage hit 
this barrier. Similar to
+   * MPI_Barrier function in MPI, the barrier() function call blocks until 
all tasks in the same
+   * stage have reached this routine.
+   */
+  @Experimental
+  @Since("2.4.0")
+  def barrier(): Unit
+
+  /**
+   * :: Experimental ::
+   * Returns the all task infos in this barrier stage, the task infos are 
ordered by partitionId.
+   */
+  @Experimental
+  @Since("2.4.0")
+  def getTaskInfos(): Array[BarrierTaskInfo]
--- End diff --

what other things do you expect to be included in the future in 
BarrierTaskInfo? It seems overkill to have a new class for a single field 
(address).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205249225
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+
+/**
+ * :: Experimental ::
+ * Carries all task infos of a barrier task.
+ *
+ * @param address the IPv4 address(host:port) of the executor that a 
barrier task is running on
+ */
+@Experimental
+@Since("2.4.0")
+class BarrierTaskInfo(val address: String)
--- End diff --

Can we just bake address into TaskInfo?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205249297
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+
+/**
+ * :: Experimental ::
+ * Carries all task infos of a barrier task.
+ *
+ * @param address the IPv4 address(host:port) of the executor that a 
barrier task is running on
+ */
+@Experimental
+@Since("2.4.0")
+class BarrierTaskInfo(val address: String)
--- End diff --

or is TaskIinfo not a public API?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21875: [SPARK-24288][SQL] Enable preventing predicate pushdown

2018-07-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21875
  
Can you add JDBC to the title?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21866#discussion_r204961291
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends FileFormat 
with DataSourceRegister {
   spark: SparkSession,
   options: Map[String, String],
   files: Seq[FileStatus]): Option[StructType] = {
-val conf = spark.sparkContext.hadoopConfiguration
+val conf = spark.sessionState.newHadoopConf()
--- End diff --

can we add a linter rule for this?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21867#discussion_r204959300
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -731,7 +731,14 @@ private[spark] class BlockManager(
   }
 
   if (data != null) {
-return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize))
+// SPARK-24307 undocumented "escape-hatch" in case there are any 
issues in converting to
+// to ChunkedByteBuffer, to go back to old code-path.  Can be 
removed post Spark 2.4 if
+// new path is stable.
+if (conf.getBoolean("spark.fetchToNioBuffer", false)) {
--- End diff --

can we have a better prefix, rather than just spark. ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204957474
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -751,7 +751,8 @@ object TypeCoercion {
*/
   case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
 
-override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = 
plan transform { case p =>
+override protected def coerceTypes(
+  plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p 
=>
--- End diff --

im using a weird wrapping here to minimize the diff.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21822
  
I changed the way we do the checks in test to use a thread local rather 
than checking the stacktrace, so they should run faster now. Also added test 
cases for the various new methods. Also moved the relevant code into 
AnalysisHelper for better code structure.

This should be ready now if tests pass.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204955869
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -787,6 +782,7 @@ class Analyzer(
   right
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
+  // TODO(rxin): Why do we need transformUp here?
--- End diff --

@cloud-fan  ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21845
  
If that's the only one I think that PR itself needs to be fixed
(significantly increases test runtime), and I wouldn't increase the time
here.


On Mon, Jul 23, 2018 at 11:44 PM Hyukjin Kwon 
wrote:

> Yup, looks so in your PR #21822 (comment)
> <https://github.com/apache/spark/pull/21822#issuecomment-407298841>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21845#issuecomment-407298942>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPLusPAUegZcctlDTG2lWAJO1pjlDks5uJsJagaJpZM4VaY0E>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21822
  
Yea the extra check in test cases might've contributed to the longer test
time. Let me think about how to reduce it.


On Mon, Jul 23, 2018 at 11:28 PM Hyukjin Kwon 
wrote:

    > @rxin <https://github.com/rxin>, I think this PR could possibly cause
> some performance effect given the latest test ran above
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93470/
> and from a rough scan of other builds:
>
> ArithmeticExpressionSuite:
> - SPARK-22499: Least and greatest should not generate codes beyond 64KB 
(11 minutes, 51 seconds)
>
> CastSuite:
> - cast string to timestamp (8 minutes, 42 seconds)
>
> TPCDSQuerySuite:
> - q14a-v2.7 (2 minutes, 29 seconds)
>
> SQLQueryTestSuite:
> - subquery/in-subquery/in-joins.sql (2 minutes, 36 seconds)
>
> ContinuousStressSuite:
> - only one epoch (3 minutes, 21 seconds)
> - automatic epoch advancement (3 minutes, 21 seconds)
>
> vs
> 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/4699/consoleFull
>
>
> ArithmeticExpressionSuite:
> - SPARK-22499: Least and greatest should not generate codes beyond 64KB 
(7 minutes, 49 seconds)
>
> CastSuite:
> - cast string to timestamp (1 minute)
>
> TPCDSQuerySuite:
> - q14a-v2.7 (3 seconds, 442 milliseconds)
>
> SQLQueryTestSuite:
> - subquery/in-subquery/in-joins.sql (2 minutes, 21 seconds)
>
> ContinuousStressSuite:
> - only one epoch (3 minutes, 21 seconds)
> - automatic epoch advancement (3 minutes, 21 seconds)
>
> There could of course other factors like machine's status as well but I
> thought it's good to note while I am taking a look for those.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21822#issuecomment-407295739>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPMFWaX8huI8hkWBghiP740jtu592ks5uJr6fgaJpZM4VXRDF>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21845
  
Are more pull requests failing due to time out right now?

On Mon, Jul 23, 2018 at 6:30 PM Hyukjin Kwon 
wrote:

> @rxin <https://github.com/rxin>, btw you want me close this one or get
> this in? Will take a look for the build and tests thing again during this
> week for sure anyway.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21845#issuecomment-407250594>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPAz8b9kFd3V6puP3zYfyw-GSv2BGks5uJnimgaJpZM4VaY0E>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r204504127
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{Experimental, Since}
+
+
+/**
+ * :: Experimental ::
+ * Carries all task infos of a barrier task.
+ *
+ * @param address the IPv4 address(host:port) of the executor that a 
barrier task is running on
+ */
+@Experimental
+@Since("2.4.0")
+class BarrierTaskInfo(val address: String)
--- End diff --

does this need to be a public API?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21826
  
No we can't because you can still use string concat in filters, e.g.

colA || colB == "ab"

What is "||" here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21845
  
This helps, but it is not sustainable to keep increasing the threshold. 
What we need to do is to look at test time distribution and figure out what 
test suites are unnecessarily long and actually cut down the time there. 
@HyukjinKwon  Would you be interested in doing that?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-22 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21822
  
Jenkins, retest this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21822
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21802
  
Do we really need full codegen for all of these collection functions? They 
seem pretty slow and specialization with full codegen won't help perf that much 
(and might even hurt by blowing up the code size) right?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21826
  
cc @gatorsmile @cloud-fan @HyukjinKwon this is a good thing to do?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21826
  
Jenkins, test this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21822
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163484
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -33,6 +49,116 @@ abstract class LogicalPlan
   with QueryPlanConstraints
   with Logging {
 
+  private var _analyzed: Boolean = false
+
+  /**
+   * Marks this plan as already analyzed. This should only be called by 
[[CheckAnalysis]].
+   */
+  private[catalyst] def setAnalyzed(): Unit = { _analyzed = true }
+
+  /**
+   * Returns true if this node and its children have already been gone 
through analysis and
+   * verification.  Note that this is only an optimization used to avoid 
analyzing trees that
+   * have already been analyzed, and can be reset by transformations.
+   */
+  def analyzed: Boolean = _analyzed
+
+  /**
+   * Returns a copy of this node where `rule` has been recursively applied 
first to all of its
+   * children and then itself (post-order, bottom-up). When `rule` does 
not apply to a given node,
+   * it is left unchanged.  This function is similar to `transformUp`, but 
skips sub-trees that
+   * have already been marked as analyzed.
+   *
+   * @param rule the function use to transform this nodes children
+   */
+  def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): 
LogicalPlan = {
--- End diff --

todo: add unit tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163424
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -23,8 +23,24 @@ import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.QueryPlan
 import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin
+import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, TreeNode}
 import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.Utils
+
+
+object LogicalPlan {
+
+  private val resolveOperatorDepth = new ThreadLocal[Int] {
--- End diff --

todo: explain what this is


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2390,16 +2375,21 @@ class Analyzer(
  * scoping information for attributes and can be removed once analysis is 
complete.
  */
 object EliminateSubqueryAliases extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-case SubqueryAlias(_, child) => child
+  // This is actually called in the beginning of the optimization phase, 
and as a result
+  // is using transformUp rather than resolveOperators. This is also often 
called in the
+  //
--- End diff --

note: finish comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204160853
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

Thanks. I'm going to add it back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204160150
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -787,6 +782,7 @@ class Analyzer(
   right
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
+  // TODO(rxin): Why do we need transformUp here?
--- End diff --

cc @cloud-fan why do we need transformUp here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21803
  
Should we do schema.toDDL, or StructType.toDDL(schema)?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r203918981
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

cc @dilipbiswal @cloud-fan @gatorsmile 

Why did we need BooleanSimplification here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-07-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18784
  
Let's remove it in 3.0 then. We can do it after 2.4 release.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21742#discussion_r203496489
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+package object avro {
+  /**
+   * Adds a method, `avro`, to DataFrameWriter that allows you to write 
avro files using
+   * the DataFileWriter
+   */
+  implicit class AvroDataFrameWriter[T](writer: DataFrameWriter[T]) {
+def avro: String => Unit = writer.format("avro").save
--- End diff --

I'd remove the Scala API.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric

2018-07-16 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21766
  
Why did you need this change? Given it's very difficult to revert the 
change (or introduce a proper numeric type if ever needed in the future), I 
would not merge this pull request unless there are sufficient justification.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21568
  
To me it is actually confusing to have the decimal one in there at all, by
defining a list of queries that are reused for different functional
testing. It is very easy to just ignore the subtle differences.

We are also risk over engineering this with only one use case.

On Tue, Jul 10, 2018 at 8:20 AM Wenchen Fan 
wrote:

> We can deal with the decimal test file specially if that's the only use
> case. For now I'd say the join test is more important and let's finish it
> first.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21568#issuecomment-403860889>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPMjJPsZhXrOo_pbuxz-GwvKdds9lks5uFMYkgaJpZM4UoVQo>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21568
  
What are the use cases other than decimal? I am not sure if we need to 
build a lot of infrastructure just for one or two use cases.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21568
  
If they produce different results why do you need any infrastructure for 
them? They are just part of the normal test flow.

If they produce the same result, and you don't want to define the same test 
queries twice, we can create an infra for that. I thought that's what this is 
about?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21568
  
Can you just define a config matrix in the beginning of the file, and each 
file is run with the config matrix?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21568
  
I think it's super confusing to have the config names encoded in file 
names. Makes the names super long and difficult to read, and also hard to 
verify what was set, and difficult to get multiple configs.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...

2018-07-03 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21705#discussion_r199940775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala 
---
@@ -66,6 +66,12 @@ object StaticSQLConf {
   .checkValue(cacheSize => cacheSize >= 0, "The maximum size of the 
cache must not be negative")
   .createWithDefault(1000)
 
+  val CODEGEN_CACHE_SIZE = buildStaticConf("spark.sql.codegen.cacheSize")
+  .doc("The maximum cache size for generated classes.")
--- End diff --

spark.sql.codegen.cache.maxEntries?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21686
  
Thanks. Awesome. This matches what I had in mind then.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-07-02 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21459
  
SGTM.

On Mon, Jul 2, 2018 at 4:38 PM DB Tsai  wrote:

> There are three approvals from the committers, and the changes are pretty
> trivial to revert if we see any performance regression which is unlikely.
> To move thing forward, if there is no further objection, I'll merge it
> tomorrow. Thanks.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21459#issuecomment-401968866>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPGuKGHBfHvFrtH9xZ79XWTJbrGSXks5uCq7hgaJpZM4USoCT>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21686
  
Does this actually work in SQL? How does it work when we don't have a data 
type that's a schema?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21626: [SPARK-24642][SQL] New function infers schema for JSON c...

2018-06-28 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21626
  
It is on the public list: https://issues.apache.org/jira/browse/SPARK-24642


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...

2018-06-26 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21598#discussion_r198364343
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1324,6 +1324,12 @@ object SQLConf {
   "Other column values can be ignored during parsing even if they are 
malformed.")
 .booleanConf
 .createWithDefault(true)
+
+  val LEGACY_SIZE_OF_NULL = buildConf("spark.sql.legacy.sizeOfNull")
+.doc("If it is set to true, size of null returns -1. This behavior was 
inherited from Hive. " +
+  "The size function returns null for null input if the flag is 
disabled.")
--- End diff --

perhaps you should say this will be updated to false in spark 3.0?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21482
  
OK I double checked. I don't think we should be adding this functionality, 
since different databases implemented it differently, and it is somewhat 
difficult to create Infinity in Spark SQL given we return null or nan.

On top of that, we already support equality for infinity, e.g.
```
spark.range(1).select(
  org.apache.spark.sql.functions.lit(java.lang.Double.POSITIVE_INFINITY) 
===org.apache.spark.sql.functions.lit(java.lang.Double.POSITIVE_INFINITY)).show()
```

The above shows true.


If you start adding inf, we'd need to soon add functions for negative 
infinity, and these functions provide very little value beyond what we already 
support using equality.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21482
  
Hey I have an additional thought on this. Will leave it in the next ten 
mins.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21598
  
Here: https://en.wikipedia.org/wiki/Bug_compatibility


On Tue, Jun 26, 2018 at 9:28 AM Reynold Xin  wrote:

> It’s actually common software engineering practice to keep “buggy”
> semantics if a bug has been out there long enough and a lot of 
applications
> depend on the semantics.
>
> On Tue, Jun 26, 2018 at 9:18 AM Hyukjin Kwon 
> wrote:
>
>> I don't think it's too discretionary. We have a safeguard to control the
>> behaviour. Spark mentions it in the migration guide. In case of changing
>> public interface which breaks binary or source compatibility, there 
should
>> really be strong argument, sure. For clarification, I don't think such
>> change is made usually.
>>
>> In this case, it changes a behaviour even with a safeguard. Sounds pretty
>> minor. I wonder why this is suddenly poped up. As I said, if the 
standards
>> don't reflect the practice, the standards should be corrected or the
>> practice should be complied. Committer's judgement is needed time to 
time.
>> We need more committers for the more proper review iteration. Let's back 
it
>> roll.
>>
>> If you prefer more conservative distribution, it should be an option to
>> consider using a maintenance release.
>>
>> we may choose to retain that buggy behavior
>>
>> I strongly disagree. We should fix the buggy behavior. There's no point
>> of having upper versions.
>>
>> If you strongly doubt it, please open a discussion in the mailing list
>> and see if we get agreed at some point.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/apache/spark/pull/21598#issuecomment-400372980>, or 
mute
>> the thread
>> 
<https://github.com/notifications/unsubscribe-auth/AATvPGegZOtghAWSAzJFk-PHGWTMthCMks5uAl7VgaJpZM4UvHQD>
>> .
>>
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21598
  
It’s actually common software engineering practice to keep “buggy”
semantics if a bug has been out there long enough and a lot of applications
depend on the semantics.

On Tue, Jun 26, 2018 at 9:18 AM Hyukjin Kwon 
wrote:

> I don't think it's too discretionary. We have a safeguard to control the
> behaviour. Spark mentions it in the migration guide. In case of changing
> public interface which breaks binary or source compatibility, there should
> really be strong argument, sure. For clarification, I don't think such
> change is made usually.
>
> In this case, it changes a behaviour even with a safeguard. Sounds pretty
> minor. I wonder why this is suddenly poped up. As I said, if the standards
> don't reflect the practice, the standards should be corrected or the
> practice should be complied. Committer's judgement is needed time to time.
> We need more committers for the more proper review iteration. Let's back 
it
> roll.
>
> If you prefer more conservative distribution, it should be an option to
> consider using a maintenance release.
>
> we may choose to retain that buggy behavior
>
> I strongly disagree. We should fix the buggy behavior. There's no point of
> having upper versions.
>
> If you strongly doubt it, please open a discussion in the mailing list and
> see if we get agreed at some point.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21598#issuecomment-400372980>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AATvPGegZOtghAWSAzJFk-PHGWTMthCMks5uAl7VgaJpZM4UvHQD>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21598
  
Do we have other "legacy" configs that we haven't released and can change 
to match this prefix? It's pretty nice to have a single prefix for stuff like 
this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21598
  
This is not a "bug" and there is no "right" behavior in APIs. It's been 
defined as -1 since the very beginning (when was it added?), so we can't just 
change the default value in a feature release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21544: add one supported type missing from the javadoc

2018-06-15 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21544
  
Thanks. Merging in master.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: add one supported type missing from the javadoc

2018-06-15 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master e4fee395e -> c7c0b086a


add one supported type missing from the javadoc

## What changes were proposed in this pull request?

The supported java.math.BigInteger type is not mentioned in the javadoc of 
Encoders.bean()

## How was this patch tested?

only Javadoc fix

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: James Yu 

Closes #21544 from yuj/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7c0b086
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7c0b086
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c7c0b086

Branch: refs/heads/master
Commit: c7c0b086a0b18424725433ade840d5121ac2b86e
Parents: e4fee39
Author: James Yu 
Authored: Fri Jun 15 21:04:04 2018 -0700
Committer: Reynold Xin 
Committed: Fri Jun 15 21:04:04 2018 -0700

--
 sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c7c0b086/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
--
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
index 0b95a88..b47ec0b 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala
@@ -132,7 +132,7 @@ object Encoders {
*  - primitive types: boolean, int, double, etc.
*  - boxed types: Boolean, Integer, Double, etc.
*  - String
-   *  - java.math.BigDecimal
+   *  - java.math.BigDecimal, java.math.BigInteger
*  - time related: java.sql.Date, java.sql.Timestamp
*  - collection types: only array and java.util.List currently, map support 
is in progress
*  - nested java bean.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-06-15 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21568
  
I'm confused by the description. What does this PR actually do?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...

2018-06-15 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21574
  
Does this move actually make sense? It'd destroy stats estimation for 
partition pruning.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21502
  
How does this solve the problem you described? If the container is gone, 
the process is gone and users can't destroy things anymore.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-06-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/19498
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21482
  
Thanks, Henry. In general I'm not a huge fan of adding something because 
hypothetically somebody might want it. Also if you want this to be compatible 
with Impala, wouldn't you want to name this the same way as Impala?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-06 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21482
  
@henryr 1.0/0.0 also returns null in Spark SQL ...

```
scala> sql("select cast(1.0 as double)/cast(0 as double)").show()
+-+
|(CAST(1.0 AS DOUBLE) / CAST(0 AS DOUBLE))|
+-+
| null|
+-+
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >