Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21758#discussion_r205249547
--- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala
---
@@ -27,7 +27,8 @@ import org.apache.spark.{Partition, TaskContext}
private
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21758#discussion_r205249449
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21758#discussion_r205249225
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21758#discussion_r205249297
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21875
Can you add JDBC to the title?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21866#discussion_r204961291
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21867#discussion_r204959300
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -731,7 +731,14 @@ private[spark] class BlockManager
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204957474
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -751,7 +751,8 @@ object TypeCoercion
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21822
I changed the way we do the checks in test to use a thread local rather
than checking the stacktrace, so they should run faster now. Also added test
cases for the various new methods. Also moved the
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204955869
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -787,6 +782,7 @@ class Analyzer(
right
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21845
If that's the only one I think that PR itself needs to be fixed
(significantly increases test runtime), and I wouldn't increase the time
here.
On Mon, Jul 23, 2018 a
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21822
Yea the extra check in test cases might've contributed to the longer test
time. Let me think about how to reduce it.
On Mon, Jul 23, 2018 at 11:28 PM Hyukjin Kwon
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21845
Are more pull requests failing due to time out right now?
On Mon, Jul 23, 2018 at 6:30 PM Hyukjin Kwon
wrote:
> @rxin <https://github.com/rxin>, btw you want me close th
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21758#discussion_r204504127
--- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala ---
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21826
No we can't because you can still use string concat in filters, e.g.
colA || colB == "ab"
Wh
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21845
This helps, but it is not sustainable to keep increasing the threshold.
What we need to do is to look at test time distribution and figure out what
test suites are unnecessarily long and actually cut
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21822
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21822
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21802
Do we really need full codegen for all of these collection functions? They
seem pretty slow and specialization with full codegen won't help perf that much
(and might even hurt by blowing up the
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21826
cc @gatorsmile @cloud-fan @HyukjinKwon this is a good thing to do?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21826
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21822
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204163484
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
---
@@ -33,6 +49,116 @@ abstract class LogicalPlan
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204163424
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
---
@@ -23,8 +23,24 @@ import
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204163328
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2390,16 +2375,21 @@ class Analyzer(
* scoping
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204160853
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r204160150
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -787,6 +782,7 @@ class Analyzer(
right
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21803
Should we do schema.toDDL, or StructType.toDDL(schema)?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21822#discussion_r203918981
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18784
Let's remove it in 3.0 then. We can do it after 2.4 release.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apach
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21742#discussion_r203496489
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21766
Why did you need this change? Given it's very difficult to revert the
change (or introduce a proper numeric type if ever needed in the future), I
would not merge this pull request unless ther
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21568
To me it is actually confusing to have the decimal one in there at all, by
defining a list of queries that are reused for different functional
testing. It is very easy to just ignore the subtle
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21568
What are the use cases other than decimal? I am not sure if we need to
build a lot of infrastructure just for one or two use cases
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21568
If they produce different results why do you need any infrastructure for
them? They are just part of the normal test flow.
If they produce the same result, and you don't want to defin
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21568
Can you just define a config matrix in the beginning of the file, and each
file is run with the config matrix?
---
-
To
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21568
I think it's super confusing to have the config names encoded in file
names. Makes the names super long and difficult to read, and also hard to
verify what was set, and difficult to get mul
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21705#discussion_r199940775
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
---
@@ -66,6 +66,12 @@ object StaticSQLConf {
.checkValue
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21686
Thanks. Awesome. This matches what I had in mind then.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21459
SGTM.
On Mon, Jul 2, 2018 at 4:38 PM DB Tsai wrote:
> There are three approvals from the committers, and the changes are pretty
> trivial to revert if we see any perfo
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21686
Does this actually work in SQL? How does it work when we don't have a data
type that's a schema?
---
-
To unsubscri
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21626
It is on the public list: https://issues.apache.org/jira/browse/SPARK-24642
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21598#discussion_r198364343
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1324,6 +1324,12 @@ object SQLConf {
"Other column v
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21482
OK I double checked. I don't think we should be adding this functionality,
since different databases implemented it differently, and it is somewhat
difficult to create Infinity in Spark SQL giv
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21482
Hey I have an additional thought on this. Will leave it in the next ten
mins.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21598
Here: https://en.wikipedia.org/wiki/Bug_compatibility
On Tue, Jun 26, 2018 at 9:28 AM Reynold Xin wrote:
> Itâs actually common software engineering practice to keep âbug
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21598
Itâs actually common software engineering practice to keep âbuggyâ
semantics if a bug has been out there long enough and a lot of applications
depend on the semantics.
On Tue, Jun
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21598
Do we have other "legacy" configs that we haven't released and can change
to match this prefix? It's pretty nice to have a single prefix
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21598
This is not a "bug" and there is no "right" behavior in APIs. It's been
defined as -1 since the very beginning (when was it added?), so we can't just
change the defa
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21544
Thanks. Merging in master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21568
I'm confused by the description. What does this PR actually do?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apach
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21574
Does this move actually make sense? It'd destroy stats estimation for
partition pruning.
---
-
To unsubscribe, e-mail: re
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21502
How does this solve the problem you described? If the container is gone,
the process is gone and users can't destroy things an
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19498
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21482
Thanks, Henry. In general I'm not a huge fan of adding something because
hypothetically somebody might want it. Also if you want this to be compatible
with Impala, wouldn't you want to nam
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21482
@henryr 1.0/0.0 also returns null in Spark SQL ...
```
scala> sql("select cast(1.0 as double)/cast(0 as double)").show()
+-+
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21482
How is this done in other databases? I don't think we want to invent new
ways on these basic primitives.
---
-
To unsubscri
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r193230476
--- Diff: R/pkg/NAMESPACE ---
@@ -281,6 +281,8 @@ exportMethods("%<=>%",
"initcap",
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21448
I'd only move abs and nothing else.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comman
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21459
What's driving this (is it java 9)? I'm in general scared by core library
updates like this. Maybe Spark 3.0 is a good time (and we should just do it
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21453
Jenkins, add to whitelist.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21453
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21416
LGTM (I didn't look that carefully though)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For addit
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21416#discussion_r191306678
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21416#discussion_r191306654
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
If we can fix it without breaking existing behavior that would be awesome.
On Fri, May 25, 2018 at 9:59 AM Bryan Cutler
wrote:
> I've been thinking about this and came to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
On the config part, I havenât looked at the code but canât we just
reorder
the columns on the JVM side? Why do we need to reorder them on the Python
side?
On Fri, May 25, 2018 at
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
I agree it should have started experimental. It is pretty weird to after
the fact mark something experimental though.
On Fri, May 25, 2018 at 12:23 AM Hyukjin Kwon
wrote
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
Why is it difficult?
On Fri, May 25, 2018 at 12:03 AM Hyukjin Kwon
wrote:
> but as I said it's difficult to have a configuration there. Shall we just
> target 3.0.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803873
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
name | Bob
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803855
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
name | Bob
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803772
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also
available, and may be useful
from JVM to
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803641
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also
available, and may be useful
from JVM to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
If this has been released you can't just change it like this; it will break
users' programs immediately. At the very least introduce a flag so it can be
set by the user to avoid breaking
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21242
Thanks Ryan. I'm not a fan of just exposing internal classes like this. The
APIs haven't really been designed or audited for the purpose of external
consumption. If we want to expose the int
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r189669772
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also
available, and may be useful
from JVM to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21370
Can we also do something a bit more generic that works for non-Jupyter
notebooks as well? For example, in IPython or just plain Python REPL
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21329
Why are we cleaning up stuff like this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21192
my point is that i don't consider a sequence of chars an array to begin
with. it is not natural to me.
I'd want an array if it is a different set of
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21192
eh I actually think separated makes it much simpler to look at, compared
with an array. Why complicate the API and require users to understand how to
specify an array (in all languages)?
One
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21318
It's still going to fail because I haven't updated it yet. Will do tomorrow.
---
-
To unsubscribe, e-mail: review
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21316#discussion_r188104204
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
@Experimental
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21318
Hm the failure doesn't look like it's caused by this PR. Do you guys know
what's going on?
---
-
To unsubscribe,
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21318
cc @gatorsmile @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/21318
[minor] Update docs for functions.scala to make it clear not all the
built-in functions are defined there
The title summarizes the change.
You can merge this pull request into a Git repository by
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21316#discussion_r187838099
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
@Experimental
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
Better compile time error. Plus a lot of people are already using these.
On Fri, May 11, 2018 at 7:35 PM Hyukjin Kwon
wrote:
> Yup, then why not just deprecate other functions
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
Adding it to sql would allow it to be available everywhere (through expr)
right?
On Fri, May 11, 2018 at 7:30 PM Hyukjin Kwon
wrote:
> Thing is, I am a bit confused when
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
Btw itâs been always the case that the less commonly used functions are
not
part of this file. There is just a lot of overhead to maintaining all of
them.
Iâm not even sure if the
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21054
There is not a single function that canât be called by expr. It mainly
adds
some type safety.
On Fri, May 11, 2018 at 7:18 PM Hyukjin Kwon
wrote:
> *@HyukjinKwon* commen
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
cc @gatorsmile @mgaido91
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/21309
[SPARK-23907] Removes regr_* functions in functions.scala
## What changes were proposed in this pull request?
This patch removes the various regr_* functions in functions.scala. They
are so
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21054#discussion_r187751801
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -775,6 +775,178 @@ object functions {
*/
def var_pop(columnName
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21121
@lokm01 wouldn't @ueshin's suggestion on adding a second parameter to
transform work for you? You can just do something similar to `transform(x,
(entry, index) -> struct(entry, inde
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21187#discussion_r185084802
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/PivotSuite.scala ---
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21169#discussion_r184596334
--- Diff: docs/sql-programming-guide.md ---
@@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20560
Just saw this - this seems like a somewhat awkward way to do it by just
matching on filter / project. Is the main thing lacking a way to do back
propagation for properties? (We can only do forward
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21071
@devaraj-kavali can you close this PR first?
Looks like there isn't any reason to really use htrace anymore ...
---
---
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19222
@kiszk do you have more data now?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19222
OK thanks please do that. Does TPC-DS even trigger 2 call sites? E.g.
ByteArrayMemoryBlock and OnHeapMemoryBlock. Even there it might introduce a
conditional branch after JIT that could lead to perf
201 - 300 of 15085 matches
Mail list logo