Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21482
How is this done in other databases? I don't think we want to invent new
ways on these basic primitives.
---
-
To unsubscri
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21482#discussion_r193230476
--- Diff: R/pkg/NAMESPACE ---
@@ -281,6 +281,8 @@ exportMethods("%<=>%",
"initcap",
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21448
I'd only move abs and nothing else.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comman
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21459
What's driving this (is it java 9)? I'm in general scared by core library
updates like this. Maybe Spark 3.0 is a good time (and we should just do it
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21453
Jenkins, add to whitelist.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21453
Jenkins, test this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21416
LGTM (I didn't look that carefully though)
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For addit
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21416#discussion_r191306678
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21416#discussion_r191306654
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest with
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
If we can fix it without breaking existing behavior that would be awesome.
On Fri, May 25, 2018 at 9:59 AM Bryan Cutler
wrote:
> I've been thinking about this and came to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
On the config part, I havenât looked at the code but canât we just
reorder
the columns on the JVM side? Why do we need to reorder them on the Python
side?
On Fri, May 25, 2018 at
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
I agree it should have started experimental. It is pretty weird to after
the fact mark something experimental though.
On Fri, May 25, 2018 at 12:23 AM Hyukjin Kwon
wrote
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
Why is it difficult?
On Fri, May 25, 2018 at 12:03 AM Hyukjin Kwon
wrote:
> but as I said it's difficult to have a configuration there. Shall we just
> target 3.0.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803873
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
name | Bob
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803855
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
name | Bob
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803772
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also
available, and may be useful
from JVM to
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r190803641
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also
available, and may be useful
from JVM to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
If this has been released you can't just change it like this; it will break
users' programs immediately. At the very least introduce a flag so it can be
set by the user to avoid breaking
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21242
Thanks Ryan. I'm not a fan of just exposing internal classes like this. The
APIs haven't really been designed or audited for the purpose of external
consumption. If we want to expose the int
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r189669772
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also
available, and may be useful
from JVM to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21370
Can we also do something a bit more generic that works for non-Jupyter
notebooks as well? For example, in IPython or just plain Python REPL
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21329
Why are we cleaning up stuff like this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21192
my point is that i don't consider a sequence of chars an array to begin
with. it is not natural to me.
I'd want an array if it is a different set of
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21192
eh I actually think separated makes it much simpler to look at, compared
with an array. Why complicate the API and require users to understand how to
specify an array (in all languages)?
One
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21318
It's still going to fail because I haven't updated it yet. Will do tomorrow.
---
-
To unsubscribe, e-mail: review
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21316#discussion_r188104204
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
@Experimental
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21318
Hm the failure doesn't look like it's caused by this PR. Do you guys know
what's going on?
---
-
To unsubscribe,
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21318
cc @gatorsmile @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/21318
[minor] Update docs for functions.scala to make it clear not all the
built-in functions are defined there
The title summarizes the change.
You can merge this pull request into a Git repository by
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21316#discussion_r187838099
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1607,7 +1607,9 @@ class Dataset[T] private[sql](
*/
@Experimental
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
Better compile time error. Plus a lot of people are already using these.
On Fri, May 11, 2018 at 7:35 PM Hyukjin Kwon
wrote:
> Yup, then why not just deprecate other functions
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
Adding it to sql would allow it to be available everywhere (through expr)
right?
On Fri, May 11, 2018 at 7:30 PM Hyukjin Kwon
wrote:
> Thing is, I am a bit confused when
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
Btw itâs been always the case that the less commonly used functions are
not
part of this file. There is just a lot of overhead to maintaining all of
them.
Iâm not even sure if the
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21054
There is not a single function that canât be called by expr. It mainly
adds
some type safety.
On Fri, May 11, 2018 at 7:18 PM Hyukjin Kwon
wrote:
> *@HyukjinKwon* commen
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21309
cc @gatorsmile @mgaido91
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/21309
[SPARK-23907] Removes regr_* functions in functions.scala
## What changes were proposed in this pull request?
This patch removes the various regr_* functions in functions.scala. They
are so
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21054#discussion_r187751801
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -775,6 +775,178 @@ object functions {
*/
def var_pop(columnName
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21121
@lokm01 wouldn't @ueshin's suggestion on adding a second parameter to
transform work for you? You can just do something similar to `transform(x,
(entry, index) -> struct(entry, inde
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21187#discussion_r185084802
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/PivotSuite.scala ---
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21169#discussion_r184596334
--- Diff: docs/sql-programming-guide.md ---
@@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20560
Just saw this - this seems like a somewhat awkward way to do it by just
matching on filter / project. Is the main thing lacking a way to do back
propagation for properties? (We can only do forward
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21071
@devaraj-kavali can you close this PR first?
Looks like there isn't any reason to really use htrace anymore ...
---
---
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19222
@kiszk do you have more data now?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19222
OK thanks please do that. Does TPC-DS even trigger 2 call sites? E.g.
ByteArrayMemoryBlock and OnHeapMemoryBlock. Even there it might introduce a
conditional branch after JIT that could lead to perf
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19222
Sorry this thread is too long for me to follow. I might be bringing up a
point that has been brought up before.
@kiszk did your perf tests take into account megamorphic callsites? It
seems to
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19881
Thanks @jcuquemelle
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21071
This probably deserves its own SPIP. Also unclear whether we should just
support htrace, or have an extension api so users can plug in whatever they
want
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21060
It looks to me this is a bug fix that can merit backporting, as
QueryExecutionListener is also marked as experimental,
In this case, I think @gatorsmile is worried one might have written a
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20992
What are the performance improvements? Without additional data this seems
like just an invasive change without any real benefits
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21031
If there is already size, why do we need to create a new implementation?
Why can't we just rewrite cardinality to size?
Also I wouldn't add any programming API for this, sinc
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21056#discussion_r181530121
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2128,38 +2128,60 @@ class JsonSuite extends
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21053#discussion_r181529978
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21053#discussion_r181529901
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20933#discussion_r181529318
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcDataSourceV2.scala
---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to
Repository: spark-website
Updated Branches:
refs/heads/asf-site 91b561749 -> 658467248
http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/strata-exercises-now-available-online.html
--
diff --git a/site
Update text/wording to more "modern" Spark and more consistent.
1. Use DataFrame examples.
2. Reduce explicit comparison with MapReduce, since the topic does not really
come up.
3. More focus on analytics rather than "cluster compute".
4. Update committer affiliation.
5. Make it more clear Sp
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19881
I thought about this more, and I actually think something like this makes
more sense: `executorAllocationRatio`. Basically it is just a ratio that
determines how aggressive we want Spark to request
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19881
SGTM on divisor.
Do we need "full" there in the config?
---
-
To unsubscribe, e-mail: review
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20045
Can we add them to the file based test suites instead?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19881
Maybe instead of "divisor", we just have a "rate" or "factor" that can be
floating point value, and use multiplication rather than division? This way
people can als
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20937
Seems fine to me ...
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20959
I'm good with having this option given the data @MaxGekk posted. (I haven't
reviewed the code - somebody else should do that before merging).
`val sampledSchema = spark.r
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19881
Can you wait another day? I just find the name pretty weird. Do we have
other configs that use the âdivisorâ suffix?
On Wed, Mar 28, 2018 at 7:23 AM Tom Graves wrote:
> I
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20877
We can also change both if they havenât been released yet.
On Sun, Mar 25, 2018 at 10:37 AM Maxim Gekk
wrote:
> @gatorsmile <https://github.com/gatorsmile> The PR
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20731
Yea we gotta be careful with adding commercial vendor logos here. It's part
of the complexity we need to navigate being hosted at the Apache Software
Foundation. The project needs to be very v
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20774#discussion_r175335072
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -479,6 +479,15 @@ object SQLConf {
.checkValues
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20774#discussion_r175334948
--- Diff:
sql/core/src/test/resources/sql-tests/inputs/predicate-functions.sql ---
@@ -39,3 +43,4 @@ select 2.0 <= '2.2';
selec
Squashed commit of the following:
commit 8e2dd71cf5613be6f019bb76b46226771422a40e
Merge: 8bd24fb6d 01f0b4e0c
Author: Reynold Xin
Date: Fri Mar 16 10:24:54 2018 -0700
Merge pull request #104 from mateiz/history
Add a project history page
commit 01f0b4e0c1fe77781850cf994058980664201bce
Repository: spark-website
Updated Branches:
refs/heads/asf-site 8bd24fb6d -> a1d84bcbf
http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2016-agenda-posted.html
--
diff --git a/site/
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20800
So the API looks useful, but I don't know if this is the right
implementation. How important is it to add this? It seems like the value is not
super high e
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20800#discussion_r174016939
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -511,6 +511,14 @@ class Dataset[T] private[sql](
*/
def isLocal
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20674
I personally wouldn't include this since it's a simple function users can
write ...
---
-
To unsubscribe, e-mai
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20706#discussion_r171666996
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -267,44 +264,20 @@ private[spark] object Utils extends Logging
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20567
A quick bit: fallback is a single word.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20490#discussion_r167137165
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
---
@@ -62,6 +62,16 @@
*/
DataWriterFactory
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20499
I'd fix this in 2.3, and 2.2.1 as well.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional com
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20535#discussion_r166701501
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceOptions.java
---
@@ -27,6 +27,39 @@
/**
* An immutable string-to-string
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20491
This should also go into branch-2.3.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/20491
[SQL] Minor doc update: Add an example in DataFrameReader.schema
## What changes were proposed in this pull request?
This patch adds a small example to the schema string definition of schema
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16793
Also the implementation doesn't match what was proposed in
https://issues.apache.org/jira/browse/SPARK-19454
Having null value as the default in a function called replace is too risky
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16793
Sorry I object this change. Why would we put null as the default replace
value, in a function called replace? That seems very counterintuitive and error
prone
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20219
But it is possible to generate NullType data right?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20152
cc @gatorsmile @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20072#discussion_r159573530
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -261,6 +261,17 @@ object SQLConf {
.booleanConf
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20076
Thanks for the PR. Why are we complicating the PR by doing the rename? Does
this actually gain anything other than minor cosmetic changes? It makes the
simple PR pretty long
our fork. Rest is documentation.
cc rxin mateiz (shepherd)
k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926
erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko
reviewers: vanzin felixcheung jiangxb1987 mridulm
TODO:
- [x] Add dockerfiles directory t
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19946
Merging in master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin closed the pull request at:
https://github.com/apache/spark/pull/19973
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19946#discussion_r158205893
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by
the
to be runnable, use `./dev/make
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/20014
Overall change lgtm.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20014#discussion_r157673852
--- Diff:
core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19973
@vanzin you got a min to submit a patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19946#discussion_r156821519
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by
the
to be runnable, use `./dev/make
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19973
That's what the "default" is, isn't it?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19973
The issue is in
```
/**
* Return the `string` value of Spark SQL configuration property for the
given key. If the key is
* not set yet, return `defaultValue
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/19973
[SPARK-22779] ConfigEntry's default value should actually be a value
## What changes were proposed in this pull request?
ConfigEntry's config value right now shows a human readable m
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19973
cc @vanzin @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19861#discussion_r155693977
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19861#discussion_r155693966
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ConfigSupport.scala
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/19905
cc @vanzin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
301 - 400 of 20485 matches
Mail list logo