GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/7104
[SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number
of partitions
See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb
You can merge this pull request into a Git
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-127265753
@rxin it looks like Jenkins forgot about building this. Can you help
trigger the build again?
---
If your project is set up for it, you can reply to this email and have
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/7104#discussion_r33529689
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala ---
@@ -32,7 +32,7 @@ trait PartitionStrategy extends Serializable {
object
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/7841
[SPARK-8992] [SQL] Add pivot to dataframe api
This adds a pivot method to the dataframe api.
Following the lead of cube and rollup this adds a Pivot operator that is
translated
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-150620321
@rxin here is my summary of other frameworks API's
I'm going to use an example dataset form the pandas doc for all the
examples (as df)
|A|B|C|D
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-150745807
@rxin, Not requiring the values would necessitate doing a separate query
for the distinct values of the column before the pivot query. It looks like at
least some DF
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-150464038
@rxin and @JoshRosen, this is ready for review now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/7841#discussion_r44545811
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
---
@@ -385,6 +385,20 @@ case class Rollup
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-155985411
@rxin sure I'll put together a PR for the python API tonight
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/9653
[SPARK-11690][PYSPARK] Add pivot to python api
This PR adds pivot to the python api.
@rxin can you take a look?
You can merge this pull request into a Git repository by running:
$ git
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/9653#issuecomment-156481590
@rxin or @yhuai since you helped with the original pr
https://github.com/apache/spark/pull/7841 can you take a look?
---
If your project is set up for it, you can reply
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/7841#discussion_r44566886
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala ---
@@ -273,6 +280,60 @@ class GroupedData protected[sql](
def sum(colNames
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/7841#discussion_r44572982
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -248,6 +253,43 @@ class Analyzer
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-155871674
@yhuai RE your questions (3 was already addressed above):
>1. Should we always ask users to provide pivot values?
The argument for not requiring values I th
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-155916926
@yhuai I think this addresses everything we discussed, let me know if I
missed anything or if there is anything else I can do. Again, thanks for the
code review
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/9429
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/9429#issuecomment-157453036
I'm going to close this PR in favor of just fixing the current
implementation for now since it has recently become more optimized with support
for unsafe rows. Thanks
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/7841#issuecomment-155223109
@rxin Updated, the values are now optional.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/7841#discussion_r44352381
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -989,6 +989,41 @@ class DataFrame private[sql
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/9429
[SPARK-11275][SQL] Reimplement Expand as a Generator and fix existing
implementation bugs
This is an alternative to https://github.com/apache/spark/pull/9419
I got tired of fighting/fixing
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9429#discussion_r43810369
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -205,45 +205,30 @@ class Analyzer(
GroupingSets
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9429#discussion_r43811041
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -205,45 +205,30 @@ class Analyzer(
GroupingSets
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/9815
[SPARK-11275] [SQL] Incorrect results when using rollup/cube
Fixes bug with grouping sets (including cube/rollup) where aggregates that
included grouping expressions would return the wrong (null
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/9815#issuecomment-157862146
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/10202
[SPARK-12205][SQL] Pivot fails Analysis when aggregate is UnresolvedFunction
Delays application of ResolvePivot until all aggregates are resolved to
prevent problems with UnresolvedFunction and adds
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/10202#issuecomment-162975930
@yhuai can you take a look at this small patch to pivot?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/10206
[SPARK-12211][DOC][GRAPHX] Fix version number in graphx doc for migration
from 1.1
Migration from 1.1 section added to the GraphX doc in 1.2.0 (see
https://spark.apache.org/docs/1.2.0/graphx
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/10176
[SPARK-12184][Python] Make python api doc for pivot consistant with scala
doc
In SPARK-11946 the API for pivot was changed a bit and got updated doc, the
doc changes were not made for the python api
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/10176#issuecomment-162662044
@rxin or @yhuai can we get this doc change merged for 1.6?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/10218#discussion_r47166749
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -1271,10 +1271,11 @@ class DataFrame private[sql](
* @since 1.6.0
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/10218#discussion_r47232368
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -1271,10 +1271,11 @@ class DataFrame private[sql](
* @since 1.6.0
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9929#discussion_r45806267
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala ---
@@ -282,74 +282,96 @@ class GroupedData protected[sql
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/9815#issuecomment-157922956
@yhuai can you take a look at this pr?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/9815#issuecomment-157876634
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/9815#issuecomment-157950224
@yhuai I do think this is the minimal fix. However like I stated in the
summary we are simplifying instead of making more exceptions that might
themselves have bugs. Let
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9815#discussion_r45298904
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -211,45 +211,31 @@ class Analyzer(
GroupingSets
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9815#discussion_r45298806
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
---
@@ -323,6 +323,10 @@ trait GroupingAnalytics extends
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9815#discussion_r45300236
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -211,45 +211,31 @@ class Analyzer(
GroupingSets
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9815#discussion_r45346085
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -211,45 +211,31 @@ class Analyzer(
GroupingSets
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/9815#discussion_r45345199
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -60,6 +60,68 @@ class DataFrameAggregateSuite extends QueryTest
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11100#issuecomment-182891207
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/10677#discussion_r50137528
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -324,6 +324,51 @@ object functions extends LegacyFunctions {
*/
def
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11583#issuecomment-194961885
Here are some quick benchmark results on a ~1 million row dataset
![](http://i.imgur.com/sreUTO3.png)
---
If your project is set up for it, you can reply
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11583#issuecomment-200516430
@yhuai do you have time this week to look at this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11632#issuecomment-200515122
While this may help with join ambiguity. I think the more fundamental
problem is that a transformed DataFrame should not be giving the same column
references
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/11583
[SPARK-13749][SQL] Faster pivot implementation for many distinct values
with two phase aggregation
## What changes were proposed in this pull request?
The existing implementation of pivot
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11583#issuecomment-193933833
cc @rxin and @yhuai since you two were involved in the original version
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/12861
[SPARK-13749][SQL][FOLLOW-UP] Faster pivot implementation for many distinct
values with two phase aggregation
## What changes were proposed in this pull request?
This is a follow up PR
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11583#issuecomment-216292927
Sure, will do tonight.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11583#issuecomment-216274651
@yhuai can we get this merged for 2.0?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/11583#discussion_r60099683
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PivotFirst.scala
---
@@ -0,0 +1,141 @@
+/*
+ * Licensed
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/11583#discussion_r60099379
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -309,38 +309,64 @@ class Analyzer(
object
Github user aray commented on the pull request:
https://github.com/apache/spark/pull/11583#issuecomment-211553866
@yhuai I've addressed all your comments, ready for you to take another
look. Sorry for the delay.
---
If your project is set up for it, you can reply to this email
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97168170
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97162464
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97168311
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/15415#discussion_r97166816
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
@rxin can you take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/16539
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/15111
[SPARK-17458][SQL] Alias specified for aggregates in a pivot are not honored
## What changes were proposed in this pull request?
This change preserves aliases that are given for pivot
Github user aray commented on the issue:
https://github.com/apache/spark/pull/15898
The code that is being changed originated 2 years ago with the addition of
Hive 0.13 support by @zhzhan, see
https://github.com/apache/spark/commit/7c89a8f0c81ecf91dba34c1f44393f45845d438c#diff
Github user aray commented on the issue:
https://github.com/apache/spark/pull/15898
@tejasapatil yes that is the use case where this applies. It's only tested
against whatever version is included in the hadoop2.7+hive build configuration
listed above. Is there anything in particular
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/15898
[SPARK-18457][SQL] ORC and other columnar formats using HiveShim read all
columns when doing a simple count
## What changes were proposed in this pull request?
When reading zero columns
Github user aray closed the pull request at:
https://github.com/apache/spark/pull/16197
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16161
I would be happy to create a seperate PR for adding support for
`mutable.Map` (and `List`) if that is wanted. But there is no _generic_
solution as there is no type that is assignable to both
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
ping @srowen @dbtsai @rxin @ankurdave @jegonzal
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
Yes the improvement is from the sum of magnitudes of initial values being
closer to the (known) sum of the solution. Fiddling with resetProb controls a
completely different thing. The current
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16161
The approach is to change the deserializer (via
`ScalaReflection#deserializerFor`) to return the more specific type
`scala.collections.immutable.Map` instead of `scala.collections.Map` as it does
now
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
**References**
[Pagerank paper](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
> We need to make an initial assignment of the ranks. This assignment can
be made by one of several strateg
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16240#discussion_r92546082
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala
---
@@ -100,31 +100,76 @@ abstract class SQLImplicits {
// Seqs
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16271#discussion_r92621591
--- Diff:
graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala ---
@@ -70,10 +70,10 @@ class PageRankSuite extends SparkFunSuite
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16271
[SPARK-18845][GraphX] PageRank has incorrect initialization value that
leads to slow convergence
## What changes were proposed in this pull request?
Change the initial value in all PageRank
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16271
Updated the above benchmark code with a log normal random graph on 10,000
vertices the difference is much more drastic.
![](http://i.imgur.com/Zo56dEO.png)
(take the very bottom of the graph
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16539
[SPARK-8855][MLlib][PySpark] Python API for Association Rules
## What changes were proposed in this pull request?
This patch adds a `generateAssociationRules(confidence)` method
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16559
It can already be done with the `posexplode` UDTF like
```
with t as (values (array(1,2,3)), (array(4,5,6)) as (a))
select col from t lateral view posexplode(a) tt where pos = 2
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16555
The title should say 2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16577
[SPARK-19214][SQL] Typed aggregate count output field name should be "count"
## What changes were proposed in this pull request?
Changes the output field name of typed aggreg
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16161
[SPARK-18717][SQL] Make code generation for Scala Map work with
immutable.Map also
## What changes were proposed in this pull request?
Fixes compile errors in generated code when user has
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16121
@davies, @zero323, and @holdenk this is in a good place for review if you
want to take a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16121
[SPARK-16589][PYTHON] Chained cartesian produces incorrect number of records
## What changes were proposed in this pull request?
Fixes a bug in the python implementation of rdd cartesian
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16121
@davies I was trying to make minimal changes to `PairDeserializer`, but you
are right it needs changed also. I'll update the PR shortly.
---
If your project is set up for it, you can reply
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
ping @srowen @ankurdave can you take a look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16483
[SPARK-18847][GraphX] PageRank gives incorrect results for graphs with sinks
## What changes were proposed in this pull request?
Graphs with sinks (vertices with no outgoing edges) don't have
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16177
[SPARK-17760][SQL] AnalysisException with dataframe pivot when groupBy
column is not attribute
## What changes were proposed in this pull request?
Fixes AnalysisException for pivot queries
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/16197
[SPARK-17760][SQL][Backport] AnalysisException with dataframe pivot when
groupBy column is not attribute
## What changes were proposed in this pull request?
Backport of #16177 to branch-2.0
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16161
Right now it's not supported to have the following:
```
case class Foo(a: Map[Int, Int])
```
(using the scala Predef version of Map)
The
[documented](http://spark.apache.org
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17348
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
@rxin can anyone else review this? It would be nice to get this correctness
fix into 2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106546448
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala
---
@@ -322,13 +335,12 @@ object PageRank extends Logging {
def
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/16483#discussion_r106548090
--- Diff:
graphx/src/test/scala/org/apache/spark/graphx/lib/PageRankSuite.scala ---
@@ -68,26 +69,34 @@ class PageRankSuite extends SparkFunSuite
Github user aray commented on the issue:
https://github.com/apache/spark/pull/16483
@thunterdb The extra step -- as implemented -- is only at the end as that
gives the same result as doing it after every iteration but without the extra
overhead.
---
If your project is set up
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
@HyukjinKwon we're not introducing a regression in this PR by fixing the
NPE, the answer given by 1.6 was incorrect under any interpenetration. Again,
there is a completely separate issue of what
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
@HyukjinKwon There is an inconsistency/regression but its not being
introduced in this PR, its already there. Take an example without null as a
pivot column value like below. The only difference
GitHub user aray opened a pull request:
https://github.com/apache/spark/pull/17226
[SPARK-19882][SQL] Pivot with null as a distinct pivot value throws NPE
## What changes were proposed in this pull request?
Allows null values of the pivot column to be included in the pivot
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105322758
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
@HyukjinKwon As stated in 17226#discussion_r105322758 I think we should
open a second JIRA to have the discussion on whether or not count(1) of no
values in a pivot should be filled with 0's
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105324124
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -522,7 +522,7 @@ class Analyzer(
} else
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
BTW for 3 above if we decide it should be 0, we can add an initial value
for `PivotFirst` to make the fix.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
There are three things going on here in your one example.
1. Spark 1.6 [first version with pivot] (and Spark 2.0+ with an aggregate
output type unsupported by PivotFirst) gives incorrect
Github user aray commented on a diff in the pull request:
https://github.com/apache/spark/pull/18697#discussion_r130396904
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -65,6 +65,10 @@ abstract class SparkPlan extends QueryPlan[SparkPlan
1 - 100 of 156 matches
Mail list logo