[GitHub] spark pull request: [SPARK-8992] [SQL] Add pivot to dataframe api

aray Fri, 23 Oct 2015 19:58:27 -0700

Github user aray commented on the pull request:

    https://github.com/apache/spark/pull/7841#issuecomment-150745807
  
    @rxin, Not requiring the values would necessitate doing a separate query 
for the distinct values of the column before the pivot query. It looks like at 
least some DF operations (eg, drop) would need the result so even if we made 
Pivot.output lazy we would be running an unexpected job.
    
    If a user really didn't want to specify the values, they can explicitly do 
the query:
    
    ```scala
        df.groupBy("A", "B").pivot("C", 
df.select("C").distinct.collect.map(_.getString(0)): _*).sum("D")
    ```
    
    Needing to know the output columns of an operator for analysis/planning is 
probably why the other SQL implementations require the values also (technically 
Oracle supports omitting it but only in XML mode where you essentially just get 
one column).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8992] [SQL] Add pivot to dataframe api

Reply via email to