RE: Equivalent of Redshift ListAgg function in Spark (Pyspak)

2017-10-09 Thread Mahesh Sawaiker
c_tax_percentage|
+-+-+-+---+-+---+-++++++-+++-+---++--+---++--+--+---+---+-++--+-+-+-+
|1| BAAA|   1998-01-01|   null| 
null|2450997| NY Metro|   large|2325| 
1374075|8AM-12AM|Keith Cunningham|4|Matters may hear ...|New, cold 
plants ...|   Dante Cook|  3| pri| 4|   
 ese| 995|  Park 3rd|   Dr.|  Suite 470|Five 
Points|   Ziebach County|  SD| 56098|United States| -6.0|   
  0.02|
|2| CAAA|   1998-01-01| 2000-12-31| 
null|2450876| Mid Atlantic|   large|4208|  837392| 8AM-4PM| 
   Stephen Clem|3|Classes devote la...|Free germans prov...|Christopher 
Perez|  6|   cally| 3|pri| 245| 
 Johnson |Circle|  Suite 200|   Fairview|Williamson County| 
 TN| 35709|United States| -5.0| 0.03|
|3| CAAA|   2001-01-01|   null| 
null|2450876| Mid Atlantic|   small|3251|  837392| 8AM-4PM| 
William Johnson|3|Classes devote la...|Ridiculous requir...|Derrick 
Burke|  6|   cally| 3|pri| 245| 
 Johnson |Circle|  Suite 200|   Fairview|Williamson County| 
 TN| 35709|United States| -5.0| 0.03|
|4| EAAA|   1998-01-01| 2000-01-01| 
null|2450872|North Midwest|   large|2596|  708708| 8AM-4PM| 
  Lamont Greene|3|Events must find ...|Great rates must ...|  
Marvin Dean|  2|able| 2|   able|
 927|  Oak Main|ST|  Suite 150|Five Points|Williamson 
County|  TN| 36098|United States| -5.0| 0.03|
|5| EAAA|   2000-01-02| 2001-12-31| 
null|2450872|North Midwest|  medium|2596|  708708|8AM-12AM| 
  Lamont Greene|3|Events must find ...|So fresh supplies...| Matthew 
Williams|  2|able| 1|   able| 
927|  Oak Main|ST|  Suite 150|Five Points|Williamson 
County|  TN| 36098|United States| -5.0|  0.0|
|6| EAAA|   2002-01-01|   null| 
null|2450872|North Midwest|   small|2596|  708708| 8AM-4PM| 
  Emilio Romano|6|As well novel sen...|Sophisticated cit...|  William 
Johnson|  5|anti| 1|   able| 
927|  Oak Main|ST|  Suite 150|Five Points|Williamson 
County|  TN| 36098|United States| -5.0| 0.07|
+-+-+-+---+-+---+-++++++-+++-+---++--+---++--+--+---+---+-++--+-+-+-+

From: Somasundaram Sekar [mailto:somasundar.se...@tigeranalytics.com]
Sent: Sunday, October 08, 2017 5:30 PM
To: user@spark.apache.org
Subject: Equivalent of Redshift ListAgg function in Spark (Pyspak)

Hi,

I want to concat multiple columns into a single column after grouping the  
DataFrame,

I want an functional equivalent of Redshift ListAgg function

pg_catalog.Listagg(column, '|')
 within GROUP( ORDER BY column) AS
name

LISTAGG Function
: For each group in a query, the LISTAGG aggregate function orders the rows for 
that group according to the ORDER BY expression, then concatenates the values 
into a single string.
DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Equivalent of Redshift ListAgg function in Spark (Pyspak)

2017-10-08 Thread Somasundaram Sekar
Hi,



I want to concat multiple columns into a single column after grouping the
 DataFrame,



I want an functional equivalent of Redshift ListAgg function



pg_catalog.Listagg(column, '|')

 within GROUP( ORDER BY column) AS

name


LISTAGG Function

: For each group in a query, the LISTAGG aggregate function orders the rows
for that group according to the ORDER BY expression, then concatenates the
values into a single string.