e.g. select max value for column "foo":

from pyspark.sql.functions import max, col
df.select(max(col("foo"))).show()

On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> I am using pyspark 1.6.1 and python3.
>
>
> *Given:*
>
> idDF2 = idDF.select(idDF.id, idDF.col.id )
> idDF2.printSchema()
> idDF2.show()
>
> root
>  |-- id: string (nullable = true)
>  |-- col[id]: long (nullable = true)
>
> +----------+----------+
> |        id|   col[id]|
> +----------+----------+
> |1008930924| 534494917|
> |1008930924| 442237496|
> |1008930924|  98069752|
> |1008930924|2790311425|
> |1008930924|3300869821|
>
>
>
> *I have to do a lot of work to get the max value*
>
>
> rows = idDF2.select("col[id]").describe().collect()
> hack = [s for s in rows if s.summary == 'max']
> print(hack)
> print(hack[0].summary)
> print(type(hack[0]))
> print(hack[0].asDict()['col[id]'])
> maxStr = hack[0].asDict()['col[id]']
> ttt = int(maxStr)
> numDimensions = 1 + ttt
> print(numDimensions)
>
>
> Is there an easier way?
>
>
> Kind regards
>
>
> Andy
>
>


-- 
Regards,
Alexander

Reply via email to