looking for an easy to to find the max value of a column in a data frame

Andy Davidson Mon, 28 Mar 2016 17:16:36 -0700

I am using pyspark 1.6.1 and python3.


Given:

idDF2 = idDF.select(idDF.id, idDF.col.id )
idDF2.printSchema()
idDF2.show()
root
 |-- id: string (nullable = true)
 |-- col[id]: long (nullable = true)

+----------+----------+
|        id|   col[id]|
+----------+----------+
|1008930924| 534494917|
|1008930924| 442237496|
|1008930924|  98069752|
|1008930924|2790311425|
|1008930924|3300869821|


I have to do a lot of work to get the max value

rows = idDF2.select("col[id]").describe().collect()
hack = [s for s in rows if s.summary == 'max']
print(hack)
print(hack[0].summary)
print(type(hack[0]))
print(hack[0].asDict()['col[id]'])
maxStr = hack[0].asDict()['col[id]']
ttt = int(maxStr)
numDimensions = 1 + ttt
print(numDimensions)

Is there an easier way?

Kind regards

Andy

looking for an easy to to find the max value of a column in a data frame

Reply via email to