I am using pyspark 1.6.1 and python3.
Given: idDF2 = idDF.select(idDF.id, idDF.col.id ) idDF2.printSchema() idDF2.show() root |-- id: string (nullable = true) |-- col[id]: long (nullable = true) +----------+----------+ | id| col[id]| +----------+----------+ |1008930924| 534494917| |1008930924| 442237496| |1008930924| 98069752| |1008930924|2790311425| |1008930924|3300869821| I have to do a lot of work to get the max value rows = idDF2.select("col[id]").describe().collect() hack = [s for s in rows if s.summary == 'max'] print(hack) print(hack[0].summary) print(type(hack[0])) print(hack[0].asDict()['col[id]']) maxStr = hack[0].asDict()['col[id]'] ttt = int(maxStr) numDimensions = 1 + ttt print(numDimensions) Is there an easier way? Kind regards Andy