I am using pyspark 1.6.1 and python3.
Given:
idDF2 = idDF.select(idDF.id, idDF.col.id )
idDF2.printSchema()
idDF2.show()
root
|-- id: string (nullable = true)
|-- col[id]: long (nullable = true)
+----------+----------+
| id| col[id]|
+----------+----------+
|1008930924| 534494917|
|1008930924| 442237496|
|1008930924| 98069752|
|1008930924|2790311425|
|1008930924|3300869821|
I have to do a lot of work to get the max value
rows = idDF2.select("col[id]").describe().collect()
hack = [s for s in rows if s.summary == 'max']
print(hack)
print(hack[0].summary)
print(type(hack[0]))
print(hack[0].asDict()['col[id]'])
maxStr = hack[0].asDict()['col[id]']
ttt = int(maxStr)
numDimensions = 1 + ttt
print(numDimensions)
Is there an easier way?
Kind regards
Andy