e.g. select max value for column "foo": from pyspark.sql.functions import max, col df.select(max(col("foo"))).show()
On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am using pyspark 1.6.1 and python3. > > > *Given:* > > idDF2 = idDF.select(idDF.id, idDF.col.id ) > idDF2.printSchema() > idDF2.show() > > root > |-- id: string (nullable = true) > |-- col[id]: long (nullable = true) > > +----------+----------+ > | id| col[id]| > +----------+----------+ > |1008930924| 534494917| > |1008930924| 442237496| > |1008930924| 98069752| > |1008930924|2790311425| > |1008930924|3300869821| > > > > *I have to do a lot of work to get the max value* > > > rows = idDF2.select("col[id]").describe().collect() > hack = [s for s in rows if s.summary == 'max'] > print(hack) > print(hack[0].summary) > print(type(hack[0])) > print(hack[0].asDict()['col[id]']) > maxStr = hack[0].asDict()['col[id]'] > ttt = int(maxStr) > numDimensions = 1 + ttt > print(numDimensions) > > > Is there an easier way? > > > Kind regards > > > Andy > > -- Regards, Alexander