You can even use the fact that pyspark has dynamic properties

rows = idDF2.select(max("col[id]").alias("max")).collect()
firstRow = rows[0]
max = firstRow.max

On Tue, Mar 29, 2016 at 7:14 PM, Alexander Krasnukhin <the.malk...@gmail.com
> wrote:

> You should be able to index columns directly either by index or column
> name i.e.
>
> from pyspark.sql.functions import max
>
> rows = idDF2.select(max("col[id]")).collect()
> firstRow = rows[0]
>
> # by index
> max = firstRow[0]
>
> # by column name
> max = firstRow["max(col[id])"]
>
> On Tue, Mar 29, 2016 at 6:58 PM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> Hi Alexander
>>
>> Many thanks. I think the key was I needed to import that max function.
>> Turns out you do not need to use col
>> Df.select(max(“foo”)).show()
>>
>> To get the actual value of max you still need to write more code than I
>> would expect. I wonder if there is a easier way to work with Rows?
>>
>> In [19]:
>>
>> from pyspark.sql.functions import max
>>
>> maxRow = idDF2.select(max("col[id]")).collect()
>>
>> max = maxRow[0].asDict()['max(col[id])']
>>
>> max
>>
>> Out[19]:
>>
>> 713912692155621376
>>
>>
>> From: Alexander Krasnukhin <the.malk...@gmail.com>
>> Date: Monday, March 28, 2016 at 5:55 PM
>> To: Andrew Davidson <a...@santacruzintegration.com>
>> Cc: "user @spark" <user@spark.apache.org>
>> Subject: Re: looking for an easy to to find the max value of a column in
>> a data frame
>>
>> e.g. select max value for column "foo":
>>
>> from pyspark.sql.functions import max, col
>> df.select(max(col("foo"))).show()
>>
>> On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <
>> a...@santacruzintegration.com> wrote:
>>
>>> I am using pyspark 1.6.1 and python3.
>>>
>>>
>>> *Given:*
>>>
>>> idDF2 = idDF.select(idDF.id, idDF.col.id )
>>> idDF2.printSchema()
>>> idDF2.show()
>>>
>>> root
>>>  |-- id: string (nullable = true)
>>>  |-- col[id]: long (nullable = true)
>>>
>>> +----------+----------+
>>> |        id|   col[id]|
>>> +----------+----------+
>>> |1008930924| 534494917|
>>> |1008930924| 442237496|
>>> |1008930924|  98069752|
>>> |1008930924|2790311425|
>>> |1008930924|3300869821|
>>>
>>>
>>>
>>> *I have to do a lot of work to get the max value*
>>>
>>>
>>> rows = idDF2.select("col[id]").describe().collect()
>>> hack = [s for s in rows if s.summary == 'max']
>>> print(hack)
>>> print(hack[0].summary)
>>> print(type(hack[0]))
>>> print(hack[0].asDict()['col[id]'])
>>> maxStr = hack[0].asDict()['col[id]']
>>> ttt = int(maxStr)
>>> numDimensions = 1 + ttt
>>> print(numDimensions)
>>>
>>>
>>> Is there an easier way?
>>>
>>>
>>> Kind regards
>>>
>>>
>>> Andy
>>>
>>>
>>
>>
>> --
>> Regards,
>> Alexander
>>
>>
>
>
> --
> Regards,
> Alexander
>



-- 
Regards,
Alexander

Reply via email to