bjornjorgensen commented on PR #40913:
URL: https://github.com/apache/spark/pull/40913#issuecomment-1519049541

   For me it seams like we can just add `show_counts` to this function. We 
already have this max row to calculate on.   
   
   Or we can implement something like this..
   
   ```
   from collections import Counter
   from pyspark.sql.functions import col, count, when
   
   def spark_info(df):
       # Print basic DataFrame information
       print(f"<class '{df.__class__.__module__}.{df.__class__.__name__}'>")
       print(f"Number of rows: {df.count()}")
       print(f"Number of columns: {len(df.columns)}")
   
       # Print column header for the detailed DataFrame information
       print("\nColumn" + " " * 110 + "Non-Null Count" + " " + "Dtype")
       print("-" * 6, " " * 108, "-" * 14, "-" * 5)
   
       # Calculate non-null counts for each column
       non_null_counts = df.agg(*[count(when(col(f"`{c}`").isNotNull(), 
f"`{c}`")).alias(c) for c in df.columns]).collect()[0]
   
       # Initialize a counter to store data type counts
       dtype_counter = Counter()
   
       # Iterate through the schema fields and print detailed column information
       for i, field in enumerate(df.schema.fields):
           non_null_count = non_null_counts[field.name]
           dtype = field.dataType.simpleString()
           print(f"{field.name:<90} {non_null_count:>30} non-null {dtype}")
   
           # Update the data type counter
           dtype_counter[dtype] += 1
   
       # Print data type summary
       dtypes_summary = ", ".join([f"{dtype}({count})" for dtype, count in 
dtype_counter.items()])
       print(f"\ndtypes: {dtypes_summary}")
    ```
   
   
![image](https://user-images.githubusercontent.com/47577197/233838325-b1b7b5ef-b358-4c41-a20c-f841f3484d2c.png)
   (...)
   
   
![image](https://user-images.githubusercontent.com/47577197/233838368-5599bfe9-2a05-44d6-b583-cd2bbb444127.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to