ala commented on PR #37228:
URL: https://github.com/apache/spark/pull/37228#issuecomment-1213402734

   @sadikovi The cost of reading the row_index column is in the same ballpark 
as the other metadata columns:
   
   ```
   [info] Vectorized Parquet:                       Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] no metadata columns                                 332            
370          15         15.1          66.3       1.0X
   [info] _metadata.file_path                                 436            
491          33         11.5          87.1       0.8X
   [info] _metadata.file_name                                 440            
479          20         11.4          88.0       0.8X
   [info] _metadata.file_size                                 377            
420          24         13.3          75.4       0.9X
   [info] _metadata.file_modification_time                    391            
420          19         12.8          78.1       0.8X
   [info] _metadata.row_index                                 434            
489          27         11.5          86.7       0.8X
   [info] _metadata                                           676            
766          34          7.4         135.2       0.5X
   
   [info] Parquet-mr:                               Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] no metadata columns                                1250           
1447          78          4.0         250.0       1.0X
   [info] _metadata.file_path                                1688           
1898         116          3.0         337.6       0.7X
   [info] _metadata.file_name                                1678           
1867          87          3.0         335.6       0.7X
   [info] _metadata.file_size                                1518           
1711          79          3.3         303.6       0.8X
   [info] _metadata.file_modification_time                   1596           
1701          60          3.1         319.3       0.8X
   [info] _metadata.row_index                                1526           
1725          79          3.3         305.3       0.8X
   [info] _metadata                                          2268           
2578         134          2.2         453.5       0.6X
   ```
   And these numbers are in the same ballpark as for vanilla `master` branch:
   ```
   [info] Vectorized Parquet:                       Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] no metadata columns                                 346            
411          31         14.5          69.1       1.0X
   [info] _metadata.file_path                                 452            
524          49         11.1          90.5       0.8X
   [info] _metadata.file_name                                 446            
489          24         11.2          89.2       0.8X
   [info] _metadata.file_size                                 389            
436          38         12.9          77.8       0.9X
   [info] _metadata.file_modification_time                    387            
421          19         12.9          77.4       0.9X
   [info] _metadata                                           592            
672          30          8.4         118.4       0.6X
   
   [info] Parquet-mr:                               Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] no metadata columns                                1209           
1351          73          4.1         241.8       1.0X
   [info] _metadata.file_path                                1595           
1807         112          3.1         318.9       0.8X
   [info] _metadata.file_name                                1592           
1777         100          3.1         318.3       0.8X
   [info] _metadata.file_size                                1493           
1692         102          3.3         298.7       0.8X
   [info] _metadata.file_modification_time                   1507           
1688          87          3.3         301.5       0.8X
   [info] _metadata                                          1998           
2238         107          2.5         399.6       0.6X
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to