[ https://issues.apache.org/jira/browse/PARQUET-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728039#comment-17728039 ]
Uma Maheswari commented on PARQUET-196: --------------------------------------- Can I know the difference between pretty and detailed size? Gives same results. Code also looks same. Is that something like size -pretty means only datablock size and size -detailed means completed parquet file size (header+datablocks+footer)? > parquet-tools command to get rowcount & size > -------------------------------------------- > > Key: PARQUET-196 > URL: https://issues.apache.org/jira/browse/PARQUET-196 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.6.0 > Reporter: Swapnil > Priority: Minor > Labels: features > Fix For: 1.10.0 > > Original Estimate: 10m > Remaining Estimate: 10m > > Parquet files contain metadata about rowcount & file size. We should have new > commands to get rows count & size. > These command can be added in parquet-tools: > 1. rowcount : This should add number of rows in all footers to give total > rows in data. > 2. size : This should give compresses size in bytes and human readable format > too. > These command helps us to avoid parsing job logs or loading data once again > to find number of rows in data. This comes very handy in complex processes, > stats generation, QA etc.. -- This message was sent by Atlassian Jira (v8.20.10#820010)