[GitHub] spark pull request: [SPARK-7743] [SQL] Parquet 1.7

JoshRosen Thu, 04 Jun 2015 15:06:03 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/6597#issuecomment-109069355
  
    Looking through some of the Jenkins pull request builder logs, I've noticed 
some noisier log output from Parquet:
    
    ```
    ileReader: Initiating action with parallelism: 5
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: 
Initiating action with parallelism: 5
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: 
reading summary file: 
file:/tmp/spark-a698f45b-c33a-4601-abdc-ca878a2fa499/test_insert_parquet/_common_metadata
    Jun 4, 2015 3:04:45 PM WARNING: 
org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due 
to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
    Jun 4, 2015 3:04:45 PM WARNING: 
org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due 
to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
    Jun 4, 2015 3:04:45 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized 
will read a total of 5 records.
    Jun 4, 2015 3:04:45 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next 
block
    Jun 4, 2015 3:04:45 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 
0 ms. row count = 5
    Jun 4, 2015 3:04:45 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized 
will read a total of 5 records.
    Jun 4, 2015 3:04:45 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next 
block
    Jun 4, 2015 3:04:45 PM INFO: 
org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 
1 ms. row count = 5
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.ParquetFileReader: 
Initiating action with parallelism: 5
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: 
Compression: GZIP
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet block size to 134217728
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet page size to 1048576
    Jun 4, 2015 3:04:45 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: 
Parquet dictionary page size to 1048576
    Jun 4, 2015 3:04:45 PM INFO: 
    ```
    
    Can someone file a JIRA and investigate?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7743] [SQL] Parquet 1.7

Reply via email to