tooptoop4 commented on pull request #33332: URL: https://github.com/apache/spark/pull/33332#issuecomment-880665317
s3a url, bucket is in one of asia pacific ones, it is accessed via a HTTP proxy and with STS. some snippets of the log: ``` 21-07-12 11:58:36 INFO org.apache.spark.sql.execution.datasources.FileScanRDD: Reading File path: s3a://xxxx/yyyy.csv, range: 0-1951, partition values: [empty row] 21-07-12 11:58:37 INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Code generated in 9.010979 ms 21-07-12 11:58:37 INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Code generated in 8.593598 ms 21-07-12 11:58:37 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 2 21-07-12 11:58:37 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 21-07-12 11:58:37 INFO org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter 21-07-12 11:58:37 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 2 21-07-12 11:58:37 INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 21-07-12 11:58:37 INFO org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter 21-07-12 11:58:37 INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Code generated in 6.470666 ms 21-07-12 11:58:37 INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Code generated in 4.507112 ms 21-07-12 11:58:37 INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: Code generated in 13.859354 ms 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Dictionary is on 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Validation is off 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 8388608 bytes 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Page size checking is: estimated 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Min row count for page size check is: 100 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Max row count for page size check is: 10000 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Truncate length for column indexes is: 64 21-07-12 11:58:37 INFO org.apache.parquet.hadoop.ParquetOutputFormat: Page row count limit to 20000 21-07-12 11:58:37 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] 21-07-12 11:58:38 INFO org.apache.spark.sql.execution.datasources.BasicWriteTaskStatsTracker: Expected 1 files, but only saw 0. This could be due to the output format not writing empty files, or files being not immediately visible in the filesystem. 21-07-12 11:58:38 INFO org.apache.spark.mapred.SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_xxxx836_0021_m_000000_617 ``` can't see a fat jar of cloudstore and don't have access to hadoop installation atm ``` java -jar cloudstore-1.0.jar storediag no main manifest attribute, in cloudstore-1.0.jar ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
