[ https://issues.apache.org/jira/browse/PARQUET-544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Le Dem resolved PARQUET-544. ----------------------------------- Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request 345 [https://github.com/apache/parquet-mr/pull/345] > ParquetWriter.close() throws NullPointerException on second call, improper > implementation of Closeable contract > --------------------------------------------------------------------------------------------------------------- > > Key: PARQUET-544 > URL: https://issues.apache.org/jira/browse/PARQUET-544 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.8.1 > Reporter: Michal Turek > Assignee: Michal Turek > Priority: Minor > Fix For: 1.9.0 > > > {{org.apache.parquet.hadoop.ParquetWriter}} implements > {{java.util.Closeable}}, but its {{close()}} method doesn't follow its > contract properly. The interface defines "If the stream is already closed > then invoking this method has no effect.", but {{ParquetWriter}} instead > throws {{NullPointerException}}. > It's source is quite obvious, {{columnStore}} is set to null and then > accessed again. There is no "if already closed" condition to prevent it. > {noformat} > java.lang.NullPointerException: null > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:157) > ~[parquet-hadoop-1.8.1.jar:1.8.1] > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113) > ~[parquet-hadoop-1.8.1.jar:1.8.1] > at > org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297) > ~[parquet-hadoop-1.8.1.jar:1.8.1] > {noformat} > {noformat} > private void flushRowGroupToStore() > throws IOException { > LOG.info(format("Flushing mem columnStore to file. allocated memory: > %,d", columnStore.getAllocatedSize())); > if (columnStore.getAllocatedSize() > (3 * rowGroupSizeThreshold)) { > LOG.warn("Too much memory used: " + columnStore.memUsageString()); > } > if (recordCount > 0) { > parquetFileWriter.startBlock(recordCount); > columnStore.flush(); > pageStore.flushToFileWriter(parquetFileWriter); > recordCount = 0; > parquetFileWriter.endBlock(); > this.nextRowGroupSize = Math.min( > parquetFileWriter.getNextRowGroupSize(), > rowGroupSizeThreshold); > } > columnStore = null; > pageStore = null; > } > {noformat} > Known workaround is to prevent the second and other closes explicitly in the > application code. > {noformat} > private final ParquetWriter<V> writer; > private boolean closed; > private void closeWriterOnlyOnce() throws IOException { > if (!closed) { > closed = true; > writer.close(); > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)