date:20160215

[jira] [Commented] (PARQUET-408) Shutdown hook in parquet-avro library corrupts data and disables logging

2016-02-15 Thread Michal Turek (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148233#comment-15148233
 ] 

Michal Turek commented on PARQUET-408:
--

Hi Jim, can you point to a related code snippet where log4j2 comes from? I 
can't find anything, where did you find it? Is it shadowed somewhere similarly 
to slf4j-api and slf4j-nop in parquet-format? Our project doesn't use it at all 
and there is no related JAR in dependencies. We use standard slf4j-api, various 
slf4j bridges and logback as logger implementation. It's also quite bad idea to 
configure logging of everything e.g. in logback.xml and have a second 
configuration file only due to one concrete library that is not well behaved. 
Am I the only one who see it? ;-)

{noformat}
find . -name '*log4j*'
./log4j-over-slf4j-1.7.14.jar
{noformat}

> Shutdown hook in parquet-avro library corrupts data and disables logging
> 
>
> Key: PARQUET-408
> URL: https://issues.apache.org/jira/browse/PARQUET-408
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro
>Affects Versions: 1.8.1
>Reporter: Michal Turek
> Fix For: 1.9.0
>
> Attachments: parquet-broken-shutdown_2015-12-16.tar.gz
>
>
> Parquet-avro and probably also other Parquet libraries are not well behaved. 
> It registers a shutdown hook that bypasses application shutdown sequence, 
> corrupts data written to currently opened Parquet file(s) and disables or 
> reconfigures slf4j/logback logger so no further log message is visible.
> h3. Scope
> Our application is a microservice that handles stop request in form of signal 
> SIGTERM, resp. JVM shutdown hook. If it arrives the application will close 
> all opened files (writers), release all other resources and gracefully 
> shutdown. We are swiching from sequence files to Parquet at the moment and 
> using Maven dependency {{org.apache.parquet:parquet-avro:1.8.1}} which is 
> current latest version. We are using 
> {{Runtime.getRuntime().addShutdownHook()}} to handle SIGTERM.
> h3. Example code
> See archive in attachment.
> - Optionally update version of {{hadoop-client}} in {{pom.xml}} to match your 
> Hadoop.
> - Use {{mvn package}} to compile.
> - Copy Hadoop configuration XMLs to {{config}} directory.
> - Update configuration at the top of {{ParquetBrokenShutdown}} class.
> - Execute {{ParquetBrokenShutdown}} class.
> - Send SIGTERM to shutdown the application ({{kill PID}}).
> h3. Initial analysis
> Parquet library tries to care about application shutdown but this introduces 
> more issues than solves. If application is writing to a file and the library 
> asynchronously decides to close underlying writer, data loss will occur. The 
> handle is just closed and all remaining records can't be written.
> {noformat}
> Writing to HDFS/Parquet failed
> java.io.IOException: can not write PageHeader(type:DICTIONARY_PAGE, 
> uncompressed_page_size:14, compressed_page_size:34, 
> dictionary_page_header:DictionaryPageHeader(num_values:1, encoding:PLAIN))
>   at org.apache.parquet.format.Util.write(Util.java:224)
>   at org.apache.parquet.format.Util.writePageHeader(Util.java:61)
>   at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.writeDictionaryPageHeader(ParquetMetadataConverter.java:760)
>   at 
> org.apache.parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:307)
>   at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179)
>   at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:165)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
>   at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.writeParquetFile(ParquetBrokenShutdown.java:86)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.run(ParquetBrokenShutdown.java:53)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.main(ParquetBrokenShutdown.java:153)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: parquet.org.apache.thrift.transport.TTransportException: 
>

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

2016-02-15 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148130#comment-15148130
 ] 

Ferdinand Xu commented on PARQUET-41:
-

Hi [~rdblue],
I have a basic idea about how to estimate the expected entries required by 
bloom filter. 
AFAIK we can’t get the row count for each row group before all data are flushed 
into the disk. Since this we can estimate the size in the following way.
For the first row group, we don’t create bloom filter statistics for it at the 
beginning. By flushing the first row group, we’re able to have a general idea 
of the row counts. For the rest of the row groups, we will choose this row 
count to create the bloom filter bit set. 
We can do a small improvement for the strategy above. We have the size for the 
whole row group. We can calculate the expected entry number based on the 
average size for the first 100 or 1000 rows. Since the characteristic of bloom 
filter, we need to store these items in a tmp buffer. Once the bloom filter bit 
set is created, we will flush these data into bit set and then drop them.
One thing I want to highlight is that we don’t need to know the *exact* row 
count and an estimated value is enough. 
Any thoughts about the idea?


> Add bloom filters to parquet statistics
> ---
>
> Key: PARQUET-41
> URL: https://issues.apache.org/jira/browse/PARQUET-41
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-format, parquet-mr
>Reporter: Alex Levenson
>Assignee: Ferdinand Xu
>  Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: parquet-cpp first 0.1 release planning & timeline

2016-02-15 Thread Majeti, Deepak

This looks good to me as well.
With little more effort towards testing, March 1 should be a doable target.

On 02/15/2016 07:11 PM, Julien Le Dem wrote:
> Looks good. I commented in the doc
>
> On Sun, Feb 14, 2016 at 9:18 AM, Aliaksei Sandryhaila 
> wrote:
>
>> The list looks reasonable. I've added PARQUET-530 to add support for LZO
>> compression.
>>
>> Not sure we'll be able to get everything done by March 1, but it's a good
>> target.
>>
>>
>>
>> On 02/13/2016 12:32 PM, Wes McKinney wrote:
>>
>>> Dear friends,
>>>
>>> I made a pass through the JIRAs and feature roadmap and listed out
>>> essential tasks for reaching a milestone that would merit a versioned
>>> code release, see:
>>>
>>>
>>> https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#
>>>
>>> I will be pressing for all of this to be completed with target date
>>> (for a parquet-cpp 0.1 RC) of March 1. I think with the current
>>> cadence and progress this should be achievable.
>>>
>>> It would be great if Aliaksei, Deepak, Julien, and Nong could briefly
>>> review the items and let me know if there are any blind spots / things
>>> we've missed?
>>>
>>> If we agree on the scope of the release and timeline, we should create
>>> ordered "tracks" of interrelated JIRAs as many of them depend on the
>>> others for functionality and unit test tools that will be created.
>>> This will reduce ambiguity about what needs doing next and reduce
>>> refactor churn / merge conflicts.
>>>
>>> Thank you
>>> Wes
>>>
>>
>


-- 
--
Deepak Majeti

[jira] [Resolved] (PARQUET-431) Make ParquetOutputFormat.memoryManager volatile

2016-02-15 Thread Julien Le Dem (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PARQUET-431.
---
Resolution: Fixed

Issue resolved by pull request 313
[https://github.com/apache/parquet-mr/pull/313]

> Make ParquetOutputFormat.memoryManager volatile
> ---
>
> Key: PARQUET-431
> URL: https://issues.apache.org/jira/browse/PARQUET-431
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
> Fix For: 1.9.0
>
>
> Currently ParquetOutputFormat.getRecordWriter() contains an unsynchronized 
> lazy initialization of the non-volatile static field *memoryManager*.
> Because the compiler or processor may reorder instructions, threads are not 
> guaranteed to see a completely initialized object, when 
> ParquetOutputFormat.getRecordWriter() is called by multiple threads.
> This ticket proposes to make *memoryManager* volatile to correct the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (PARQUET-430) Change to use Locale parameterized version of String.toUpperCase()/toLowerCase

2016-02-15 Thread Julien Le Dem (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PARQUET-430.
---
Resolution: Fixed

Issue resolved by pull request 312
[https://github.com/apache/parquet-mr/pull/312]

> Change to use Locale parameterized version of String.toUpperCase()/toLowerCase
> --
>
> Key: PARQUET-430
> URL: https://issues.apache.org/jira/browse/PARQUET-430
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>Priority: Minor
> Fix For: 1.9.0
>
>
> A String is being converted to upper or lowercase, using the platform's 
> default encoding. This may result in improper conversions when used with 
> international characters.
> For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", 
> where 'ı' -- without a dot -- is the LATIN SMALL LETTER DOTLESS I character. 
> To obtain correct results for locale insensitive strings, we'd better use 
> toLowerCase(Locale.ENGLISH).
> For more information on this, please see:
> - 
> http://stackoverflow.com/questions/11063102/using-locales-with-javas-tolowercase-and-touppercase
> - 
> http://lotusnotus.com/lotusnotus_en.nsf/dx/dotless-i-tolowercase-and-touppercase-functions-use-responsibly.htm
> - http://java.sys-con.com/node/46241
> This ticket proposes to change our use of String.toUpperCase()/toLowerCase() 
> to String.toUpperCase(Locale.*ENGLISH*)/toLowerCase(*Locale.ENGLISH*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (PARQUET-277) Remove boost dependency

2016-02-15 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved PARQUET-277.
--
   Resolution: Fixed
 Assignee: Wes McKinney
Fix Version/s: cpp-0.1

Resolved by https://github.com/apache/parquet-cpp/pull/49

> Remove boost dependency
> ---
>
> Key: PARQUET-277
> URL: https://issues.apache.org/jira/browse/PARQUET-277
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Hyunsik Choi
>Assignee: Wes McKinney
> Fix For: cpp-0.1
>
>
> At a glance, parquet-cpp slightly uses boost dependency. It seems to be 
> possible to remove boost dependency if we use C++11 feature.
> If we remove boost dependency, parquet-cpp can be more portable and 
> lightweight. Also, C\+\+11 would allow us to modernize C++ codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PARQUET-534) Add Travis CI and codecov.io badges to README.md

2016-02-15 Thread Wes McKinney (JIRA)

Wes McKinney created PARQUET-534:


 Summary: Add Travis CI and codecov.io badges to README.md
 Key: PARQUET-534
 URL: https://issues.apache.org/jira/browse/PARQUET-534
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-cpp
Reporter: Wes McKinney
Priority: Minor


This will give users finding parquet-cpp more confidence that it is a reliable 
piece of software. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: parquet-cpp first 0.1 release planning & timeline

2016-02-15 Thread Julien Le Dem

Looks good. I commented in the doc

On Sun, Feb 14, 2016 at 9:18 AM, Aliaksei Sandryhaila 
wrote:

> The list looks reasonable. I've added PARQUET-530 to add support for LZO
> compression.
>
> Not sure we'll be able to get everything done by March 1, but it's a good
> target.
>
>
>
> On 02/13/2016 12:32 PM, Wes McKinney wrote:
>
>> Dear friends,
>>
>> I made a pass through the JIRAs and feature roadmap and listed out
>> essential tasks for reaching a milestone that would merit a versioned
>> code release, see:
>>
>>
>> https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#
>>
>> I will be pressing for all of this to be completed with target date
>> (for a parquet-cpp 0.1 RC) of March 1. I think with the current
>> cadence and progress this should be achievable.
>>
>> It would be great if Aliaksei, Deepak, Julien, and Nong could briefly
>> review the items and let me know if there are any blind spots / things
>> we've missed?
>>
>> If we agree on the scope of the release and timeline, we should create
>> ordered "tracks" of interrelated JIRAs as many of them depend on the
>> others for functionality and unit test tools that will be created.
>> This will reduce ambiguity about what needs doing next and reduce
>> refactor churn / merge conflicts.
>>
>> Thank you
>> Wes
>>
>
>


-- 
Julien

[jira] [Resolved] (PARQUET-446) Hide thrift dependency in parquet-cpp

2016-02-15 Thread Julien Le Dem (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PARQUET-446.
---
   Resolution: Fixed
Fix Version/s: cpp-0.1

Issue resolved by pull request 49
[https://github.com/apache/parquet-cpp/pull/49]

> Hide thrift dependency in parquet-cpp
> -
>
> Key: PARQUET-446
> URL: https://issues.apache.org/jira/browse/PARQUET-446
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Nong Li
> Fix For: cpp-0.1
>
>
> Pulling in thrift compiled headers tend to pull in a lot of things. It would 
> be nice to not expose them in the parquet library (the application should be 
> able to use a different version of thrift, etc). 
> We can also see if it is practical to not depend on thrift at all and 
> replicate the logic we need. Thrift is fairly stable at this point so this 
> might be feasible. This would allow us to do things like not rely on boost. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (PARQUET-408) Shutdown hook in parquet-avro library corrupts data and disables logging

2016-02-15 Thread Ryan Blue (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved PARQUET-408.
---
Resolution: Not A Problem

> Shutdown hook in parquet-avro library corrupts data and disables logging
> 
>
> Key: PARQUET-408
> URL: https://issues.apache.org/jira/browse/PARQUET-408
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro
>Affects Versions: 1.8.1
>Reporter: Michal Turek
> Fix For: 1.9.0
>
> Attachments: parquet-broken-shutdown_2015-12-16.tar.gz
>
>
> Parquet-avro and probably also other Parquet libraries are not well behaved. 
> It registers a shutdown hook that bypasses application shutdown sequence, 
> corrupts data written to currently opened Parquet file(s) and disables or 
> reconfigures slf4j/logback logger so no further log message is visible.
> h3. Scope
> Our application is a microservice that handles stop request in form of signal 
> SIGTERM, resp. JVM shutdown hook. If it arrives the application will close 
> all opened files (writers), release all other resources and gracefully 
> shutdown. We are swiching from sequence files to Parquet at the moment and 
> using Maven dependency {{org.apache.parquet:parquet-avro:1.8.1}} which is 
> current latest version. We are using 
> {{Runtime.getRuntime().addShutdownHook()}} to handle SIGTERM.
> h3. Example code
> See archive in attachment.
> - Optionally update version of {{hadoop-client}} in {{pom.xml}} to match your 
> Hadoop.
> - Use {{mvn package}} to compile.
> - Copy Hadoop configuration XMLs to {{config}} directory.
> - Update configuration at the top of {{ParquetBrokenShutdown}} class.
> - Execute {{ParquetBrokenShutdown}} class.
> - Send SIGTERM to shutdown the application ({{kill PID}}).
> h3. Initial analysis
> Parquet library tries to care about application shutdown but this introduces 
> more issues than solves. If application is writing to a file and the library 
> asynchronously decides to close underlying writer, data loss will occur. The 
> handle is just closed and all remaining records can't be written.
> {noformat}
> Writing to HDFS/Parquet failed
> java.io.IOException: can not write PageHeader(type:DICTIONARY_PAGE, 
> uncompressed_page_size:14, compressed_page_size:34, 
> dictionary_page_header:DictionaryPageHeader(num_values:1, encoding:PLAIN))
>   at org.apache.parquet.format.Util.write(Util.java:224)
>   at org.apache.parquet.format.Util.writePageHeader(Util.java:61)
>   at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.writeDictionaryPageHeader(ParquetMetadataConverter.java:760)
>   at 
> org.apache.parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:307)
>   at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179)
>   at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:165)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
>   at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.writeParquetFile(ParquetBrokenShutdown.java:86)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.run(ParquetBrokenShutdown.java:53)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.main(ParquetBrokenShutdown.java:153)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: parquet.org.apache.thrift.transport.TTransportException: 
> java.nio.channels.ClosedChannelException
>   at 
> parquet.org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
>   at 
> parquet.org.apache.thrift.transport.TTransport.write(TTransport.java:105)
>   at 
> parquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:424)
>   at 
> parquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:431)
>   at 
> parquet.org.apache.thrift.protocol.TCompactProtocol.writeFieldBeginInternal(TCompactProtocol.java:194)
>   at 
> parquet.org.apache.thrift.protocol.TCompactProtocol.writeFieldBegin(TCompactProtocol.java:176)
>   at 
>

[jira] [Commented] (PARQUET-408) Shutdown hook in parquet-avro library corrupts data and disables logging

2016-02-15 Thread Ryan Blue (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147589#comment-15147589
 ] 

Ryan Blue commented on PARQUET-408:
---

Thanks, Jim! I think that closes out what we needed to explain here. We'll 
follow up on the Parquet logging changes in PARQUET-412 and I'll close this 
issue. Feel free to re-open if I missed something.

> Shutdown hook in parquet-avro library corrupts data and disables logging
> 
>
> Key: PARQUET-408
> URL: https://issues.apache.org/jira/browse/PARQUET-408
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro
>Affects Versions: 1.8.1
>Reporter: Michal Turek
> Fix For: 1.9.0
>
> Attachments: parquet-broken-shutdown_2015-12-16.tar.gz
>
>
> Parquet-avro and probably also other Parquet libraries are not well behaved. 
> It registers a shutdown hook that bypasses application shutdown sequence, 
> corrupts data written to currently opened Parquet file(s) and disables or 
> reconfigures slf4j/logback logger so no further log message is visible.
> h3. Scope
> Our application is a microservice that handles stop request in form of signal 
> SIGTERM, resp. JVM shutdown hook. If it arrives the application will close 
> all opened files (writers), release all other resources and gracefully 
> shutdown. We are swiching from sequence files to Parquet at the moment and 
> using Maven dependency {{org.apache.parquet:parquet-avro:1.8.1}} which is 
> current latest version. We are using 
> {{Runtime.getRuntime().addShutdownHook()}} to handle SIGTERM.
> h3. Example code
> See archive in attachment.
> - Optionally update version of {{hadoop-client}} in {{pom.xml}} to match your 
> Hadoop.
> - Use {{mvn package}} to compile.
> - Copy Hadoop configuration XMLs to {{config}} directory.
> - Update configuration at the top of {{ParquetBrokenShutdown}} class.
> - Execute {{ParquetBrokenShutdown}} class.
> - Send SIGTERM to shutdown the application ({{kill PID}}).
> h3. Initial analysis
> Parquet library tries to care about application shutdown but this introduces 
> more issues than solves. If application is writing to a file and the library 
> asynchronously decides to close underlying writer, data loss will occur. The 
> handle is just closed and all remaining records can't be written.
> {noformat}
> Writing to HDFS/Parquet failed
> java.io.IOException: can not write PageHeader(type:DICTIONARY_PAGE, 
> uncompressed_page_size:14, compressed_page_size:34, 
> dictionary_page_header:DictionaryPageHeader(num_values:1, encoding:PLAIN))
>   at org.apache.parquet.format.Util.write(Util.java:224)
>   at org.apache.parquet.format.Util.writePageHeader(Util.java:61)
>   at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.writeDictionaryPageHeader(ParquetMetadataConverter.java:760)
>   at 
> org.apache.parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:307)
>   at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179)
>   at 
> org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:165)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
>   at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.writeParquetFile(ParquetBrokenShutdown.java:86)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.run(ParquetBrokenShutdown.java:53)
>   at 
> com.avast.bugreport.parquet.ParquetBrokenShutdown.main(ParquetBrokenShutdown.java:153)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: parquet.org.apache.thrift.transport.TTransportException: 
> java.nio.channels.ClosedChannelException
>   at 
> parquet.org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
>   at 
> parquet.org.apache.thrift.transport.TTransport.write(TTransport.java:105)
>   at 
> parquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:424)
>   at 
> parquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:431)
>   at 
>

[jira] [Commented] (PARQUET-408) Shutdown hook in parquet-avro library corrupts data and disables logging

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

Re: parquet-cpp first 0.1 release planning & timeline

[jira] [Resolved] (PARQUET-431) Make ParquetOutputFormat.memoryManager volatile

[jira] [Resolved] (PARQUET-430) Change to use Locale parameterized version of String.toUpperCase()/toLowerCase

[jira] [Resolved] (PARQUET-277) Remove boost dependency

[jira] [Created] (PARQUET-534) Add Travis CI and codecov.io badges to README.md

Re: parquet-cpp first 0.1 release planning & timeline

[jira] [Resolved] (PARQUET-446) Hide thrift dependency in parquet-cpp

[jira] [Resolved] (PARQUET-408) Shutdown hook in parquet-avro library corrupts data and disables logging

[jira] [Commented] (PARQUET-408) Shutdown hook in parquet-avro library corrupts data and disables logging

11 matches

Site Navigation

Mail list logo

Footer information