svn commit: r17889 - in /dev/parquet/apache-parquet-1.8.2-rc1: ./ apache-parquet-1.8.2.tar.gz apache-parquet-1.8.2.tar.gz.asc apache-parquet-1.8.2.tar.gz.md5 apache-parquet-1.8.2.tar.gz.sha

2017-01-18 Thread blue
Author: blue Date: Thu Jan 19 03:04:40 2017 New Revision: 17889 Log: Apache Parquet MR $version RC${rc} Added: dev/parquet/apache-parquet-1.8.2-rc1/ dev/parquet/apache-parquet-1.8.2-rc1/apache-parquet-1.8.2.tar.gz (with props)

[31/50] [abbrv] parquet-mr git commit: PARQUET-358: Add support for Avro's logical types API.

2017-01-18 Thread blue
PARQUET-358: Add support for Avro's logical types API. This adds support for Avro's logical types API to parquet-avro. * The logical types API was introduced in Avro 1.8.0, so this bumps the Avro dependency version to 1.8.0. * Types supported are: decimal, date, time-millis, time-micros,

[22/50] [abbrv] parquet-mr git commit: PARQUET-423: Replace old Log class with SLF4J Logging

2017-01-18 Thread blue
PARQUET-423: Replace old Log class with SLF4J Logging And make writing files less noisy Author: Niels Basjes Closes #369 from nielsbasjes/PARQUET-423-2 and squashes the following commits: b31e30f [Niels Basjes] Merge branch 'master' of github.com:apache/parquet-mr into

[09/50] [abbrv] parquet-mr git commit: PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32

2017-01-18 Thread blue
PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32 Below is documented in [LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal): > int32: for 1 <= precision <= 9 > int64: for 1 <= precision <= 18; precision < 10 will

[26/50] [abbrv] parquet-mr git commit: PARQUET-660: Ignore extension fields in protobuf messages.

2017-01-18 Thread blue
PARQUET-660: Ignore extension fields in protobuf messages. Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660). This patch

[10/50] [abbrv] parquet-mr git commit: PARQUET-431: Make ParquetOutputFormat.memoryManager volatile

2017-01-18 Thread blue
PARQUET-431: Make ParquetOutputFormat.memoryManager volatile Currently ParquetOutputFormat.getRecordWriter() contains an unsynchronized lazy initialization of the non-volatile static field *memoryManager*. Because the compiler or processor may reorder instructions, threads are not guaranteed

[47/50] [abbrv] parquet-mr git commit: PARQUET-783: Close the underlying stream when an H2SeekableInputStream is closed

2017-01-18 Thread blue
PARQUET-783: Close the underlying stream when an H2SeekableInputStream is closed This PR addresses https://issues.apache.org/jira/browse/PARQUET-783. `ParquetFileReader` opens a `SeekableInputStream` to read a footer. In the process, it opens a new `FSDataInputStream` and wraps it. However,

[40/50] [abbrv] parquet-mr git commit: PARQUET-511: Integer overflow when counting values in column.

2017-01-18 Thread blue
PARQUET-511: Integer overflow when counting values in column. This commit fixes an issue when the number of entries in a column page is larger than the size of an integer. No exception is thrown directly, but the def level is set incorrectly, leading to a null value being returned during read.

[13/50] [abbrv] parquet-mr git commit: PARQUET-580: Switch int[] initialization in IntList to be lazy

2017-01-18 Thread blue
PARQUET-580: Switch int[] initialization in IntList to be lazy Noticed that for a dataset that we were trying to import that had a lot of columns (few thousand) that weren't being used, we ended up allocating a lot of unnecessary int arrays (each 64K in size). Heap footprint for all those

[21/50] [abbrv] parquet-mr git commit: PARQUET-423: Replace old Log class with SLF4J Logging

2017-01-18 Thread blue
http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/8e2009b8/parquet-column/src/main/java/org/apache/parquet/io/RecordConsumerLoggingWrapper.java -- diff --git

[parquet-mr] Git Push Summary

2017-01-18 Thread blue
Repository: parquet-mr Updated Tags: refs/tags/apache-parquet-1.8.2 [created] beaf00345

[11/50] [abbrv] parquet-mr git commit: PARQUET-430: Change to use Locale parameterized version of String.toUpperCase()/toLowerCase

2017-01-18 Thread blue
PARQUET-430: Change to use Locale parameterized version of String.toUpperCase()/toLowerCase A String is being converted to upper or lowercase, using the platform's default encoding. This may result in improper conversions when used with international characters. For instance,

[39/50] [abbrv] parquet-mr git commit: PARQUET-685 - Deprecated ParquetInputSplit constructor passes paramet…

2017-01-18 Thread blue
PARQUET-685 - Deprecated ParquetInputSplit constructor passes paramet… The problem was not discovered because the test was bugous. Updated both sides. Author: Gabor Szadovszky Closes #372 from gszadovszky/PARQUET-685 and squashes the following commits: 9cbeee2

[15/50] [abbrv] parquet-mr git commit: PARQUET-528: Fix flush() for RecordConsumer and implementations

2017-01-18 Thread blue
PARQUET-528: Fix flush() for RecordConsumer and implementations `flush()` was added in `RecordConsumer` and `MessageColumnIO` to help implementing nulls caching. However, other `RecordConsumer` implementations should also implements `flush()` properly. For instance,

[32/50] [abbrv] parquet-mr git commit: PARQUET-654: Add option to disable record-level filtering.

2017-01-18 Thread blue
PARQUET-654: Add option to disable record-level filtering. This can be used by frameworks that use codegen for filtering to avoid running filters within Parquet. Author: Ryan Blue Closes #353 from rdblue/PARQUET-654-add-record-level-filter-option and squashes the following

[28/50] [abbrv] parquet-mr git commit: PARQUET-642: Improve performance of ByteBuffer based read / write paths

2017-01-18 Thread blue
PARQUET-642: Improve performance of ByteBuffer based read / write paths While trying out the newest Parquet version, we noticed that the changes to start using ByteBuffers: https://github.com/apache/parquet-mr/commit/6b605a4ea05b66e1a6bf843353abcb4834a4ced8 and

[35/50] [abbrv] parquet-mr git commit: PARQUET-651: Improve Avro's isElementType check.

2017-01-18 Thread blue
PARQUET-651: Improve Avro's isElementType check. The Avro implementation needs to check whether the read schema that is passed by the user (or automatically converted from the file schema) expects an extra 1-field layer to be returned, which matches the previous behavior of Avro when reading a

[23/50] [abbrv] parquet-mr git commit: PARQUET-726: Increase max difference of testMemoryManagerUpperLimit to 10%

2017-01-18 Thread blue
PARQUET-726: Increase max difference of testMemoryManagerUpperLimit to 10% Author: Niels Basjes Closes #370 from nielsbasjes/PARQUET-726 and squashes the following commits: f385ede [Niels Basjes] PARQUET-726: Increase max difference of testMemoryManagerUpperLimit to 10%

[30/50] [abbrv] parquet-mr git commit: PARQUET-358: Add support for Avro's logical types API.

2017-01-18 Thread blue
http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/36e14294/parquet-avro/src/test/java/org/apache/parquet/avro/TestCircularReferences.java -- diff --git

[01/50] [abbrv] parquet-mr git commit: PARQUET-422: Fix a potential bug in MessageTypeParser where we ignore…

2017-01-18 Thread blue
Repository: parquet-mr Updated Branches: refs/heads/parquet-1.8.x [created] c65227886 PARQUET-422: Fix a potential bug in MessageTypeParser where we ignore… … and overwrite the initial value of a method parameter In org.apache.parquet.schema.MessageTypeParser, for addGroupType() and

[04/50] [abbrv] parquet-mr git commit: PARQUET-495: Fix mismatches in Types class comments

2017-01-18 Thread blue
PARQUET-495: Fix mismatches in Types class comments To produce > required group User { required int64 id; **optional** binary email (UTF8); } we should do: > Types.requiredGroup() .required(INT64).named("id") .~~**required** (BINARY).as(UTF8).named("email")~~