Author: blue
Date: Thu Jan 19 03:04:40 2017
New Revision: 17889
Log:
Apache Parquet MR $version RC${rc}
Added:
dev/parquet/apache-parquet-1.8.2-rc1/
dev/parquet/apache-parquet-1.8.2-rc1/apache-parquet-1.8.2.tar.gz (with
props)
PARQUET-358: Add support for Avro's logical types API.
This adds support for Avro's logical types API to parquet-avro.
* The logical types API was introduced in Avro 1.8.0, so this bumps the Avro
dependency version to 1.8.0.
* Types supported are: decimal, date, time-millis, time-micros,
PARQUET-423: Replace old Log class with SLF4J Logging
And make writing files less noisy
Author: Niels Basjes
Closes #369 from nielsbasjes/PARQUET-423-2 and squashes the following commits:
b31e30f [Niels Basjes] Merge branch 'master' of github.com:apache/parquet-mr
into
PARQUET-484: Warn when Decimal is stored as INT64 while could be stored as INT32
Below is documented in
[LogicalTypes.md](https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal):
> int32: for 1 <= precision <= 9
> int64: for 1 <= precision <= 18; precision < 10 will
PARQUET-660: Ignore extension fields in protobuf messages.
Currently, converting protobuf messages with extension can result in an
uninformative error or a data corruption. A more detailed explanation in the
corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660).
This patch
PARQUET-431: Make ParquetOutputFormat.memoryManager volatile
Currently ParquetOutputFormat.getRecordWriter() contains an unsynchronized lazy
initialization of the non-volatile static field *memoryManager*.
Because the compiler or processor may reorder instructions, threads are not
guaranteed
PARQUET-783: Close the underlying stream when an H2SeekableInputStream is closed
This PR addresses https://issues.apache.org/jira/browse/PARQUET-783.
`ParquetFileReader` opens a `SeekableInputStream` to read a footer. In the
process, it opens a new `FSDataInputStream` and wraps it. However,
PARQUET-511: Integer overflow when counting values in column.
This commit fixes an issue when the number of entries in a column page is
larger than the size of an integer. No exception is thrown directly, but the
def level is set incorrectly, leading to a null value being returned during
read.
PARQUET-580: Switch int[] initialization in IntList to be lazy
Noticed that for a dataset that we were trying to import that had a lot of
columns (few thousand) that weren't being used, we ended up allocating a lot of
unnecessary int arrays (each 64K in size). Heap footprint for all those
http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/8e2009b8/parquet-column/src/main/java/org/apache/parquet/io/RecordConsumerLoggingWrapper.java
--
diff --git
Repository: parquet-mr
Updated Tags: refs/tags/apache-parquet-1.8.2 [created] beaf00345
PARQUET-430: Change to use Locale parameterized version of
String.toUpperCase()/toLowerCase
A String is being converted to upper or lowercase, using the platform's default
encoding. This may result in improper conversions when used with international
characters.
For instance,
PARQUET-685 - Deprecated ParquetInputSplit constructor passes parametâ¦
The problem was not discovered because the test was bugous. Updated both sides.
Author: Gabor Szadovszky
Closes #372 from gszadovszky/PARQUET-685 and squashes the following commits:
9cbeee2
PARQUET-528: Fix flush() for RecordConsumer and implementations
`flush()` was added in `RecordConsumer` and `MessageColumnIO` to help
implementing nulls caching.
However, other `RecordConsumer` implementations should also implements
`flush()` properly. For instance,
PARQUET-654: Add option to disable record-level filtering.
This can be used by frameworks that use codegen for filtering to avoid
running filters within Parquet.
Author: Ryan Blue
Closes #353 from rdblue/PARQUET-654-add-record-level-filter-option and squashes
the following
PARQUET-642: Improve performance of ByteBuffer based read / write paths
While trying out the newest Parquet version, we noticed that the changes to
start using ByteBuffers:
https://github.com/apache/parquet-mr/commit/6b605a4ea05b66e1a6bf843353abcb4834a4ced8
and
PARQUET-651: Improve Avro's isElementType check.
The Avro implementation needs to check whether the read schema that is
passed by the user (or automatically converted from the file schema)
expects an extra 1-field layer to be returned, which matches the
previous behavior of Avro when reading a
PARQUET-726: Increase max difference of testMemoryManagerUpperLimit to 10%
Author: Niels Basjes
Closes #370 from nielsbasjes/PARQUET-726 and squashes the following commits:
f385ede [Niels Basjes] PARQUET-726: Increase max difference of
testMemoryManagerUpperLimit to 10%
http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/36e14294/parquet-avro/src/test/java/org/apache/parquet/avro/TestCircularReferences.java
--
diff --git
Repository: parquet-mr
Updated Branches:
refs/heads/parquet-1.8.x [created] c65227886
PARQUET-422: Fix a potential bug in MessageTypeParser where we ignoreâ¦
⦠and overwrite the initial value of a method parameter
In org.apache.parquet.schema.MessageTypeParser, for addGroupType() and
PARQUET-495: Fix mismatches in Types class comments
To produce
> required group User {
required int64 id;
**optional** binary email (UTF8);
}
we should do:
>
Types.requiredGroup()
.required(INT64).named("id")
.~~**required** (BINARY).as(UTF8).named("email")~~
21 matches
Mail list logo