Re: Date and time for the next Parquet sync

2018-04-18 Thread Zoltan Ivanfi
+1, thanks Lars!

On Wed, Apr 18, 2018 at 6:20 PM Lars Volker  wrote:

> Hi All,
>
> It has been 3 weeks since our last Parquet community sync and I think it
> would be great to have one next week. Last time we met on a Wednesday, so
> this time it should be Tuesday.
>
> I'd like to propose next Tuesday, April 24th, at 6pm CET / 9 am PST.
>
> Please speak up if that time does not work for you.
>
> Cheers, Lars
>


Date and time for the next Parquet sync

2018-04-18 Thread Lars Volker
Hi All,

It has been 3 weeks since our last Parquet community sync and I think it
would be great to have one next week. Last time we met on a Wednesday, so
this time it should be Tuesday.

I'd like to propose next Tuesday, April 24th, at 6pm CET / 9 am PST.

Please speak up if that time does not work for you.

Cheers, Lars


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442647#comment-16442647
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182451175
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/GroupType.java
 ##
 @@ -88,6 +88,7 @@ public GroupType(Repetition repetition, String name, 
OriginalType originalType,
* @param fields the contained fields
* @param id the id of the field
*/
+  @Deprecated
 
 Review comment:
   I think, it is enough to deprecate public API. If a method is not public we 
can freely modify/remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442648#comment-16442648
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182458050
 
 

 ##
 File path: parquet-column/src/main/java/org/apache/parquet/schema/Type.java
 ##
 @@ -146,11 +146,18 @@ public Type(String name, Repetition repetition, 
OriginalType originalType) {
* @param repetition OPTIONAL, REPEATED, REQUIRED
* @param originalType (optional) the original type to help with cross 
schema conversion (LIST, MAP, ...)
* @param id (optional) the id of the fields.
+   *
+   * @deprecated use {@link #Type(String, Repetition, LogicalTypeAnnotation, 
ID)} instead
*/
+  @Deprecated
 
 Review comment:
   Not public, no deprecating is required.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442643#comment-16442643
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182452787
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -436,13 +438,20 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
* @param originalType (optional) the original type (MAP, DECIMAL, UTF8, ...)
* @param decimalMeta (optional) metadata about the decimal type
* @param id the id of the field
+   *
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, int, 
String, LogicalTypeAnnotation, ID)} instead
 
 Review comment:
   See above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442644#comment-16442644
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182452368
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -401,15 +400,18 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
* @param name the name of the type
*/
   public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive, int 
length, String name) {
-this(repetition, primitive, length, name, null, null, null);
+this(repetition, primitive, length, name, (LogicalTypeAnnotation) null, 
null, null);
   }
 
   /**
* @param repetition OPTIONAL, REPEATED, REQUIRED
* @param primitive STRING, INT64, ...
* @param name the name of the type
* @param originalType (optional) the original type to help with cross 
schema convertion (LIST, MAP, ...)
+   *
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, 
String, LogicalTypeAnnotation)} instead
 
 Review comment:
   The usual pattern in parquet for deprecating is to mention that the related 
method will be removed in 2.0.0. In case of the suggestion would be to use one 
of the overloaded methods then it is fine to not mention.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442646#comment-16442646
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182456699
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -459,6 +468,37 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
 this.columnOrder = requireValidColumnOrder(columnOrder);
   }
 
+  public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive,
 
 Review comment:
   I think, in long term we do not want to expose the construction of types 
from the API. We would like the clients to use the builder instead. Therefore, 
I would suggest not adding public constructors if your code does not need it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442645#comment-16442645
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182452892
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -436,13 +438,20 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
* @param originalType (optional) the original type (MAP, DECIMAL, UTF8, ...)
* @param decimalMeta (optional) metadata about the decimal type
* @param id the id of the field
+   *
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, int, 
String, LogicalTypeAnnotation, ID)} instead
*/
+  @Deprecated
   public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive,
int length, String name, OriginalType originalType,
DecimalMetadata decimalMeta, ID id) {
 this(repetition, primitive, length, name, originalType, decimalMeta, id, 
null);
   }
 
+  /**
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, int, 
String, LogicalTypeAnnotation, ID, ColumnOrder)} instead
 
 Review comment:
   Not public, no need for deprecation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442641#comment-16442641
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182458122
 
 

 ##
 File path: parquet-column/src/main/java/org/apache/parquet/schema/Type.java
 ##
 @@ -146,11 +146,18 @@ public Type(String name, Repetition repetition, 
OriginalType originalType) {
* @param repetition OPTIONAL, REPEATED, REQUIRED
* @param originalType (optional) the original type to help with cross 
schema conversion (LIST, MAP, ...)
* @param id (optional) the id of the fields.
+   *
+   * @deprecated use {@link #Type(String, Repetition, LogicalTypeAnnotation, 
ID)} instead
*/
+  @Deprecated
   Type(String name, Repetition repetition, OriginalType originalType, ID id) {
 this(name, repetition, originalType, null, id);
   }
 
+  /**
+   * @deprecated use {@link #Type(String, Repetition, LogicalTypeAnnotation, 
ID)} instead
+   */
+  @Deprecated
 
 Review comment:
   see above


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PARQUET-1120) Parquet-mr project build fails with parquet format 2.3.2-SNAPSHOT

2018-04-18 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442497#comment-16442497
 ] 

Nandor Kollar edited comment on PARQUET-1120 at 4/18/18 1:34 PM:
-

I think this issue is already addressed in PARQUET-1143. [~zi] can we close 
this Jira as duplicate?


was (Author: nkollar):
I think this issue is already addressed in PARQUET-1143. [~zi] can we close 
this Jira with duplicate?

> Parquet-mr project build fails with parquet format 2.3.2-SNAPSHOT
> -
>
> Key: PARQUET-1120
> URL: https://issues.apache.org/jira/browse/PARQUET-1120
> Project: Parquet
>  Issue Type: Wish
>  Components: parquet-mr
>Affects Versions: 1.9.1
>Reporter: Krishnaprasad A S
>Priority: Minor
>
> current parquet-mr (1.9.1-SNAPSHOT) project build fails with parquet-format 
> 2.3.2-SNAPSHOT. Which blocks to incorporate the parquet-format related 
> changes into parquet-mr project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1120) Parquet-mr project build fails with parquet format 2.3.2-SNAPSHOT

2018-04-18 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442497#comment-16442497
 ] 

Nandor Kollar commented on PARQUET-1120:


I think this issue is already addressed in PARQUET-1143. [~zi] can we close 
this Jira with duplicate?

> Parquet-mr project build fails with parquet format 2.3.2-SNAPSHOT
> -
>
> Key: PARQUET-1120
> URL: https://issues.apache.org/jira/browse/PARQUET-1120
> Project: Parquet
>  Issue Type: Wish
>  Components: parquet-mr
>Affects Versions: 1.9.1
>Reporter: Krishnaprasad A S
>Priority: Minor
>
> current parquet-mr (1.9.1-SNAPSHOT) project build fails with parquet-format 
> 2.3.2-SNAPSHOT. Which blocks to incorporate the parquet-format related 
> changes into parquet-mr project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1171) [C++] Clarify valid uses for RLE, BIT_PACKED encodings

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1171.
-

> [C++] Clarify valid uses for RLE, BIT_PACKED encodings
> --
>
> Key: PARQUET-1171
> URL: https://issues.apache.org/jira/browse/PARQUET-1171
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
>Priority: Major
> Fix For: format-2.5.0
>
>
> Currently we only support these encodings for levels but not for data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1258) Update scm developer connection to github

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1258.
-

> Update scm developer connection to github
> -
>
> Key: PARQUET-1258
> URL: https://issues.apache.org/jira/browse/PARQUET-1258
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format, parquet-mr
>Affects Versions: 1.10.0, format-2.5.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Minor
> Fix For: 1.10.0, format-2.5.0
>
>
> After moving to gitbox the old apache repo 
> (https://git-wip-us.apache.org/repos/asf/parquet-format.git) is not working 
> anymore. The pom.xml shall be updated accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1251) Clarify ambiguous min/max stats for FLOAT/DOUBLE

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1251.
-

> Clarify ambiguous min/max stats for FLOAT/DOUBLE
> 
>
> Key: PARQUET-1251
> URL: https://issues.apache.org/jira/browse/PARQUET-1251
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Affects Versions: format-2.4.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: format-2.5.0
>
>
> Describe the handling of the ambigous min/max statistics for FLOAT/DOUBLE 
> types in case of TypeDefinedOrder. (See PARQUET-1222 for details.)
> * When looking for NaN values, min and max should be ignored.
> * If the min is a NaN, it should be ignored.
> * If the max is a NaN, it should be ignored.
> * If the min is +0, the row group may contain -0 values as well.
> * If the max is -0, the row group may contain +0 values as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1236) Upgrade org.slf4j:slf4j-api:1.7.2 to 1.7.12

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1236.
-

> Upgrade org.slf4j:slf4j-api:1.7.2 to 1.7.12
> ---
>
> Key: PARQUET-1236
> URL: https://issues.apache.org/jira/browse/PARQUET-1236
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Affects Versions: format-2.5.0
>Reporter: PandaMonkey
>Assignee: PandaMonkey
>Priority: Major
> Fix For: format-2.5.0
>
> Attachments: parquet-format.txt
>
>
> Hi, I found two versions of library org.slf4j:slf4j-api in your project. It 
> would be nice to keep the version consistency.
> Their introduced path is:
>  # 
> org.apache.parquet:parquet-format:2.4.1-SNAPSHOT::null->org.slf4j:slf4j-api:1.7.2::compile
>  # 
> org.apache.parquet:parquet-format:2.4.1-SNAPSHOT::null->org.apache.thrift:libthrift:0.9.3::compile->org.slf4j:slf4j-api:1.7.12::compile
>  Thanks!
>  
> Regards,
>     Panda



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1065) Deprecate type-defined sort ordering for INT96 type

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1065.
-

> Deprecate type-defined sort ordering for INT96 type
> ---
>
> Key: PARQUET-1065
> URL: https://issues.apache.org/jira/browse/PARQUET-1065
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
> Fix For: 1.10.0, format-2.5.0
>
>
> [parquet.thrift in 
> parquet-format|https://github.com/apache/parquet-format/blob/041708da1af52e7cb9288c331b542aa25b68a2b6/src/main/thrift/parquet.thrift#L37]
>  defines the the sort order for INT96 to be signed. 
> [ParquetMetadataConverter.java in 
> parquet-mr|https://github.com/apache/parquet-mr/blob/352b906996f392030bfd53b93e3cf4adb78d1a55/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L422]
>  uses unsigned ordering instead. In practice, INT96 is only used for 
> timestamps and neither signed nor unsigned ordering of the numeric values is 
> correct for this purpose. For this reason, the INT96 sort order should be 
> specified as undefined.
> (As a special case, min == max signifies that all values are the same, and 
> can be considered valid even for undefined orderings.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1242) parquet.thrift refers to wrong releases for the new compressions

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1242.
-

> parquet.thrift refers to wrong releases for the new compressions
> 
>
> Key: PARQUET-1242
> URL: https://issues.apache.org/jira/browse/PARQUET-1242
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
> Fix For: format-2.5.0
>
>
> parquet.thrift contains the following:
> {{/**}}
> {{ * Supported compression algorithms.}}
> {{ *}}
> {{ * Codecs added in {color:red}2.3.2{color} can be read by readers based on 
> {color:red}2.3.2{color} and later.}}
> {{ * Codec support may vary between readers based on the format version and}}
> {{ * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are}}
> {{ * widely available, while Zstd and Brotli require additional libraries.}}
> {{ */}}
> {{enum CompressionCodec {}}
> {{  UNCOMPRESSED = 0;}}
> {{  SNAPPY = 1;}}
> {{  GZIP = 2;}}
> {{  LZO = 3;}}
> {{  BROTLI = 4; // Added in {color:red}2.3.2{color}}}
> {{  LZ4 = 5;// Added in {color:red}2.3.2{color}}}
> {{  ZSTD = 6;   // Added in {color:red}2.3.2{color}}}
> {{}}}
> In reality, there was no 2.3.2 release. These compression codecs were added 
> in version 2.4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1145) Add license to .gitignore and .travis.yml

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1145.
-

> Add license to .gitignore and .travis.yml
> -
>
> Key: PARQUET-1145
> URL: https://issues.apache.org/jira/browse/PARQUET-1145
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: format-2.3.1, format-2.4.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Trivial
> Fix For: format-2.5.0
>
>
> The .gitignore file could have the ASF license. I'll post a PR momentarily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-323) INT96 should be marked as deprecated

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-323.


> INT96 should be marked as deprecated
> 
>
> Key: PARQUET-323
> URL: https://issues.apache.org/jira/browse/PARQUET-323
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Cheng Lian
>Assignee: Lars Volker
>Priority: Major
> Fix For: format-2.5.0
>
>
> As discussed in the mailing list, {{INT96}} is only used to represent nanosec 
> timestamp in Impala for some historical reasons, and should be deprecated. 
> Since nanosec precision is rarely a real requirement, one possible and simple 
> solution would be replacing {{INT96}} with {{INT64 (TIMESTAMP_MILLIS)}} or 
> {{INT64 (TIMESTAMP_MICROS)}}.
> Several projects (Impala, Hive, Spark, ...) support INT96.
> We need a clear spec of the replacement and the path to deprecation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1234) Release Parquet format 2.5.0

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1234.
-

> Release Parquet format 2.5.0
> 
>
> Key: PARQUET-1234
> URL: https://issues.apache.org/jira/browse/PARQUET-1234
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Affects Versions: format-2.5.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: format-2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1064) Deprecate type-defined sort ordering for INTERVAL type

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1064.
-

> Deprecate type-defined sort ordering for INTERVAL type
> --
>
> Key: PARQUET-1064
> URL: https://issues.apache.org/jira/browse/PARQUET-1064
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Minor
> Fix For: format-2.5.0
>
>
> [LogicalTypes.md in 
> parquet-format|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]
>  defines the the sort order for INTERVAL to be produced by sorting by the 
> value of months, then days, then milliseconds with unsigned comparison.
> According to these rules, 1d0h0s > 0d48h0s, which is counter-intuitive and 
> does not seem to have any practical uses. Unless somebody is aware of an 
> actual use-case in which this makes sense, I think the sort order should be 
> undefined instead. The [reference implementation in 
> parquet-mr|https://github.com/apache/parquet-mr/blob/352b906996f392030bfd53b93e3cf4adb78d1a55/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L459]
>  already considers the ordering to be unknown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1156) dev/merge_parquet_pr.py problems

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1156.
-

> dev/merge_parquet_pr.py problems
> 
>
> Key: PARQUET-1156
> URL: https://issues.apache.org/jira/browse/PARQUET-1156
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
> Fix For: 1.10.0, format-2.5.0
>
>
> I have run into several issues while trying to run dev/merge_parquet_pr.py 
> according to the [instructions|https://parquet.apache.org/contribute/]:
> * The optional import {{jira.client}} is only checked for the 
> {{resolve_jira()}} call, but the script fails much earlier if {{jira.client}} 
> is not available in {{check_jira()}}, so a check should be added there as 
> well. In fact, the script shouldn't even ask for {{JIRA_USERNAME}} and 
> {{JIRA_PASSWORD}} if {{jira.client}} is not available.
> * I had to issue {{pip install jira}} instead of {{pip install jira-python}} 
> that was suggested by the script.
> * Once you have {{jira.client}} installed, the script still fails when 
> following the [instructions|https://parquet.apache.org/contribute/], because 
> the instructions specify a remote named {{github-apache}} but the script 
> tries to use {{apache-github}} instead. Either the instructions or the script 
> should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PARQUET-1197) Log rat failures

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky closed PARQUET-1197.
-

> Log rat failures
> 
>
> Key: PARQUET-1197
> URL: https://issues.apache.org/jira/browse/PARQUET-1197
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Minor
> Fix For: 1.10.0, format-2.5.0
>
>
> Currently, rat plugin does not log anything useful in case of failure (e.g. 
> found files without a license). The details can be found only in the 
> generated {{rat.txt}} file. This can make hard to find the problem in 
> environments where the build workspace is not easily accessible (e.g. 
> Jenkins).
> The rat plugin should be configured to log the erroneous files so accessing 
> {{rat.txt}} is not required to fix the issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1234) Release Parquet format 2.5.0

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky resolved PARQUET-1234.
---
Resolution: Fixed

> Release Parquet format 2.5.0
> 
>
> Key: PARQUET-1234
> URL: https://issues.apache.org/jira/browse/PARQUET-1234
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Affects Versions: format-2.5.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: format-2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1234) Release Parquet format 2.5.0

2018-04-18 Thread Gabor Szadovszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky reassigned PARQUET-1234:
-

Assignee: Gabor Szadovszky

> Release Parquet format 2.5.0
> 
>
> Key: PARQUET-1234
> URL: https://issues.apache.org/jira/browse/PARQUET-1234
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Affects Versions: format-2.5.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: format-2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[RESULT][VOTE] Release Apache Parquet Format 2.5.0 RC0

2018-04-18 Thread Gabor Szadovszky
Hi All,

For the vote for this parquet-format release have seen
4   "+1" votes
0   "0" votes
0   "-1" votes

With 4 binding “+1” votes this vote PASSES. We’ll release the artifacts and 
send an announcement.

Regards,
Gabor

> On 18 Apr 2018, at 12:38, Zoltan Ivanfi  wrote:
> 
> +1 (binding)
> 
> Checked sigs, built and tested.
> 
> On Tue, Apr 17, 2018 at 1:39 PM Gabor Szadovszky <
> gabor.szadovs...@cloudera.com> wrote:
> 
>> Hi everyone,
>> 
>> We reached the required 3 binding +1 votes. As there was no deadline
>> defined for this vote we’ll wait another 24 hours and close it tomorrow.
>> Thanks a lot for your efforts.
>> 
>> Regards,
>> Gabor
>> 
>>> On 16 Apr 2018, at 18:46, Daniel Weeks 
>> wrote:
>>> 
>>> +1 (binding)
>>> 
>>> Checked sigs, built and tested.
>>> 
>>> 
>>> On Thu, Apr 12, 2018 at 2:48 PM, Julien Le Dem 
>>> wrote:
>>> 
 +1 (binding)
 checked signature
 ran build and tests
 
 On Mon, Apr 9, 2018 at 8:44 AM, Ryan Blue 
 wrote:
 
> +1 (binding)
> 
> Checked this for the last vote.
> 
> On Mon, Apr 9, 2018 at 4:53 AM, Gabor Szadovszky <
> gabor.szadovs...@cloudera.com> wrote:
> 
>> Hi everyone,
>> 
>> Unfortunately, the previous vote has failed due to timeout. Now,
>> Zoltan
>> and I propose a new vote for the same RC to be released as official
> Apache
>> Parquet Format 2.5.0 release.
>> 
>> The commit id is f0fa7c14a4699581b41d8ba9aff1512663cc0fb4
>> * This corresponds to the tag: apache-parquet-format-2.5.0
>> * https://github.com/apache/parquet-format/tree/
>> f0fa7c14a4699581b41d8ba9aff1512663cc0fb4 > parquet-format/tree/f0fa7c14a4699581b41d8ba9aff1512663cc0fb4>
>> 
>> The release tarball, signature, and checksums are here:
>> * https://dist.apache.org/repos/dist/dev/parquet/apache-
>> parquet-format-2.5.0-rc0/ > repos/dist/dev/parquet/apache-parquet-format-2.5.0-rc0/>
>> 
>> You can find the KEYS file here:
>> * https://dist.apache.org/repos/dist/dev/parquet/KEYS <
>> https://dist.apache.org/repos/dist/dev/parquet/KEYS>
>> 
>> Binary artifacts are staged in Nexus here:
>> * https://repository.apache.org/content/groups/staging/org/
>> apache/parquet/parquet-format/ > org/content/groups/staging/org/apache/parquet/parquet-format/>
>> 
>> This release includes important changes that I should have summarized
>> here, but I'm lazy.
>> See https://github.com/apache/parquet-format/blob/
>> f0fa7c14a4699581b41d8ba9aff1512663cc0fb4/CHANGES.md <
>> https://github.com/apache/parquet-format/blob/
>> f0fa7c14a4699581b41d8ba9aff1512663cc0fb4/CHANGES.md> for details.
>> 
>> Please download, verify, and test.
>> 
>> [ ] +1 Release this as Apache Parquet Format 2.5.0
>> [ ] +0
>> [ ] -1 Do not release this because…
> 
> 
> 
> 
> --
> Ryan Blue
> Software Engineer
> Netflix
> 
 
>> 
>> 



[jira] [Commented] (PARQUET-1272) [C++] ScanFileContents reports wrong row count for nested columns

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442295#comment-16442295
 ] 

ASF GitHub Bot commented on PARQUET-1272:
-

xhochy closed pull request #457: PARQUET-1272: Return correct row count for 
nested columns in ScanFileContents
URL: https://github.com/apache/parquet-cpp/pull/457
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/parquet/file_reader.cc b/src/parquet/file_reader.cc
index 983d2d0b..0632872c 100644
--- a/src/parquet/file_reader.cc
+++ b/src/parquet/file_reader.cc
@@ -347,9 +347,18 @@ int64_t ScanFileContents(std::vector columns, const 
int32_t column_batch_si
 
   int64_t values_read = 0;
   while (col_reader->HasNext()) {
-total_rows[col] +=
+int64_t levels_read =
 ScanAllValues(column_batch_size, def_levels.data(), 
rep_levels.data(),
   values.data(), _read, col_reader.get());
+if (col_reader->descr()->max_repetition_level() > 0) {
+  for (int64_t i = 0; i < levels_read; i++) {
+if (rep_levels[i] == 0) {
+  total_rows[col]++;
+}
+  }
+} else {
+  total_rows[col] += levels_read;
+}
   }
   col++;
 }


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] ScanFileContents reports wrong row count for nested columns
> -
>
> Key: PARQUET-1272
> URL: https://issues.apache.org/jira/browse/PARQUET-1272
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: cpp-1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1272) [C++] ScanFileContents reports wrong row count for nested columns

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved PARQUET-1272.
--
Resolution: Fixed

Issue resolved by pull request 457
[https://github.com/apache/parquet-cpp/pull/457]

> [C++] ScanFileContents reports wrong row count for nested columns
> -
>
> Key: PARQUET-1272
> URL: https://issues.apache.org/jira/browse/PARQUET-1272
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: cpp-1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1260) Add Zoltan Ivanfi's code signing key to the KEYS file

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442294#comment-16442294
 ] 

ASF GitHub Bot commented on PARQUET-1260:
-

zivanfi commented on issue #91: PARQUET-1260: Add Zoltan Ivanfi's code signing 
key to the KEYS file
URL: https://github.com/apache/parquet-format/pull/91#issuecomment-382349610
 
 
   When I added my key to the KEY files, they were already diverged at that 
point and I couldn't figure out which one should "win", so I just added my key 
to both of them, but I didn't cause any further inconsistencies (I believe).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Zoltan Ivanfi's code signing key to the KEYS file
> -
>
> Key: PARQUET-1260
> URL: https://issues.apache.org/jira/browse/PARQUET-1260
> Project: Parquet
>  Issue Type: Task
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>
> To make a release, I would need to have my gpg key added to the KEYS file. I 
> can add it to the repos in a commit, but I don't know how to update 
> [https://dist.apache.org/repos/dist/dev/parquet/KEYS]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1274) [Python] SegFault in pyarrow.parquet.write_table with specific options

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved PARQUET-1274.
--
Resolution: Fixed

Issue resolved by pull request 456
[https://github.com/apache/parquet-cpp/pull/456]

> [Python] SegFault in pyarrow.parquet.write_table with specific options
> --
>
> Key: PARQUET-1274
> URL: https://issues.apache.org/jira/browse/PARQUET-1274
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu 
> Xenial (Python 3.5)
>Reporter: Clément Bouscasse
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down 
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
>  df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers,  pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', 
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1274) [Python] SegFault in pyarrow.parquet.write_table with specific options

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442274#comment-16442274
 ] 

ASF GitHub Bot commented on PARQUET-1274:
-

xhochy commented on issue #456: PARQUET-1274: Prevent segfault that was 
occurring when writing a nanosecond timestamp with arrow writer properties set 
to coerce timestamps and support deprecated int96 timestamps.
URL: https://github.com/apache/parquet-cpp/pull/456#issuecomment-382347711
 
 
   Thanks @joshuastorck 
   
   I moved the ARROW issue over to the PARQUET tracker and also gave you karma 
to assign issues to yourself there, too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] SegFault in pyarrow.parquet.write_table with specific options
> --
>
> Key: PARQUET-1274
> URL: https://issues.apache.org/jira/browse/PARQUET-1274
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu 
> Xenial (Python 3.5)
>Reporter: Clément Bouscasse
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down 
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
>  df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers,  pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', 
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1274) [Python] SegFault in pyarrow.parquet.write_table with specific options

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442273#comment-16442273
 ] 

ASF GitHub Bot commented on PARQUET-1274:
-

xhochy closed pull request #456: PARQUET-1274: Prevent segfault that was 
occurring when writing a nanosecond timestamp with arrow writer properties set 
to coerce timestamps and support deprecated int96 timestamps.
URL: https://github.com/apache/parquet-cpp/pull/456
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/parquet/arrow/arrow-reader-writer-test.cc 
b/src/parquet/arrow/arrow-reader-writer-test.cc
index 79a393f6..bf6f3022 100644
--- a/src/parquet/arrow/arrow-reader-writer-test.cc
+++ b/src/parquet/arrow/arrow-reader-writer-test.cc
@@ -1403,6 +1403,64 @@ TEST(TestArrowReadWrite, ConvertedDateTimeTypes) {
   AssertTablesEqual(*ex_table, *result);
 }
 
+// Regression for ARROW-2802
+TEST(TestArrowReadWrite, CoerceTimestampsAndSupportDeprecatedInt96) {
+  using ::arrow::Column;
+  using ::arrow::Field;
+  using ::arrow::Schema;
+  using ::arrow::Table;
+  using ::arrow::TimeUnit;
+  using ::arrow::TimestampType;
+  using ::arrow::TimestampBuilder;
+  using ::arrow::default_memory_pool;
+
+  auto timestamp_type = std::make_shared(TimeUnit::NANO);
+
+  TimestampBuilder builder(timestamp_type, default_memory_pool());
+  for (std::int64_t ii = 0; ii < 10; ++ii) {
+ASSERT_OK(builder.Append(10L * ii));
+  }
+  std::shared_ptr values;
+  ASSERT_OK(builder.Finish());
+
+  std::vector fields;
+  auto field = std::make_shared("nanos", timestamp_type);
+  fields.emplace_back(field);
+
+  auto schema = std::make_shared(fields);
+
+  std::vector columns;
+  auto column = std::make_shared("nanos", values);
+  columns.emplace_back(column);
+
+  auto table = Table::Make(schema, columns);
+
+  auto arrow_writer_properties = ArrowWriterProperties::Builder()
+ .coerce_timestamps(TimeUnit::MICRO)
+ ->enable_deprecated_int96_timestamps()
+ ->build();
+
+  std::shared_ptr result;
+  DoSimpleRoundtrip(table, 1, table->num_rows(), {}, , 
arrow_writer_properties);
+
+  ASSERT_EQ(table->num_columns(), result->num_columns());
+  ASSERT_EQ(table->num_rows(), result->num_rows());
+
+  auto actual_column = result->column(0);
+  auto data = actual_column->data();
+  auto expected_values =
+  
static_cast<::arrow::NumericArray*>(values.get())->raw_values();
+  for (int ii = 0; ii < data->num_chunks(); ++ii) {
+auto chunk =
+
static_cast<::arrow::NumericArray*>(data->chunk(ii).get());
+auto values = chunk->raw_values();
+for (int64_t jj = 0; jj < chunk->length(); ++jj, ++expected_values) {
+  // Check that the nanos have been converted to micros
+  ASSERT_EQ(*expected_values / 1000, values[jj]);
+}
+  }
+}
+
 void MakeDoubleTable(int num_columns, int num_rows, int nchunks,
  std::shared_ptr* out) {
   std::shared_ptr<::arrow::Column> column;
diff --git a/src/parquet/arrow/writer.cc b/src/parquet/arrow/writer.cc
index 5040e0cc..9eca41ec 100644
--- a/src/parquet/arrow/writer.cc
+++ b/src/parquet/arrow/writer.cc
@@ -595,7 +595,14 @@ Status ArrowColumnWriter::WriteTimestamps(const Array& 
values, int64_t num_level
 
   const bool is_nanosecond = type.unit() == TimeUnit::NANO;
 
-  if (is_nanosecond && 
ctx_->properties->support_deprecated_int96_timestamps()) {
+  // In the case where support_deprecated_int96_timestamps was specified
+  // and coerce_timestamps_enabled was specified, a nanosecond column
+  // will have a physical type of int64. In that case, we fall through
+  // to the else if below.
+  //
+  // See https://issues.apache.org/jira/browse/ARROW-2082
+  if (is_nanosecond && ctx_->properties->support_deprecated_int96_timestamps() 
&&
+  !ctx_->properties->coerce_timestamps_enabled()) {
 return TypedWriteBatch(values, 
num_levels,
   def_levels, 
rep_levels);
   } else if (is_nanosecond ||


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] SegFault in pyarrow.parquet.write_table with specific options
> --
>
> Key: PARQUET-1274
> URL: 

[jira] [Assigned] (PARQUET-1274) [Python] SegFault in pyarrow.parquet.write_table with specific options

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned PARQUET-1274:


Assignee: Joshua Storck

> [Python] SegFault in pyarrow.parquet.write_table with specific options
> --
>
> Key: PARQUET-1274
> URL: https://issues.apache.org/jira/browse/PARQUET-1274
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu 
> Xenial (Python 3.5)
>Reporter: Clément Bouscasse
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down 
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
>  df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers,  pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', 
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Moved] (PARQUET-1274) [Python] SegFault in pyarrow.parquet.write_table with specific options

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn moved ARROW-2082 to PARQUET-1274:
-

Fix Version/s: (was: 0.10.0)
   cpp-1.5.0
Affects Version/s: (was: 0.8.0)
  Component/s: (was: Python)
   parquet-cpp
 Workflow: patch-available, re-open possible  (was: jira)
  Key: PARQUET-1274  (was: ARROW-2082)
  Project: Parquet  (was: Apache Arrow)

> [Python] SegFault in pyarrow.parquet.write_table with specific options
> --
>
> Key: PARQUET-1274
> URL: https://issues.apache.org/jira/browse/PARQUET-1274
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu 
> Xenial (Python 3.5)
>Reporter: Clément Bouscasse
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down 
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
>  df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers,  pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', 
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Parquet Format 2.5.0 RC0

2018-04-18 Thread Zoltan Ivanfi
+1 (binding)

Checked sigs, built and tested.

On Tue, Apr 17, 2018 at 1:39 PM Gabor Szadovszky <
gabor.szadovs...@cloudera.com> wrote:

> Hi everyone,
>
> We reached the required 3 binding +1 votes. As there was no deadline
> defined for this vote we’ll wait another 24 hours and close it tomorrow.
> Thanks a lot for your efforts.
>
> Regards,
> Gabor
>
> > On 16 Apr 2018, at 18:46, Daniel Weeks 
> wrote:
> >
> > +1 (binding)
> >
> > Checked sigs, built and tested.
> >
> >
> > On Thu, Apr 12, 2018 at 2:48 PM, Julien Le Dem 
> > wrote:
> >
> >> +1 (binding)
> >> checked signature
> >> ran build and tests
> >>
> >> On Mon, Apr 9, 2018 at 8:44 AM, Ryan Blue 
> >> wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> Checked this for the last vote.
> >>>
> >>> On Mon, Apr 9, 2018 at 4:53 AM, Gabor Szadovszky <
> >>> gabor.szadovs...@cloudera.com> wrote:
> >>>
>  Hi everyone,
> 
>  Unfortunately, the previous vote has failed due to timeout. Now,
> Zoltan
>  and I propose a new vote for the same RC to be released as official
> >>> Apache
>  Parquet Format 2.5.0 release.
> 
>  The commit id is f0fa7c14a4699581b41d8ba9aff1512663cc0fb4
>  * This corresponds to the tag: apache-parquet-format-2.5.0
>  * https://github.com/apache/parquet-format/tree/
>  f0fa7c14a4699581b41d8ba9aff1512663cc0fb4   parquet-format/tree/f0fa7c14a4699581b41d8ba9aff1512663cc0fb4>
> 
>  The release tarball, signature, and checksums are here:
>  * https://dist.apache.org/repos/dist/dev/parquet/apache-
>  parquet-format-2.5.0-rc0/   repos/dist/dev/parquet/apache-parquet-format-2.5.0-rc0/>
> 
>  You can find the KEYS file here:
>  * https://dist.apache.org/repos/dist/dev/parquet/KEYS <
>  https://dist.apache.org/repos/dist/dev/parquet/KEYS>
> 
>  Binary artifacts are staged in Nexus here:
>  * https://repository.apache.org/content/groups/staging/org/
>  apache/parquet/parquet-format/   org/content/groups/staging/org/apache/parquet/parquet-format/>
> 
>  This release includes important changes that I should have summarized
>  here, but I'm lazy.
>  See https://github.com/apache/parquet-format/blob/
>  f0fa7c14a4699581b41d8ba9aff1512663cc0fb4/CHANGES.md <
>  https://github.com/apache/parquet-format/blob/
>  f0fa7c14a4699581b41d8ba9aff1512663cc0fb4/CHANGES.md> for details.
> 
>  Please download, verify, and test.
> 
>  [ ] +1 Release this as Apache Parquet Format 2.5.0
>  [ ] +0
>  [ ] -1 Do not release this because…
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Ryan Blue
> >>> Software Engineer
> >>> Netflix
> >>>
> >>
>
>


[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442180#comment-16442180
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

xhochy commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382324782
 
 
   @thamht4190 You need to do `export PATH=/usr/local/opt/bison/bin:$PATH` and 
then a fresh build to get rid of the error. Alternatively install Thrift via 
homebrew or conda and use that instead of the one that is built by parquet-cpp.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /usr/local/include/thrift/protocol/TCompactProtocol.h:242:76: note: passing 
> argument to parameter 'trans' here
>   stdcxx::shared_ptr getProtocol(stdcxx::shared_ptr 
> trans) {
>^
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> 

[jira] [Resolved] (PARQUET-1273) [Python] Error writing to partitioned Parquet dataset

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved PARQUET-1273.
--
Resolution: Fixed

Issue resolved by pull request 453
[https://github.com/apache/parquet-cpp/pull/453]

> [Python] Error writing to partitioned Parquet dataset
> -
>
> Key: PARQUET-1273
> URL: https://issues.apache.org/jira/browse/PARQUET-1273
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: Linux (Ubuntu 16.04)
>Reporter: Robert Dailey
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
> Attachments: ARROW-1938-test-data.csv.gz, ARROW-1938.py, 
> pyarrow_dataset_error.png
>
>
> I receive the following error after upgrading to pyarrow 0.8.0 when writing 
> to a dataset:
> * ArrowIOError: Column 3 had 187374 while previous column had 1
> The command was:
> write_table_values = {'row_group_size': 1}
> pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), 
> '/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 
> 'hour'], **write_table_values)
> I've also tried write_table_values = {'chunk_size': 1} and received the 
> same error.
> This same command works in version 0.7.1.  I am trying to troubleshoot the 
> problem but wanted to submit a ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1273) [Python] Error writing to partitioned Parquet dataset

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned PARQUET-1273:


Assignee: Joshua Storck

> [Python] Error writing to partitioned Parquet dataset
> -
>
> Key: PARQUET-1273
> URL: https://issues.apache.org/jira/browse/PARQUET-1273
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: Linux (Ubuntu 16.04)
>Reporter: Robert Dailey
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
> Attachments: ARROW-1938-test-data.csv.gz, ARROW-1938.py, 
> pyarrow_dataset_error.png
>
>
> I receive the following error after upgrading to pyarrow 0.8.0 when writing 
> to a dataset:
> * ArrowIOError: Column 3 had 187374 while previous column had 1
> The command was:
> write_table_values = {'row_group_size': 1}
> pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), 
> '/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 
> 'hour'], **write_table_values)
> I've also tried write_table_values = {'chunk_size': 1} and received the 
> same error.
> This same command works in version 0.7.1.  I am trying to troubleshoot the 
> problem but wanted to submit a ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1273) [Python] Error writing to partitioned Parquet dataset

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442096#comment-16442096
 ] 

ASF GitHub Bot commented on PARQUET-1273:
-

xhochy closed pull request #453: PARQUET-1273: Properly write dictionary values 
when writing in chunks
URL: https://github.com/apache/parquet-cpp/pull/453
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/parquet/arrow/arrow-reader-writer-test.cc 
b/src/parquet/arrow/arrow-reader-writer-test.cc
index 79a393f6..92b67353 100644
--- a/src/parquet/arrow/arrow-reader-writer-test.cc
+++ b/src/parquet/arrow/arrow-reader-writer-test.cc
@@ -1726,6 +1726,69 @@ TEST(TestArrowReadWrite, TableWithDuplicateColumns) {
   CheckSimpleRoundtrip(table, table->num_rows());
 }
 
+TEST(TestArrowReadWrite, DictionaryColumnChunkedWrite) {
+  // This is a regression test for this:
+  //
+  // https://issues.apache.org/jira/browse/ARROW-1938
+  //
+  // As of the writing of this test, columns of type
+  // dictionary are written as their raw/expanded values.
+  // The regression was that the whole column was being
+  // written for each chunk.
+  using ::arrow::ArrayFromVector;
+
+  std::vector values = {"first", "second", "third"};
+  auto type = ::arrow::utf8();
+  std::shared_ptr dict_values;
+  ArrayFromVector<::arrow::StringType, std::string>(values, _values);
+
+  auto dict_type = ::arrow::dictionary(::arrow::int32(), dict_values);
+  auto f0 = field("dictionary", dict_type);
+  std::vector> fields;
+  fields.emplace_back(f0);
+  auto schema = ::arrow::schema(fields);
+
+  std::shared_ptr f0_values, f1_values;
+  ArrayFromVector<::arrow::Int32Type, int32_t>({0, 1, 0, 2, 1}, _values);
+  ArrayFromVector<::arrow::Int32Type, int32_t>({2, 0, 1, 0, 2}, _values);
+  ::arrow::ArrayVector dict_arrays = {
+  std::make_shared<::arrow::DictionaryArray>(dict_type, f0_values),
+  std::make_shared<::arrow::DictionaryArray>(dict_type, f1_values)};
+
+  std::vector> columns;
+  auto column = MakeColumn("column", dict_arrays, false);
+  columns.emplace_back(column);
+
+  auto table = Table::Make(schema, columns);
+
+  std::shared_ptr result;
+  DoSimpleRoundtrip(table, 1,
+// Just need to make sure that we make
+// a chunk size that is smaller than the
+// total number of values
+2, {}, );
+
+  std::vector expected_values = {"first",  "second", "first", 
"third",
+  "second", "third",  "first", 
"second",
+  "first",  "third"};
+  columns.clear();
+
+  std::shared_ptr expected_array;
+  ArrayFromVector<::arrow::StringType, std::string>(expected_values, 
_array);
+
+  // The column name gets changed on output to the name of the
+  // field, and it also turns into a nullable column
+  columns.emplace_back(MakeColumn("dictionary", expected_array, true));
+
+  fields.clear();
+  fields.emplace_back(::arrow::field("dictionary", ::arrow::utf8()));
+  schema = ::arrow::schema(fields);
+
+  auto expected_table = Table::Make(schema, columns);
+
+  AssertTablesEqual(*expected_table, *result, false);
+}
+
 TEST(TestArrowWrite, CheckChunkSize) {
   const int num_columns = 2;
   const int num_rows = 128;
diff --git a/src/parquet/arrow/writer.cc b/src/parquet/arrow/writer.cc
index 5040e0cc..ce05ef0b 100644
--- a/src/parquet/arrow/writer.cc
+++ b/src/parquet/arrow/writer.cc
@@ -962,7 +962,7 @@ class FileWriter::Impl {
   ::arrow::compute::Datum cast_output;
   RETURN_NOT_OK(Cast(, cast_input, dict_type.dictionary()->type(), 
CastOptions(),
  _output));
-  return WriteColumnChunk(cast_output.chunked_array(), 0, data->length());
+  return WriteColumnChunk(cast_output.chunked_array(), offset, size);
 }
 
 ColumnWriter* column_writer;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Error writing to partitioned Parquet dataset
> -
>
> Key: PARQUET-1273
> URL: https://issues.apache.org/jira/browse/PARQUET-1273
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: Linux (Ubuntu 16.04)
>Reporter: Robert Dailey
>Assignee: Joshua Storck
>

[jira] [Moved] (PARQUET-1273) [Python] Error writing to partitioned Parquet dataset

2018-04-18 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn moved ARROW-1938 to PARQUET-1273:
-

Fix Version/s: (was: 0.10.0)
   cpp-1.5.0
Affects Version/s: (was: 0.8.0)
  Component/s: (was: Python)
   parquet-cpp
 Workflow: patch-available, re-open possible  (was: jira)
  Key: PARQUET-1273  (was: ARROW-1938)
  Project: Parquet  (was: Apache Arrow)

> [Python] Error writing to partitioned Parquet dataset
> -
>
> Key: PARQUET-1273
> URL: https://issues.apache.org/jira/browse/PARQUET-1273
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: Linux (Ubuntu 16.04)
>Reporter: Robert Dailey
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.5.0
>
> Attachments: ARROW-1938-test-data.csv.gz, ARROW-1938.py, 
> pyarrow_dataset_error.png
>
>
> I receive the following error after upgrading to pyarrow 0.8.0 when writing 
> to a dataset:
> * ArrowIOError: Column 3 had 187374 while previous column had 1
> The command was:
> write_table_values = {'row_group_size': 1}
> pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True), 
> '/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day', 
> 'hour'], **write_table_values)
> I've also tried write_table_values = {'chunk_size': 1} and received the 
> same error.
> This same command works in version 0.7.1.  I am trying to troubleshoot the 
> problem but wanted to submit a ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442042#comment-16442042
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

thamht4190 commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382291072
 
 
   I tried check `bison --version`, oh, it show bison 2.3. Then I found [this 
article on 
stackoverflow](https://stackoverflow.com/questions/10778905/why-not-gnu-bison-upgrade-to-2-5-on-macosx-10-7-3/30844621#30844621).
 It generates another errors but it passed bison issue ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /usr/local/include/thrift/protocol/TCompactProtocol.h:242:76: note: passing 
> argument to parameter 'trans' here
>   stdcxx::shared_ptr getProtocol(stdcxx::shared_ptr 
> trans) {
>^
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:

[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442043#comment-16442043
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

thamht4190 commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382291072
 
 
   I tried check `bison --version`, oh, it showed bison version 2.3. Then I 
found [this article on 
stackoverflow](https://stackoverflow.com/questions/10778905/why-not-gnu-bison-upgrade-to-2-5-on-macosx-10-7-3/30844621#30844621).
 It generates another errors but it passed bison issue ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /usr/local/include/thrift/protocol/TCompactProtocol.h:242:76: note: passing 
> argument to parameter 'trans' here
>   stdcxx::shared_ptr getProtocol(stdcxx::shared_ptr 
> trans) {
>^
> In file included from 
> 

[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441991#comment-16441991
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

thamht4190 commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382281162
 
 
   I still get an issue on MacOS while the latest bison is installed:
   ```
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.1-5:
 invalid directive: `%code'
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.7-14:
 syntax error, unexpected identifier
   ```
   Here are the version of bison:
   ```
   Mac-CoreTeam:ThirdParty dev$ brew upgrade bison
   Error: bison 3.0.4_1 already installed
   ```
   Do you have any idea? Thanks!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> 

[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441990#comment-16441990
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

thamht4190 commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382281162
 
 
   I still get an issue on MacOS while the latest bison is installed 
   `
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.1-5:
 invalid directive: `%code'
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.7-14:
 syntax error, unexpected identifier
   `
   Here are the version of bison:
   `
   Mac-CoreTeam:ThirdParty dev$ brew upgrade bison
   Error: bison 3.0.4_1 already installed
   `
   Do you have any idea? Thanks!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> 

[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441989#comment-16441989
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

thamht4190 commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382281162
 
 
   I still get an issue on MacOS while the latest bison is installed 
   `
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.1-5:
 invalid directive: `%code'
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.7-14:
 syntax error, unexpected identifier
   `
   `
   Mac-CoreTeam:ThirdParty dev$ brew upgrade bison
   Error: bison 3.0.4_1 already installed
   `
   Do you have any idea? Thanks!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /usr/local/include/thrift/protocol/TCompactProtocol.h:242:76: note: 

[jira] [Commented] (PARQUET-1179) [C++] Support Apache Thrift 0.11

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441988#comment-16441988
 ] 

ASF GitHub Bot commented on PARQUET-1179:
-

thamht4190 commented on issue #433: PARQUET-1179: Upgrade to Thrift 0.11, use 
std::shared_ptr instead of boost::shared_ptr
URL: https://github.com/apache/parquet-cpp/pull/433#issuecomment-382281162
 
 
   I still get an issue after install the latest bison 
   `
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.1-5:
 invalid directive: `%code'
   
/Users/dev/tham/cortex-v2/src/ThirdParty/parquet-cpp/build/thrift_ep-prefix/src/thrift_ep/compiler/cpp/src/thrift/thrifty.yy:1.7-14:
 syntax error, unexpected identifier
   `
   `
   Mac-CoreTeam:ThirdParty dev$ brew upgrade bison
   Error: bison 3.0.4_1 already installed
   `
   Do you have any idea? Thanks!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Apache Thrift 0.11
> 
>
> Key: PARQUET-1179
> URL: https://issues.apache.org/jira/browse/PARQUET-1179
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
> Environment: OSX 10.13.2
> Apple Clang
> {code:java}
> Apple LLVM version 9.0.0 (clang-900.0.39.2)
> Target: x86_64-apple-darwin17.3.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> {code}
>Reporter: Stephen Carman
>Assignee: Wes McKinney
>Priority: Major
> Fix For: cpp-1.4.0
>
>
> I am not sure if this is an OSX specific issue or something with a new 
> version of Boost, but parquet does not seem to build with the current setup.
> {code:java}
> In file included from 
> /Users/steve_carman/software/parquet-cpp/src/parquet/schema.cc:28:
> /Users/steve_carman/software/parquet-cpp/src/parquet/thrift.h:105:34: error: 
> no viable conversion from 
> 'boost::shared_ptr' to 
> 'stdcxx::shared_ptr'
>   tproto_factory.getProtocol(tmem_transport);
>  ^~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3900:23:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'nullptr_t' 
> for 1st argument
> _LIBCPP_CONSTEXPR shared_ptr(nullptr_t) _NOEXCEPT;
>   ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3914:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 'const 
> std::__1::shared_ptr &' for 1st 
> argument
> shared_ptr(const shared_ptr& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3922:5:
>  note: candidate constructor not viable: no known conversion from 
> 'boost::shared_ptr' to 
> 'std::__1::shared_ptr &&' for 1st 
> argument
> shared_ptr(shared_ptr&& __r) _NOEXCEPT;
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3917:9:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> shared_ptr(const shared_ptr<_Yp>& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3923:52:
>  note: candidate template ignored: could not match 'std::__1::shared_ptr' 
> against 'boost::shared_ptr'
> template _LIBCPP_INLINE_VISIBILITY  
> shared_ptr(shared_ptr<_Yp>&& __r,
>^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3931:9:
>  note: candidate template ignored: could not match 'auto_ptr' against 
> 'shared_ptr'
> shared_ptr(auto_ptr<_Yp>&& __r,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3940:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/memory:3949:9:
>  note: candidate template ignored: could not match 'unique_ptr' against 
> 'shared_ptr'
> shared_ptr(unique_ptr<_Yp, _Dp>&&,
> ^
> /usr/local/include/thrift/protocol/TCompactProtocol.h:242:76: note: passing 
> argument