[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2018-07-04 Thread Al M (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532534#comment-16532534
 ] 

Al M commented on SPARK-13127:
--

Would be great to get this resolved in Spark 2.3.2.  Especially since Parquet 
1.9 supports delta encoding: https://issues.apache.org/jira/browse/PARQUET-225

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>Priority: Major
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2018-02-16 Thread Li Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367925#comment-16367925
 ] 

Li Jin commented on SPARK-13127:


Hi all,

The status of the Jira is "Progress". I am wondering if this is being actively 
worked on?

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>Priority: Major
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-12-28 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305691#comment-16305691
 ] 

Dong Jiang commented on SPARK-13127:


[~gaurav24], looks like you are like me, waiting for this ticket to be worked 
on.
If you would like, help to comment on this thread in developer list to advocate 
to have this issue resolved in Spark 2.3 release
http://apache-spark-developers-list.1001551.n3.nabble.com/Timeline-for-Spark-2-3-td22793.html

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-12-28 Thread Gaurav Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305612#comment-16305612
 ] 

Gaurav Shah commented on SPARK-13127:
-

I am surprised people haven't hit 
https://issues.apache.org/jira/browse/PARQUET-353, I constantly face OOM error 
on a continuous streaming application. Wondering if we would get parquet 1.9.1 
and then upgrade spark to use that. 
https://issues.apache.org/jira/browse/PARQUET-1027

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-11-13 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250512#comment-16250512
 ] 

Dong Jiang commented on SPARK-13127:


[~igozali], I think you are referring to this parquet ticket: 
https://issues.apache.org/jira/browse/PARQUET-686
The parquet ticket indicated the fix is in 1.9.0, so we still need Spark to 
upgrade parquet to 1.9.0
I have examined the parquet file generated by Spark 2.2, the string column 
doesn't have the min/max generated in the footer. I believe it is disabled.

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-11-10 Thread Ivan Gozali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248235#comment-16248235
 ] 

Ivan Gozali commented on SPARK-13127:
-

Hello! I was looking to find more information on Parquet string comparison 
issues and eventually found my way here. I was just curious to see if this 
issue has been resolved by upgrading Parquet to 1.8.2, since that's what the PR 
referenced here seems to suggest? Are there still any benefits to be gained by 
upgrading to Parquet 1.9.0? 

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2016-12-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749118#comment-15749118
 ] 

Apache Spark commented on SPARK-13127:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/16281

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2016-02-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136236#comment-15136236
 ] 

Sean Owen commented on SPARK-13127:
---

[~JustinPihony] I suspect this is a good idea, but whenever someone suggests a 
dependency upgrade, the question is of course: are there incompatible changes? 
is it compatible with other dependencies? does it work with all transitive 
dependencies?

Would you mind opening a PR with the change, which will entail running the 
dependency update scripts to check and declare the changed transitive 
dependencies? and then also review release notes to identify any breaking 
changes we should know about? for 2.0.0 we can tolerate most incompatibilities 
but good to know.

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Justin Pihony
>Priority: Minor
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org