[jira] [Created] (PARQUET-2275) Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-2275:
--

 Summary: Upgrade `cyclonedx-maven-plugin` to 2.7.6
 Key: PARQUET-2275
 URL: https://issues.apache.org/jira/browse/PARQUET-2275
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.13.1
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705962#comment-17705962
 ] 

Dongjoon Hyun commented on PARQUET-2224:


To [~ste...@apache.org], I don't think we are in the right place to discuss 
that because we are in the Apache Parquet community now which has a successful 
RC release with SBOM. Like Apache ORC/Zookeeper community, each communities are 
developing and maintaining their SBOM features like the following.
- Apache ORC: 
https://repo1.maven.org/maven2/org/apache/orc/orc-core/1.8.3/orc-core-1.8.3-cyclonedx.json
- Apache Zookeeper: 
https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.8.1/zookeeper-3.8.1-cyclonedx.json

Just for the answer to your questions, Apache Spark and ORC community never 
claim that we need to support all Maven versions (and its plugin combinations) 
because we had experienced many Maven and plugins bugs already. It's an 
infeasible goal.

> Publish SBOM artifacts
> --
>
> Key: PARQUET-2224
> URL: https://issues.apache.org/jira/browse/PARQUET-2224
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705697#comment-17705697
 ] 

Dongjoon Hyun commented on PARQUET-2224:


As you know, for SPARK-42380, we verified that it's Maven and its plugin 
combination issues and we avoid it by pinning Maven version and plugin versions.
Thus, we don't think that's Apache Spark issue and will re-try when we can 
verify that Maven and its plugins work fine in the future.

> Publish SBOM artifacts
> --
>
> Key: PARQUET-2224
> URL: https://issues.apache.org/jira/browse/PARQUET-2224
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705436#comment-17705436
 ] 

Dongjoon Hyun commented on PARQUET-2224:


Never mind. I found it Apache Parquet 1.12.4 RC artifacts. It looks good to me. 
Thank you, [~wgtmac].

- 
https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-common/1.12.4/parquet-common-1.12.4-cyclonedx.json

> Publish SBOM artifacts
> --
>
> Key: PARQUET-2224
> URL: https://issues.apache.org/jira/browse/PARQUET-2224
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705433#comment-17705433
 ] 

Dongjoon Hyun commented on PARQUET-2224:


BTW, [~wgtmac]. What is the `Fixed Version` of this issue? I want to check the 
released artifacts.

> Publish SBOM artifacts
> --
>
> Key: PARQUET-2224
> URL: https://issues.apache.org/jira/browse/PARQUET-2224
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-27 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705430#comment-17705430
 ] 

Dongjoon Hyun commented on PARQUET-2224:


Apache Maven and its plugin eco-systems are also one of the open source 
projects which have many issues.

FYI, Apache ORC 1.7.8 and 1.8.3 also have no issue, [~ste...@apache.org]. 
- 
https://repo1.maven.org/maven2/org/apache/orc/orc-core/1.7.8/orc-core-1.7.8-cyclonedx.json
- 
https://repo1.maven.org/maven2/org/apache/orc/orc-core/1.8.3/orc-core-1.8.3-cyclonedx.json

> Publish SBOM artifacts
> --
>
> Key: PARQUET-2224
> URL: https://issues.apache.org/jira/browse/PARQUET-2224
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2224) Publish SBOM artifacts

2023-03-26 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705142#comment-17705142
 ] 

Dongjoon Hyun commented on PARQUET-2224:


Thank you for resolving this, [~wgtmac]. Could you set the assignee field too?

> Publish SBOM artifacts
> --
>
> Key: PARQUET-2224
> URL: https://issues.apache.org/jira/browse/PARQUET-2224
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PARQUET-2224) Publish SBOM artifacts

2023-01-05 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-2224:
--

 Summary: Publish SBOM artifacts
 Key: PARQUET-2224
 URL: https://issues.apache.org/jira/browse/PARQUET-2224
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.13.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PARQUET-2044) Enable ZSTD buffer pool by default

2021-05-05 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-2044:
--

 Summary: Enable ZSTD buffer pool by default
 Key: PARQUET-2044
 URL: https://issues.apache.org/jira/browse/PARQUET-2044
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.13.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-2022) ZstdDecompressorStream should close `zstdInputStream`

2021-04-11 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-2022:
--

 Summary: ZstdDecompressorStream should close `zstdInputStream`
 Key: PARQUET-2022
 URL: https://issues.apache.org/jira/browse/PARQUET-2022
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.12.0
Reporter: Dongjoon Hyun


`ZstdDecompressorStream` should close its resource because 
`CompressionInputStream.close` closes only the inter stream.

{code}
public class ZstdDecompressorStream extends CompressionInputStream {

  private ZstdInputStream zstdInputStream;

  public ZstdDecompressorStream(InputStream stream) throws IOException {
super(stream);
zstdInputStream = new ZstdInputStream(stream);
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2021-04-04 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314618#comment-17314618
 ] 

Dongjoon Hyun commented on PARQUET-1143:


Hi, [~rdblue]. Could you set the Fix Version, please?

> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1994) Upgrade ZSTD JNI to 1.4.9-1

2021-03-08 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-1994:
--

 Summary: Upgrade ZSTD JNI to 1.4.9-1
 Key: PARQUET-1994
 URL: https://issues.apache.org/jira/browse/PARQUET-1994
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.12.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1988) Upgrade to ZSTD 1.4.8-6

2021-02-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated PARQUET-1988:
---
Summary: Upgrade to ZSTD 1.4.8-6  (was: Upgrade to ZSTD 1.4.8-5)

> Upgrade to ZSTD 1.4.8-6
> ---
>
> Key: PARQUET-1988
> URL: https://issues.apache.org/jira/browse/PARQUET-1988
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1988) Upgrade to ZSTD 1.4.8-5

2021-02-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-1988:
--

 Summary: Upgrade to ZSTD 1.4.8-5
 Key: PARQUET-1988
 URL: https://issues.apache.org/jira/browse/PARQUET-1988
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.12.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1809) Add new APIs for nested predicate pushdown

2021-02-14 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284525#comment-17284525
 ] 

Dongjoon Hyun commented on PARQUET-1809:


Hi, All.

According to the discussion history and the long pause (one year), it looks 
like being rejected due to the potential collision. 

Since Apache Parquet 1.12.0 RCs also do not have this, is this a final 
conclusion?

>  Add new APIs for nested predicate pushdown
> ---
>
> Key: PARQUET-1809
> URL: https://issues.apache.org/jira/browse/PARQUET-1809
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: DB Tsai
>Priority: Major
>
> Currently, Parquet's *org.apache.parquet.filter2.predicate.FilterApi* is 
> using *dot* to split the column name into multi-parts of nested fields. The 
> drawback is that this causes issues when the field name contains *dot*.
> The new APIs that will be added will take array of string directly for 
> multi-parts of nested fields, so no confusion as using *dot* as a separator.  
> See https://github.com/apache/spark/pull/27728 and [SPARK-17636] for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1973) Support ZSTD JNI BufferPool

2021-02-04 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-1973:
--

 Summary: Support ZSTD JNI BufferPool
 Key: PARQUET-1973
 URL: https://issues.apache.org/jira/browse/PARQUET-1973
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.12.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1967) Upgrade Zstd-jni to 1.4.8-3

2021-02-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated PARQUET-1967:
---
Summary: Upgrade Zstd-jni to 1.4.8-3  (was: Upgrade Zstd-jni to 1.4.8-2)

> Upgrade Zstd-jni to 1.4.8-3
> ---
>
> Key: PARQUET-1967
> URL: https://issues.apache.org/jira/browse/PARQUET-1967
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cli, parquet-mr
>Affects Versions: 1.13.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1967) Upgrade Zstd-jni to 1.4.8-2

2021-01-29 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created PARQUET-1967:
--

 Summary: Upgrade Zstd-jni to 1.4.8-2
 Key: PARQUET-1967
 URL: https://issues.apache.org/jira/browse/PARQUET-1967
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-cli, parquet-mr
Affects Versions: 1.13.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1946) Parquet File not readable by Google big query (works with Spark)

2020-11-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238881#comment-17238881
 ] 

Dongjoon Hyun commented on PARQUET-1946:


BTW, Spark 3.0/2.4 use Parquet 1.10.1.

> Parquet File not readable by Google big query (works with Spark)
> 
>
> Key: PARQUET-1946
> URL: https://issues.apache.org/jira/browse/PARQUET-1946
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro
>Affects Versions: 1.11.0
> Environment: [secor|https://github.com/pinterest/secor]
> GCP 
> Big Query google cloud
> Parquet writer 1.11
>  
>  
>Reporter: Richard Grossman
>Priority: Blocker
>
> Hi
> I'm trying to write Avro message to parquet on GCS. These parquet should be 
> query by big query engine who support now parquet.
> To do this I'm using Secor a kafka log persister tools from pinterest.
> First I didn't notice any problem using Spark the same file can be read 
> without any problem all is working perfect.
> Now using Big query bring and error like this :
> Error while reading table: , error message: Read less values than expected: 
> Actual: 29333, Expected: 33827. Row group: 0, Column: , File:
> After investigation using parquet-tools I figured out that in parquet there 
> is metadata regarding number total of unique values for each columns eg from 
> parquet-tools
> page 0: DLE:BIT_PACKED RLE:BIT_PACKED [more]... CRC:[PAGE CORRUPT] VC:547
> So the VC value indicate that the total number of unique value in the file is 
> 547.
> Now when make a spark SQL like SELECT DISTINCT COUNT(column) FROM ... I get 
> 421 mean this number in the metadata is incorrect.
> So what is not a problem for Spark to read is a blocking problem for Big data 
> because it relies on these values and found it incorrect.
> Is there any configuration of the writer that can prevent these errors in the 
> metadata ? Or is it a normal behavior that should be a problem ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1946) Parquet File not readable by Google big query (works with Spark)

2020-11-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238879#comment-17238879
 ] 

Dongjoon Hyun commented on PARQUET-1946:


Hi, [~richiesgr] This is only for Parquet 1.11.0 right? Did you try to use 
Parquet 1.11.1?

> Parquet File not readable by Google big query (works with Spark)
> 
>
> Key: PARQUET-1946
> URL: https://issues.apache.org/jira/browse/PARQUET-1946
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro
>Affects Versions: 1.11.0
> Environment: [secor|https://github.com/pinterest/secor]
> GCP 
> Big Query google cloud
> Parquet writer 1.11
>  
>  
>Reporter: Richard Grossman
>Priority: Blocker
>
> Hi
> I'm trying to write Avro message to parquet on GCS. These parquet should be 
> query by big query engine who support now parquet.
> To do this I'm using Secor a kafka log persister tools from pinterest.
> First I didn't notice any problem using Spark the same file can be read 
> without any problem all is working perfect.
> Now using Big query bring and error like this :
> Error while reading table: , error message: Read less values than expected: 
> Actual: 29333, Expected: 33827. Row group: 0, Column: , File:
> After investigation using parquet-tools I figured out that in parquet there 
> is metadata regarding number total of unique values for each columns eg from 
> parquet-tools
> page 0: DLE:BIT_PACKED RLE:BIT_PACKED [more]... CRC:[PAGE CORRUPT] VC:547
> So the VC value indicate that the total number of unique value in the file is 
> 547.
> Now when make a spark SQL like SELECT DISTINCT COUNT(column) FROM ... I get 
> 421 mean this number in the metadata is incorrect.
> So what is not a problem for Spark to read is a blocking problem for Big data 
> because it relies on these values and found it incorrect.
> Is there any configuration of the writer that can prevent these errors in the 
> metadata ? Or is it a normal behavior that should be a problem ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1774) Release parquet 1.11.1

2020-01-22 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021327#comment-17021327
 ] 

Dongjoon Hyun commented on PARQUET-1774:


[~gszadovszky]. Could you add `is blocked by` links to those required issues?

> Release parquet 1.11.1
> --
>
> Key: PARQUET-1774
> URL: https://issues.apache.org/jira/browse/PARQUET-1774
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.11.1
>
>
> Some issues are discovered during the migration to the parquet-mr release 
> 1.11.0 in Spark. These issues are to be fixed and release in the minor 
> release 1.11.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1520) Update README to use correct build and version info

2019-01-31 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757672#comment-16757672
 ] 

Dongjoon Hyun commented on PARQUET-1520:


Thanks!

> Update README to use correct build and version info
> ---
>
> Key: PARQUET-1520
> URL: https://issues.apache.org/jira/browse/PARQUET-1520
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1520) Update README to use correct build and version info

2019-01-31 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757659#comment-16757659
 ] 

Dongjoon Hyun commented on PARQUET-1520:


Hi, [~rdblue].
Could you resolve this issue (maybe for 1.10.2) since it's merged to 
`branch-1.10.x` now?

> Update README to use correct build and version info
> ---
>
> Key: PARQUET-1520
> URL: https://issues.apache.org/jira/browse/PARQUET-1520
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1520) Update README to use correct build and version info

2019-01-30 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created PARQUET-1520:
--

 Summary: Update README to use correct build and version info
 Key: PARQUET-1520
 URL: https://issues.apache.org/jira/browse/PARQUET-1520
 Project: Parquet
  Issue Type: Bug
Affects Versions: 1.10.1
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1510) Dictionary filter skips null values when evaluating not-equals.

2019-01-25 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752623#comment-16752623
 ] 

Dongjoon Hyun commented on PARQUET-1510:


Ur, [~rdblue], is this `Improvement` instead of `Bug`?

> Dictionary filter skips null values when evaluating not-equals.
> ---
>
> Key: PARQUET-1510
> URL: https://issues.apache.org/jira/browse/PARQUET-1510
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Ryan Blue
>Priority: Major
>  Labels: pull-request-available
>
> This was discovered in Spark, see SPARK-26677. From the Spark PR:
> {code}
> // Repeat the values to get dictionary encoding.
> Seq(Some("A"), Some("A"), 
> None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
> spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
> +-+
> |value|
> +-+
> +-+
> {code}
> {code}
> // Use plain encoding.
> Seq(Some("A"), 
> None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
> spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
> +-+
> |value|
> +-+
> | null|
> +-+
> {code}
> This is a correctness issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)