[jira] [Resolved] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-10-28 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-758. -- Resolution: Fixed > [Format] HALF precision FLOAT Logical t

[jira] [Assigned] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-10-28 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-758: Assignee: Anja Boskovic > [Format] HALF precision FLOAT Logical t

[jira] [Commented] (PARQUET-2340) appendRowGroup will loose pageIndex

2023-08-22 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757396#comment-17757396 ] Gabor Szadovszky commented on PARQUET-2340: --- [~NathanKan], I don't think these methods

[jira] [Resolved] (PARQUET-2318) Implement a tool to list page headers

2023-06-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2318. --- Resolution: Fixed > Implement a tool to list page head

[jira] [Created] (PARQUET-2318) Implement a tool to list page headers

2023-06-27 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2318: - Summary: Implement a tool to list page headers Key: PARQUET-2318 URL: https://issues.apache.org/jira/browse/PARQUET-2318 Project: Parquet Issue

[jira] [Commented] (PARQUET-2317) parquet-format and parquet-format-structures defines Util with inconsitent methods provided

2023-06-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736990#comment-17736990 ] Gabor Szadovszky commented on PARQUET-2317: --- [~wgtmac], Let me summarize the history

[jira] [Comment Edited] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-06-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730988#comment-17730988 ] Gabor Szadovszky edited comment on PARQUET- at 6/9/23 2:40 PM

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-06-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730988#comment-17730988 ] Gabor Szadovszky commented on PARQUET-: --- [~mwish], This is specifically about BOOLEAN

[jira] [Commented] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-06-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730904#comment-17730904 ] Gabor Szadovszky commented on PARQUET-: --- [~apitrou], [~wgtmac], It seems my review

[jira] [Assigned] (PARQUET-2222) [Format] RLE encoding spec incorrect for v2 data pages

2023-06-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-: - Assignee: Gang Wu > [Format] RLE encoding spec incorrect for v2 data pa

[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-06-06 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729630#comment-17729630 ] Gabor Szadovszky commented on PARQUET-758: -- Thanks for your reply, [~anjakefala]! I've

[jira] [Commented] (PARQUET-758) [Format] HALF precision FLOAT Logical type

2023-06-05 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729214#comment-17729214 ] Gabor Szadovszky commented on PARQUET-758: -- Hey everyone, who is interested in the half-float

[jira] [Commented] (PARQUET-2276) ParquetReader reads do not work with Hadoop version 2.8.5

2023-04-18 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713635#comment-17713635 ] Gabor Szadovszky commented on PARQUET-2276: --- I think it is fine to drop support of older

[jira] [Commented] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-17 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701575#comment-17701575 ] Gabor Szadovszky commented on PARQUET-2256: --- [~mwish], would you mind to do some

[jira] [Assigned] (PARQUET-2256) Adding Compression for BloomFilter

2023-03-17 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2256: - Assignee: Xuwei Fu > Adding Compression for BloomFil

[jira] [Commented] (PARQUET-2258) Storing toString fields in FilterPredicate instances can lead to memory pressure

2023-03-17 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701568#comment-17701568 ] Gabor Szadovszky commented on PARQUET-2258: --- Thanks for fixing this, [~abstractdog]! As far

[jira] [Commented] (PARQUET-1690) Integer Overflow of BinaryStatistics#isSmallerThan()

2023-03-17 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17701561#comment-17701561 ] Gabor Szadovszky commented on PARQUET-1690: --- [~humanoid], I don't know/remember

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699732#comment-17699732 ] Gabor Szadovszky commented on PARQUET-2255: --- But we don't build the dictionary for filtering

[jira] [Commented] (PARQUET-2255) BloomFilter and float point is ambiguous

2023-03-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699712#comment-17699712 ] Gabor Szadovszky commented on PARQUET-2255: --- Bloom filters are for searching for exact values

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697510#comment-17697510 ] Gabor Szadovszky commented on PARQUET-2254: --- 1) I think, for creating bloom filters we have

[jira] [Commented] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697301#comment-17697301 ] Gabor Szadovszky commented on PARQUET-2254: --- I think this is a good idea. Meanwhile, it would

[jira] [Assigned] (PARQUET-2254) Build a BloomFilter with a more precise size

2023-03-07 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2254: - Assignee: Mars > Build a BloomFilter with a more precise s

[jira] [Resolved] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-23 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2246. --- Resolution: Fixed > Add short circuit logic to column index fil

[jira] [Assigned] (PARQUET-2246) Add short circuit logic to column index filter

2023-02-23 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2246: - Assignee: Yujiang Zhong > Add short circuit logic to column index fil

[jira] [Resolved] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-22 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2243. --- Resolution: Fixed > Support zstd-jni in DirectCodecFact

[jira] [Assigned] (PARQUET-2247) Fail-fast if CapacityByteArrayOutputStream write overflow

2023-02-22 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2247: - Assignee: dzcxzl (was: Gabor Szadovszky) > Fail-f

[jira] [Resolved] (PARQUET-2247) Fail-fast if CapacityByteArrayOutputStream write overflow

2023-02-22 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2247. --- Resolution: Fixed > Fail-fast if CapacityByteArrayOutputStream write overf

[jira] [Assigned] (PARQUET-2247) Fail-fast if CapacityByteArrayOutputStream write overflow

2023-02-22 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2247: - Assignee: Gabor Szadovszky > Fail-fast if CapacityByteArrayOutputStream wr

[jira] [Resolved] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-21 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2241. --- Resolution: Fixed > ByteStreamSplitDecoder broken in presence of nu

[jira] [Resolved] (PARQUET-2228) ParquetRewriter supports more than one input file

2023-02-21 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2228. --- Resolution: Fixed > ParquetRewriter supports more than one input f

[jira] [Assigned] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2244: - Assignee: Yujiang Zhong > Dictionary filter may skip row-groups incorrec

[jira] [Resolved] (PARQUET-2244) Dictionary filter may skip row-groups incorrectly when evaluating notIn

2023-02-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2244. --- Resolution: Fixed > Dictionary filter may skip row-groups incorrectly w

[jira] [Created] (PARQUET-2243) Support zstd-jni in DirectCodecFactory

2023-02-14 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2243: - Summary: Support zstd-jni in DirectCodecFactory Key: PARQUET-2243 URL: https://issues.apache.org/jira/browse/PARQUET-2243 Project: Parquet Issue

[jira] [Commented] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688363#comment-17688363 ] Gabor Szadovszky commented on PARQUET-2241: --- [~wgtmac], realted to your question about

[jira] [Comment Edited] (PARQUET-2241) ByteStreamSplitDecoder broken in presence of nulls

2023-02-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688363#comment-17688363 ] Gabor Szadovszky edited comment on PARQUET-2241 at 2/14/23 8:37 AM

[jira] [Resolved] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2226. --- Resolution: Fixed > Support merge Bloom Fil

[jira] [Assigned] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2226: - Assignee: miracle > Support merge Bloom Fil

[jira] [Assigned] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2226: - Assignee: (was: miracle) > Support merge Bloom Fil

[jira] [Assigned] (PARQUET-2226) Support merge Bloom Filter

2023-01-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2226: - Assignee: miracle > Support merge Bloom Fil

[jira] [Commented] (PARQUET-1980) Build and test Apache Parquet on ARM64 CPU architecture

2023-01-10 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656754#comment-17656754 ] Gabor Szadovszky commented on PARQUET-1980: --- Perfect. Thank you, [~mgrigorov]! > Bu

[jira] [Reopened] (PARQUET-1980) Build and test Apache Parquet on ARM64 CPU architecture

2023-01-08 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reopened PARQUET-1980: --- [~mgrigorov], PMC just got a note from Apache IT that they are about to "move

[jira] [Commented] (PARQUET-2220) Parquet Filter predicate storing nested string causing OOM's

2022-12-31 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653313#comment-17653313 ] Gabor Szadovszky commented on PARQUET-2220: --- [~abhiSumo304], I agree eagerly storing

[jira] [Assigned] (PARQUET-2159) Parquet bit-packing de/encode optimization

2022-11-25 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2159: - Assignee: Fang-Xie > Parquet bit-packing de/encode optimizat

[jira] [Commented] (PARQUET-2020) Remove deprecated modules

2022-10-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616825#comment-17616825 ] Gabor Szadovszky commented on PARQUET-2020: --- [~Unsta], the module {{parquet-cli}} is meant

[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-10-10 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614907#comment-17614907 ] Gabor Szadovszky commented on PARQUET-1222: --- [~emkornfield], There are a couple of docs

[jira] [Commented] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2022-09-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611398#comment-17611398 ] Gabor Szadovszky commented on PARQUET-1222: --- [~emkornfield], I think we do not need to handle

[jira] [Created] (PARQUET-2182) Handle unknown logical types

2022-08-30 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2182: - Summary: Handle unknown logical types Key: PARQUET-2182 URL: https://issues.apache.org/jira/browse/PARQUET-2182 Project: Parquet Issue Type: Bug

[jira] [Updated] (PARQUET-2094) Handle negative values in page headers

2021-12-20 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-2094: -- External issue ID: CVE-2021-41561 External issue URL: https://cve.mitre.org/cgi

[jira] [Updated] (PARQUET-2106) BinaryComparator should avoid doing ByteBuffer.wrap in the hot-path

2021-12-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-2106: -- Issue Type: Improvement (was: Task) > BinaryComparator should avoid do

[jira] [Assigned] (PARQUET-2106) BinaryComparator should avoid doing ByteBuffer.wrap in the hot-path

2021-12-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2106: - Assignee: Alexey Kudinkin > BinaryComparator should avoid do

[jira] [Resolved] (PARQUET-2107) Travis failures

2021-12-08 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2107. --- Resolution: Fixed > Travis failures > --- > >

[jira] [Created] (PARQUET-2107) Travis failures

2021-12-07 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2107: - Summary: Travis failures Key: PARQUET-2107 URL: https://issues.apache.org/jira/browse/PARQUET-2107 Project: Parquet Issue Type: Bug

[jira] [Commented] (PARQUET-2104) parquet-cli broken in master

2021-11-24 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448483#comment-17448483 ] Gabor Szadovszky commented on PARQUET-2104: --- [~gamaken], I am not sure about a workaround

Re: unable to get rid of NoSuchMethodError with parquet-cli

2021-11-23 Thread Gabor Szadovszky
Hey, I can reproduce the same on master. It seems the same issue happens with older versions as well. I don't know how we did not find it yet. (Or I am making the same mistake as you :) ). Could you please create a jira about it and continue the discussion there? Thanks a lot, Gabor On Tue, Nov

[jira] [Commented] (PARQUET-2103) crypto exception in print toPrettyJSON

2021-11-22 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447455#comment-17447455 ] Gabor Szadovszky commented on PARQUET-2103: --- I think, we need to update

[jira] [Resolved] (PARQUET-2101) Fix wrong descriptions about the default block size

2021-11-02 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2101. --- Resolution: Fixed > Fix wrong descriptions about the default block s

Re: Map Type duplicate keys

2021-10-26 Thread Gabor Szadovszky
Hi Micah, Parquet-MR does not have its own data model (except an example implementation used for unit tests). So it is up to the data model how the values are handled. I think it is possible to store key-value pairs with the same key using the example implementation but there are no such tests. I

[jira] [Updated] (PARQUET-2094) Handle negative values in page headers

2021-09-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-2094: -- Fix Version/s: 1.12.2 1.11.2 > Handle negative values in p

[jira] [Resolved] (PARQUET-2094) Handle negative values in page headers

2021-09-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2094. --- Resolution: Fixed > Handle negative values in page head

[jira] [Resolved] (PARQUET-1968) FilterApi support In predicate

2021-09-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-1968. --- Resolution: Fixed > FilterApi support In predic

[jira] [Assigned] (PARQUET-1968) FilterApi support In predicate

2021-09-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-1968: - Assignee: Huaxin Gao > FilterApi support In predic

[jira] [Resolved] (PARQUET-2096) Upgrade Thrift to 0.15.0

2021-09-28 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2096. --- Resolution: Fixed > Upgrade Thrift to 0.1

[jira] [Assigned] (PARQUET-2096) Upgrade Thrift to 0.15.0

2021-09-28 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2096: - Assignee: Vinoo Ganesh > Upgrade Thrift to 0.1

[jira] [Commented] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-09-28 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421270#comment-17421270 ] Gabor Szadovszky commented on PARQUET-2080: --- [~gershinsky], could you make the doc available

Re: Guidelines for Thrift max message size? (Thrift 0.14+)

2021-09-27 Thread Gabor Szadovszky
Hi Antoine, I do not have too much to add just hate getting no replies on the dev list. Parquet-mr doesn't have a release with thrift 0.14+ yet. (The latest release 1.12.1 went out with 0.13.0.) I don't know how common a >100MB file footer is. Since we read the whole footer at once to memory and

[jira] [Created] (PARQUET-2094) Handle negative values in page headers

2021-09-22 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2094: - Summary: Handle negative values in page headers Key: PARQUET-2094 URL: https://issues.apache.org/jira/browse/PARQUET-2094 Project: Parquet Issue

[jira] [Commented] (PARQUET-118) Provide option to use on-heap buffers for Snappy compression/decompression

2021-09-21 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418202#comment-17418202 ] Gabor Szadovszky commented on PARQUET-118: -- [~MasterDDT], Unfortunately I can only say

[jira] [Commented] (PARQUET-2091) Fix release build error introduced by PARQUET-2043

2021-09-20 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417540#comment-17417540 ] Gabor Szadovszky commented on PARQUET-2091: --- Strange to me because the release command should

[jira] [Commented] (PARQUET-2088) Different created_by field values for application and library

2021-09-15 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415378#comment-17415378 ] Gabor Szadovszky commented on PARQUET-2088: --- parquet-mr automatically fills the {{created_by

[jira] [Commented] (PARQUET-2091) Fix release build error introduced by PARQUET-2043

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414886#comment-17414886 ] Gabor Szadovszky commented on PARQUET-2091: --- [~sha...@uber.com], do you have issues

[jira] [Resolved] (PARQUET-2084) Upgrade Thrift to 0.14.2

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2084. --- Resolution: Fixed > Upgrade Thrift to 0.1

[jira] [Resolved] (PARQUET-2083) Expose getFieldPath from ColumnIO

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2083. --- Resolution: Fixed > Expose getFieldPath from Colum

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Gabor Szadovszky
Thanks for the new RC, Xinli. The content seems correct to me. The checksum and sign are correct. Unit tests pass. My vote is +1 (binding) On Mon, Sep 13, 2021 at 8:11 PM Xinli shang wrote: > Hi everyone, > > > I propose the following RC to be released as the official Apache Parquet > 1.12.1

[jira] [Commented] (PARQUET-2088) Different created_by field values for application and library

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414829#comment-17414829 ] Gabor Szadovszky commented on PARQUET-2088: --- Ah, I see. So, that code part is not about

[jira] [Commented] (PARQUET-2085) Formatting is broken for description of BIT_PACKED

2021-09-14 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414823#comment-17414823 ] Gabor Szadovszky commented on PARQUET-2085: --- [~alexott], I got it now. You are talking about

Re: Concatenation of parquet files

2021-09-14 Thread Gabor Szadovszky
Hi Pau, I guess attachments are not allowed in the apache lists so we cannot see the image. If the two row groups contain the very same data in the same order and encoded with the same encoding, compressed with the same codec I think, they should be the same binary. I am not sure why you have

Re: [VOTE] Release Apache Parquet 1.12.1 RC0

2021-09-13 Thread Gabor Szadovszky
Thanks a lot for working on this, Xinli. Do not forget that you also have a vote :) I have some issues with the content of the release. I would not include the change PARQUET-2043. It is not a bugfix and contains a lot of changes around dependencies. I feel it a bit risky to include it in a patch

[jira] [Resolved] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-09-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2078. --- Resolution: Fixed Since the PR is merged I am resolving this. > Failed to r

[jira] [Commented] (PARQUET-2088) Different created_by field values for application and library

2021-09-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414092#comment-17414092 ] Gabor Szadovszky commented on PARQUET-2088: --- Could you please list what exact features do you

[jira] [Commented] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-09-13 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414053#comment-17414053 ] Gabor Szadovszky commented on PARQUET-2080: --- [~gershinsky], however the original topic

Re: Any Parquet implementations might be impacted by PARQUET-2078

2021-08-30 Thread Gabor Szadovszky
value of ColumnChunk.file_offset at least in cases when the file was written by parquet-mr 1.12.0. I've also created PARQUET-2080 <https://issues.apache.org/jira/browse/PARQUET-2080> to deprecate the field in the format. Regards, Gabor On Fri, Aug 27, 2021 at 11:11 AM Gabor Szadovszky

[jira] [Created] (PARQUET-2080) Deprecate RowGroup.file_offset

2021-08-30 Thread Gabor Szadovszky (Jira)
Gabor Szadovszky created PARQUET-2080: - Summary: Deprecate RowGroup.file_offset Key: PARQUET-2080 URL: https://issues.apache.org/jira/browse/PARQUET-2080 Project: Parquet Issue Type: Bug

[jira] [Assigned] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2078: - Assignee: Nemon Lou > Failed to read parquet file after writing with the s

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406722#comment-17406722 ] Gabor Szadovszky commented on PARQUET-2078: --- [~nemon], you are right, so

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-30 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406621#comment-17406621 ] Gabor Szadovszky commented on PARQUET-2078: --- [~nemon], I am not sure how it would be possible

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-27 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405698#comment-17405698 ] Gabor Szadovszky commented on PARQUET-2078: --- Added the dev list thread link here to keep both

Any Parquet implementations might be impacted by PARQUET-2078

2021-08-27 Thread Gabor Szadovszky
Hi everyone, It turned out that since parquet-mr 1.12.0 in certain conditions we write wrong values into ColumnMetaData.dictionary_page_offset and ColumnChunk.file_offset

[jira] [Comment Edited] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-27 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405677#comment-17405677 ] Gabor Szadovszky edited comment on PARQUET-2078 at 8/27/21, 8:50 AM

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-27 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405677#comment-17405677 ] Gabor Szadovszky commented on PARQUET-2078: --- [~nemon], thanks a lot for the detailed

[jira] [Commented] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-26 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405227#comment-17405227 ] Gabor Szadovszky commented on PARQUET-2078: --- [~nemon], thanks a lot for the investigation

[jira] [Updated] (PARQUET-2078) Failed to read parquet file after writing with the same parquet version

2021-08-26 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-2078: -- Fix Version/s: 1.12.1 1.13.0 > Failed to read parquet file af

[jira] [Commented] (PARQUET-2071) Encryption translation tool

2021-08-23 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403039#comment-17403039 ] Gabor Szadovszky commented on PARQUET-2071: --- [~sha...@uber.com], sure, I am fine with having

Re: 【vulnerability confirmation】parquet-format-structures-1.12.0

2021-08-17 Thread Gabor Szadovszky
Hi, It is required to shade the thrift library into paquet-format-structures because we use thrift to serialize/deserialize the metadata structures in the parquet files. So, you really don't have any way to change it at runtime. If it is urgent you may build your parquet-mr on your own with an

[jira] [Resolved] (PARQUET-2064) Make Range public accessible in RowRanges

2021-08-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2064. --- Resolution: Fixed > Make Range public accessible in RowRan

[jira] [Resolved] (PARQUET-2073) Is there something wrong calculate usedMem in ColumnWriteStoreBase.java

2021-08-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2073. --- Resolution: Fixed > Is there something wrong calculate used

[jira] [Resolved] (PARQUET-2059) Tests require too much memory

2021-08-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2059. --- Resolution: Fixed > Tests require too much mem

[jira] [Resolved] (PARQUET-2043) Fail build for used but not declared direct dependencies

2021-08-16 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2043. --- Resolution: Fixed > Fail build for used but not declared direct dependenc

[jira] [Resolved] (PARQUET-2063) Remove Compile Warnings from MemoryManager

2021-08-10 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky resolved PARQUET-2063. --- Resolution: Fixed > Remove Compile Warnings from MemoryMana

[jira] [Commented] (PARQUET-2074) Upgrade to JDK 9+

2021-08-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396113#comment-17396113 ] Gabor Szadovszky commented on PARQUET-2074: --- [~belugabehr], it sounds good to me but also

[jira] [Assigned] (PARQUET-2073) Is there something wrong calculate usedMem in ColumnWriteStoreBase.java

2021-08-09 Thread Gabor Szadovszky (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky reassigned PARQUET-2073: - Assignee: JiangYang > Is there something wrong calculate used

  1   2   3   4   5   6   7   8   9   10   >