[jira] [Updated] (PARQUET-1301) [C++] Crypto package in parquet-cpp
[ https://issues.apache.org/jira/browse/PARQUET-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Majeti updated PARQUET-1301: --- Fix Version/s: 1.5.0 > [C++] Crypto package in parquet-cpp > --- > > Key: PARQUET-1301 > URL: https://issues.apache.org/jira/browse/PARQUET-1301 > Project: Parquet > Issue Type: Sub-task > Components: parquet-cpp >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > The C++ implementation of basic AES-GCM encryption and decryption -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1301) [C++] Crypto package in parquet-cpp
[ https://issues.apache.org/jira/browse/PARQUET-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1301: Labels: pull-request-available (was: ) > [C++] Crypto package in parquet-cpp > --- > > Key: PARQUET-1301 > URL: https://issues.apache.org/jira/browse/PARQUET-1301 > Project: Parquet > Issue Type: Sub-task > Components: parquet-cpp >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > The C++ implementation of basic AES-GCM encryption and decryption -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1301) [C++] Crypto package in parquet-cpp
[ https://issues.apache.org/jira/browse/PARQUET-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569313#comment-16569313 ] ASF GitHub Bot commented on PARQUET-1301: - majetideepak closed pull request #464: PARQUET-1301: [C++] Crypto package in parquet-cpp URL: https://github.com/apache/parquet-cpp/pull/464 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/src/parquet/types.h b/src/parquet/types.h index 0f4cfc21..aec99656 100644 --- a/src/parquet/types.h +++ b/src/parquet/types.h @@ -113,6 +113,14 @@ struct Compression { enum type { UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, LZ4, ZSTD }; }; +struct Encryption { + enum type { +AES_GCM_V1 = 0, +AES_GCM_CTR_V1 = 1 + }; +}; + + // parquet::PageType struct PageType { enum type { DATA_PAGE, INDEX_PAGE, DICTIONARY_PAGE, DATA_PAGE_V2 }; diff --git a/src/parquet/util/crypto.cc b/src/parquet/util/crypto.cc new file mode 100644 index ..59383d18 --- /dev/null +++ b/src/parquet/util/crypto.cc @@ -0,0 +1,369 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "parquet/util/crypto.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include "parquet/exception.h" + +using parquet::ParquetException; + +namespace parquet_encryption { + +constexpr int aesGcm = 0; +constexpr int aesCtr = 1; +constexpr int encryptType = 0; +constexpr int decryptType = 1; +constexpr int gcmTagLen = 16; +constexpr int gcmIvLen = 12; +constexpr int ctrIvLen = 16; +constexpr int rndMaxBytes = 32; + +#define ENCRYPT_INIT(CTX, ALG)\ + if (1 != EVP_EncryptInit_ex(CTX, ALG, nullptr, nullptr, nullptr)) { \ +throw ParquetException("Couldn't init ALG encryption"); \ + } + +#define DECRYPT_INIT(CTX, ALG)\ + if (1 != EVP_DecryptInit_ex(CTX, ALG, nullptr, nullptr, nullptr)) { \ +throw ParquetException("Couldn't init ALG decryption"); \ + } + +class EvpCipher { + public: + explicit EvpCipher(int cipher, int key_len, int type) { +ctx_ = nullptr; + +if (aesGcm != cipher && aesCtr != cipher) { + std::stringstream ss; + ss << "Wrong cipher: " << cipher; + throw ParquetException(ss.str()); +} + +if (16 != key_len && 24 != key_len && 32 != key_len) { + std::stringstream ss; + ss << "Wrong key length: " << key_len; + throw ParquetException(ss.str()); +} + +if (encryptType != type && decryptType != type) { + std::stringstream ss; + ss << "Wrong cipher type: " << type; + throw ParquetException(ss.str()); +} + +ctx_ = EVP_CIPHER_CTX_new(); +if (nullptr == ctx_) { + throw ParquetException("Couldn't init cipher context"); +} + +if (aesGcm == cipher) { + // Init AES-GCM with specified key length + if (16 == key_len) { +if (encryptType == type) { + ENCRYPT_INIT(ctx_, EVP_aes_128_gcm()); +} else { + DECRYPT_INIT(ctx_, EVP_aes_128_gcm()); +} + } else if (24 == key_len) { +if (encryptType == type) { + ENCRYPT_INIT(ctx_, EVP_aes_192_gcm()); +} else { + DECRYPT_INIT(ctx_, EVP_aes_192_gcm()); +} + } else if (32 == key_len) { +if (encryptType == type) { + ENCRYPT_INIT(ctx_, EVP_aes_256_gcm()); +} else { + DECRYPT_INIT(ctx_, EVP_aes_256_gcm()); +} + } +} else { + // Init AES-CTR with specified key length + if (16 == key_len) { +if (encryptType == type) { + ENCRYPT_INIT(ctx_, EVP_aes_128_ctr()); +} else { + DECRYPT_INIT(ctx_, EVP_aes_128_ctr()); +} + } else if (24 == key_len) { +if (encryptType == type) { + ENCRYPT_INIT(ctx_, EVP_aes_192_ctr()); +} else { + DECRYPT_INIT(ctx_, EVP_aes_192_ctr()); +} + } else if (32 == key_len) { +
[jira] [Commented] (PARQUET-1370) Read consecutive column chunks in a single scan
[ https://issues.apache.org/jira/browse/PARQUET-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569210#comment-16569210 ] Wes McKinney commented on PARQUET-1370: --- I have opened some issues related to buffering / concurrent IO in C++, e.g. https://issues.apache.org/jira/browse/ARROW-501 [~rgruener] In 0.10.0 the pyarrow file handles implement RawIOBase now I don't think it would be to difficult to add a buffering reader to the Parquet hot path with a configurable buffer size. We already have a {{BufferedInputStream}} which may help > Read consecutive column chunks in a single scan > --- > > Key: PARQUET-1370 > URL: https://issues.apache.org/jira/browse/PARQUET-1370 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Robert Gruener >Priority: Major > > Currently parquet-cpp calls for a filesystem scan with every single data page > see > [https://github.com/apache/parquet-cpp/blob/a0d1669cf67b055cd7b724dea04886a0ded53c8f/src/parquet/column_reader.cc#L181] > For remote filesystems this can be very inefficient when reading many small > columns. The java implementation already does this and will read consecutive > column chunks (and the resulting pages) in a single scan see > [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L786] > > This might be a bit difficult to do, as it would require changing a lot of > the code structure but it would certainly be valuable for workloads concerned > with optimal read performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1370) [C++] Read consecutive column chunks in a single scan
[ https://issues.apache.org/jira/browse/PARQUET-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated PARQUET-1370: -- Summary: [C++] Read consecutive column chunks in a single scan (was: Read consecutive column chunks in a single scan) > [C++] Read consecutive column chunks in a single scan > - > > Key: PARQUET-1370 > URL: https://issues.apache.org/jira/browse/PARQUET-1370 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Robert Gruener >Priority: Major > > Currently parquet-cpp calls for a filesystem scan with every single data page > see > [https://github.com/apache/parquet-cpp/blob/a0d1669cf67b055cd7b724dea04886a0ded53c8f/src/parquet/column_reader.cc#L181] > For remote filesystems this can be very inefficient when reading many small > columns. The java implementation already does this and will read consecutive > column chunks (and the resulting pages) in a single scan see > [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L786] > > This might be a bit difficult to do, as it would require changing a lot of > the code structure but it would certainly be valuable for workloads concerned > with optimal read performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)