Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3088687770 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3088494835 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3088437997 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082252947 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082234794 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082116424 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082108963 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3076882721 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3076478852 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2209052237
##
cpp/velox/compute/iceberg/IcebergWriter.cc:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "IcebergWriter.h"
+
+#include "IcebergPartitionSpec.pb.h"
+#include "compute/ProtobufUtils.h"
+#include "compute/iceberg/IcebergFormat.h"
+#include "utils/ConfigExtractor.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+#include "velox/connectors/hive/iceberg/IcebergDeleteFile.h"
+
+using namespace facebook::velox;
+using namespace facebook::velox::connector::hive;
+using namespace facebook::velox::connector::hive::iceberg;
+namespace {
+
+std::shared_ptr makeLocationHandle(
+std::string targetDirectory,
+std::optional writeDirectory = std::nullopt,
+connector::hive::LocationHandle::TableType tableType =
connector::hive::LocationHandle::TableType::kExisting) {
+ return;
Review Comment:
One simple refactor commit breaks the compile, the PR has been tested, and
wait for PR to merge to native side
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3076474847 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
avevad commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2200827686
##
cpp/velox/compute/iceberg/IcebergWriter.cc:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "IcebergWriter.h"
+
+#include "IcebergPartitionSpec.pb.h"
+#include "compute/ProtobufUtils.h"
+#include "compute/iceberg/IcebergFormat.h"
+#include "utils/ConfigExtractor.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+#include "velox/connectors/hive/iceberg/IcebergDeleteFile.h"
+
+using namespace facebook::velox;
+using namespace facebook::velox::connector::hive;
+using namespace facebook::velox::connector::hive::iceberg;
+namespace {
+
+std::shared_ptr makeLocationHandle(
+std::string targetDirectory,
+std::optional writeDirectory = std::nullopt,
+connector::hive::LocationHandle::TableType tableType =
connector::hive::LocationHandle::TableType::kExisting) {
+ return;
Review Comment:
Return what? Does this PR even compile? I keep seeing syntax errors here and
there, but I don't really understand how can one write such amount of code
without compiling it
##
cpp/velox/compute/iceberg/IcebergWriter.cc:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "IcebergWriter.h"
+
+#include "IcebergPartitionSpec.pb.h"
+#include "compute/ProtobufUtils.h"
+#include "compute/iceberg/IcebergFormat.h"
+#include "utils/ConfigExtractor.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+#include "velox/connectors/hive/iceberg/IcebergDeleteFile.h"
+
+using namespace facebook::velox;
+using namespace facebook::velox::connector::hive;
+using namespace facebook::velox::connector::hive::iceberg;
+namespace {
+
+std::shared_ptr makeLocationHandle(
+std::string targetDirectory,
+std::optional writeDirectory = std::nullopt,
+connector::hive::LocationHandle::TableType tableType =
connector::hive::LocationHandle::TableType::kExisting) {
+ return;
+}
+
+std::shared_ptr createIcebergInsertTableHandle(
+const RowTypePtr& outputRowType,
+const std::string& outputDirectoryPath,
+dwio::common::FileFormat fileFormat,
+facebook::velox::common::CompressionKind compressionKind,
+std::shared_ptr spec) {
+ std::cout <<"output directory" << outputDirectoryPath << std::endl;
+ std::vector>
columnHandles;
+
+ std::vector columnNames = outputRowType->names();
+ std::vector columnTypes = outputRowType->children();
+ std::vector partitionColumns;
+ partitionColumns.reserve(spec->fields.size());
+ for (const auto& field : spec->fields) {
+partitionColumns.push_back(field.name);
+ }
+ for (auto i = 0; i < columnNames.size(); ++i) {
+if (std::find(partitionColumns.begin(), partitionColumns.end(),
columnNames[i]) != partitionColumns.end()) {
+ columnHandles.push_back(
+std::make_shared(
+columnNames.at(i),
+connector::hive::HiveColumnHandle::ColumnType::kPartitionKey,
+columnTypes.at(i),
+columnTypes.at(i)));
+} else {
+ columnHandles.push_back(
+std::make_shared(
+columnNames.at(i),
+connector::hive::HiveColumnHandle::ColumnType::kRegular,
+columnTypes.at(i),
+columnTypes.at(i)));
+ }
Review Comment:
Looks like the '}' of else-clause is absent
--
This is an automated message from the Apache Git Servic
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3045827380 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3045060451 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3043406248 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3043404550 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3043301911 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3022214636 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3022108354 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2176517207
##
.github/workflows/velox_backend_enhanced_features.yml:
##
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: Velox backend Github Runner (Enhanced Features)
+
+on:
+ pull_request:
+paths:
+ - '.github/workflows/velox_backend_enhanced_features.yml'
+ - 'pom.xml'
+ - 'backends-velox/**'
+ - 'gluten-uniffle/**'
+ - 'gluten-celeborn/**'
+ - 'gluten-ras/**'
+ - 'gluten-core/**'
+ - 'gluten-substrait/**'
+ - 'gluten-arrow/**'
+ - 'gluten-delta/**'
+ - 'gluten-iceberg/**'
+ - 'gluten-hudi/**'
+ - 'gluten-ut/**'
+ - 'shims/**'
+ - 'tools/gluten-it/**'
+ - 'ep/build-velox/**'
+ - 'cpp/**'
+ - 'dev/**'
+
+env:
+ ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
+ MVN_CMD: 'mvn -ntp'
+ WGET_CMD: 'wget -nv'
+ SETUP: 'bash .github/workflows/util/setup_helper.sh'
+ CCACHE_DIR: "${{ github.workspace }}/.ccache"
+ # for JDK17 unit tests
+ EXTRA_FLAGS: "-XX:+IgnoreUnrecognizedVMOptions
+--add-opens=java.base/java.lang=ALL-UNNAMED
+--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
+--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
+--add-opens=java.base/java.io=ALL-UNNAMED
+--add-opens=java.base/java.net=ALL-UNNAMED
+--add-opens=java.base/java.nio=ALL-UNNAMED
+--add-opens=java.base/java.util=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
+--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
+--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
+--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
+--add-opens=java.base/sun.security.action=ALL-UNNAMED
+--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
+-Djdk.reflect.useDirectMethodHandle=false
+-Dio.netty.tryReflectionSetAccessible=true"
+
+concurrency:
+ group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{
github.workflow }}
+ cancel-in-progress: true
+
+jobs:
+ build-native-lib-centos-7:
+runs-on: ubuntu-22.04
+steps:
+ - uses: actions/checkout@v4
+ - name: Get Ccache
+uses: actions/cache/restore@v4
+with:
+ path: '${{ env.CCACHE_DIR }}'
+ key: ccache-enhanced-centos7-release-default-${{github.sha}}
+ restore-keys: |
+ccache-enhanced-centos7-release-default
+ - name: Build Gluten native libraries
+run: |
+ docker pull apache/gluten:vcpkg-centos-7
+ docker run -v $GITHUB_WORKSPACE:/work -w /work
apache/gluten:vcpkg-centos-7 bash -c "
+set -e
+yum install tzdata -y
+df -a
+cd /work
+export CCACHE_DIR=/work/.ccache
+mkdir -p /work/.ccache
+bash dev/ci-velox-buildstatic-centos-7-enhanced-features.sh
+ccache -s
+mkdir -p /work/.m2/repository/org/apache/arrow/
+cp -r /root/.m2/repository/org/apache/arrow/*
/work/.m2/repository/org/apache/arrow/
+ "
+
+ - name: "Save ccache"
+uses: actions/cache/save@v4
+id: ccache
+with:
+ path: '${{ env.CCACHE_DIR }}'
+ key: ccache-enhanced-centos7-release-default-${{github.sha}}
+ - uses: actions/upload-artifact@v4
+with:
+ name: velox-native-lib-enhanced-centos-7-${{github.sha}}
+ path: ./cpp/build/releases/
+ if-no-files-found: error
+ - uses: actions/upload-artifact@v4
+with:
+ name: arrow-jars-enhanced-centos-7-${{github.sha}}
+ path: .m2/repository/org/apache/arrow/
+ if-no-files-found: error
+
+ spark-test-spark34:
Review Comment:
The spark 34 uses iceberg 1.7.1 version while spark3 uses iceberg 1.5.0
iceberg version which causes version mismatch, the reflection to get
writeProperty from SparkWrite will failed in the test.
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2176406267 ## ep/build-velox/src/get_velox.sh: ## @@ -20,8 +20,8 @@ VELOX_REPO=https://github.com/oap-project/velox.git VELOX_BRANCH=2025_06_25 VELOX_HOME="" RUN_SETUP_SCRIPT=ON -VELOX_ENHANCED_REPO=https://github.com/oap-project/velox.git -VELOX_ENHANCED_BRANCH=2025_06_24 +VELOX_ENHANCED_REPO=https://github.com/jinchengchenghh/velox.git +VELOX_ENHANCED_BRANCH=2025_06_25 Review Comment: No, I will merge this PR after the velox PR is merged to ibm/velox, it will be soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2176295082
##
gluten-core/src/main/resources/org/apache/gluten/proto/IcebergPartitionSpec.proto:
##
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: Apache-2.0
+syntax = "proto3";
+
+package gluten;
+
+option java_package = "org.apache.gluten.proto";
+option java_multiple_files = true;
+
+enum TransformType {
+ IDENTITY = 0;
+ YEAR = 1;
+ MONTH = 2;
+ DAY = 3;
+ HOUR = 4;
+ BUCKET = 5;
+ TRUNCATE = 6;
+}
+
+message IcebergPartitionField {
+ string name = 1;
+ TransformType transform = 2;
+ optional int32 parameter = 3; // Optional parameter for transform config
+}
+
+message IcebergPartitionSpec {
+ int32 spec_id = 1; // Field name uses snake_case per protobuf conventions
+ repeated IcebergPartitionField fields = 2;
+}
Review Comment:
Possible to move to gluten-substrait or backend-velox? Thanks!
I understand it doesn't belong to the substrait protos but still it doesn't
seem general to place here.
##
dev/builddeps-veloxbe.sh:
##
@@ -7,7 +7,7 @@
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
-#http://www.apache.org/licenses/LICENSE-2.0
+#http://www.apache.org/licenses/LICENSE-2.0C
Review Comment:
is this intentional?
##
ep/build-velox/src/get_velox.sh:
##
@@ -20,8 +20,8 @@ VELOX_REPO=https://github.com/oap-project/velox.git
VELOX_BRANCH=2025_06_25
VELOX_HOME=""
RUN_SETUP_SCRIPT=ON
-VELOX_ENHANCED_REPO=https://github.com/oap-project/velox.git
-VELOX_ENHANCED_BRANCH=2025_06_24
+VELOX_ENHANCED_REPO=https://github.com/jinchengchenghh/velox.git
+VELOX_ENHANCED_BRANCH=2025_06_25
Review Comment:
I have no strong opinion on this, but as long as there are no objections
from everybody.
Are you going to change the repo to `ibm/velox` right in the next PR?
##
.github/workflows/velox_backend_enhanced_features.yml:
##
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: Velox backend Github Runner (Enhanced Features)
+
+on:
+ pull_request:
+paths:
+ - '.github/workflows/velox_backend_enhanced_features.yml'
+ - 'pom.xml'
+ - 'backends-velox/**'
+ - 'gluten-uniffle/**'
+ - 'gluten-celeborn/**'
+ - 'gluten-ras/**'
+ - 'gluten-core/**'
+ - 'gluten-substrait/**'
+ - 'gluten-arrow/**'
+ - 'gluten-delta/**'
+ - 'gluten-iceberg/**'
+ - 'gluten-hudi/**'
+ - 'gluten-ut/**'
+ - 'shims/**'
+ - 'tools/gluten-it/**'
+ - 'ep/build-velox/**'
+ - 'cpp/**'
+ - 'dev/**'
+
+env:
+ ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
+ MVN_CMD: 'mvn -ntp'
+ WGET_CMD: 'wget -nv'
+ SETUP: 'bash .github/workflows/util/setup_helper.sh'
+ CCACHE_DIR: "${{ github.workspace }}/.ccache"
+ # for JDK17 unit tests
+ EXTRA_FLAGS: "-XX:+IgnoreUnrecognizedVMOptions
+--add-opens=java.base/java.lang=ALL-UNNAMED
+--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
+--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
+--add-opens=java.base/java.io=ALL-UNNAMED
+--add-opens=java.base/java.net=ALL-UNNAMED
+--add-opens=java.base/java.nio=ALL-UNNAMED
+--add-opens=java.base/java.util=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
+--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
+--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
+--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
+--add-opens=java.base/sun.security.action=ALL-UNNAMED
+--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
+-Djdk.reflect.useDirectMethodHandle=false
+-Dio.netty.tryReflectionSetAccessible=true"
+
+concurrency:
+ group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{
github.workflow }}
+ cancel-in-progress: true
+
+jobs:
+ build-native-lib-centos-7:
+runs-on: ubuntu-22.0
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3018163650 Could you help review again?Thanks! The first velox PR will be merged to ibm/velox after CI ready. After then, I will update the enhanced frature branch to ibm velox main. @zhztheplayer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3006957045 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3006849656 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3003691205 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2986863248 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2986741821 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2134919351
##
cpp/velox/compute/iceberg/IcebergWriter.h:
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+
+#include "compute/iceberg/IcebergFormat.h"
+#include "memory/VeloxColumnarBatch.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+
+namespace gluten {
+
+class IcebergWriter {
Review Comment:
The API is different, for function commit vs close return type, and
initialize arguments, Iceberg datasource needs partition spec, now it does not
include in this PR but will be supported later. And VeloxDataSource onlys
supports parquet, so it does not have the argument format, and the
compressionKind gets from iceberg TableProperty not sparkConfs, so much
difference makes it a separate class.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2134909382
##
gluten-substrait/src/main/scala/org/apache/spark/sql/datasources/v2/AppendColumnarBatchDataExec.scala:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.datasources.v2
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+
+import org.apache.spark.TaskContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.connector.write._
+import org.apache.spark.sql.execution.datasources.v2.StreamWriterCommitProgress
+import org.apache.spark.sql.execution.metric.{CustomMetrics, SQLMetric}
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.Utils
+
+case class DataWritingColumnarBatchSparkTaskResult(
+numRows: Long,
Review Comment:
In spark, it is WriteToDataSourceV2Exec.scala, we may need to rename
`AppendColumnarBatchDataExec.scala` to `ColumnarWriteToDataSourceV2Exec.scala`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2134903421
##
gluten-substrait/src/main/scala/org/apache/gluten/backendsapi/BackendSettingsApi.scala:
##
@@ -158,4 +158,6 @@ trait BackendSettingsApi {
def supportIcebergEqualityDeleteRead(): Boolean = true
+ def supportAppendDataExec(): Boolean = false
Review Comment:
I suppose CH backends do not support AppendDataExec operator
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2128389033
##
gluten-substrait/src/main/java/org/apache/gluten/connector/write/ColumnarBatchDataWriterFactory.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.connector.write.DataWriter;
+import org.apache.spark.sql.connector.write.PhysicalWriteInfo;
+import org.apache.spark.sql.vectorized.ColumnarBatch;
+
+import java.io.Serializable;
+
+/**
+ * A factory of {@link DataWriter} returned by {@link
+ * BatchWrite#createBatchWriterFactory(PhysicalWriteInfo)}, which is
responsible for creating and
+ * initializing the actual data writer at executor side.
+ *
+ * Note that, the writer factory will be serialized and sent to executors,
then the data writer
+ * will be created on executors and do the actual writing. So this interface
must be serializable
+ * and {@link DataWriter} doesn't need to be.
+ *
+ * @since 3.0.0
+ */
Review Comment:
Let's rephrase the comments. Maybe we just convey that it's a companion
interface with Spark's row-based version.
##
gluten-iceberg/src/main/java/org/apache/gluten/connector/write/ColumnarBatchWrite.java:
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write;
+
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.connector.write.DataWriterFactory;
+import org.apache.spark.sql.connector.write.PhysicalWriteInfo;
+
+public abstract class ColumnarBatchWrite implements BatchWrite {
+ @Override
+ public DataWriterFactory createBatchWriterFactory(PhysicalWriteInfo info) {
+throw new UnsupportedOperationException();
+ }
+
+ public ColumnarDataWriterFactory
createColumnarBatchWriterFactory(PhysicalWriteInfo info) {
+throw new UnsupportedOperationException();
+ }
+}
Review Comment:
Could move the API to `gluten-substrait`?
##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+import org.apache.gluten.extension.columnar.transition.Convention
+import org.apache.gluten.extension.columnar.transition.Convention.RowType
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2845156876 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070411767
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+ format: Integer, directory: String, codec:
String)
+ extends ColumnarBatchDataWriterFactory {
+
+ /**
+ * Returns a data writer to do the actual writing work. Note that, Spark
will reuse the same data
+ * object instance when sending data to the data writer, for better
performance. Data writers
+ * are responsible for defensive copies if necessary, e.g. copy the data
before buffer it in a
+ * list.
+ *
+ * If this method fails (by throwing an exception), the corresponding Spark
write task would fail
+ * and get retried until hitting the maximum retry times.
+ *
+ */
+ override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory,
codec)
Review Comment:
@jinchengchenghh I had opened this
https://github.com/apache/incubator-gluten/pull/9478 either. The reason is
exactly the same with you have found.
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+ format: Integer, directory: String, codec:
String)
+ extends ColumnarBatchDataWriterFactory {
+
+ /**
+ * Returns a data writer to do the actual writing work. Note that, Spark
will reuse the same data
+ * object instance when sending data to the data writer, for better
performance. Data writers
+ * are responsible for defensive copies if necessary, e.g. copy the data
before buffer it in a
+ * list.
+ *
+ * If this method fails (by throwing an exception), the corresponding Spark
write task would fail
+ * and get retried until hitting the maximum retry times.
+ *
+ */
+ override def createWriter():
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070411767
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+ format: Integer, directory: String, codec:
String)
+ extends ColumnarBatchDataWriterFactory {
+
+ /**
+ * Returns a data writer to do the actual writing work. Note that, Spark
will reuse the same data
+ * object instance when sending data to the data writer, for better
performance. Data writers
+ * are responsible for defensive copies if necessary, e.g. copy the data
before buffer it in a
+ * list.
+ *
+ * If this method fails (by throwing an exception), the corresponding Spark
write task would fail
+ * and get retried until hitting the maximum retry times.
+ *
+ */
+ override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory,
codec)
Review Comment:
@jinchengchenghh I had opened this
https://github.com/apache/incubator-gluten/pull/9478 either
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070401492
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+ format: Integer, directory: String, codec:
String)
+ extends ColumnarBatchDataWriterFactory {
+
+ /**
+ * Returns a data writer to do the actual writing work. Note that, Spark
will reuse the same data
+ * object instance when sending data to the data writer, for better
performance. Data writers
+ * are responsible for defensive copies if necessary, e.g. copy the data
before buffer it in a
+ * list.
+ *
+ * If this method fails (by throwing an exception), the corresponding Spark
write task would fail
+ * and get retried until hitting the maximum retry times.
+ *
+ */
+ override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory,
codec)
Review Comment:
Because the name src-iceberg is not included in the plugin, I will try to
fix it.
https://github.com/diffplug/spotless/blob/main/plugin-maven/README.md#scala
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070391038
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+ format: Integer, directory: String, codec:
String)
+ extends ColumnarBatchDataWriterFactory {
+
+ /**
+ * Returns a data writer to do the actual writing work. Note that, Spark
will reuse the same data
+ * object instance when sending data to the data writer, for better
performance. Data writers
+ * are responsible for defensive copies if necessary, e.g. copy the data
before buffer it in a
+ * list.
+ *
+ * If this method fails (by throwing an exception), the corresponding Spark
write task would fail
+ * and get retried until hitting the maximum retry times.
+ *
+ */
+ override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory,
codec)
Review Comment:
com.diffplug.spotless:spotless-maven-plugin:2.27.2:check can check the
module gluten-iceberg, but cannot check the backends-velox/src/iceberg
introduced by `build-helper-maven-plugin`, scala style plugin can work well
with src-iceberg
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2842292139 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068782744 ## cpp/CMakeLists.txt: ## @@ -57,6 +57,7 @@ option(ENABLE_HDFS "Enable HDFS" OFF) option(ENABLE_ORC "Enable ORC" OFF) option(ENABLE_ABFS "Enable ABFS" OFF) option(ENABLE_GPU "Enable GPU" OFF) +option(ENABLE_ICEBERG_WRITE "Enable iceberg write" OFF) Review Comment: The compile will fail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068782144
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+ format: Integer, directory: String, codec:
String)
+ extends ColumnarBatchDataWriterFactory {
+
+ /**
+ * Returns a data writer to do the actual writing work. Note that, Spark
will reuse the same data
+ * object instance when sending data to the data writer, for better
performance. Data writers
+ * are responsible for defensive copies if necessary, e.g. copy the data
before buffer it in a
+ * list.
+ *
+ * If this method fails (by throwing an exception), the corresponding Spark
write task would fail
+ * and get retried until hitting the maximum retry times.
+ *
+ */
+ override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory,
codec)
Review Comment:
The code style check make no effect on iceberg code, maybe we should fix it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068780818
##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+import org.apache.gluten.extension.columnar.transition.Convention
+import org.apache.gluten.extension.columnar.transition.Convention.RowType
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.write.{BatchWrite, Write,
WriterCommitMessage}
+import org.apache.spark.sql.datasources.v2.{DataWritingColumnarBatchSparkTask,
DataWritingColumnarBatchSparkTaskResult, StreamWriterCommitProgressUtil,
WritingColumnarBatchSparkTask}
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.execution.metric.SQLMetric
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.LongAccumulator
+
+abstract class ColumnarAppendDataExec(query: SparkPlan, refreshCache: () =>
Unit, write: Write)
+ extends V2ExistingTableWriteExec
+ with ValidatablePlan {
+
+ def writingTaskBatch: WritingColumnarBatchSparkTask[_] =
DataWritingColumnarBatchSparkTask
+
+ override def doExecute(): RDD[InternalRow] = {
+result
+sparkContext.parallelize(Nil, 1)
+ }
+
+ def createFactory(schema: StructType): ColumnarBatchDataWriterFactory
+
+ protected def writeColumnarBatchWithV2(batchWrite: BatchWrite): Unit = {
+val rdd: RDD[ColumnarBatch] = {
+ val tempRdd = query.executeColumnar()
+ // SPARK-23271 If we are attempting to write a zero partition rdd,
create a dummy single
+ // partition rdd to make sure we at least set up one write task to write
the metadata.
+ if (tempRdd.partitions.length == 0) {
+sparkContext.parallelize(Array.empty[ColumnarBatch], 1)
+ } else {
+tempRdd
+ }
+}
+// introduce a local var to avoid serializing the whole class
+val task = writingTaskBatch
+val messages = new Array[WriterCommitMessage](rdd.partitions.length)
+val totalNumRowsAccumulator = new LongAccumulator()
+
+logInfo(
+ s"Start processing data source write support: $batchWrite. " +
+s"The input RDD has ${messages.length} partitions.")
+
+// Avoid object not serializable issue.
+val writeMetrics: Map[String, SQLMetric] = customMetrics
+val factory = createFactory(query.schema)
+try {
+ sparkContext.runJob(
+rdd,
+(context: TaskContext, iter: Iterator[ColumnarBatch]) =>
+ task.run(factory, context, iter, writeMetrics),
+rdd.partitions.indices,
+(index, result: DataWritingColumnarBatchSparkTaskResult) => {
+ val commitMessage = result.writerCommitMessage
+ messages(index) = commitMessage
+ totalNumRowsAccumulator.add(result.numRows)
+ batchWrite.onDataWriterCommit(commitMessage)
+}
+ )
+
+ logInfo(s"Data source write support $batchWrite is committing.")
+ batchWrite.commit(messages)
+ logInfo(s"Data source write support $batchWrite committed.")
+ commitProgress = Some(
+
StreamWriterCommitProgressUtil.getStreamWriterCommitProgress(totalNumRowsAccumulator.value))
+} catch {
+ case cause: Throwable =>
Review Comment:
This is copied from Spark
##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068779830
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/execution/OffloadIcebergWrite.scala:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.extension.columnar.enumerated.RasOffload
+import org.apache.gluten.extension.columnar.heuristic.HeuristicTransform
+import org.apache.gluten.extension.columnar.offload.OffloadSingleNode
+import org.apache.gluten.extension.columnar.validator.Validators
+import org.apache.gluten.extension.injector.Injector
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2.AppendDataExec
+
+case class OffloadIcebergWrite() extends OffloadSingleNode {
+ override def offload(plan: SparkPlan): SparkPlan = plan match {
+case a: AppendDataExec =>
+ VeloxIcebergAppendDataExec(a)
Review Comment:
VeloxIcebergAppendDataExec uses the jni wrapper to write iceberg, but for
delta, the jni wrapper might be different, for example, delta lake may call the
rust jni call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068390354
##
backends-velox/src-iceberg/main/scala/org/apache/gluten/execution/OffloadIcebergWrite.scala:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.extension.columnar.enumerated.RasOffload
+import org.apache.gluten.extension.columnar.heuristic.HeuristicTransform
+import org.apache.gluten.extension.columnar.offload.OffloadSingleNode
+import org.apache.gluten.extension.columnar.validator.Validators
+import org.apache.gluten.extension.injector.Injector
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2.AppendDataExec
+
+case class OffloadIcebergWrite() extends OffloadSingleNode {
+ override def offload(plan: SparkPlan): SparkPlan = plan match {
+case a: AppendDataExec =>
+ VeloxIcebergAppendDataExec(a)
Review Comment:
What happens if the `AppendDataExec` is for hudi or delta?
##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+import org.apache.gluten.extension.columnar.transition.Convention
+import org.apache.gluten.extension.columnar.transition.Convention.RowType
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.write.{BatchWrite, Write,
WriterCommitMessage}
+import org.apache.spark.sql.datasources.v2.{DataWritingColumnarBatchSparkTask,
DataWritingColumnarBatchSparkTaskResult, StreamWriterCommitProgressUtil,
WritingColumnarBatchSparkTask}
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.execution.metric.SQLMetric
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.LongAccumulator
+
+abstract class ColumnarAppendDataExec(query: SparkPlan, refreshCache: () =>
Unit, write: Write)
+ extends V2ExistingTableWriteExec
+ with ValidatablePlan {
+
+ def writingTaskBatch: WritingColumnarBatchSparkTask[_] =
DataWritingColumnarBatchSparkTask
+
+ override def doExecute(): RDD[InternalRow] = {
+result
+sparkContext.parallelize(Nil, 1)
+ }
+
+ def createFactory(schema: StructType): ColumnarBatchDataWriterFactory
+
+ protected def writeColumnarBatchWithV2(batchWrite: BatchWrite): Unit = {
+val rdd: RDD[ColumnarBatch] = {
+ val tempRdd = query.executeColumnar()
+ // SPARK-23271 If we are attempting to write a zero partition rdd,
create a dummy single
+ // partition rdd to make sure we at least set up one write task to write
the metadata.
+ if (tempRdd.partitions.length == 0) {
+sparkContext.parallelize(Array.empty[ColumnarBatch], 1)
+ } else {
+tempRdd
+ }
+}
+// introduce a local var to avoid serializing the whole class
+val task = writingTaskBatch
+val messages = new Array[WriterCommitMessage](rdd.partitions.length)
+val totalNumRowsAccumulator = new LongAccumulator()
+
+logInfo(
+ s"Start processing data
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2838743393 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2828737371 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2828585072 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827679516 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827677605 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827145540 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827035280 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2824241263 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2824267789 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2824217543 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
Copilot commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2055999858
##
cpp/velox/compute/iceberg/IcebergFormat.cc:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "IcebergFormat.h"
+
+namespace gluten {
+using namespace facebook::velox::dwio::common;
+// static
+FileFormat icebergFormatToVelox(int32_t format) {
+ auto icebergFormat = static_cast(format);
+ switch (icebergFormat) {
+case IcebergFileFormat::ORC:
+ return FileFormat::ORC;
+case IcebergFileFormat::PARQUET:
+ return FileFormat::PARQUET;
+default:
+ throw std::invalid_argument("Not suppport file format " +
std::to_string(format));
Review Comment:
The error message contains a typographical error ('suppport'). Please
correct it to 'support'.
```suggestion
throw std::invalid_argument("Not support file format " +
std::to_string(format));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-282379 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2823758429 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821672211 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821659064 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821648890 Run Gluten Clickhouse CI on x86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397: URL: https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821648754 https://github.com/apache/incubator-gluten/issues/9335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]
github-actions[bot] commented on PR #9397:
URL:
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821648102
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/apache/incubator-gluten/issues
Then could you also rename ***commit message*** and ***pull request title***
in the following format?
[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
See also:
* [Other pull requests](https://github.com/apache/incubator-gluten/pulls/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
