Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-18 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3088687770

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-18 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3088494835

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-18 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3088437997

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-16 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082252947

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-16 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082234794

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-16 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082116424

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-16 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3082108963

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-15 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3076882721

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-15 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3076478852

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-15 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2209052237


##
cpp/velox/compute/iceberg/IcebergWriter.cc:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "IcebergWriter.h"
+
+#include "IcebergPartitionSpec.pb.h"
+#include "compute/ProtobufUtils.h"
+#include "compute/iceberg/IcebergFormat.h"
+#include "utils/ConfigExtractor.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+#include "velox/connectors/hive/iceberg/IcebergDeleteFile.h"
+
+using namespace facebook::velox;
+using namespace facebook::velox::connector::hive;
+using namespace facebook::velox::connector::hive::iceberg;
+namespace {
+
+std::shared_ptr makeLocationHandle(
+std::string targetDirectory,
+std::optional writeDirectory = std::nullopt,
+connector::hive::LocationHandle::TableType tableType = 
connector::hive::LocationHandle::TableType::kExisting) {
+  return;

Review Comment:
   One simple refactor commit breaks the compile, the PR has been tested, and 
wait for PR to merge to native side



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-15 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3076474847

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-11 Thread via GitHub


avevad commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2200827686


##
cpp/velox/compute/iceberg/IcebergWriter.cc:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "IcebergWriter.h"
+
+#include "IcebergPartitionSpec.pb.h"
+#include "compute/ProtobufUtils.h"
+#include "compute/iceberg/IcebergFormat.h"
+#include "utils/ConfigExtractor.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+#include "velox/connectors/hive/iceberg/IcebergDeleteFile.h"
+
+using namespace facebook::velox;
+using namespace facebook::velox::connector::hive;
+using namespace facebook::velox::connector::hive::iceberg;
+namespace {
+
+std::shared_ptr makeLocationHandle(
+std::string targetDirectory,
+std::optional writeDirectory = std::nullopt,
+connector::hive::LocationHandle::TableType tableType = 
connector::hive::LocationHandle::TableType::kExisting) {
+  return;

Review Comment:
   Return what? Does this PR even compile? I keep seeing syntax errors here and 
there, but I don't really understand how can one write such amount of code 
without compiling it



##
cpp/velox/compute/iceberg/IcebergWriter.cc:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "IcebergWriter.h"
+
+#include "IcebergPartitionSpec.pb.h"
+#include "compute/ProtobufUtils.h"
+#include "compute/iceberg/IcebergFormat.h"
+#include "utils/ConfigExtractor.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+#include "velox/connectors/hive/iceberg/IcebergDeleteFile.h"
+
+using namespace facebook::velox;
+using namespace facebook::velox::connector::hive;
+using namespace facebook::velox::connector::hive::iceberg;
+namespace {
+
+std::shared_ptr makeLocationHandle(
+std::string targetDirectory,
+std::optional writeDirectory = std::nullopt,
+connector::hive::LocationHandle::TableType tableType = 
connector::hive::LocationHandle::TableType::kExisting) {
+  return;
+}
+
+std::shared_ptr createIcebergInsertTableHandle(
+const RowTypePtr& outputRowType,
+const std::string& outputDirectoryPath,
+dwio::common::FileFormat fileFormat,
+facebook::velox::common::CompressionKind compressionKind,
+std::shared_ptr spec) {
+  std::cout <<"output directory" << outputDirectoryPath << std::endl;
+  std::vector> 
columnHandles;
+
+  std::vector columnNames = outputRowType->names();
+  std::vector columnTypes = outputRowType->children();
+  std::vector partitionColumns;
+  partitionColumns.reserve(spec->fields.size());
+  for (const auto& field : spec->fields) {
+partitionColumns.push_back(field.name);
+  }
+  for (auto i = 0; i < columnNames.size(); ++i) {
+if (std::find(partitionColumns.begin(), partitionColumns.end(), 
columnNames[i]) != partitionColumns.end()) {
+  columnHandles.push_back(
+std::make_shared(
+columnNames.at(i),
+connector::hive::HiveColumnHandle::ColumnType::kPartitionKey,
+columnTypes.at(i),
+columnTypes.at(i)));
+} else {
+  columnHandles.push_back(
+std::make_shared(
+columnNames.at(i),
+connector::hive::HiveColumnHandle::ColumnType::kRegular,
+columnTypes.at(i),
+columnTypes.at(i)));
+  }

Review Comment:
   Looks like the '}' of else-clause is absent



-- 
This is an automated message from the Apache Git Servic

Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-07 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3045827380

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-07 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3045060451

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-06 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3043406248

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-06 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3043404550

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-06 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3043301911

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-07-01 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3022214636

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-30 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3022108354

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-30 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2176517207


##
.github/workflows/velox_backend_enhanced_features.yml:
##
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: Velox backend Github Runner (Enhanced Features)
+
+on:
+  pull_request:
+paths:
+  - '.github/workflows/velox_backend_enhanced_features.yml'
+  - 'pom.xml'
+  - 'backends-velox/**'
+  - 'gluten-uniffle/**'
+  - 'gluten-celeborn/**'
+  - 'gluten-ras/**'
+  - 'gluten-core/**'
+  - 'gluten-substrait/**'
+  - 'gluten-arrow/**'
+  - 'gluten-delta/**'
+  - 'gluten-iceberg/**'
+  - 'gluten-hudi/**'
+  - 'gluten-ut/**'
+  - 'shims/**'
+  - 'tools/gluten-it/**'
+  - 'ep/build-velox/**'
+  - 'cpp/**'
+  - 'dev/**'
+
+env:
+  ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
+  MVN_CMD: 'mvn -ntp'
+  WGET_CMD: 'wget -nv'
+  SETUP: 'bash .github/workflows/util/setup_helper.sh'
+  CCACHE_DIR: "${{ github.workspace }}/.ccache"
+  # for JDK17 unit tests
+  EXTRA_FLAGS:  "-XX:+IgnoreUnrecognizedVMOptions 
+--add-opens=java.base/java.lang=ALL-UNNAMED
+--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
+--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
+--add-opens=java.base/java.io=ALL-UNNAMED
+--add-opens=java.base/java.net=ALL-UNNAMED
+--add-opens=java.base/java.nio=ALL-UNNAMED
+--add-opens=java.base/java.util=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
+--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
+--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
+--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
+--add-opens=java.base/sun.security.action=ALL-UNNAMED
+--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
+-Djdk.reflect.useDirectMethodHandle=false
+-Dio.netty.tryReflectionSetAccessible=true"
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+jobs:
+  build-native-lib-centos-7:
+runs-on: ubuntu-22.04
+steps:
+  - uses: actions/checkout@v4
+  - name: Get Ccache
+uses: actions/cache/restore@v4
+with:
+  path: '${{ env.CCACHE_DIR }}'
+  key: ccache-enhanced-centos7-release-default-${{github.sha}}
+  restore-keys: |
+ccache-enhanced-centos7-release-default
+  - name: Build Gluten native libraries
+run: |
+  docker pull apache/gluten:vcpkg-centos-7
+  docker run -v $GITHUB_WORKSPACE:/work -w /work 
apache/gluten:vcpkg-centos-7 bash -c "
+set -e
+yum install tzdata -y
+df -a
+cd /work
+export CCACHE_DIR=/work/.ccache
+mkdir -p /work/.ccache
+bash dev/ci-velox-buildstatic-centos-7-enhanced-features.sh
+ccache -s
+mkdir -p /work/.m2/repository/org/apache/arrow/
+cp -r /root/.m2/repository/org/apache/arrow/* 
/work/.m2/repository/org/apache/arrow/
+  "
+
+  - name: "Save ccache"
+uses: actions/cache/save@v4
+id: ccache
+with:
+  path: '${{ env.CCACHE_DIR }}'
+  key: ccache-enhanced-centos7-release-default-${{github.sha}}
+  - uses: actions/upload-artifact@v4
+with:
+  name: velox-native-lib-enhanced-centos-7-${{github.sha}}
+  path: ./cpp/build/releases/
+  if-no-files-found: error
+  - uses: actions/upload-artifact@v4
+with:
+  name: arrow-jars-enhanced-centos-7-${{github.sha}}
+  path: .m2/repository/org/apache/arrow/
+  if-no-files-found: error
+
+  spark-test-spark34:

Review Comment:
   The spark 34 uses iceberg 1.7.1 version while spark3 uses iceberg 1.5.0 
iceberg version which causes version mismatch, the reflection to get 
writeProperty from SparkWrite will failed in the test.




Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-30 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2176406267


##
ep/build-velox/src/get_velox.sh:
##
@@ -20,8 +20,8 @@ VELOX_REPO=https://github.com/oap-project/velox.git
 VELOX_BRANCH=2025_06_25
 VELOX_HOME=""
 RUN_SETUP_SCRIPT=ON
-VELOX_ENHANCED_REPO=https://github.com/oap-project/velox.git
-VELOX_ENHANCED_BRANCH=2025_06_24
+VELOX_ENHANCED_REPO=https://github.com/jinchengchenghh/velox.git
+VELOX_ENHANCED_BRANCH=2025_06_25

Review Comment:
   No, I will merge this PR after the velox PR is merged to ibm/velox, it will 
be soon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-30 Thread via GitHub


zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2176295082


##
gluten-core/src/main/resources/org/apache/gluten/proto/IcebergPartitionSpec.proto:
##
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: Apache-2.0
+syntax = "proto3";
+
+package gluten;
+
+option java_package = "org.apache.gluten.proto";
+option java_multiple_files = true;
+
+enum TransformType {
+  IDENTITY = 0;
+  YEAR = 1;
+  MONTH = 2;
+  DAY = 3;
+  HOUR = 4;
+  BUCKET = 5;
+  TRUNCATE = 6;
+}
+
+message IcebergPartitionField {
+  string name = 1;
+  TransformType transform = 2;
+  optional int32 parameter = 3;  // Optional parameter for transform config
+}
+
+message IcebergPartitionSpec {
+  int32 spec_id = 1;  // Field name uses snake_case per protobuf conventions
+  repeated IcebergPartitionField fields = 2;
+}

Review Comment:
   Possible to move to gluten-substrait or backend-velox? Thanks!
   
   I understand it doesn't belong to the substrait protos but still it doesn't 
seem general to place here.



##
dev/builddeps-veloxbe.sh:
##
@@ -7,7 +7,7 @@
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
-#http://www.apache.org/licenses/LICENSE-2.0
+#http://www.apache.org/licenses/LICENSE-2.0C

Review Comment:
   is this intentional?



##
ep/build-velox/src/get_velox.sh:
##
@@ -20,8 +20,8 @@ VELOX_REPO=https://github.com/oap-project/velox.git
 VELOX_BRANCH=2025_06_25
 VELOX_HOME=""
 RUN_SETUP_SCRIPT=ON
-VELOX_ENHANCED_REPO=https://github.com/oap-project/velox.git
-VELOX_ENHANCED_BRANCH=2025_06_24
+VELOX_ENHANCED_REPO=https://github.com/jinchengchenghh/velox.git
+VELOX_ENHANCED_BRANCH=2025_06_25

Review Comment:
   I have no strong opinion on this, but as long as there are no objections 
from everybody.
   
   Are you going to change the repo to `ibm/velox` right in the next PR?



##
.github/workflows/velox_backend_enhanced_features.yml:
##
@@ -0,0 +1,170 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: Velox backend Github Runner (Enhanced Features)
+
+on:
+  pull_request:
+paths:
+  - '.github/workflows/velox_backend_enhanced_features.yml'
+  - 'pom.xml'
+  - 'backends-velox/**'
+  - 'gluten-uniffle/**'
+  - 'gluten-celeborn/**'
+  - 'gluten-ras/**'
+  - 'gluten-core/**'
+  - 'gluten-substrait/**'
+  - 'gluten-arrow/**'
+  - 'gluten-delta/**'
+  - 'gluten-iceberg/**'
+  - 'gluten-hudi/**'
+  - 'gluten-ut/**'
+  - 'shims/**'
+  - 'tools/gluten-it/**'
+  - 'ep/build-velox/**'
+  - 'cpp/**'
+  - 'dev/**'
+
+env:
+  ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true
+  MVN_CMD: 'mvn -ntp'
+  WGET_CMD: 'wget -nv'
+  SETUP: 'bash .github/workflows/util/setup_helper.sh'
+  CCACHE_DIR: "${{ github.workspace }}/.ccache"
+  # for JDK17 unit tests
+  EXTRA_FLAGS:  "-XX:+IgnoreUnrecognizedVMOptions 
+--add-opens=java.base/java.lang=ALL-UNNAMED
+--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
+--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
+--add-opens=java.base/java.io=ALL-UNNAMED
+--add-opens=java.base/java.net=ALL-UNNAMED
+--add-opens=java.base/java.nio=ALL-UNNAMED
+--add-opens=java.base/java.util=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
+--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
+--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
+--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
+--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
+--add-opens=java.base/sun.security.action=ALL-UNNAMED
+--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
+-Djdk.reflect.useDirectMethodHandle=false
+-Dio.netty.tryReflectionSetAccessible=true"
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ 
github.workflow }}
+  cancel-in-progress: true
+
+jobs:
+  build-native-lib-centos-7:
+runs-on: ubuntu-22.0

Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-30 Thread via GitHub


jinchengchenghh commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3018163650

   Could you help review again?Thanks! The first velox PR will be merged to 
ibm/velox after CI ready. After then, I will update the enhanced frature branch 
to ibm velox main. @zhztheplayer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-25 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3006957045

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-25 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3006849656

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-25 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-3003691205

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-19 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2986863248

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-18 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2986741821

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-08 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2134919351


##
cpp/velox/compute/iceberg/IcebergWriter.h:
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#pragma once
+
+#include "compute/iceberg/IcebergFormat.h"
+#include "memory/VeloxColumnarBatch.h"
+#include "velox/connectors/hive/iceberg/IcebergDataSink.h"
+
+namespace gluten {
+
+class IcebergWriter {

Review Comment:
   The API is different, for function commit vs close return type, and 
initialize arguments, Iceberg datasource needs partition spec, now it does not 
include in this PR but will be supported later. And VeloxDataSource onlys 
supports parquet, so it does not have the argument format, and the 
compressionKind gets from iceberg TableProperty not sparkConfs, so much 
difference makes it a separate class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-08 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2134909382


##
gluten-substrait/src/main/scala/org/apache/spark/sql/datasources/v2/AppendColumnarBatchDataExec.scala:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.datasources.v2
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+
+import org.apache.spark.TaskContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.connector.write._
+import org.apache.spark.sql.execution.datasources.v2.StreamWriterCommitProgress
+import org.apache.spark.sql.execution.metric.{CustomMetrics, SQLMetric}
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.Utils
+
+case class DataWritingColumnarBatchSparkTaskResult(
+numRows: Long,

Review Comment:
   In spark, it is WriteToDataSourceV2Exec.scala, we may need to rename 
`AppendColumnarBatchDataExec.scala` to `ColumnarWriteToDataSourceV2Exec.scala`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-08 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2134903421


##
gluten-substrait/src/main/scala/org/apache/gluten/backendsapi/BackendSettingsApi.scala:
##
@@ -158,4 +158,6 @@ trait BackendSettingsApi {
 
   def supportIcebergEqualityDeleteRead(): Boolean = true
 
+  def supportAppendDataExec(): Boolean = false

Review Comment:
   I suppose CH backends do not support AppendDataExec operator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-06-05 Thread via GitHub


zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2128389033


##
gluten-substrait/src/main/java/org/apache/gluten/connector/write/ColumnarBatchDataWriterFactory.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.connector.write.DataWriter;
+import org.apache.spark.sql.connector.write.PhysicalWriteInfo;
+import org.apache.spark.sql.vectorized.ColumnarBatch;
+
+import java.io.Serializable;
+
+/**
+ * A factory of {@link DataWriter} returned by {@link
+ * BatchWrite#createBatchWriterFactory(PhysicalWriteInfo)}, which is 
responsible for creating and
+ * initializing the actual data writer at executor side.
+ *
+ * Note that, the writer factory will be serialized and sent to executors, 
then the data writer
+ * will be created on executors and do the actual writing. So this interface 
must be serializable
+ * and {@link DataWriter} doesn't need to be.
+ *
+ * @since 3.0.0
+ */

Review Comment:
   Let's rephrase the comments. Maybe we just convey that it's a companion 
interface with Spark's row-based version.



##
gluten-iceberg/src/main/java/org/apache/gluten/connector/write/ColumnarBatchWrite.java:
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write;
+
+import org.apache.spark.sql.connector.write.BatchWrite;
+import org.apache.spark.sql.connector.write.DataWriterFactory;
+import org.apache.spark.sql.connector.write.PhysicalWriteInfo;
+
+public abstract class ColumnarBatchWrite implements BatchWrite {
+  @Override
+  public DataWriterFactory createBatchWriterFactory(PhysicalWriteInfo info) {
+throw new UnsupportedOperationException();
+  }
+
+  public ColumnarDataWriterFactory 
createColumnarBatchWriterFactory(PhysicalWriteInfo info) {
+throw new UnsupportedOperationException();
+  }
+}

Review Comment:
   Could move the API to `gluten-substrait`?



##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+import org.apache.gluten.extension.columnar.transition.Convention
+import org.apache.gluten.extension.columnar.transition.Convention.RowType
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark

Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-05-01 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2845156876

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-05-01 Thread via GitHub


zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070411767


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+   format: Integer, directory: String, codec: 
String)
+  extends ColumnarBatchDataWriterFactory {
+
+  /**
+   * Returns a data writer to do the actual writing work. Note that, Spark 
will reuse the same data
+   * object instance when sending data to the data writer, for better 
performance. Data writers
+   * are responsible for defensive copies if necessary, e.g. copy the data 
before buffer it in a
+   * list.
+   * 
+   * If this method fails (by throwing an exception), the corresponding Spark 
write task would fail
+   * and get retried until hitting the maximum retry times.
+   *
+   */
+  override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory, 
codec)

Review Comment:
   @jinchengchenghh I had opened this 
https://github.com/apache/incubator-gluten/pull/9478 either. The reason is 
exactly the same with you have found.



##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+   format: Integer, directory: String, codec: 
String)
+  extends ColumnarBatchDataWriterFactory {
+
+  /**
+   * Returns a data writer to do the actual writing work. Note that, Spark 
will reuse the same data
+   * object instance when sending data to the data writer, for better 
performance. Data writers
+   * are responsible for defensive copies if necessary, e.g. copy the data 
before buffer it in a
+   * list.
+   * 
+   * If this method fails (by throwing an exception), the corresponding Spark 
write task would fail
+   * and get retried until hitting the maximum retry times.
+   *
+   */
+  override def createWriter():

Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-05-01 Thread via GitHub


zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070411767


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+   format: Integer, directory: String, codec: 
String)
+  extends ColumnarBatchDataWriterFactory {
+
+  /**
+   * Returns a data writer to do the actual writing work. Note that, Spark 
will reuse the same data
+   * object instance when sending data to the data writer, for better 
performance. Data writers
+   * are responsible for defensive copies if necessary, e.g. copy the data 
before buffer it in a
+   * list.
+   * 
+   * If this method fails (by throwing an exception), the corresponding Spark 
write task would fail
+   * and get retried until hitting the maximum retry times.
+   *
+   */
+  override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory, 
codec)

Review Comment:
   @jinchengchenghh I had opened this 
https://github.com/apache/incubator-gluten/pull/9478 either



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-05-01 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070401492


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+   format: Integer, directory: String, codec: 
String)
+  extends ColumnarBatchDataWriterFactory {
+
+  /**
+   * Returns a data writer to do the actual writing work. Note that, Spark 
will reuse the same data
+   * object instance when sending data to the data writer, for better 
performance. Data writers
+   * are responsible for defensive copies if necessary, e.g. copy the data 
before buffer it in a
+   * list.
+   * 
+   * If this method fails (by throwing an exception), the corresponding Spark 
write task would fail
+   * and get retried until hitting the maximum retry times.
+   *
+   */
+  override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory, 
codec)

Review Comment:
   Because the name src-iceberg is not included in the plugin, I will try to 
fix it. 
https://github.com/diffplug/spotless/blob/main/plugin-maven/README.md#scala



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-05-01 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2070391038


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+   format: Integer, directory: String, codec: 
String)
+  extends ColumnarBatchDataWriterFactory {
+
+  /**
+   * Returns a data writer to do the actual writing work. Note that, Spark 
will reuse the same data
+   * object instance when sending data to the data writer, for better 
performance. Data writers
+   * are responsible for defensive copies if necessary, e.g. copy the data 
before buffer it in a
+   * list.
+   * 
+   * If this method fails (by throwing an exception), the corresponding Spark 
write task would fail
+   * and get retried until hitting the maximum retry times.
+   *
+   */
+  override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory, 
codec)

Review Comment:
   com.diffplug.spotless:spotless-maven-plugin:2.27.2:check can check the 
module gluten-iceberg, but cannot check the backends-velox/src/iceberg 
introduced by `build-helper-maven-plugin`, scala style plugin can work well 
with src-iceberg



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-30 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2842292139

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-30 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068782744


##
cpp/CMakeLists.txt:
##
@@ -57,6 +57,7 @@ option(ENABLE_HDFS "Enable HDFS" OFF)
 option(ENABLE_ORC "Enable ORC" OFF)
 option(ENABLE_ABFS "Enable ABFS" OFF)
 option(ENABLE_GPU "Enable GPU" OFF)
+option(ENABLE_ICEBERG_WRITE "Enable iceberg write" OFF)

Review Comment:
   The compile will fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-30 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068782144


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/connector/write/IcebergDataWriteFactory.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.connector.write
+
+import org.apache.arrow.c.ArrowSchema
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.execution.IcebergWriteJniWrapper
+import org.apache.gluten.memory.arrow.alloc.ArrowBufferAllocators
+import org.apache.gluten.runtime.Runtimes
+import org.apache.gluten.utils.ArrowAbiUtil
+import org.apache.spark.sql.connector.write.DataWriter
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.utils.SparkArrowUtil
+import org.apache.spark.sql.vectorized.ColumnarBatch
+
+case class IcebergDataWriteFactory(schema: StructType,
+   format: Integer, directory: String, codec: 
String)
+  extends ColumnarBatchDataWriterFactory {
+
+  /**
+   * Returns a data writer to do the actual writing work. Note that, Spark 
will reuse the same data
+   * object instance when sending data to the data writer, for better 
performance. Data writers
+   * are responsible for defensive copies if necessary, e.g. copy the data 
before buffer it in a
+   * list.
+   * 
+   * If this method fails (by throwing an exception), the corresponding Spark 
write task would fail
+   * and get retried until hitting the maximum retry times.
+   *
+   */
+  override def createWriter(): DataWriter[ColumnarBatch] = {
+val(writerHandle, jniWrapper) = getJniWrapper(schema, format, directory, 
codec)

Review Comment:
   The code style check make no effect on iceberg code, maybe we should fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-30 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068780818


##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+import org.apache.gluten.extension.columnar.transition.Convention
+import org.apache.gluten.extension.columnar.transition.Convention.RowType
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.write.{BatchWrite, Write, 
WriterCommitMessage}
+import org.apache.spark.sql.datasources.v2.{DataWritingColumnarBatchSparkTask, 
DataWritingColumnarBatchSparkTaskResult, StreamWriterCommitProgressUtil, 
WritingColumnarBatchSparkTask}
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.execution.metric.SQLMetric
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.LongAccumulator
+
+abstract class ColumnarAppendDataExec(query: SparkPlan, refreshCache: () => 
Unit, write: Write)
+  extends V2ExistingTableWriteExec
+  with ValidatablePlan {
+
+  def writingTaskBatch: WritingColumnarBatchSparkTask[_] = 
DataWritingColumnarBatchSparkTask
+
+  override def doExecute(): RDD[InternalRow] = {
+result
+sparkContext.parallelize(Nil, 1)
+  }
+
+  def createFactory(schema: StructType): ColumnarBatchDataWriterFactory
+
+  protected def writeColumnarBatchWithV2(batchWrite: BatchWrite): Unit = {
+val rdd: RDD[ColumnarBatch] = {
+  val tempRdd = query.executeColumnar()
+  // SPARK-23271 If we are attempting to write a zero partition rdd, 
create a dummy single
+  // partition rdd to make sure we at least set up one write task to write 
the metadata.
+  if (tempRdd.partitions.length == 0) {
+sparkContext.parallelize(Array.empty[ColumnarBatch], 1)
+  } else {
+tempRdd
+  }
+}
+// introduce a local var to avoid serializing the whole class
+val task = writingTaskBatch
+val messages = new Array[WriterCommitMessage](rdd.partitions.length)
+val totalNumRowsAccumulator = new LongAccumulator()
+
+logInfo(
+  s"Start processing data source write support: $batchWrite. " +
+s"The input RDD has ${messages.length} partitions.")
+
+// Avoid object not serializable issue.
+val writeMetrics: Map[String, SQLMetric] = customMetrics
+val factory = createFactory(query.schema)
+try {
+  sparkContext.runJob(
+rdd,
+(context: TaskContext, iter: Iterator[ColumnarBatch]) =>
+  task.run(factory, context, iter, writeMetrics),
+rdd.partitions.indices,
+(index, result: DataWritingColumnarBatchSparkTaskResult) => {
+  val commitMessage = result.writerCommitMessage
+  messages(index) = commitMessage
+  totalNumRowsAccumulator.add(result.numRows)
+  batchWrite.onDataWriterCommit(commitMessage)
+}
+  )
+
+  logInfo(s"Data source write support $batchWrite is committing.")
+  batchWrite.commit(messages)
+  logInfo(s"Data source write support $batchWrite committed.")
+  commitProgress = Some(
+
StreamWriterCommitProgressUtil.getStreamWriterCommitProgress(totalNumRowsAccumulator.value))
+} catch {
+  case cause: Throwable =>

Review Comment:
   This is copied from Spark



##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy

Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-30 Thread via GitHub


jinchengchenghh commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068779830


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/execution/OffloadIcebergWrite.scala:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.extension.columnar.enumerated.RasOffload
+import org.apache.gluten.extension.columnar.heuristic.HeuristicTransform
+import org.apache.gluten.extension.columnar.offload.OffloadSingleNode
+import org.apache.gluten.extension.columnar.validator.Validators
+import org.apache.gluten.extension.injector.Injector
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2.AppendDataExec
+
+case class OffloadIcebergWrite() extends OffloadSingleNode {
+  override def offload(plan: SparkPlan): SparkPlan = plan match {
+case a: AppendDataExec =>
+  VeloxIcebergAppendDataExec(a)

Review Comment:
   VeloxIcebergAppendDataExec uses the jni wrapper to write iceberg, but for 
delta, the jni wrapper might be different, for example, delta lake may call the 
rust jni call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-30 Thread via GitHub


zhztheplayer commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2068390354


##
backends-velox/src-iceberg/main/scala/org/apache/gluten/execution/OffloadIcebergWrite.scala:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.extension.columnar.enumerated.RasOffload
+import org.apache.gluten.extension.columnar.heuristic.HeuristicTransform
+import org.apache.gluten.extension.columnar.offload.OffloadSingleNode
+import org.apache.gluten.extension.columnar.validator.Validators
+import org.apache.gluten.extension.injector.Injector
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2.AppendDataExec
+
+case class OffloadIcebergWrite() extends OffloadSingleNode {
+  override def offload(plan: SparkPlan): SparkPlan = plan match {
+case a: AppendDataExec =>
+  VeloxIcebergAppendDataExec(a)

Review Comment:
   What happens if the `AppendDataExec` is for hudi or delta?



##
gluten-substrait/src/main/scala/org/apache/gluten/execution/ColumnarAppendDataExec.scala:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.connector.write.ColumnarBatchDataWriterFactory
+import org.apache.gluten.extension.columnar.transition.Convention
+import org.apache.gluten.extension.columnar.transition.Convention.RowType
+
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.write.{BatchWrite, Write, 
WriterCommitMessage}
+import org.apache.spark.sql.datasources.v2.{DataWritingColumnarBatchSparkTask, 
DataWritingColumnarBatchSparkTaskResult, StreamWriterCommitProgressUtil, 
WritingColumnarBatchSparkTask}
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.datasources.v2._
+import org.apache.spark.sql.execution.metric.SQLMetric
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.vectorized.ColumnarBatch
+import org.apache.spark.util.LongAccumulator
+
+abstract class ColumnarAppendDataExec(query: SparkPlan, refreshCache: () => 
Unit, write: Write)
+  extends V2ExistingTableWriteExec
+  with ValidatablePlan {
+
+  def writingTaskBatch: WritingColumnarBatchSparkTask[_] = 
DataWritingColumnarBatchSparkTask
+
+  override def doExecute(): RDD[InternalRow] = {
+result
+sparkContext.parallelize(Nil, 1)
+  }
+
+  def createFactory(schema: StructType): ColumnarBatchDataWriterFactory
+
+  protected def writeColumnarBatchWithV2(batchWrite: BatchWrite): Unit = {
+val rdd: RDD[ColumnarBatch] = {
+  val tempRdd = query.executeColumnar()
+  // SPARK-23271 If we are attempting to write a zero partition rdd, 
create a dummy single
+  // partition rdd to make sure we at least set up one write task to write 
the metadata.
+  if (tempRdd.partitions.length == 0) {
+sparkContext.parallelize(Array.empty[ColumnarBatch], 1)
+  } else {
+tempRdd
+  }
+}
+// introduce a local var to avoid serializing the whole class
+val task = writingTaskBatch
+val messages = new Array[WriterCommitMessage](rdd.partitions.length)
+val totalNumRowsAccumulator = new LongAccumulator()
+
+logInfo(
+  s"Start processing data 

Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-29 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2838743393

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-24 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2828737371

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-24 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2828585072

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-24 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827679516

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-24 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827677605

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-24 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827145540

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-24 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2827035280

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2824241263

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2824267789

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2824217543

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-23 Thread via GitHub


Copilot commented on code in PR #9397:
URL: https://github.com/apache/incubator-gluten/pull/9397#discussion_r2055999858


##
cpp/velox/compute/iceberg/IcebergFormat.cc:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "IcebergFormat.h"
+
+namespace gluten {
+using namespace facebook::velox::dwio::common;
+// static
+FileFormat icebergFormatToVelox(int32_t format) {
+  auto icebergFormat = static_cast(format);
+  switch (icebergFormat) {
+case IcebergFileFormat::ORC:
+  return FileFormat::ORC;
+case IcebergFileFormat::PARQUET:
+  return FileFormat::PARQUET;
+default:
+  throw std::invalid_argument("Not suppport file format " + 
std::to_string(format));

Review Comment:
   The error message contains a typographical error ('suppport'). Please 
correct it to 'support'.
   ```suggestion
 throw std::invalid_argument("Not support file format " + 
std::to_string(format));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-282379

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-23 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2823758429

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-22 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821672211

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-22 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821659064

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-22 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821648890

   Run Gluten Clickhouse CI on x86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-22 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821648754

   https://github.com/apache/incubator-gluten/issues/9335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [GLUTEN-9335][VL] Support iceberg write [incubator-gluten]

2025-04-22 Thread via GitHub


github-actions[bot] commented on PR #9397:
URL: 
https://github.com/apache/incubator-gluten/pull/9397#issuecomment-2821648102

   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on Github Issues?
   
   https://github.com/apache/incubator-gluten/issues
   
   Then could you also rename ***commit message*** and ***pull request title*** 
in the following format?
   
   [GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/incubator-gluten/pulls/)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]