Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-12-01 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1865183048


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfigOptions.java:
##
@@ -134,6 +134,12 @@ public class BaseSourceConfigOptions {
 .noDefaultValue()
 .withDescription("To be read sheet name,only valid for 
excel files");
 
+public static final Option EXCEL_ENGINE =
+Options.key("excel_engine")
+.objectType(ExcelEngine.class)

Review Comment:
   ```suggestion
   .enumType(ExcelEngine.class)
   ```



##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +160,19 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+
+
+
+org.dhatim
+fastexcel-reader
+${fastexcel-reader.version}
+

Review Comment:
   Please remove this dependency because we don't use it at now.



##
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/excel/ExcelReaderListener.java:
##
@@ -0,0 +1,277 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.file.excel;
+
+import 
org.apache.seatunnel.shade.com.fasterxml.jackson.core.JsonProcessingException;
+import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+
+import org.apache.seatunnel.api.configuration.ReadonlyConfig;
+import org.apache.seatunnel.api.source.Collector;
+import org.apache.seatunnel.api.table.type.SeaTunnelDataType;
+import org.apache.seatunnel.api.table.type.SeaTunnelRow;
+import org.apache.seatunnel.api.table.type.SeaTunnelRowType;
+import org.apache.seatunnel.api.table.type.SqlType;
+import org.apache.seatunnel.common.exception.CommonErrorCodeDeprecated;
+import org.apache.seatunnel.common.utils.DateTimeUtils;
+import org.apache.seatunnel.common.utils.DateUtils;
+import org.apache.seatunnel.common.utils.TimeUtils;
+import 
org.apache.seatunnel.connectors.seatunnel.file.config.BaseSourceConfigOptions;
+import 
org.apache.seatunnel.connectors.seatunnel.file.exception.FileConnectorException;
+
+import org.apache.poi.ss.usermodel.DateUtil;
+
+import com.alibaba.excel.context.AnalysisContext;
+import com.alibaba.excel.enums.CellDataTypeEnum;
+import com.alibaba.excel.event.AnalysisEventListener;
+import com.alibaba.excel.exception.ExcelDataConvertException;
+import com.alibaba.excel.metadata.Cell;
+import com.alibaba.excel.metadata.data.ReadCellData;
+import lombok.SneakyThrows;
+import lombok.extern.slf4j.Slf4j;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.nio.charset.StandardCharsets;
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.time.ZoneId;
+import java.time.format.DateTimeFormatter;
+import java.util.Date;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+
+@Slf4j
+public class ExcelReaderListener extends AnalysisEventListener>
+implements Serializable, Closeable {
+private final String tableId;
+private final Collector output;
+private int cellCount;
+
+private final ObjectMapper objectMapper = new ObjectMapper();
+
+private DateTimeFormatter dateFormatter;
+private DateTimeFormatter dateTimeFormatter;
+private DateTimeFormatter timeFormatter;
+
+protected Config pluginConfig;
+
+protected SeaTunnelRowType seaTunnelRowType;
+
+private SeaTunnelDataType[] fieldTypes;
+
+Map customHeaders = new HashMap<>();
+
+public ExcelReaderListener(
+String tableId,
+Collector output,
+Config pluginConfig,
+SeaTunnelRowType seaTunnelRowType) {
+this.tableId = tableId;
+this.output = output;
+this.p

Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-12-01 Thread via GitHub


corgy-w commented on PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#issuecomment-2509770603

   > I participated in the attempt, but it will take a while for both surnames 
to be present
   
   @dwave Thank you for your contribution,If you want to provide the engine 
selection for reading from the source first, you should update the 
documentation to remove the sink capability and modify all related 
documentation about Excel support. Thank you again for your contribution. The 
sink engine selection capability will be provided in future updates.:>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-12-01 Thread via GitHub


dwave commented on PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#issuecomment-2509761338

   > Thanks @dwave ! Currently only support change engine when read excel, I 
thing we should support it when write excel. 
![image](https://private-user-images.githubusercontent.com/32387433/390662181-e4b33b71-cfd1-4d3e-b4bc-33675ee6fe5e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzMwNTg2NjgsIm5iZiI6MTczMzA1ODM2OCwicGF0aCI6Ii8zMjM4NzQzMy8zOTA2NjIxODEtZTRiMzNiNzEtY2ZkMS00ZDNlLWI0YmMtMzM2NzVlZTZmZTVlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEyMDElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMjAxVDEzMDYwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTUwN2M4MDY0MzJmNThlOTZhNGNjMTNkZGQ1NjM4ZjUwZjlmZGZkMWE1MjFmN2UxZDNhYzQ0NDdmODZjZWE3MjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0._dBv_x4opSCp4IYWY60BkWlxlCo0rMIen4jZ7I9rshk)
   
   I participated in the attempt, but it will take a while for both surnames to 
be present


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-26 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1860054913


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfigOptions.java:
##
@@ -134,6 +134,13 @@ public class BaseSourceConfigOptions {
 .noDefaultValue()
 .withDescription("To be read sheet name,only valid for 
excel files");
 
+public static final Option EXCEL_ENGINE =
+Options.key("excel_engine")
+.stringType()
+.noDefaultValue()

Review Comment:
   OK, I'll edit it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-26 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1860020963


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfigOptions.java:
##
@@ -134,6 +134,13 @@ public class BaseSourceConfigOptions {
 .noDefaultValue()
 .withDescription("To be read sheet name,only valid for 
excel files");
 
+public static final Option EXCEL_ENGINE =
+Options.key("excel_engine")
+.stringType()
+.noDefaultValue()

Review Comment:
   +1, we can use enum type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-26 Thread via GitHub


corgy-w commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1859982060


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/config/BaseSourceConfigOptions.java:
##
@@ -134,6 +134,13 @@ public class BaseSourceConfigOptions {
 .noDefaultValue()
 .withDescription("To be read sheet name,only valid for 
excel files");
 
+public static final Option EXCEL_ENGINE =
+Options.key("excel_engine")
+.stringType()
+.noDefaultValue()

Review Comment:
   It is recommended to directly give a default value of a POI here for 
convenience, and make a list of supported types in .md.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-26 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1859932001


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   Finally succeeded



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-26 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1859698729


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   
![image](https://github.com/user-attachments/assets/276e7836-bebd-46b3-8e14-704a9ec20d5e)
   In the CI phase, there are errors in the test cases of other modules, 
causing CI to fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-26 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1857869414


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   I added a configuration item for excel reading to select the engine for 
excel reading. POI is used by default. EaseExcel can be used through 
configuration. The configuration item name is excel_engine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-24 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1855878026


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   we can set poi version in 
https://github.com/apache/seatunnel/pull/8064/files#diff-1fb7018c76d4a71faced6c343bcbff3a7d20464c8c9f80338fa2a50e57fae254R142
 consistent with that in easyexcel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-24 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1855869201


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   > we can use poi version in easyexcel for now.
   
   What's the meaning of it ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-24 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1855720725


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   we can use poi version in easyexcel for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-22 Thread via GitHub


corgy-w commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1853626094


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   > oh, let's add an option to configure the excel parse engine, default POI, 
support POI and easyexcel at now. So we can implement other engine in the 
future.
   
   Will there be any conflict between poi versions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-21 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1853348202


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   Okay



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-21 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1853325381


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   oh, let's add an option to configure the excel parse engine, default POI, 
support POI and easyexcel at now. So we can implement other engine in the 
future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-21 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1853255425


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   > As we all know, easyexcel is no longer maintained. It doesn't seem good to 
introduce it at this time. We can try other alternatives, such as 
[fastexcel](https://github.com/dhatim/fastexcel). There are also 
[reports](https://stackoverflow.com/questions/69914445/apache-poi-large-excel-export-is-slow/69961916#69961916)
 online that it is faster than easyexcel. What do you think? cc @hailin0
   
   I tried using fastexcel, but there is a problem with its xls support for 
excel97-2003
   
   https://github.com/dhatim/fastexcel/issues/287
   
![image](https://github.com/user-attachments/assets/90892ece-3323-4c9d-a606-204f19fa4dbd)
   
   
![image](https://github.com/user-attachments/assets/66f02391-df27-47bf-a786-f1faffe1ce8e)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-21 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1853255425


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   > As we all know, easyexcel is no longer maintained. It doesn't seem good to 
introduce it at this time. We can try other alternatives, such as 
[fastexcel](https://github.com/dhatim/fastexcel). There are also 
[reports](https://stackoverflow.com/questions/69914445/apache-poi-large-excel-export-is-slow/69961916#69961916)
 online that it is faster than easyexcel. What do you think? cc @hailin0
   
   I tried using fastexcel, but there is a problem with its xls support for 
excel97-2003
   
![image](https://github.com/user-attachments/assets/66f02391-df27-47bf-a786-f1faffe1ce8e)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-20 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1851263200


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   > or easyexcel-plus?
   
easyexcel-plus was only on GitHub last night, and I haven't seen it in the 
maven repository yet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-20 Thread via GitHub


corgy-w commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1851258441


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/main/java/org/apache/seatunnel/connectors/seatunnel/file/excel/ExcelReaderListener.java:
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.file.excel;
+
+import org.apache.seatunnel.shade.com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.seatunnel.shade.com.typesafe.config.Config;
+
+import org.apache.seatunnel.api.configuration.ReadonlyConfig;
+import org.apache.seatunnel.api.source.Collector;
+import org.apache.seatunnel.api.table.type.SeaTunnelDataType;
+import org.apache.seatunnel.api.table.type.SeaTunnelRow;
+import org.apache.seatunnel.api.table.type.SeaTunnelRowType;
+import org.apache.seatunnel.api.table.type.SqlType;
+import org.apache.seatunnel.common.exception.CommonErrorCodeDeprecated;
+import org.apache.seatunnel.common.utils.DateTimeUtils;
+import org.apache.seatunnel.common.utils.DateUtils;
+import org.apache.seatunnel.common.utils.TimeUtils;
+import 
org.apache.seatunnel.connectors.seatunnel.file.config.BaseSourceConfigOptions;
+import 
org.apache.seatunnel.connectors.seatunnel.file.exception.FileConnectorException;
+
+import org.apache.poi.ss.usermodel.DateUtil;
+
+import com.alibaba.excel.context.AnalysisContext;
+import com.alibaba.excel.event.AnalysisEventListener;
+import com.alibaba.excel.exception.ExcelDataConvertException;
+import com.alibaba.excel.metadata.Cell;
+import com.alibaba.excel.metadata.data.ReadCellData;
+import lombok.SneakyThrows;
+import lombok.extern.slf4j.Slf4j;
+
+import java.math.BigDecimal;
+import java.nio.charset.StandardCharsets;
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+import java.time.LocalTime;
+import java.time.ZoneId;
+import java.time.format.DateTimeFormatter;
+import java.util.Date;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+
+@Slf4j
+public class ExcelReaderListener extends AnalysisEventListener> {
+private final String tableId;
+private final Collector output;
+private int cellCount;
+
+private final ObjectMapper objectMapper = new ObjectMapper();
+
+private DateTimeFormatter dateFormatter;
+private DateTimeFormatter dateTimeFormatter;
+private DateTimeFormatter timeFormatter;
+
+protected Config pluginConfig;
+
+protected SeaTunnelRowType seaTunnelRowType;
+
+private SeaTunnelDataType[] fieldTypes;
+
+Map customHeaders = new HashMap<>();
+
+public ExcelReaderListener(
+String tableId,
+Collector output,
+Config pluginConfig,
+SeaTunnelRowType seaTunnelRowType) {
+this.tableId = tableId;
+this.output = output;
+this.pluginConfig = pluginConfig;
+this.seaTunnelRowType = seaTunnelRowType;
+
+fieldTypes = seaTunnelRowType.getFieldTypes();
+
+if (pluginConfig.hasPath(BaseSourceConfigOptions.DATE_FORMAT.key())) {
+String dateFormatString =
+
pluginConfig.getString(BaseSourceConfigOptions.DATE_FORMAT.key());
+dateFormatter = DateTimeFormatter.ofPattern(dateFormatString);
+}
+if 
(pluginConfig.hasPath(BaseSourceConfigOptions.DATETIME_FORMAT.key())) {
+String datetimeFormatString =
+
pluginConfig.getString(BaseSourceConfigOptions.DATETIME_FORMAT.key());
+dateTimeFormatter = 
DateTimeFormatter.ofPattern(datetimeFormatString);
+}
+if (pluginConfig.hasPath(BaseSourceConfigOptions.TIME_FORMAT.key())) {
+String timeFormatString =
+
pluginConfig.getString(BaseSourceConfigOptions.TIME_FORMAT.key());
+timeFormatter = DateTimeFormatter.ofPattern(timeFormatString);
+}
+}
+
+@Override
+public void invokeHead(Map> headMap, 
AnalysisContext context) {
+for (int i = 0; i < headMap.size(); i++) {
+String header = headMap.get(i).getStringValue();
+if (!"null".equals(header)) {
+customHeaders.put(i, header);
+

Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-20 Thread via GitHub


dwave commented on PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#issuecomment-2487281888

   > https://github.com/apache/seatunnel/runs/33188901598 @dwave Please open ci 
workflow
   
   Okay, it's already opened


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-20 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1850435501


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   I will give it a try



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-20 Thread via GitHub


corgy-w commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1850205759


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   or easyexcel-plus?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-20 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1850194350


##
seatunnel-connectors-v2/connector-file/connector-file-base/pom.xml:
##
@@ -158,6 +159,13 @@
 jaxen
 ${jaxen.version}
 
+
+
+com.alibaba
+easyexcel
+${easyexcel.version}
+

Review Comment:
   As we all know, easyexcel is no longer maintained. It doesn't seem good to 
introduce it at this time. We can try other alternatives, such as 
[fastexcel](https://github.com/dhatim/fastexcel). There are also 
[reports](https://stackoverflow.com/questions/69914445/apache-poi-large-excel-export-is-slow/69961916#69961916)
 online that it is faster than easyexcel. What do you think? cc @hailin0 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1849697779


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/resources/excel/test_read_excel.conf:
##
@@ -18,6 +18,7 @@
 {
   sheet_name = "Sheet1"
   skip_header_row_number = 1
+  date_format = "/M/d"

Review Comment:
   please revert it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1849708741


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/resources/excel/test_read_excel.conf:
##
@@ -18,6 +18,7 @@
 {
   sheet_name = "Sheet1"
   skip_header_row_number = 1
+  date_format = "/M/d"

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1849692697


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/resources/excel/test_read_excel.conf:
##
@@ -18,6 +18,7 @@
 {
   sheet_name = "Sheet1"
   skip_header_row_number = 1
+  date_format = "/M/d"

Review Comment:
   > so revert this config can pass ci too?
   
   yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1849689612


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/resources/excel/test_read_excel.conf:
##
@@ -18,6 +18,7 @@
 {
   sheet_name = "Sheet1"
   skip_header_row_number = 1
+  date_format = "/M/d"

Review Comment:
   so revert this config can pass ci too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1849684165


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/resources/excel/test_read_excel.conf:
##
@@ -18,6 +18,7 @@
 {
   sheet_name = "Sheet1"
   skip_header_row_number = 1
+  date_format = "/M/d"

Review Comment:
   > Is that we must add `date_format` config to pass ci?
   It is not necessary to use date_format, datetime_format, time_format, see 
LocalFile
   These three parameters exist in the parameters, allowing users to set them.
   
   https://seatunnel.apache.org/docs/2.3.8/connector-v2/source/LocalFile
   
In the previous version, I did not see the use of configuration. In this 
modification, the parameters are used. If date_format, datetime_format, 
time_format is configured, it will be based on the configured date_format. 
datetime_format,t ime_format performs time format parsing. If it is not 
configured, the cell format will be used to obtain the original value of the 
stored double type. If the cell format is not configured as a time type, 
DateTimeUtils.matchDateTimeFormatter / DateUtils.matchDateFormatter  will be 
used to format the string with time and date. Match is used to parse
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1849660373


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/resources/excel/test_read_excel.conf:
##
@@ -18,6 +18,7 @@
 {
   sheet_name = "Sheet1"
   skip_header_row_number = 1
+  date_format = "/M/d"

Review Comment:
   Is that we must add `date_format` config to pass ci?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


corgy-w commented on PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#issuecomment-2485409248

   https://github.com/apache/seatunnel/runs/33188901598 @dwave Please open ci 
workflow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-19 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1847995238


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/file/Reader/ExcelReadStrategyTest.java:
##
@@ -15,8 +15,9 @@
  * limitations under the License.
  */
 
-package org.apache.seatunnel.connectors.seatunnel.file.writer;
+package org.apache.seatunnel.connectors.seatunnel.file.Reader;

Review Comment:
   
![image](https://github.com/user-attachments/assets/7e10f6fe-7202-45dd-bc40-15bfe9820adf)
   Excel stores the double type of the data of the time type at the bottom 
layer, so using double to convert back to the Date and DateTime types can now 
pass all test cases.
   And added the string matching recognition time formatting option of /M/d 
and -M-d in DateTimeUtils and DateUtils



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-18 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1847504915


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/file/Reader/ExcelReadStrategyTest.java:
##
@@ -54,7 +55,7 @@ public class ExcelReadStrategyTest {
 
 @Test
 public void testExcelRead() throws IOException, URISyntaxException {
-testExcelRead("/excel/test_read_excel.xlsx");
+//testExcelRead("/excel/test_read_excel.xlsx");

Review Comment:
   Okay, I'll find a way to deal with it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-18 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1846281321


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/file/Reader/ExcelReadStrategyTest.java:
##
@@ -54,7 +55,7 @@ public class ExcelReadStrategyTest {
 
 @Test
 public void testExcelRead() throws IOException, URISyntaxException {
-testExcelRead("/excel/test_read_excel.xlsx");
+//testExcelRead("/excel/test_read_excel.xlsx");

Review Comment:
   I think we should find some way to make sure the old behavior not changed. 
Or add an option to let user to choose use POI or EasyExcel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-15 Thread via GitHub


dwave commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1843657279


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/file/Reader/ExcelReadStrategyTest.java:
##
@@ -54,7 +55,7 @@ public class ExcelReadStrategyTest {
 
 @Test
 public void testExcelRead() throws IOException, URISyntaxException {
-testExcelRead("/excel/test_read_excel.xlsx");
+//testExcelRead("/excel/test_read_excel.xlsx");

Review Comment:
   This is the test excel used in the commented out code, and the date string 
that needs to be converted is 2024/1/31, and the format is
{mso-generic-font-family:auto;
mso-font-charset:134;
mso-number-format:"/m/d"; }
   
   In POI, we can get the correct data type according to the format of the 
cell, but in EasyExcel, we can only get the string, and the conversion of the 
string to the Date type does not conform to the defined Y/MM/dd format, 
which causes the test case to fail, so I commented out this one test case
   
   
![image](https://github.com/user-attachments/assets/950ce24f-d9b8-4ce2-aa98-94bfbe9b92f6)
   
   
![image](https://github.com/user-attachments/assets/30bb4f1f-1bff-4ead-b3e6-49cb6132fda8)
   
   
![image](https://github.com/user-attachments/assets/47b4065d-1f6c-4005-a6d2-eaa6307a39f3)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Improve][Connector-V2] Change read excel util from POI to EasyExcel [seatunnel]

2024-11-15 Thread via GitHub


Hisoka-X commented on code in PR #8064:
URL: https://github.com/apache/seatunnel/pull/8064#discussion_r1843609823


##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/file/Reader/ExcelReadStrategyTest.java:
##
@@ -54,7 +55,7 @@ public class ExcelReadStrategyTest {
 
 @Test
 public void testExcelRead() throws IOException, URISyntaxException {
-testExcelRead("/excel/test_read_excel.xlsx");
+//testExcelRead("/excel/test_read_excel.xlsx");

Review Comment:
   why disable this?



##
seatunnel-connectors-v2/connector-file/connector-file-base/src/test/java/org/apache/seatunnel/connectors/seatunnel/file/Reader/ExcelReadStrategyTest.java:
##
@@ -15,8 +15,9 @@
  * limitations under the License.
  */
 
-package org.apache.seatunnel.connectors.seatunnel.file.writer;
+package org.apache.seatunnel.connectors.seatunnel.file.Reader;

Review Comment:
   ```suggestion
   package org.apache.seatunnel.connectors.seatunnel.file.reader;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org