Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-22 Thread via GitHub


hailin0 merged PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-21 Thread via GitHub


corgy-w commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490875669

   > When testing. xls locally, it prompts that it is not supported
   
   Got it. I will check it out when I have time. tks @zhdech 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-21 Thread via GitHub


zhdech commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490607304

   > Forgot to add, although .xlsx files do not support reading after being 
compressed by gz, .xls does. Can be added later for testing. cc @Hisoka-X 
@zhdech
   When testing. xls locally, it prompts that it is not supported
   
![image](https://github.com/user-attachments/assets/e40d3da6-4e6a-4d96-8a37-ac3d5a1b03ce)
   `env {
 parallelism = 1
 job.mode = "BATCH"
 # You can set spark configuration here
 spark.app.name = "SeaTunnel"
 spark.executor.instances = 2
 spark.executor.cores = 1
 spark.executor.memory = "1g"
 spark.master = local
 job.mode = "BATCH"
   }
   
   source {
 LocalFile {
   path = "/seatunnel/read/gz/excel/single/e2e-xls-gz.xls.gz"
   result_table_name = "fake"
   file_format_type = excel
   archive_compress_codec = "gz"
   field_delimiter = ;
   skip_header_row_number = 1
   schema = {
 fields {
   c_map = "map"
   c_array = "array"
   c_string = string
   c_boolean = boolean
   c_tinyint = tinyint
   c_smallint = smallint
   c_int = int
   c_bigint = bigint
   c_float = float
   c_double = double
   c_bytes = bytes
   c_date = date
   c_decimal = "decimal(38, 18)"
   c_timestamp = timestamp
   c_row = {
 c_map = "map"
 c_array = "array"
 c_string = string
 c_boolean = boolean
 c_tinyint = tinyint
 c_smallint = smallint
 c_int = int
 c_bigint = bigint
 c_float = float
 c_double = double
 c_bytes = bytes
 c_date = date
 c_decimal = "decimal(38, 18)"
 c_timestamp = timestamp
   }
 }
   }
 }
   }
   
   sink {
 Assert {
   rules {
 row_rules = [
   {
 rule_type = MAX_ROW
 rule_value = 5
   },
   {
 rule_type = MIN_ROW
 rule_value = 5
   }
 ],
 field_rules = [
   {
 field_name = c_string
 field_type = string
 field_value = [
   {
 rule_type = NOT_NULL
   }
 ]
   },
   {
 field_name = c_boolean
 field_type = boolean
 field_value = [
   {
 rule_type = NOT_NULL
   }
 ]
   },
   {
 field_name = c_double
 field_type = double
 field_value = [
   {
 rule_type = NOT_NULL
   }
 ]
   }
 ]
   }
 }
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-21 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851567714


##
docs/en/connector-v2/source/S3File.md:
##
@@ -299,6 +299,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   
   
   > > Fix doc , gz format doesn't seem to compress.xlsx reads directly
   > 
   > The code supports. xls. Can we add Excel support to the doc? @corgy-w
   
   Let merge this. I will take a look at it later @zhdech 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-21 Thread via GitHub


zhdech commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851551611


##
docs/en/connector-v2/source/S3File.md:
##
@@ -299,6 +299,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   > Fix doc , gz format doesn't seem to compress.xlsx reads directly
   
   The code supports. xls. Can we add Excel support to the doc? @corgy-w 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-21 Thread via GitHub


zhdech commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851551611


##
docs/en/connector-v2/source/S3File.md:
##
@@ -299,6 +299,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   > Fix doc , gz format doesn't seem to compress.xlsx reads directly
   
   Can we add Excel support to doc?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-20 Thread via GitHub


corgy-w commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490190787

   Forgot to add, although .xlsx files do not support reading after being 
compressed by gz, .xls does. Can be added later for testing.  cc @Hisoka-X 
@zhdech 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-20 Thread via GitHub


zhdech commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851253930


##
docs/en/connector-v2/source/S3File.md:
##
@@ -299,6 +299,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   Adjusted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-20 Thread via GitHub


Hisoka-X commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1848213029


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase {
 "/seatunnel/read/zip/excel/multifile/multiZip.zip",
 container);
 
+Path xlsxGz =
+convertToGzFile(
+Lists.newArrayList(
+
ContainerUtil.getResourcesFile("/excel/e2e.xlsx")),
+"e2e-xlsx-gz");

Review Comment:
   We CAN NOT remove .xlsx support.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


zhdech commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849642421


##
docs/en/connector-v2/source/S3File.md:
##
@@ -299,6 +299,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   OK, let me verify the Excel problem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849613125


##
docs/en/connector-v2/source/HdfsFile.md:
##
@@ -144,6 +144,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   ditto



##
docs/en/connector-v2/source/LocalFile.md:
##
@@ -322,6 +322,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   ditto



##
docs/en/connector-v2/source/FtpFile.md:
##
@@ -328,6 +328,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   ditto



##
docs/en/connector-v2/source/OssJindoFile.md:
##
@@ -335,6 +335,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   ditto



##
docs/en/connector-v2/source/CosFile.md:
##
@@ -343,6 +343,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849608744


##
docs/en/connector-v2/source/S3File.md:
##
@@ -299,6 +299,7 @@ The compress codec of archive files and the details that 
supported as the follow
 | ZIP| txt,json,excel,xml | .zip|
 | TAR| txt,json,excel,xml | .tar|
 | TAR_GZ | txt,json,excel,xml | .tar.gz |
+| GZ | txt,json,excel,xml | .gz |

Review Comment:
   Fix doc , gz format doesn't seem to compress.xlsx reads directly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


zhdech commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2487251431

   @Hisoka-X Sir, please help me check it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


zhdech commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849339649


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase {
 "/seatunnel/read/zip/excel/multifile/multiZip.zip",
 container);
 
+Path xlsxGz =
+convertToGzFile(
+Lists.newArrayList(
+
ContainerUtil.getResourcesFile("/excel/e2e.xlsx")),
+"e2e-xlsx-gz");

Review Comment:
   Even after removing Excel, it still cannot pass.
   I suspect there may be some issues with the original case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


Hisoka-X commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1848213029


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase {
 "/seatunnel/read/zip/excel/multifile/multiZip.zip",
 container);
 
+Path xlsxGz =
+convertToGzFile(
+Lists.newArrayList(
+
ContainerUtil.getResourcesFile("/excel/e2e.xlsx")),
+"e2e-xlsx-gz");

Review Comment:
   ~~We CAN NOT remove .xlsx support.~~
   
   My mistake, if we doesn't support it before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-19 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1848139842


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase {
 "/seatunnel/read/zip/excel/multifile/multiZip.zip",
 container);
 
+Path xlsxGz =
+convertToGzFile(
+Lists.newArrayList(
+
ContainerUtil.getResourcesFile("/excel/e2e.xlsx")),
+"e2e-xlsx-gz");

Review Comment:
   @zhdech remember to adjust doc if you remove .xlsx support



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-18 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1847631780


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase {
 "/seatunnel/read/zip/excel/multifile/multiZip.zip",
 container);
 
+Path xlsxGz =
+convertToGzFile(
+Lists.newArrayList(
+
ContainerUtil.getResourcesFile("/excel/e2e.xlsx")),
+"e2e-xlsx-gz");

Review Comment:
   After testing, it seems that it is a compression problem. ?xlsx file cannot 
be converted after compression?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-17 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1845909320


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase {
 
"/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz",
 container);
 
+Path txtGz =

Review Comment:
   At night or at noon tomorrow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-17 Thread via GitHub


zhdech commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1845765567


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase {
 
"/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz",
 container);
 
+Path txtGz =

Review Comment:
   The file path in the container has been adjusted, but the case still cannot 
pass. Please help me take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-17 Thread via GitHub


corgy-w commented on code in PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1845443668


##
seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java:
##
@@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase {
 
"/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz",
 container);
 
+Path txtGz =

Review Comment:
   I briefly looked at the error report. It may be because the files in the 
container are messed up. You can adjust the path of the files in the container.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-12 Thread via GitHub


Hisoka-X commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2472174529

   > May I ask how to resolve the following construction errors? What do you 
need me to do?
   
   Try to retrigger failed ci. It is unstable. cc @zhangshenghang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]

2024-11-12 Thread via GitHub


zhdech commented on PR #8025:
URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2472053151

   > Thanks @zhdech ! Could you add a test case for this feature?
   
   OK。May I ask how to resolve the construction error mentioned above? What do 
you need me to do?
   好的。请问,针对上面的构建错误,如何解决?需要我怎么做?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org