Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
hailin0 merged PR #8025: URL: https://github.com/apache/seatunnel/pull/8025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490875669 > When testing. xls locally, it prompts that it is not supported Got it. I will check it out when I have time. tks @zhdech -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490607304 > Forgot to add, although .xlsx files do not support reading after being compressed by gz, .xls does. Can be added later for testing. cc @Hisoka-X @zhdech When testing. xls locally, it prompts that it is not supported ![image](https://github.com/user-attachments/assets/e40d3da6-4e6a-4d96-8a37-ac3d5a1b03ce) `env { parallelism = 1 job.mode = "BATCH" # You can set spark configuration here spark.app.name = "SeaTunnel" spark.executor.instances = 2 spark.executor.cores = 1 spark.executor.memory = "1g" spark.master = local job.mode = "BATCH" } source { LocalFile { path = "/seatunnel/read/gz/excel/single/e2e-xls-gz.xls.gz" result_table_name = "fake" file_format_type = excel archive_compress_codec = "gz" field_delimiter = ; skip_header_row_number = 1 schema = { fields { c_map = "map" c_array = "array" c_string = string c_boolean = boolean c_tinyint = tinyint c_smallint = smallint c_int = int c_bigint = bigint c_float = float c_double = double c_bytes = bytes c_date = date c_decimal = "decimal(38, 18)" c_timestamp = timestamp c_row = { c_map = "map" c_array = "array" c_string = string c_boolean = boolean c_tinyint = tinyint c_smallint = smallint c_int = int c_bigint = bigint c_float = float c_double = double c_bytes = bytes c_date = date c_decimal = "decimal(38, 18)" c_timestamp = timestamp } } } } } sink { Assert { rules { row_rules = [ { rule_type = MAX_ROW rule_value = 5 }, { rule_type = MIN_ROW rule_value = 5 } ], field_rules = [ { field_name = c_string field_type = string field_value = [ { rule_type = NOT_NULL } ] }, { field_name = c_boolean field_type = boolean field_value = [ { rule_type = NOT_NULL } ] }, { field_name = c_double field_type = double field_value = [ { rule_type = NOT_NULL } ] } ] } } } ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851567714 ## docs/en/connector-v2/source/S3File.md: ## @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: > > Fix doc , gz format doesn't seem to compress.xlsx reads directly > > The code supports. xls. Can we add Excel support to the doc? @corgy-w Let merge this. I will take a look at it later @zhdech -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851551611 ## docs/en/connector-v2/source/S3File.md: ## @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: > Fix doc , gz format doesn't seem to compress.xlsx reads directly The code supports. xls. Can we add Excel support to the doc? @corgy-w -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851551611 ## docs/en/connector-v2/source/S3File.md: ## @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: > Fix doc , gz format doesn't seem to compress.xlsx reads directly Can we add Excel support to doc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2490190787 Forgot to add, although .xlsx files do not support reading after being compressed by gz, .xls does. Can be added later for testing. cc @Hisoka-X @zhdech -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1851253930 ## docs/en/connector-v2/source/S3File.md: ## @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: Adjusted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
Hisoka-X commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1848213029 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/zip/excel/multifile/multiZip.zip", container); +Path xlsxGz = +convertToGzFile( +Lists.newArrayList( + ContainerUtil.getResourcesFile("/excel/e2e.xlsx")), +"e2e-xlsx-gz"); Review Comment: We CAN NOT remove .xlsx support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849642421 ## docs/en/connector-v2/source/S3File.md: ## @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: OK, let me verify the Excel problem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849613125 ## docs/en/connector-v2/source/HdfsFile.md: ## @@ -144,6 +144,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: ditto ## docs/en/connector-v2/source/LocalFile.md: ## @@ -322,6 +322,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: ditto ## docs/en/connector-v2/source/FtpFile.md: ## @@ -328,6 +328,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: ditto ## docs/en/connector-v2/source/OssJindoFile.md: ## @@ -335,6 +335,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: ditto ## docs/en/connector-v2/source/CosFile.md: ## @@ -343,6 +343,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849608744 ## docs/en/connector-v2/source/S3File.md: ## @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow | ZIP| txt,json,excel,xml | .zip| | TAR| txt,json,excel,xml | .tar| | TAR_GZ | txt,json,excel,xml | .tar.gz | +| GZ | txt,json,excel,xml | .gz | Review Comment: Fix doc , gz format doesn't seem to compress.xlsx reads directly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2487251431 @Hisoka-X Sir, please help me check it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1849339649 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/zip/excel/multifile/multiZip.zip", container); +Path xlsxGz = +convertToGzFile( +Lists.newArrayList( + ContainerUtil.getResourcesFile("/excel/e2e.xlsx")), +"e2e-xlsx-gz"); Review Comment: Even after removing Excel, it still cannot pass. I suspect there may be some issues with the original case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
Hisoka-X commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1848213029 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/zip/excel/multifile/multiZip.zip", container); +Path xlsxGz = +convertToGzFile( +Lists.newArrayList( + ContainerUtil.getResourcesFile("/excel/e2e.xlsx")), +"e2e-xlsx-gz"); Review Comment: ~~We CAN NOT remove .xlsx support.~~ My mistake, if we doesn't support it before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1848139842 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/zip/excel/multifile/multiZip.zip", container); +Path xlsxGz = +convertToGzFile( +Lists.newArrayList( + ContainerUtil.getResourcesFile("/excel/e2e.xlsx")), +"e2e-xlsx-gz"); Review Comment: @zhdech remember to adjust doc if you remove .xlsx support -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1847631780 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -224,6 +248,14 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/zip/excel/multifile/multiZip.zip", container); +Path xlsxGz = +convertToGzFile( +Lists.newArrayList( + ContainerUtil.getResourcesFile("/excel/e2e.xlsx")), +"e2e-xlsx-gz"); Review Comment: After testing, it seems that it is a compression problem. ?xlsx file cannot be converted after compression? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1845909320 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz", container); +Path txtGz = Review Comment: At night or at noon tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1845765567 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz", container); +Path txtGz = Review Comment: The file path in the container has been adjusted, but the case still cannot pass. Please help me take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
corgy-w commented on code in PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#discussion_r1845443668 ## seatunnel-e2e/seatunnel-connector-v2-e2e/connector-file-local-e2e/src/test/java/org/apache/seatunnel/e2e/connector/file/local/LocalFileIT.java: ## @@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase { "/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz", container); +Path txtGz = Review Comment: I briefly looked at the error report. It may be because the files in the container are messed up. You can adjust the path of the files in the container. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
Hisoka-X commented on PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2472174529 > May I ask how to resolve the following construction errors? What do you need me to do? Try to retrigger failed ci. It is unstable. cc @zhangshenghang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Feature][Connectors] LocalFile Support reading gz [seatunnel]
zhdech commented on PR #8025: URL: https://github.com/apache/seatunnel/pull/8025#issuecomment-2472053151 > Thanks @zhdech ! Could you add a test case for this feature? OK。May I ask how to resolve the construction error mentioned above? What do you need me to do? 好的。请问,针对上面的构建错误,如何解决?需要我怎么做? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org