[GitHub] [incubator-iceberg] waterlx commented on issue #457: DataFrame generated by Seq() might have schema conflict with Iceberg
waterlx commented on issue #457: DataFrame generated by Seq() might have schema conflict with Iceberg URL: https://github.com/apache/incubator-iceberg/issues/457#issuecomment-534483177 @sujithjay , thanks for taking care of this! I see your point by changing the Iceberg schema, but what if I do not want to make that field as "optional", and keep it as "required"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] waterlx removed a comment on issue #492: Should issue an error/warning message when no data file to delete
waterlx removed a comment on issue #492: Should issue an error/warning message when no data file to delete URL: https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534425688 I used the following API, trying to delete a data file ``` table.newDelete().deleteFile("file:///path1") ``` And got the following message: INFO SnapshotProducer: `Committed snapshot x (StreamingDelete)` I thought the deletion succeeded, but actually it did not, because the data file is "file:/path1", not "file:///path1". While [file URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme) is another topic, but it would be better to issue an error/warning here when files to delete are not in data files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] waterlx commented on issue #492: Should issue an error message when no data file to delete
waterlx commented on issue #492: Should issue an error message when no data file to delete URL: https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534425688 I used the following API, trying to delete a data file ``` table.newDelete().deleteFile("file:///path1") ``` And got the following message: INFO SnapshotProducer: `Committed snapshot x (StreamingDelete)` I thought the deletion succeeded, but actually it did not, because the data file is "file:/path1", not "file:///path1". While [file URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme) is another topic, but it would be better to issue an error/warning here when files to delete are not in data files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] waterlx opened a new issue #493: concurrent delete
waterlx opened a new issue #493: concurrent delete URL: https://github.com/apache/incubator-iceberg/issues/493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] waterlx commented on issue #493: Concurrently delete a data file
waterlx commented on issue #493: Concurrently delete a data file URL: https://github.com/apache/incubator-iceberg/issues/493#issuecomment-534543516 @rdblue Could you please help to review my findings at your most convenience? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] waterlx commented on issue #492: Should issue an error/warning message when no data file to delete
waterlx commented on issue #492: Should issue an error/warning message when no data file to delete URL: https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534545600 @rdblue make sense to you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #493: Concurrently delete a data file
rdblue commented on issue #493: Concurrently delete a data file URL: https://github.com/apache/incubator-iceberg/issues/493#issuecomment-534648500 This doesn't fail because deletes are idempotent. You tell Iceberg to delete with a filter or to delete a specific file, and that data will not be in the table after the delete commits. That's also why you see multiple delete operations. You ran 2 successful deletes on the table. The two metadata files you posted are two attempts to delete at the same time. That's why they use the same parent snapshot ID. The metastore must guarantee an atomic swap that is used to get a linear history of changes. So only one of these deletes will have been successfully committed. The other would have had to retry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #492: Should issue an error/warning message when no data file to delete
rdblue commented on issue #492: Should issue an error/warning message when no data file to delete URL: https://github.com/apache/incubator-iceberg/issues/492#issuecomment-534652798 We could add an option to StreamingDelete to validate that some data is deleted, if you wanted a failure instead of an idempotent delete. Other operations configure validations that are exposed by the base class, MergingSnapshotProducer. We could add a validation method here as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #430: Support bucket table for Iceberg
rdblue commented on issue #430: Support bucket table for Iceberg URL: https://github.com/apache/incubator-iceberg/issues/430#issuecomment-534654396 @jerryshao, yes that's correct. That's why we need to expose the transformation functions to Spark via FunctionCatalog, and add the ability for DSv2 sources to set distribution and ordering requirements with those functions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #93: [MINOR] Throw an exception early if there is no dict available instead of handling dictionary-encoded pages for absent dict
rdblue commented on issue #93: [MINOR] Throw an exception early if there is no dict available instead of handling dictionary-encoded pages for absent dict URL: https://github.com/apache/incubator-iceberg/pull/93#issuecomment-534786457 Looks good to me. If `dict` returned null, further uses would fail with a null pointer exception. So this should be safe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #432: Allow writers to control size of files generated
rdblue commented on issue #432: Allow writers to control size of files generated URL: https://github.com/apache/incubator-iceberg/pull/432#issuecomment-534785869 Thanks for fixing this, @xabriel! I'll merge it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue merged pull request #432: Allow writers to control size of files generated
rdblue merged pull request #432: Allow writers to control size of files generated URL: https://github.com/apache/incubator-iceberg/pull/432 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] holdenk opened a new issue #494: TestScanSummary appears to be flaky
holdenk opened a new issue #494: TestScanSummary appears to be flaky URL: https://github.com/apache/incubator-iceberg/issues/494 I was building Apache Icerberg for the first time and I ran into a test failure with `TestScanSummary`. Re-running the test with debugging information on, or by its self it started passing. Looking at the fix of waiting for the system clock to change by at least 1 tick I think there maybe needs to be more separation for the test not to be flaky. More generally I'm getting really confusing information from the internet of if the way how we're attempting to use currentTimeMillis is safe, e.g. seems like it may not be monotonically increasing but that might just be really old kernels. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] aokolnychyi commented on issue #430: Support bucket table for Iceberg
aokolnychyi commented on issue #430: Support bucket table for Iceberg URL: https://github.com/apache/incubator-iceberg/issues/430#issuecomment-534852783 @dbtsai, FYI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org