[GitHub] [incubator-iceberg] rdblue commented on issue #280: Add persistent IDs to partition fields
rdblue commented on issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280#issuecomment-526704926 @manishmalhotrawork, one strange thing about your test is that `data_bucket` has a different ID. It should continue to use id 1000 because it hasn't changed. Either the assignment logic or the evolution logic (like how `SchemaUpdate` works) should detect that the column has not changed and not assign a different ID. Without seeing more of what's happening, I'm not really able to tell why you're getting that error. Can you open a WIP pull request? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #280: Add persistent IDs to partition fields
rdblue commented on issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280#issuecomment-525451795 > And also as TableMetadata knows how many fields are in partition, so can maintain the nextIDValue as well. The next partition field ID is the highest field ID in all of the table's partition specs +1. Once a partition spec is removed, we can reuse the ID. Alternatively, we can keep track of the last assigned ID, like we do for the table schema. > Also the TableMetadata#updatePartitionSpec should also use nextIDValue to pass to PartitionSpec. I think the spec's IDs will be assigned by the time that method is called because the partition spec passed in is already created. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #280: Add persistent IDs to partition fields
rdblue commented on issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280#issuecomment-524938283 @manishmalhotrawork, those IDs have different contexts. The source ID in a partition field is the ID of the source data column in the table schema. The ID added by partitionType is the ID in the manifest file schema. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #280: Add persistent IDs to partition fields
rdblue commented on issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280#issuecomment-524579846 @manishmalhotrawork, we need to keep track of IDs that have been assigned to partition fields in a table and reuse them when partition specs change. They should probably continue to start at 1,000. @timmylicheng, Schema field IDs are the integers passed in when creating struct fields, maps, and lists. See http://iceberg.apache.org/api/#nested-types This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #280: Add persistent IDs to partition fields
rdblue commented on issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280#issuecomment-515096818 @timmylicheng, sorry for the confusion. The partition spec ID and the IDs I'm talking about here aren't the same thing. Partition specs are assigned IDs so that we can write manifest files that reference those specs, I think you're talking about those IDs. The IDs I'm talking about here are schema field IDs that are used to write the record of partition data in the manifest file. Right now, those IDs are assigned each time a manifest file is created, starting at 1000. Instead, we should use persistent IDs for each unique partition field and keep track of the last assigned ID for partition evolution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on issue #280: Add persistent IDs to partition fields
rdblue commented on issue #280: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/issues/280#issuecomment-514712278 Yes. If a table has multiple partition specs, it will probably have multiple manifests, each written with one of those specs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org