To get around this, my team expires snapshots based on the number of snapshots
rather than by time. For example, if the reader jobs is scheduled to consume 2k
snapshot increments, we have a cron to retain the last 10k snapshots.
That gives enough time to unclog the pipeline if the read job gets
Hi,
My team has been using the custom catalog along with atomic metadata
updates but we never migrated existing iceberg tables onto it. We also
haven't turned on integration with the hive catalog, so I'm not sure how
easy it is to plug in there (I think there was some recent work on
that?).
Hi,
I was wondering if someone could give me some pointers on this line:
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/avro/AvroIO.java#L53
Some threads keep getting stuck trying to read the manifest list on
commit. From debug logs, it looks like we're re
Update: I think I'm wrong about the listing part. I think it will only
do the HEAD request. Also it seems like the consistency issue is
probably not something my team would encounter with our current jobs.
On 2020/11/12 02:17:10, John Clara wrote:
> (Not sure if this is actually rep
ration, I would say a
lot of>
> it comes down to how to configure the AWS S3 Client that you provide
to the>
> S3FileIO implementation, but a lot of the defaults are reasonable (you>
> might want to tweak a few like max connections and maybe the retry
policy).>
>
>
ng for this info:
* https://github.com/apache/iceberg/issues/761 (issue for getting
started guide)
* https://iceberg.apache.org/spec/#file-system-operations
Thanks everyone,
John Clara