Re: An edge-case on snapshot expiration, incremental reads and very slow consecutive writes

2021-01-12 Thread John Clara
To get around this, my team expires snapshots based on the number of snapshots rather than by time. For example, if the reader jobs is scheduled to consume 2k snapshot increments, we have a cron to retain the last 10k snapshots. That gives enough time to unclog the pipeline if the read job gets

Re: Integrating Existing Iceberg Tables with a Metastore

2020-11-19 Thread John Clara
Hi, My team has been using the custom catalog along with atomic metadata updates but we never migrated existing iceberg tables onto it. We also haven't turned on integration with the hive catalog, so I'm not sure how easy it is to plug in there (I think there was some recent work on that?).

AvroIO Reflection Question

2020-11-17 Thread John Clara
Hi, I was wondering if someone could give me some pointers on this line: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/avro/AvroIO.java#L53 Some threads keep getting stuck trying to read the manifest list on commit. From debug logs, it looks like we're re

Re: Suggested S3 FileIO/Getting Started

2020-11-11 Thread John Clara
Update: I think I'm wrong about the listing part. I think it will only do the HEAD request. Also it seems like the consistency issue is probably not something my team would encounter with our current jobs. On 2020/11/12 02:17:10, John Clara wrote: > (Not sure if this is actually rep

Re: Suggested S3 FileIO/Getting Started

2020-11-11 Thread John Clara
ration, I would say a lot of> > it comes down to how to configure the AWS S3 Client that you provide to the> > S3FileIO implementation, but a lot of the defaults are reasonable (you> > might want to tweak a few like max connections and maybe the retry policy).> > >

Suggested S3 FileIO/Getting Started

2020-11-11 Thread John Clara
ng for this info: * https://github.com/apache/iceberg/issues/761 (issue for getting started guide) * https://iceberg.apache.org/spec/#file-system-operations Thanks everyone, John Clara