Re: Consistency problems with Iceberg + EMRFS

2021-06-29 Thread Ryan Blue
Greg, the ExpireSnapshots code has a really simple commit that just removes
references to snapshots. The code that actually deletes files runs after
the commit finishes, when the files are no longer referenced by the current
metadata. I don't think this would be where the issue is.

On Tue, Jun 29, 2021 at 12:17 PM Greg Hill 
wrote:

> Thanks. I will definitely take a stab at it if we end up going that route.
>
>
>
> We were also looking at using an external locking service to lock the
> table around both the expireSnapshots and the other table overwrite/append
> operations. This didn’t solve the issue for us, which we found confusing. I
> was digging through the expireSnapshots code and noticed it does attempt to
> do the file deletion in parallel. Is it possible that the expireSnapshots
> process itself is causing the conflict if there is more than one snapshot
> to expire? It does look like it only removes the expired snapshots from
> each metadata file once, but I’m not 100% sure I’m reading the code
> correctly.
>
>
>
> Greg
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Monday, June 28, 2021 at 11:44 AM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Greg, I don't think that there's a tool that does this for you. It is
> probably a good idea to build something that takes a properties or CSV
> file with table name to metadata file mappings and uses it to import
> tables. That's basically all you'd need.
>
>
>
> If you want to build that, I'll help review it and get it in! Just ping me
> on a PR.
>
>
>
> On Mon, Jun 28, 2021 at 6:59 AM Greg Hill 
> wrote:
>
> I’m Scott’s coworker and I’m looking in to migrating to the Glue catalog
> option. I’m a lot newer to Iceberg than Scott so I’m still ramping up, but
> at a glance I’m not seeing any easy way to migrate from the
> filesystem-based metadata to using the glue catalog. Is there any tooling
> to assist with such a migration?
>
>
>
> Greg
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Friday, June 11, 2021 at 11:14 AM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> That makes sense to me, then. Is it possible to coordinate externally
> between the maintenance service and your commits? This would normally be
> handled by your metastore, but if you're using an unsafe path you could
> solve it by making sure that both don't run at once or something. Of
> course, that may be more work than using a metastore instead of Hadoop
> tables.
>
>
>
> On Fri, Jun 11, 2021 at 8:08 AM Scott Kruger 
> wrote:
>
> Actually someone on my team pointed out that we have a separate process
> that runs `Actions.expireSnapshots`; this could definitely be a source of
> concurrent writes.
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 12:05 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Yeah, Dan is right. If you want to use HDFS tables then you have to use a
> Hadoop FileSystem directly since the FileIO interface doesn't include
> rename (because we don't want to use it).
>
>
>
> On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger 
> wrote:
>
> The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s
> got to be some way to manage tables in 0.11.1 with S3FileIO though, right?
> We’re using spark 3; perhaps we can use `SparkCatalog` instead of
> `HadoopTables`?
>
>
>
> *From: *Daniel Weeks 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 10:36 AM
> *To: *Iceberg Dev List 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Scott, I don't think you can use S3FileIO with HadoopTables because
> HadoopTables requires file system support for operations like rename and
> the FileIO is not intended to support those features.
>
>
>
> I think a really promising alternative is the DynamoDB Catalog
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2F

Re: Consistency problems with Iceberg + EMRFS

2021-06-29 Thread Greg Hill
Thanks. I will definitely take a stab at it if we end up going that route.

We were also looking at using an external locking service to lock the table 
around both the expireSnapshots and the other table overwrite/append 
operations. This didn’t solve the issue for us, which we found confusing. I was 
digging through the expireSnapshots code and noticed it does attempt to do the 
file deletion in parallel. Is it possible that the expireSnapshots process 
itself is causing the conflict if there is more than one snapshot to expire? It 
does look like it only removes the expired snapshots from each metadata file 
once, but I’m not 100% sure I’m reading the code correctly.

Greg

From: Ryan Blue 
Reply-To: "[email protected]" 
Date: Monday, June 28, 2021 at 11:44 AM
To: "[email protected]" 
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Greg, I don't think that there's a tool that does this for you. It is probably 
a good idea to build something that takes a properties or CSV file with table 
name to metadata file mappings and uses it to import tables. That's basically 
all you'd need.

If you want to build that, I'll help review it and get it in! Just ping me on a 
PR.

On Mon, Jun 28, 2021 at 6:59 AM Greg Hill  wrote:
I’m Scott’s coworker and I’m looking in to migrating to the Glue catalog 
option. I’m a lot newer to Iceberg than Scott so I’m still ramping up, but at a 
glance I’m not seeing any easy way to migrate from the filesystem-based 
metadata to using the glue catalog. Is there any tooling to assist with such a 
migration?

Greg

From: Ryan Blue mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Friday, June 11, 2021 at 11:14 AM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
That makes sense to me, then. Is it possible to coordinate externally between 
the maintenance service and your commits? This would normally be handled by 
your metastore, but if you're using an unsafe path you could solve it by making 
sure that both don't run at once or something. Of course, that may be more work 
than using a metastore instead of Hadoop tables.

On Fri, Jun 11, 2021 at 8:08 AM Scott Kruger  
wrote:
Actually someone on my team pointed out that we have a separate process that 
runs `Actions.expireSnapshots`; this could definitely be a source of concurrent 
writes.

From: Ryan Blue mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Thursday, June 10, 2021 at 12:05 PM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Yeah, Dan is right. If you want to use HDFS tables then you have to use a 
Hadoop FileSystem directly since the FileIO interface doesn't include rename 
(because we don't want to use it).

On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger  
wrote:
The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s got 
to be some way to manage tables in 0.11.1 with S3FileIO though, right? We’re 
using spark 3; perhaps we can use `SparkCatalog` instead of `HadoopTables`?

From: Daniel Weeks mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Thursday, June 10, 2021 at 10:36 AM
To: Iceberg Dev List mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Scott, I don't think you can use S3FileIO with HadoopTables because 
HadoopTables requires file system support for operations like rename and the 
FileIO is not intended to support those features.

I think a really promising alternative is the DynamoDB 
Catalog<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
 implementation that Jack Ye just submitted (still under review).

-Dan

On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger  
wrote:
Going to bump this question then:

> Not usin

Re: Consistency problems with Iceberg + EMRFS

2021-06-28 Thread Ryan Blue
Greg, I don't think that there's a tool that does this for you. It is
probably a good idea to build something that takes a properties or CSV file
with table name to metadata file mappings and uses it to import tables.
That's basically all you'd need.

If you want to build that, I'll help review it and get it in! Just ping me
on a PR.

On Mon, Jun 28, 2021 at 6:59 AM Greg Hill  wrote:

> I’m Scott’s coworker and I’m looking in to migrating to the Glue catalog
> option. I’m a lot newer to Iceberg than Scott so I’m still ramping up, but
> at a glance I’m not seeing any easy way to migrate from the
> filesystem-based metadata to using the glue catalog. Is there any tooling
> to assist with such a migration?
>
>
>
> Greg
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Friday, June 11, 2021 at 11:14 AM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> That makes sense to me, then. Is it possible to coordinate externally
> between the maintenance service and your commits? This would normally be
> handled by your metastore, but if you're using an unsafe path you could
> solve it by making sure that both don't run at once or something. Of
> course, that may be more work than using a metastore instead of Hadoop
> tables.
>
>
>
> On Fri, Jun 11, 2021 at 8:08 AM Scott Kruger 
> wrote:
>
> Actually someone on my team pointed out that we have a separate process
> that runs `Actions.expireSnapshots`; this could definitely be a source of
> concurrent writes.
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 12:05 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Yeah, Dan is right. If you want to use HDFS tables then you have to use a
> Hadoop FileSystem directly since the FileIO interface doesn't include
> rename (because we don't want to use it).
>
>
>
> On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger 
> wrote:
>
> The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s
> got to be some way to manage tables in 0.11.1 with S3FileIO though, right?
> We’re using spark 3; perhaps we can use `SparkCatalog` instead of
> `HadoopTables`?
>
>
>
> *From: *Daniel Weeks 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 10:36 AM
> *To: *Iceberg Dev List 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Scott, I don't think you can use S3FileIO with HadoopTables because
> HadoopTables requires file system support for operations like rename and
> the FileIO is not intended to support those features.
>
>
>
> I think a really promising alternative is the DynamoDB Catalog
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
> implementation that Jack Ye just submitted (still under review).
>
>
>
> -Dan
>
>
>
> On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger 
> wrote:
>
> Going to bump this question then:
>
> > Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Wednesday, June 9, 2021 at 4:10 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message is from an external sender.
>
> Thanks for the additional detail. If you're not writing concurrently, then
> that eliminates the explanations that I had. I also don't think that
> Iceberg retries would be a problem because Iceberg will only retry if the
> commit fails. But there is no reason for a commit to fail and retry because
> nothing else is trying to modify the table. To make sure, you can check for
> "Ret

Re: Consistency problems with Iceberg + EMRFS

2021-06-28 Thread Greg Hill
I’m Scott’s coworker and I’m looking in to migrating to the Glue catalog 
option. I’m a lot newer to Iceberg than Scott so I’m still ramping up, but at a 
glance I’m not seeing any easy way to migrate from the filesystem-based 
metadata to using the glue catalog. Is there any tooling to assist with such a 
migration?

Greg

From: Ryan Blue 
Reply-To: "[email protected]" 
Date: Friday, June 11, 2021 at 11:14 AM
To: "[email protected]" 
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
That makes sense to me, then. Is it possible to coordinate externally between 
the maintenance service and your commits? This would normally be handled by 
your metastore, but if you're using an unsafe path you could solve it by making 
sure that both don't run at once or something. Of course, that may be more work 
than using a metastore instead of Hadoop tables.

On Fri, Jun 11, 2021 at 8:08 AM Scott Kruger  
wrote:
Actually someone on my team pointed out that we have a separate process that 
runs `Actions.expireSnapshots`; this could definitely be a source of concurrent 
writes.

From: Ryan Blue mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Thursday, June 10, 2021 at 12:05 PM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Yeah, Dan is right. If you want to use HDFS tables then you have to use a 
Hadoop FileSystem directly since the FileIO interface doesn't include rename 
(because we don't want to use it).

On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger  
wrote:
The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s got 
to be some way to manage tables in 0.11.1 with S3FileIO though, right? We’re 
using spark 3; perhaps we can use `SparkCatalog` instead of `HadoopTables`?

From: Daniel Weeks mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Thursday, June 10, 2021 at 10:36 AM
To: Iceberg Dev List mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Scott, I don't think you can use S3FileIO with HadoopTables because 
HadoopTables requires file system support for operations like rename and the 
FileIO is not intended to support those features.

I think a really promising alternative is the DynamoDB 
Catalog<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
 implementation that Jack Ye just submitted (still under review).

-Dan

On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger  
wrote:
Going to bump this question then:

> Not using EMRFS for the metadata is an interesting possibility. We’re using 
> HadoopTables currently; is there a Tables implementation that uses S3FileIO 
> that we can use, or can I somehow tell HadoopTables to use S3FileIO?


From: Ryan Blue mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Wednesday, June 9, 2021 at 4:10 PM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message is from an external sender.
Thanks for the additional detail. If you're not writing concurrently, then that 
eliminates the explanations that I had. I also don't think that Iceberg retries 
would be a problem because Iceberg will only retry if the commit fails. But 
there is no reason for a commit to fail and retry because nothing else is 
trying to modify the table. To make sure, you can check for "Retrying" logs 
from Iceberg.

Now that I'm looking more closely at the second error, I see that it is also 
caused by the eTag mismatch. I wonder if this might be a different level of 
retry. Maybe EMRFS has a transient error and that causes an internal retry on 
the write that is the source of the consistency error?

What you may be able to do to solve this is to use the S3FileIO instead of 
EMRFS.

Ryan

On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger  wrote:
Here’s a little more detail on our use case tha

Re: Consistency problems with Iceberg + EMRFS

2021-06-11 Thread Ryan Blue
That makes sense to me, then. Is it possible to coordinate externally
between the maintenance service and your commits? This would normally be
handled by your metastore, but if you're using an unsafe path you could
solve it by making sure that both don't run at once or something. Of
course, that may be more work than using a metastore instead of Hadoop
tables.

On Fri, Jun 11, 2021 at 8:08 AM Scott Kruger 
wrote:

> Actually someone on my team pointed out that we have a separate process
> that runs `Actions.expireSnapshots`; this could definitely be a source of
> concurrent writes.
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 12:05 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Yeah, Dan is right. If you want to use HDFS tables then you have to use a
> Hadoop FileSystem directly since the FileIO interface doesn't include
> rename (because we don't want to use it).
>
>
>
> On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger 
> wrote:
>
> The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s
> got to be some way to manage tables in 0.11.1 with S3FileIO though, right?
> We’re using spark 3; perhaps we can use `SparkCatalog` instead of
> `HadoopTables`?
>
>
>
> *From: *Daniel Weeks 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 10:36 AM
> *To: *Iceberg Dev List 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Scott, I don't think you can use S3FileIO with HadoopTables because
> HadoopTables requires file system support for operations like rename and
> the FileIO is not intended to support those features.
>
>
>
> I think a really promising alternative is the DynamoDB Catalog
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
> implementation that Jack Ye just submitted (still under review).
>
>
>
> -Dan
>
>
>
> On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger 
> wrote:
>
> Going to bump this question then:
>
> > Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Wednesday, June 9, 2021 at 4:10 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message is from an external sender.
>
> Thanks for the additional detail. If you're not writing concurrently, then
> that eliminates the explanations that I had. I also don't think that
> Iceberg retries would be a problem because Iceberg will only retry if the
> commit fails. But there is no reason for a commit to fail and retry because
> nothing else is trying to modify the table. To make sure, you can check for
> "Retrying" logs from Iceberg.
>
>
>
> Now that I'm looking more closely at the second error, I see that it is
> also caused by the eTag mismatch. I wonder if this might be a different
> level of retry. Maybe EMRFS has a transient error and that causes an
> internal retry on the write that is the source of the consistency error?
>
>
>
> What you may be able to do to solve this is to use the S3FileIO instead of
> EMRFS.
>
>
>
> Ryan
>
>
>
> On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger 
> wrote:
>
> Here’s a little more detail on our use case that might be helpful. We’re
> running a batch process to apply CDC to several hundred tables every few
> hours; we use iceberg (via HadoopTables) on top of a traditional Hive
> external table model (EMRFS + parquet + glue metastore)  to track the
> commits (that is, changes to the list of files) to these tables. There are
> a number of technical and “political” reasons for this that don’t really
> bear going into; all we really needed was a way to track files belong to a
> table that are managed via some process external to iceberg.

Re: Consistency problems with Iceberg + EMRFS

2021-06-11 Thread Scott Kruger
Actually someone on my team pointed out that we have a separate process that 
runs `Actions.expireSnapshots`; this could definitely be a source of concurrent 
writes.

From: Ryan Blue 
Reply-To: "[email protected]" 
Date: Thursday, June 10, 2021 at 12:05 PM
To: "[email protected]" 
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Yeah, Dan is right. If you want to use HDFS tables then you have to use a 
Hadoop FileSystem directly since the FileIO interface doesn't include rename 
(because we don't want to use it).

On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger  
wrote:
The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s got 
to be some way to manage tables in 0.11.1 with S3FileIO though, right? We’re 
using spark 3; perhaps we can use `SparkCatalog` instead of `HadoopTables`?

From: Daniel Weeks mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Thursday, June 10, 2021 at 10:36 AM
To: Iceberg Dev List mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Scott, I don't think you can use S3FileIO with HadoopTables because 
HadoopTables requires file system support for operations like rename and the 
FileIO is not intended to support those features.

I think a really promising alternative is the DynamoDB 
Catalog<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
 implementation that Jack Ye just submitted (still under review).

-Dan

On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger  
wrote:
Going to bump this question then:

> Not using EMRFS for the metadata is an interesting possibility. We’re using 
> HadoopTables currently; is there a Tables implementation that uses S3FileIO 
> that we can use, or can I somehow tell HadoopTables to use S3FileIO?


From: Ryan Blue mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Wednesday, June 9, 2021 at 4:10 PM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message is from an external sender.
Thanks for the additional detail. If you're not writing concurrently, then that 
eliminates the explanations that I had. I also don't think that Iceberg retries 
would be a problem because Iceberg will only retry if the commit fails. But 
there is no reason for a commit to fail and retry because nothing else is 
trying to modify the table. To make sure, you can check for "Retrying" logs 
from Iceberg.

Now that I'm looking more closely at the second error, I see that it is also 
caused by the eTag mismatch. I wonder if this might be a different level of 
retry. Maybe EMRFS has a transient error and that causes an internal retry on 
the write that is the source of the consistency error?

What you may be able to do to solve this is to use the S3FileIO instead of 
EMRFS.

Ryan

On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger  wrote:
Here’s a little more detail on our use case that might be helpful. We’re 
running a batch process to apply CDC to several hundred tables every few hours; 
we use iceberg (via HadoopTables) on top of a traditional Hive external table 
model (EMRFS + parquet + glue metastore)  to track the commits (that is, 
changes to the list of files) to these tables. There are a number of technical 
and “political” reasons for this that don’t really bear going into; all we 
really needed was a way to track files belong to a table that are managed via 
some process external to iceberg. We have a few guarantees:


  *   Tables never, ever see concurrent writes; only one application writes to 
these tables, and only one instance of this application ever exists at any time
  *   Our application rewrites entire partitions to new directories, so we 
don’t need iceberg to help us read a handful of files from directories with 
files from multiple commits
  *   Our interaction with the iceberg API is extremely limited

overwrite = table.newOverwrite()
for each updated partition
 for each file in old partition directory
   overwrite.deleteFile(file)
 for each file in new partition directory
   overwrite.addFile(file)
overwrite.commit()

So, all

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Ryan Blue
Yeah, Dan is right. If you want to use HDFS tables then you have to use a
Hadoop FileSystem directly since the FileIO interface doesn't include
rename (because we don't want to use it).

On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger 
wrote:

> The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s
> got to be some way to manage tables in 0.11.1 with S3FileIO though, right?
> We’re using spark 3; perhaps we can use `SparkCatalog` instead of
> `HadoopTables`?
>
>
>
> *From: *Daniel Weeks 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 10:36 AM
> *To: *Iceberg Dev List 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Scott, I don't think you can use S3FileIO with HadoopTables because
> HadoopTables requires file system support for operations like rename and
> the FileIO is not intended to support those features.
>
>
>
> I think a really promising alternative is the DynamoDB Catalog
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
> implementation that Jack Ye just submitted (still under review).
>
>
>
> -Dan
>
>
>
> On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger 
> wrote:
>
> Going to bump this question then:
>
> > Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Wednesday, June 9, 2021 at 4:10 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message is from an external sender.
>
> Thanks for the additional detail. If you're not writing concurrently, then
> that eliminates the explanations that I had. I also don't think that
> Iceberg retries would be a problem because Iceberg will only retry if the
> commit fails. But there is no reason for a commit to fail and retry because
> nothing else is trying to modify the table. To make sure, you can check for
> "Retrying" logs from Iceberg.
>
>
>
> Now that I'm looking more closely at the second error, I see that it is
> also caused by the eTag mismatch. I wonder if this might be a different
> level of retry. Maybe EMRFS has a transient error and that causes an
> internal retry on the write that is the source of the consistency error?
>
>
>
> What you may be able to do to solve this is to use the S3FileIO instead of
> EMRFS.
>
>
>
> Ryan
>
>
>
> On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger 
> wrote:
>
> Here’s a little more detail on our use case that might be helpful. We’re
> running a batch process to apply CDC to several hundred tables every few
> hours; we use iceberg (via HadoopTables) on top of a traditional Hive
> external table model (EMRFS + parquet + glue metastore)  to track the
> commits (that is, changes to the list of files) to these tables. There are
> a number of technical and “political” reasons for this that don’t really
> bear going into; all we really needed was a way to track files belong to a
> table that are managed via some process external to iceberg. We have a few
> guarantees:
>
>
>
>- Tables never, *ever* see concurrent writes; only one application
>writes to these tables, and only one instance of this application ever
>exists at any time
>- Our application rewrites entire partitions to new directories, so we
>don’t need iceberg to help us read a handful of files from directories with
>files from multiple commits
>- Our interaction with the iceberg API is *extremely* limited
>
>
>
> overwrite = table.newOverwrite()
>
> for each updated partition
>
>  for each file in old partition directory
>
>overwrite.deleteFile(file)
>
>  for each file in new partition directory
>
>overwrite.addFile(file)
>
> overwrite.commit()
>
>
>
> So, all that being said, now to address your comments. We don’t have
> concurrent processes writing commits, so the problem has to be contained in
> that pseudocode block a

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Jack Ye
Hi Scott,

You mentioned that the DynamoDB catalog could potentially be used, so why
HadoopTables is chosen instead of HadoopCatalog? With HadoopCatalog you can
then use S3FileIO.

-Jack

On Thu, Jun 10, 2021 at 8:52 AM Scott Kruger 
wrote:

> The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s
> got to be some way to manage tables in 0.11.1 with S3FileIO though, right?
> We’re using spark 3; perhaps we can use `SparkCatalog` instead of
> `HadoopTables`?
>
>
>
> *From: *Daniel Weeks 
> *Reply-To: *"[email protected]" 
> *Date: *Thursday, June 10, 2021 at 10:36 AM
> *To: *Iceberg Dev List 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message contains hyperlinks, take precaution before opening these
> links.
>
> Scott, I don't think you can use S3FileIO with HadoopTables because
> HadoopTables requires file system support for operations like rename and
> the FileIO is not intended to support those features.
>
>
>
> I think a really promising alternative is the DynamoDB Catalog
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
> implementation that Jack Ye just submitted (still under review).
>
>
>
> -Dan
>
>
>
> On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger 
> wrote:
>
> Going to bump this question then:
>
> > Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Wednesday, June 9, 2021 at 4:10 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message is from an external sender.
>
> Thanks for the additional detail. If you're not writing concurrently, then
> that eliminates the explanations that I had. I also don't think that
> Iceberg retries would be a problem because Iceberg will only retry if the
> commit fails. But there is no reason for a commit to fail and retry because
> nothing else is trying to modify the table. To make sure, you can check for
> "Retrying" logs from Iceberg.
>
>
>
> Now that I'm looking more closely at the second error, I see that it is
> also caused by the eTag mismatch. I wonder if this might be a different
> level of retry. Maybe EMRFS has a transient error and that causes an
> internal retry on the write that is the source of the consistency error?
>
>
>
> What you may be able to do to solve this is to use the S3FileIO instead of
> EMRFS.
>
>
>
> Ryan
>
>
>
> On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger 
> wrote:
>
> Here’s a little more detail on our use case that might be helpful. We’re
> running a batch process to apply CDC to several hundred tables every few
> hours; we use iceberg (via HadoopTables) on top of a traditional Hive
> external table model (EMRFS + parquet + glue metastore)  to track the
> commits (that is, changes to the list of files) to these tables. There are
> a number of technical and “political” reasons for this that don’t really
> bear going into; all we really needed was a way to track files belong to a
> table that are managed via some process external to iceberg. We have a few
> guarantees:
>
>
>
>- Tables never, *ever* see concurrent writes; only one application
>writes to these tables, and only one instance of this application ever
>exists at any time
>- Our application rewrites entire partitions to new directories, so we
>don’t need iceberg to help us read a handful of files from directories with
>files from multiple commits
>- Our interaction with the iceberg API is *extremely* limited
>
>
>
> overwrite = table.newOverwrite()
>
> for each updated partition
>
>  for each file in old partition directory
>
>overwrite.deleteFile(file)
>
>  for each file in new partition directory
>
>overwrite.addFile(file)
>
> overwrite.commit()
>
>
>
> So, all that being said, now to address your comments. We don’t have
> concurrent processes writing commits, so the problem has to be contained in
> that pseudocode block above. We d

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Scott Kruger
The dynamodb catalog sounds interesting; I’ll keep my eye on that. There’s got 
to be some way to manage tables in 0.11.1 with S3FileIO though, right? We’re 
using spark 3; perhaps we can use `SparkCatalog` instead of `HadoopTables`?

From: Daniel Weeks 
Reply-To: "[email protected]" 
Date: Thursday, June 10, 2021 at 10:36 AM
To: Iceberg Dev List 
Subject: Re: Consistency problems with Iceberg + EMRFS

This message contains hyperlinks, take precaution before opening these links.
Scott, I don't think you can use S3FileIO with HadoopTables because 
HadoopTables requires file system support for operations like rename and the 
FileIO is not intended to support those features.

I think a really promising alternative is the DynamoDB 
Catalog<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Ficeberg%2Fpull%2F2688%2Ffiles&data=04%7C01%7Csckruger%40paypal.com%7C7818d2ec3a454fc91a4d08d92c257226%7Cfb00791460204374977e21bac5f3f4c8%7C0%7C0%7C637589361724551495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=T9NiG8iRE44kF9eLCzXQSLtmCcWnYWx7G5rlASvFFgk%3D&reserved=0>
 implementation that Jack Ye just submitted (still under review).

-Dan

On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger  
wrote:
Going to bump this question then:

> Not using EMRFS for the metadata is an interesting possibility. We’re using 
> HadoopTables currently; is there a Tables implementation that uses S3FileIO 
> that we can use, or can I somehow tell HadoopTables to use S3FileIO?


From: Ryan Blue mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Wednesday, June 9, 2021 at 4:10 PM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message is from an external sender.
Thanks for the additional detail. If you're not writing concurrently, then that 
eliminates the explanations that I had. I also don't think that Iceberg retries 
would be a problem because Iceberg will only retry if the commit fails. But 
there is no reason for a commit to fail and retry because nothing else is 
trying to modify the table. To make sure, you can check for "Retrying" logs 
from Iceberg.

Now that I'm looking more closely at the second error, I see that it is also 
caused by the eTag mismatch. I wonder if this might be a different level of 
retry. Maybe EMRFS has a transient error and that causes an internal retry on 
the write that is the source of the consistency error?

What you may be able to do to solve this is to use the S3FileIO instead of 
EMRFS.

Ryan

On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger  wrote:
Here’s a little more detail on our use case that might be helpful. We’re 
running a batch process to apply CDC to several hundred tables every few hours; 
we use iceberg (via HadoopTables) on top of a traditional Hive external table 
model (EMRFS + parquet + glue metastore)  to track the commits (that is, 
changes to the list of files) to these tables. There are a number of technical 
and “political” reasons for this that don’t really bear going into; all we 
really needed was a way to track files belong to a table that are managed via 
some process external to iceberg. We have a few guarantees:


  *   Tables never, ever see concurrent writes; only one application writes to 
these tables, and only one instance of this application ever exists at any time
  *   Our application rewrites entire partitions to new directories, so we 
don’t need iceberg to help us read a handful of files from directories with 
files from multiple commits
  *   Our interaction with the iceberg API is extremely limited

overwrite = table.newOverwrite()
for each updated partition
 for each file in old partition directory
   overwrite.deleteFile(file)
 for each file in new partition directory
   overwrite.addFile(file)
overwrite.commit()

So, all that being said, now to address your comments. We don’t have concurrent 
processes writing commits, so the problem has to be contained in that 
pseudocode block above. We don’t ever have any consistency issues with the 
actual data files we write (using plain spark DataFrameWriter.parquet), so 
there has to be something going on with how iceberg is writing metadata over 
EMRFS. It feels like retry logic is a likely culprit, as this only happens once 
daily for something like 1 commits. Using the metastore is unfortunately a 
non-starter for us, but given that we don’t need to support concurrent writes, 
I don’t think this is a problem.

Not using EMRFS for the metadata is an interesting possibility. We’re using 
HadoopTables currently; is there a Tables implementation that uses S3FileIO 
that we can use, or can I somehow tell H

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Daniel Weeks
Scott, I don't think you can use S3FileIO with HadoopTables because
HadoopTables requires file system support for operations like rename and
the FileIO is not intended to support those features.

I think a really promising alternative is the DynamoDB Catalog
<https://github.com/apache/iceberg/pull/2688/files> implementation that
Jack Ye just submitted (still under review).

-Dan

On Thu, Jun 10, 2021 at 7:26 AM Scott Kruger 
wrote:

> Going to bump this question then:
>
> > Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
>
>
> *From: *Ryan Blue 
> *Reply-To: *"[email protected]" 
> *Date: *Wednesday, June 9, 2021 at 4:10 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message is from an external sender.
>
> Thanks for the additional detail. If you're not writing concurrently, then
> that eliminates the explanations that I had. I also don't think that
> Iceberg retries would be a problem because Iceberg will only retry if the
> commit fails. But there is no reason for a commit to fail and retry because
> nothing else is trying to modify the table. To make sure, you can check for
> "Retrying" logs from Iceberg.
>
>
>
> Now that I'm looking more closely at the second error, I see that it is
> also caused by the eTag mismatch. I wonder if this might be a different
> level of retry. Maybe EMRFS has a transient error and that causes an
> internal retry on the write that is the source of the consistency error?
>
>
>
> What you may be able to do to solve this is to use the S3FileIO instead of
> EMRFS.
>
>
>
> Ryan
>
>
>
> On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger 
> wrote:
>
> Here’s a little more detail on our use case that might be helpful. We’re
> running a batch process to apply CDC to several hundred tables every few
> hours; we use iceberg (via HadoopTables) on top of a traditional Hive
> external table model (EMRFS + parquet + glue metastore)  to track the
> commits (that is, changes to the list of files) to these tables. There are
> a number of technical and “political” reasons for this that don’t really
> bear going into; all we really needed was a way to track files belong to a
> table that are managed via some process external to iceberg. We have a few
> guarantees:
>
>
>
>- Tables never, *ever* see concurrent writes; only one application
>writes to these tables, and only one instance of this application ever
>exists at any time
>- Our application rewrites entire partitions to new directories, so we
>don’t need iceberg to help us read a handful of files from directories with
>files from multiple commits
>- Our interaction with the iceberg API is *extremely* limited
>
>
>
> overwrite = table.newOverwrite()
>
> for each updated partition
>
>  for each file in old partition directory
>
>overwrite.deleteFile(file)
>
>  for each file in new partition directory
>
>overwrite.addFile(file)
>
> overwrite.commit()
>
>
>
> So, all that being said, now to address your comments. We don’t have
> concurrent processes writing commits, so the problem has to be contained in
> that pseudocode block above. We don’t ever have any consistency issues with
> the actual data files we write (using plain spark DataFrameWriter.parquet),
> so there has to be something going on with how iceberg is writing metadata
> over EMRFS. It feels like retry logic is a likely culprit, as this only
> happens once daily for something like 1 commits. Using the metastore is
> unfortunately a non-starter for us, but given that we don’t need to support
> concurrent writes, I don’t think this is a problem.
>
>
>
> Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
> *From: *Jack Ye 
> *Reply-To: *"[email protected]" 
> *Date: *Tuesday, June 8, 2021 at 7:49 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message was identified as a phishing scam.
>
> There are 2 potential root causes I see:
>
> 1. you might be using EMRFS with DynamoDB enabled to check consistency,
> that leads to the DynamoDB and S3 out of sync. The quick solution is to
> just delete the Dynamo

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Scott Kruger
Going to bump this question then:

> Not using EMRFS for the metadata is an interesting possibility. We’re using 
> HadoopTables currently; is there a Tables implementation that uses S3FileIO 
> that we can use, or can I somehow tell HadoopTables to use S3FileIO?


From: Ryan Blue 
Reply-To: "[email protected]" 
Date: Wednesday, June 9, 2021 at 4:10 PM
To: "[email protected]" 
Subject: Re: Consistency problems with Iceberg + EMRFS

This message is from an external sender.
Thanks for the additional detail. If you're not writing concurrently, then that 
eliminates the explanations that I had. I also don't think that Iceberg retries 
would be a problem because Iceberg will only retry if the commit fails. But 
there is no reason for a commit to fail and retry because nothing else is 
trying to modify the table. To make sure, you can check for "Retrying" logs 
from Iceberg.

Now that I'm looking more closely at the second error, I see that it is also 
caused by the eTag mismatch. I wonder if this might be a different level of 
retry. Maybe EMRFS has a transient error and that causes an internal retry on 
the write that is the source of the consistency error?

What you may be able to do to solve this is to use the S3FileIO instead of 
EMRFS.

Ryan

On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger  wrote:
Here’s a little more detail on our use case that might be helpful. We’re 
running a batch process to apply CDC to several hundred tables every few hours; 
we use iceberg (via HadoopTables) on top of a traditional Hive external table 
model (EMRFS + parquet + glue metastore)  to track the commits (that is, 
changes to the list of files) to these tables. There are a number of technical 
and “political” reasons for this that don’t really bear going into; all we 
really needed was a way to track files belong to a table that are managed via 
some process external to iceberg. We have a few guarantees:


  *   Tables never, ever see concurrent writes; only one application writes to 
these tables, and only one instance of this application ever exists at any time
  *   Our application rewrites entire partitions to new directories, so we 
don’t need iceberg to help us read a handful of files from directories with 
files from multiple commits
  *   Our interaction with the iceberg API is extremely limited

overwrite = table.newOverwrite()
for each updated partition
 for each file in old partition directory
   overwrite.deleteFile(file)
 for each file in new partition directory
   overwrite.addFile(file)
overwrite.commit()

So, all that being said, now to address your comments. We don’t have concurrent 
processes writing commits, so the problem has to be contained in that 
pseudocode block above. We don’t ever have any consistency issues with the 
actual data files we write (using plain spark DataFrameWriter.parquet), so 
there has to be something going on with how iceberg is writing metadata over 
EMRFS. It feels like retry logic is a likely culprit, as this only happens once 
daily for something like 1 commits. Using the metastore is unfortunately a 
non-starter for us, but given that we don’t need to support concurrent writes, 
I don’t think this is a problem.

Not using EMRFS for the metadata is an interesting possibility. We’re using 
HadoopTables currently; is there a Tables implementation that uses S3FileIO 
that we can use, or can I somehow tell HadoopTables to use S3FileIO?

From: Jack Ye mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Date: Tuesday, June 8, 2021 at 7:49 PM
To: "[email protected]<mailto:[email protected]>" 
mailto:[email protected]>>
Subject: Re: Consistency problems with Iceberg + EMRFS

This message was identified as a phishing scam.
There are 2 potential root causes I see:
1. you might be using EMRFS with DynamoDB enabled to check consistency, that 
leads to the DynamoDB and S3 out of sync. The quick solution is to just delete 
the DynamoDB consistency table, and the next read/write will recreate and 
resync it. After all, EMRFS only provides read-after-write consistency for S3, 
but S3 is now already strongly consistent so there is really no need to use 
EMRFS anymore.
2. HadoopCatalog on S3 always has the possibility for one process to clobber 
the other one when writing the version-hint.txt file. So as Ryan suggested, it 
is always better to use a metastore to perform consistency checks instead of 
delegating it to the file system.

-Jack

On Tue, Jun 8, 2021 at 5:41 PM Ryan Blue 
mailto:[email protected]>> wrote:
Hi Scott,

I'm not quite sure what's happening here, but I should at least note that we 
didn't intend for HDFS tables to be used with S3. HFDS tables use an atomic 
rename in the file system to ensure that only one committer "wi

Re: Consistency problems with Iceberg + EMRFS

2021-06-09 Thread Ryan Blue
Thanks for the additional detail. If you're not writing concurrently, then
that eliminates the explanations that I had. I also don't think that
Iceberg retries would be a problem because Iceberg will only retry if the
commit fails. But there is no reason for a commit to fail and retry because
nothing else is trying to modify the table. To make sure, you can check for
"Retrying" logs from Iceberg.

Now that I'm looking more closely at the second error, I see that it is
also caused by the eTag mismatch. I wonder if this might be a different
level of retry. Maybe EMRFS has a transient error and that causes an
internal retry on the write that is the source of the consistency error?

What you may be able to do to solve this is to use the S3FileIO instead of
EMRFS.

Ryan

On Wed, Jun 9, 2021 at 9:02 AM Scott Kruger 
wrote:

> Here’s a little more detail on our use case that might be helpful. We’re
> running a batch process to apply CDC to several hundred tables every few
> hours; we use iceberg (via HadoopTables) on top of a traditional Hive
> external table model (EMRFS + parquet + glue metastore)  to track the
> commits (that is, changes to the list of files) to these tables. There are
> a number of technical and “political” reasons for this that don’t really
> bear going into; all we really needed was a way to track files belong to a
> table that are managed via some process external to iceberg. We have a few
> guarantees:
>
>
>
>- Tables never, *ever* see concurrent writes; only one application
>writes to these tables, and only one instance of this application ever
>exists at any time
>- Our application rewrites entire partitions to new directories, so we
>don’t need iceberg to help us read a handful of files from directories with
>files from multiple commits
>- Our interaction with the iceberg API is *extremely* limited
>
>
>
> overwrite = table.newOverwrite()
>
> for each updated partition
>
>  for each file in old partition directory
>
>overwrite.deleteFile(file)
>
>  for each file in new partition directory
>
>overwrite.addFile(file)
>
> overwrite.commit()
>
>
>
> So, all that being said, now to address your comments. We don’t have
> concurrent processes writing commits, so the problem has to be contained in
> that pseudocode block above. We don’t ever have any consistency issues with
> the actual data files we write (using plain spark DataFrameWriter.parquet),
> so there has to be something going on with how iceberg is writing metadata
> over EMRFS. It feels like retry logic is a likely culprit, as this only
> happens once daily for something like 1 commits. Using the metastore is
> unfortunately a non-starter for us, but given that we don’t need to support
> concurrent writes, I don’t think this is a problem.
>
>
>
> Not using EMRFS for the metadata is an interesting possibility. We’re
> using HadoopTables currently; is there a Tables implementation that uses
> S3FileIO that we can use, or can I somehow tell HadoopTables to use
> S3FileIO?
>
>
>
> *From: *Jack Ye 
> *Reply-To: *"[email protected]" 
> *Date: *Tuesday, June 8, 2021 at 7:49 PM
> *To: *"[email protected]" 
> *Subject: *Re: Consistency problems with Iceberg + EMRFS
>
>
>
> This message was identified as a phishing scam.
>
> There are 2 potential root causes I see:
>
> 1. you might be using EMRFS with DynamoDB enabled to check consistency,
> that leads to the DynamoDB and S3 out of sync. The quick solution is to
> just delete the DynamoDB consistency table, and the next read/write will
> recreate and resync it. After all, EMRFS only provides read-after-write
> consistency for S3, but S3 is now already strongly consistent so there is
> really no need to use EMRFS anymore.
>
> 2. HadoopCatalog on S3 always has the possibility for one process to
> clobber the other one when writing the version-hint.txt file. So as Ryan
> suggested, it is always better to use a metastore to perform consistency
> checks instead of delegating it to the file system.
>
>
>
> -Jack
>
>
>
> On Tue, Jun 8, 2021 at 5:41 PM Ryan Blue  wrote:
>
> Hi Scott,
>
>
>
> I'm not quite sure what's happening here, but I should at least note that
> we didn't intend for HDFS tables to be used with S3. HFDS tables use an
> atomic rename in the file system to ensure that only one committer "wins"
> to produce a given version of the table metadata. In S3, renames are not
> atomic so you can get into trouble if there are two concurrent processes
> trying to rename to the same target version. That's probably what's causing
> the f

Re: Consistency problems with Iceberg + EMRFS

2021-06-09 Thread Scott Kruger
Here’s a little more detail on our use case that might be helpful. We’re 
running a batch process to apply CDC to several hundred tables every few hours; 
we use iceberg (via HadoopTables) on top of a traditional Hive external table 
model (EMRFS + parquet + glue metastore)  to track the commits (that is, 
changes to the list of files) to these tables. There are a number of technical 
and “political” reasons for this that don’t really bear going into; all we 
really needed was a way to track files belong to a table that are managed via 
some process external to iceberg. We have a few guarantees:


  *   Tables never, ever see concurrent writes; only one application writes to 
these tables, and only one instance of this application ever exists at any time
  *   Our application rewrites entire partitions to new directories, so we 
don’t need iceberg to help us read a handful of files from directories with 
files from multiple commits
  *   Our interaction with the iceberg API is extremely limited

overwrite = table.newOverwrite()
for each updated partition
 for each file in old partition directory
   overwrite.deleteFile(file)
 for each file in new partition directory
   overwrite.addFile(file)
overwrite.commit()

So, all that being said, now to address your comments. We don’t have concurrent 
processes writing commits, so the problem has to be contained in that 
pseudocode block above. We don’t ever have any consistency issues with the 
actual data files we write (using plain spark DataFrameWriter.parquet), so 
there has to be something going on with how iceberg is writing metadata over 
EMRFS. It feels like retry logic is a likely culprit, as this only happens once 
daily for something like 1 commits. Using the metastore is unfortunately a 
non-starter for us, but given that we don’t need to support concurrent writes, 
I don’t think this is a problem.

Not using EMRFS for the metadata is an interesting possibility. We’re using 
HadoopTables currently; is there a Tables implementation that uses S3FileIO 
that we can use, or can I somehow tell HadoopTables to use S3FileIO?

From: Jack Ye 
Reply-To: "[email protected]" 
Date: Tuesday, June 8, 2021 at 7:49 PM
To: "[email protected]" 
Subject: Re: Consistency problems with Iceberg + EMRFS

This message was identified as a phishing scam.
There are 2 potential root causes I see:
1. you might be using EMRFS with DynamoDB enabled to check consistency, that 
leads to the DynamoDB and S3 out of sync. The quick solution is to just delete 
the DynamoDB consistency table, and the next read/write will recreate and 
resync it. After all, EMRFS only provides read-after-write consistency for S3, 
but S3 is now already strongly consistent so there is really no need to use 
EMRFS anymore.
2. HadoopCatalog on S3 always has the possibility for one process to clobber 
the other one when writing the version-hint.txt file. So as Ryan suggested, it 
is always better to use a metastore to perform consistency checks instead of 
delegating it to the file system.

-Jack

On Tue, Jun 8, 2021 at 5:41 PM Ryan Blue 
mailto:[email protected]>> wrote:
Hi Scott,

I'm not quite sure what's happening here, but I should at least note that we 
didn't intend for HDFS tables to be used with S3. HFDS tables use an atomic 
rename in the file system to ensure that only one committer "wins" to produce a 
given version of the table metadata. In S3, renames are not atomic so you can 
get into trouble if there are two concurrent processes trying to rename to the 
same target version. That's probably what's causing the first issue, where the 
eTag for a file doesn't match the expected one.

As for the second issue, it looks like the version hint file is not valid. We 
did some work to correct these issues in HDFS that was released in 0.11.0, so 
I'm surprised to see this. Now, the version hint file is written and then 
renamed to avoid issues with reads while the file is being written.

I'm not sure how you had the second issue on S3, but the solution is probably 
the same as for the eTag issue: I recommend moving to a metastore to track the 
current table metadata rather than using the HDFS implementation.

Ryan

On Tue, Jun 8, 2021 at 5:27 PM Scott Kruger  wrote:
We’re using the Iceberg API (0.11.1) over raw parquet data in S3/EMRFS, 
basically just using the table API to issues overwrites/appends. Everything 
works great for the most part, but we’ve recently started to have problems with 
the iceberg metadata directory going out of sync. See the following stacktrace:

org.apache.iceberg.exceptions.RuntimeIOException: Failed to read file: 
s3://mybucket/db/table/metadata/v2504.metadata.json
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:241)
at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:233)
at 
org.apache.iceberg.hadoop.Hado

Re: Consistency problems with Iceberg + EMRFS

2021-06-08 Thread Jack Ye
There are 2 potential root causes I see:
1. you might be using EMRFS with DynamoDB enabled to check consistency,
that leads to the DynamoDB and S3 out of sync. The quick solution is to
just delete the DynamoDB consistency table, and the next read/write will
recreate and resync it. After all, EMRFS only provides read-after-write
consistency for S3, but S3 is now already strongly consistent so there is
really no need to use EMRFS anymore.
2. HadoopCatalog on S3 always has the possibility for one process to
clobber the other one when writing the version-hint.txt file. So as Ryan
suggested, it is always better to use a metastore to perform consistency
checks instead of delegating it to the file system.

-Jack

On Tue, Jun 8, 2021 at 5:41 PM Ryan Blue  wrote:

> Hi Scott,
>
> I'm not quite sure what's happening here, but I should at least note that
> we didn't intend for HDFS tables to be used with S3. HFDS tables use an
> atomic rename in the file system to ensure that only one committer "wins"
> to produce a given version of the table metadata. In S3, renames are not
> atomic so you can get into trouble if there are two concurrent processes
> trying to rename to the same target version. That's probably what's causing
> the first issue, where the eTag for a file doesn't match the expected one.
>
> As for the second issue, it looks like the version hint file is not valid.
> We did some work to correct these issues in HDFS that was released in
> 0.11.0, so I'm surprised to see this. Now, the version hint file is written
> and then renamed to avoid issues with reads while the file is being written.
>
> I'm not sure how you had the second issue on S3, but the solution is
> probably the same as for the eTag issue: I recommend moving to a metastore
> to track the current table metadata rather than using the HDFS
> implementation.
>
> Ryan
>
> On Tue, Jun 8, 2021 at 5:27 PM Scott Kruger 
> wrote:
>
>> We’re using the Iceberg API (0.11.1) over raw parquet data in S3/EMRFS,
>> basically just using the table API to issues overwrites/appends. Everything
>> works great for the most part, but we’ve recently started to have problems
>> with the iceberg metadata directory going out of sync. See the following
>> stacktrace:
>>
>>
>>
>> org.apache.iceberg.exceptions.RuntimeIOException: Failed to read file:
>> s3://mybucket/db/table/metadata/v2504.metadata.json
>>
>> at
>> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:241)
>>
>> at
>> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:233)
>>
>> at
>> org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:93)
>>
>> at
>> org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:116)
>>
>> at
>> org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:80)
>>
>> at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:86)
>>
>> at
>> com.braintree.data.common.snapshot.iceberg.IcebergUtils$Builder.load(IcebergUtils.java:639)
>>
>> at
>> com.braintree.data.snapshot.actions.UpdateTableMetadata.run(UpdateTableMetadata.java:53)
>>
>> at
>> com.braintree.data.snapshot.actions.UpdateMetastore.lambda$run$0(UpdateMetastore.java:104)
>>
>> at
>> com.braintree.data.base.util.StreamUtilities.lambda$null$7(StreamUtilities.java:306)
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.io.IOException: Unexpected end of stream pos=0,
>> contentLength=214601
>>
>> at
>> com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.read(S3FSInputStream.java:297)
>>
>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>
>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>>
>> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>>
>> at java.io.DataInputStream.read(DataInputStream.java:149)
>>
>> at
>> org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.read(HadoopStreams.java:113)
>>
>> at
>> org.apache.iceberg.shaded.com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:524)
>>
>> at
>> org.apache.iceberg.shaded.com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:129)
>>
>> at
>> org.apache.iceberg.shaded.com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:247)
>>
>> at
>> org.apache.iceberg.shaded.com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1481)
>>
>> at
>> org.apache.iceberg.shaded.com.fasterxml.jackson.core.JsonFacto

Re: Consistency problems with Iceberg + EMRFS

2021-06-08 Thread Ryan Blue
Hi Scott,

I'm not quite sure what's happening here, but I should at least note that
we didn't intend for HDFS tables to be used with S3. HFDS tables use an
atomic rename in the file system to ensure that only one committer "wins"
to produce a given version of the table metadata. In S3, renames are not
atomic so you can get into trouble if there are two concurrent processes
trying to rename to the same target version. That's probably what's causing
the first issue, where the eTag for a file doesn't match the expected one.

As for the second issue, it looks like the version hint file is not valid.
We did some work to correct these issues in HDFS that was released in
0.11.0, so I'm surprised to see this. Now, the version hint file is written
and then renamed to avoid issues with reads while the file is being written.

I'm not sure how you had the second issue on S3, but the solution is
probably the same as for the eTag issue: I recommend moving to a metastore
to track the current table metadata rather than using the HDFS
implementation.

Ryan

On Tue, Jun 8, 2021 at 5:27 PM Scott Kruger 
wrote:

> We’re using the Iceberg API (0.11.1) over raw parquet data in S3/EMRFS,
> basically just using the table API to issues overwrites/appends. Everything
> works great for the most part, but we’ve recently started to have problems
> with the iceberg metadata directory going out of sync. See the following
> stacktrace:
>
>
>
> org.apache.iceberg.exceptions.RuntimeIOException: Failed to read file:
> s3://mybucket/db/table/metadata/v2504.metadata.json
>
> at
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:241)
>
> at
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:233)
>
> at
> org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:93)
>
> at
> org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:116)
>
> at
> org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:80)
>
> at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:86)
>
> at
> com.braintree.data.common.snapshot.iceberg.IcebergUtils$Builder.load(IcebergUtils.java:639)
>
> at
> com.braintree.data.snapshot.actions.UpdateTableMetadata.run(UpdateTableMetadata.java:53)
>
> at
> com.braintree.data.snapshot.actions.UpdateMetastore.lambda$run$0(UpdateMetastore.java:104)
>
> at
> com.braintree.data.base.util.StreamUtilities.lambda$null$7(StreamUtilities.java:306)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.io.IOException: Unexpected end of stream pos=0,
> contentLength=214601
>
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.read(S3FSInputStream.java:297)
>
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>
> at java.io.DataInputStream.read(DataInputStream.java:149)
>
> at
> org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.read(HadoopStreams.java:113)
>
> at
> org.apache.iceberg.shaded.com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:524)
>
> at
> org.apache.iceberg.shaded.com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:129)
>
> at
> org.apache.iceberg.shaded.com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:247)
>
> at
> org.apache.iceberg.shaded.com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1481)
>
> at
> org.apache.iceberg.shaded.com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:972)
>
> at
> org.apache.iceberg.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3242)
>
> at
> org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:239)
>
> ... 15 more
>
> Caused by:
> com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException:
> eTag in metadata for File mybucket/db/table/metadata/v2504.metadata.json'
> does not match eTag from S3!
>
> at
> com.amazon.ws.emr.hadoop.fs.s3.GetObjectInputStreamWithInfoFactory.create(GetObjectInputStreamWithInfoFactory.java:69)
>
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.open(S3FSInputStream.java:200)
>
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.retrieveInputStreamWithInfo(S3FSInputStream.java:391)
>
> at
> com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.reopenStream(S3FSInputStream.java:378)
>
> at
> com.amazon.ws.emr.ha