Re: [DISCUSS] June board report

2024-06-15 Thread Owen O'Malley
Ryan, It looks good. Thanks for including the notice about Tabular/Databricks. .. Owen On Wed, Jun 12, 2024 at 9:52 PM Ryan Blue wrote: > Hi everyone, > > Here's my current draft board report for June. If you have anything to add > or update, please reply and I'll amend the report. > >

Re: Call for Ryan Blue to Step Down as PMC Chair

2024-06-05 Thread Owen O'Malley
I strongly disagree with asking Ryan to step down. For those who don't know me, I'm an Iceberg PMC member, Apache member, and was a mentor and champion for Iceberg when it entered the Apache Incubator . I've never worked at

Re: [PROPOSAL] Preparing first Apache Iceberg Summit

2023-09-22 Thread Owen O'Malley
It is also important to consider who is on the program committee and their affiliations. It also helps if the pc discourages sales talks (especially with propriety extensions!) They should encourage  technical ones about development and usage of the Apache project. .. OwenOn Sep 22, 2023, at

ApacheCon Iceberg BOF

2022-10-05 Thread Owen O'Malley
All, There is an Iceberg Birds of a Feather meet up at ApacheCon in an hour (5:50pm CDT). Please come by and join us, if you are attending. Thanks, Owen

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-04 Thread Owen O'Malley
At the stripe boundaries, the bytes on disk statistics are accurate. A stripe that is in flight, is going to be an estimate, because the dictionaries can't be compressed until the stripe is flushed. The memory usage will be a significant over estimate, because it includes buffers that are

Re: Hive table compatibility for Iceberg readers

2022-01-31 Thread Owen O'Malley
On Thu, Jan 27, 2022 at 10:26 PM Walaa Eldin Moustafa wrote: > *2. Iceberg schema lower casing:* Before Iceberg, when users read Hive > tables from Spark, the returned schema is lowercase since Hive stores all > metadata in lowercase mode. If users move to Iceberg, such readers could > break

Re: [CWS] Re: Subject: [VOTE] Release Apache Iceberg 0.12.0 RC3

2021-08-16 Thread Owen O'Malley
Ok, after the vote, but I did: * verified tag is same as the tar ball * verified checksums and signatures * built and ran the tests My one complaint is that I get test failures that look like they are timezone related. ORC and Parquet tests failing with timestamps 7 or 8 hours off. .. Owen On

Re: Default TimeZone for unit tests

2021-03-01 Thread Owen O'Malley
In ORC, the timezone tests vary the default timezone through multiple values using the Java APIs. (They do restore the initial value when the test exits.) :) .. Owen On Mon, Mar 1, 2021 at 9:25 PM Edgar Rodriguez wrote: > Hi folks, > > Thanks Peter for the quick fix! > > I do think it'd be a

Type attributes

2021-01-04 Thread Owen O'Malley
One of the challenges that we have at LinkedIn is that we have a *lot* of Avro schemas. I'd like to be able to represent those Avro schemas using Iceberg's types and there are a few challenges: - unions - enums - default values One way out of those problems without extending the Iceberg

Re: Iceberg - Hive schema synchronization

2020-11-24 Thread Owen O'Malley
You left the complex types off of your list (struct, map, array, uniontype). All of them have natural mappings in Iceberg, except for uniontype. Interval is supported on output, but not as a column type. Unfortunately, we have some tables with uniontype, so we'll need a solution for how to deal

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
dicates, having a translation on that side seems less error-prone. .. Owen On Fri, Sep 18, 2020 at 10:54 PM Ryan Blue wrote: > Are you saying that we can't fix this by rewriting expressions to > translate from SQL to more natural semantics? > > On Fri, Sep 18, 2020 at 3:28 PM Owen

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
ests for these cases and rewrite expressions > to account for the difference. Iceberg should push notEqual("col", "x") > to ORC as SQL (col != 'x' or col is null). Presto can similarly translate col > != 'x' to and(notEqual("col", "x"), notN

Re: SQL compatibility of Iceberg Expressions

2020-09-18 Thread Owen O'Malley
I think that we should follow the SQL semantics to prevent surprises when SQL engines integrate with Iceberg. .. Owen On Thu, Sep 17, 2020 at 9:08 PM Shardul Mahadik wrote: > Hi all, > > I noticed that Iceberg's predicates are not compatible with SQL predicates > when it comes to handling NULL

Re: Iceberg sync notes - 9 September 2020

2020-09-14 Thread Owen O'Malley
As I mentioned in the meetup, ORC 1.6.4 was pending and has been released. It should be available on Maven central tomorrow. .. Owen On Mon, Sep 14, 2020 at 10:38 PM Ryan Blue wrote: > Hi everyone, > > I just update the Iceberg sync doc >

Re: [DISCUSS] August board report

2020-08-12 Thread Owen O'Malley
+1 looks good. On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue wrote: > Hi everyone, > > Here's a draft of the board report for this month. Please reply with > anything that you'd like to see added or that I've missed. Thanks! > > rb > > ## Description: > Apache Iceberg is a table format for huge

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-13 Thread Owen O'Malley
On Mon, Jul 13, 2020 at 4:28 PM Anton Okolnychyi wrote: > I think the issue that was brought up by Dongjoon is valid and we should > document the current caching behavior. > The problem is also more generic and does not apply only to views as > operations that are happening through the source

Re: [VOTE] Release Apache Iceberg 0.9.0 RC5

2020-07-13 Thread Owen O'Malley
+1 (binding) - Verified signatures - Verified checksum - Built src from tarball and ran tests. - Looked at JMH dependency to make sure it wasn't leaking into the published artifacts. .. Owen On Mon, Jul 13, 2020 at 11:00 AM RD wrote: > +1 > - verified signatures and checksum >

Re: [VOTE] Graduate to a top-level project

2020-05-12 Thread Owen O'Malley
serve as the initial members of the Apache Iceberg Project: > > * Anton Okolnychyi > * Carl Steinbach > * Daniel C. Weeks > * James R. Taylor > * Julien Le Dem > * Owen O'Malley > * Parth Brahmbhatt > * Ratandeep Ratti > * Ryan Blue

Re: [DISCUSS] Graduating from the Apache Incubator

2020-05-11 Thread Owen O'Malley
of responsibility of the Apache Iceberg > Project; and be it further > > RESOLVED, that the persons listed immediately below be and hereby are > appointed to serve as the initial members of the Apache Iceberg Project: > > * Anton Okolnychyi > * Carl Steinbach >

Re: [VOTE] Release Apache Iceberg 0.8.0-incubating RC2

2020-04-30 Thread Owen O'Malley
+1 1. Checked signature and checksum 2. Built and ran unit tests. 3. Checked ORC version :) On Monday, ORC released 1.6.3, so we should grab those fixes soon. .. Owen On Thu, Apr 30, 2020 at 12:34 PM Dongjoon Hyun wrote: > +1. > > 1. Verified checksum, sig, and license > 3. Build

Re: [DISCUSS] September report

2019-09-06 Thread Owen O'Malley
On Fri, Sep 6, 2019 at 12:19 AM Justin Mclean wrote: > So why does the project think it's ready to graduate? Mentors do you think > the project is ready to graduate? > It has to make a release or two, but I agree with Ryan that it approaching graduation. The project entered Apache with five

Re: [DISCUSS] September report

2019-09-06 Thread Owen O'Malley
On Wed, Sep 4, 2019 at 4:55 PM Ryan Blue wrote: > Hi everyone, > > Here's a draft of this month's report to the IPMC. Please reply with > comments if you'd like to add anything! > > rb > > ## Iceberg > > Iceberg is a table format for large, slow-moving tabular data. > > Iceberg has been

Re: [DISCUSS] Implementation strategies for supporting Iceberg tables in Hive

2019-08-07 Thread Owen O'Malley
> On Jul 24, 2019, at 22:52, Adrien Guillo > wrote: > > Hi Iceberg folks, > > In the last few months, we (the data infrastructure team at Airbnb) have been > closely following the project. We are currently evaluating potential > strategies to migrate our data warehouse to Iceberg.

Re: Sort Spec

2019-07-18 Thread Owen O'Malley
d say yes >>>> 2) Should Iceberg allow users to define a sort spec only if the table >>>> is bucketed? >>>> - I would say no, as it seems valid to have partitioned and sorted >>>> tables. >>>> 3) How should Iceberg encode sort specs? >>>

Re: Updates/Deletes/Upserts in Iceberg

2019-07-03 Thread Owen O'Malley
rote: >>> How about 9AM PDT on Friday, 5 July then? >>> >>>> On Wed, Jul 3, 2019 at 10:55 AM Owen O'Malley >>>> wrote: >>>> I'd like to call in, but I'm out Thursday. Friday would work except 11am >>>> to 1pm pdt. >>>> >>

Re: IPMC report draft for July 2019

2019-07-03 Thread Owen O'Malley
> Have your mentors been helpful and responsive or are things falling > through the cracks? In the latter case, please list any open issues > that need to be addressed. > > Yes. > > Signed-off-by: > > [X](iceberg) Ryan Blue > Comments: I wrote the first

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Owen O'Malley
> On May 21, 2019, at 1:31 PM, Jacques Nadeau wrote: > > The main thing I'm talking about is how you target a deletion across time. If > you have a file A, and you want to delete record X in A, you define delete > A.X. At the same time, another process may be compacting A into A'. In so >

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Owen O'Malley
> On May 15, 2019, at 12:54 PM, Ryan Blue wrote: > > 2. Iceberg diff files should use synthetic keys > > A lot of the discussion on the doc is about whether natural keys are > practical or what assumptions we can make or trade about them. In my opinion, > Iceberg tables will absolutely need

Re: Approaching Vectorized Reading in Iceberg ..

2019-05-28 Thread Owen O'Malley
On Fri, May 24, 2019 at 8:28 PM Ryan Blue wrote: > if Iceberg Reader was to wrap Arrow or ColumnarBatch behind an > Iterator[InternalRow] interface, it would still not work right? Coz it > seems to me there is a lot more going on upstream in the operator execution > path that would be needed to

Re: [DISCUSS] Draft report for January 2019

2019-01-07 Thread Owen O'Malley
acks? In the latter case, please list any open issues > that need to be addressed. > > Last month was December, so traffic has been low and both PPMC members > and > mentors were slow to respond. This is not abnormal, but the PPMC missed > the > deadline to file this report. We

Re: project report

2018-12-04 Thread Owen O'Malley
ther to do a source-only first release or to go through the > pain of publishing convenience binaries with their own LICENSE and NOTICE > content. > > rb > > On Tue, Dec 4, 2018 at 3:37 PM Owen O'Malley > wrote: > > > I wrote a first pass of the report for the Apache

Re: merge-on-read?

2018-11-28 Thread Owen O'Malley
st of those >takes effect). > > Obviously readers would need to be updated to correctly interpret this > data. And there is all kinds of supporting work that would be required in > order to maintain these (periodically collapsing diffs into the base, > etc.). > > Is this some

Re: merge-on-read?

2018-11-28 Thread Owen O'Malley
I’m not sure what use case Erik is looking for, but I’ve had users that want to do the equivalent of HBase’s column families. They want some of the columns to be stored separately and the merged together on read. The requirements would be that there is a 1:1 mapping between rows in the matching

Issue list?

2018-11-27 Thread Owen O'Malley
All, As we move over to Apache infrastructure, we need to decide what works for the community. The dev list is getting a lot of traffic and is probably intimidating to new comers. Currently the notices are: Pull Requests and issue creation/comment/close -> dev@ Git commit -> commits@ One