Re: [DISCUSS] UUID type

2021-09-13 Thread Piotr Findeisen
then we need to >>> make it easy to get the expected string representation in and out, just >>> like we do with date/time types. I don't think that's specific to any >>> engine. >>> >>> On Thu, Jul 29, 2021 at 9:00 AM Jacques Nadeau >>> wrote: >>

Re: [DISCUSS] UUID type

2021-09-17 Thread Piotr Findeisen
here. >> >> Have we converged? I think most people would assume that silence is a >> vote for the status-quo. >> >> On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen >> wrote: >> >>> Hi, >>> >>> It seems we converged here that UU

Re: [DISCUSS] UUID type

2021-07-29 Thread Piotr Findeisen
Hi, I agree with Ryan, that it takes some precautions before one can assume uniqueness of UUID values, and that this shouldn't be any special for UUIDs at all. After all, this is just a primitive type, which is commonly used for certain things, but "commonly" doesn't mean "always". The

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-29 Thread Piotr Findeisen
t;>>> edu...@dremio.com> wrote: >>>>> >>>>>> Could we just update the slack link to >>>>>> https://join.slack.com/t/apache-iceberg/ on the website (see PR#2882 >>>>>> <https://github.com/apache/iceberg/pull/2882>

Re: Proposal: Support for views in Iceberg

2021-07-29 Thread Piotr Findeisen
- @Jacques Table references in the views can be arbitrary objects such >as tables from other catalogs or elasticsearch tables etc. I will clarify >it in the spec. > > I will work on incorporating all the comments in the spec and make the > next revision available for r

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-29 Thread Piotr Findeisen
] Best PF On Thu, Jul 29, 2021 at 12:51 PM Eduard Tudenhoefner wrote: > The default invite link expires after 30 days, that's why I was looking > for alternatives. Maybe a slack admin can check if the invite link can be > configured to not expire. > > On Thu, Jul 29, 2021, 11:51

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-29 Thread Piotr Findeisen
Hi, I was told the screenshot in my previous email doesn't show up, so sharing it as link instead https://gist.github.com/findepi/68e6a141d6ea06049c33e85c5ccd5835#gistcomment-3835460 Best PF On Thu, Jul 29, 2021 at 4:13 PM Piotr Findeisen wrote: > Hi, > > @Ryan Blue , where can

Re: High memory usage with highly concurrent committers

2021-12-06 Thread Piotr Findeisen
Hi Igor, does fs.gs.outputstream.upload.chunk.size affect the file size I can upload? Can i upload e.g. 1GB Parquet file, while also setting fs.gs.outputstream. upload.chunk.size=8388608 (8MB / MiB)? Best PF On Fri, Dec 3, 2021 at 5:33 PM Igor Dvorzhak wrote: > No, right now this is a global

Re: Handling pandas.Timestamps in nanos

2021-12-03 Thread Piotr Findeisen
Hi, I don't know about Pandas, but the question about timestamp precision is interesting to me nonetheless. At Starburst, we've had customer asking for nanosecond timestamp precision, and this drove adding that capability to Trino. (Actually, picosecond timestamp precision was implemented, but I

Re: Drop table behavior

2021-11-23 Thread Piotr Findeisen
Hi, When you come from storage perspective, then the current design of 'not owning' location makes sense. However, if you come from SQL perspective, then all this is impractical limitation. Analysts and other SQL users want to be able to delete their data and must have confidence that all the

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-01 Thread Piotr Findeisen
Hi Just curious. S3URI seems aws s3-specific. What would be the goal of using S3URI with google cloud storage urls? what problem are we solving? PF On Wed, Dec 1, 2021 at 4:56 PM Russell Spitzer wrote: > Sounds reasonable to me if they are compatible > > On Wed, Dec 1, 2021 at 8:27 AM Mayur

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-01 Thread Piotr Findeisen
S URIs are compatible > with the AWS S3 SDKs and if they are added to the list of supported > prefixes, they work with S3FileIO. > > > > Thanks, > > Mayur > > > > *From:* Piotr Findeisen > *Sent:* Wednesday, December 1, 2021 10:58 AM > *To:* Iceberg Dev List > *

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-02 Thread Piotr Findeisen
ures to be not supported in S3FileIO, so I >>> think a specific GCS FileIO would likely be better for GCS support in the >>> long term. >>> >>> >>> >>> Could you describe how you configure S3FileIO to talk to GCS? Do you >>> need to override th

Re: Proposal: Support for views in Iceberg

2021-07-20 Thread Piotr Findeisen
Hi, FWIW, in Trino we just added Trino views support. https://github.com/trinodb/trino/pull/8540 Of course, this is by no means usable by other query engines. Anjali, your document does not talk much about compatibility between query engines. How do you plan to address that? For example, I am

Re: string bucketing compatibility issue

2021-07-19 Thread Piotr Findeisen
Hi, I've filed https://github.com/apache/iceberg/issues/2837 for this as well. Best PF On Sat, Jul 17, 2021 at 12:48 AM Piotr Findeisen wrote: > Hi, > > It was discovered by @Mateusz Gajewski > that Iceberg bucketing > transformation for string isn't regular Murmur3 32-bit

Re: Proposal: Support for views in Iceberg

2021-07-22 Thread Piotr Findeisen
ed language of our own. (What additional >> functionality does it provide over Calcite?) >> >> Given that the view metadata is json, it is easily extendable to >> incorporate any new fields needed to make the SQL truly compatible across >> engines. >> >>

Re: Proposal: Z-Ordering in Iceberg

2021-07-22 Thread Piotr Findeisen
Hi Bhavyam, Has this been discussed on the sync? Ryan, will it be making into the table metadata spec? Best, PF On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal wrote: > Hi Everyone, > > I would like to discuss and get feedback on the following proposal for > Z-Ordering in the Iceberg Sync

string bucketing compatibility issue

2021-07-16 Thread Piotr Findeisen
Hi, It was discovered by @Mateusz Gajewski that Iceberg bucketing transformation for string isn't regular Murmur3 32-bit hash. Upon closer investigation we found out that the code:

Re: [DISCUSS] Distinct count map

2021-07-23 Thread Piotr Findeisen
Hi, File level distinct count (a number) has limited applicability in Trino. It's useful for pointed queries, where we can prune all the other files away, but in other cases, Trino optimizer wouldn't be able to make an educated use of that. Internally, Ɓukasz and I we were talking about sketches

Re: Proposal: Support for views in Iceberg

2021-07-27 Thread Piotr Findeisen
T during a create view query. >> >> 3. With these considerations, I think the "sql" field can potentially be >> a map (maybe called "engine-sqls"?), where key is the engine type and >> version like "Spark 3.1", and value is the view SQL

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-27 Thread Piotr Findeisen
Hi, I don't have opinion which Slack workspace this is in, as long as it's easy to join. Manual joining process is not healthy for sure. Btw, the apache-iceberg is currently limited to @apache.org emails, which some people do not have (e.g. i do not). Will you be sharing an invite link or

Re: [External] Re: Continuing the Secondary Index Discussion

2022-03-07 Thread Piotr Findeisen
Hi Zaicheng, thanks for following up on this. I'm certainly interested. The proposed time doesn't work for me though, I'm in the CET time zone. Best, PF On Sat, Mar 5, 2022 at 9:33 AM Zaicheng Wang wrote: > Hi dev folks, > > As discussed in the sync >

Iceberg NDV stats

2022-03-16 Thread Piotr Findeisen
Hi, We at Starburst are looking into adding number distinct values (NDV) statistics to Iceberg tables, to let e.g. the Trino cost-based query optimizer produce better plans when working with Iceberg tables. The initial approach is for table-level statistics, and may be improved in the future. I

Re: [VOTE] Adopt Puffin format as a file format for statistics and indexes

2022-06-22 Thread Piotr Findeisen
;> +1, it's an exciting step for Iceberg, look forward to all the new >> statistics and secondary indices it will allow. >> >> >> >> Had a few questions of what the reference to Puffin file(s) will be in >> the Iceberg spec, but it's orthogonal to Puffin file

[VOTE] Adopt Puffin format as a file format for statistics and indexes

2022-06-09 Thread Piotr Findeisen
Hi Everyone, I propose that we adopt Puffin file format as a file format for statistics and indexes in Iceberg tables. Puffin file format specification: https://github.com/apache/iceberg/blob/master/format/puffin-spec.md (previous discussions: https://github.com/apache/iceberg/pull/4944,

Re: Positional delete with vs without the delete row values

2022-05-09 Thread Piotr Findeisen
Hi Peter, FWIW, Trino Iceberg connector writes deletion files with just positions, without row data. cc @Alexander Jo > For the 1st point we just need to collect the statistics during the delete, but we do not have to actually persist the data. I would be weary of creating ORC/Parquet files

Re: Change default format-version of our forked Iceberg to v2

2023-01-11 Thread Piotr Findeisen
Hi, FWIW Trino already creates v2 tables by default. Thought it's worth sharing for context. Best PF On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang wrote: > Hi all, > > We've maintained a forked Iceberg internally and all our use cases involve > v2 tables with row-level updates and deletes.

Re: [Proposal] Partition stats in Iceberg

2022-11-23 Thread Piotr Findeisen
Hi Ajantha, this is very interesting document, thank you for your work on this! I've added a few comments there. I have one high-level design comment so I thought it would be nicer to everyone if I re-post it here is "partition" the right level of keeping the stats? > We do this in Hive, but

Re: [VOTE] Release Apache Iceberg 1.1.0 RC4

2022-11-28 Thread Piotr Findeisen
Hi, https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-core/1.1.0/ is already published (on Nov 22, so before voting was concluded) Is it "the" release, or there will be new tag pushed to maven central? best, PF On Mon, Nov 28, 2022 at 9:18 AM Gabor Kaszab wrote: > > Hey All, >

Re: [DISCUSS] Hive locks removal

2023-01-20 Thread Piotr Findeisen
Hi Peter Thanks for bringing this issue up and thanks for working on it as well. I haven't experienced these problems first-hand, so don't have opinion yet on what the solution should or shouldn't be. With this new approach, is there any plan to address situations where more than one application