then we need to
>>> make it easy to get the expected string representation in and out, just
>>> like we do with date/time types. I don't think that's specific to any
>>> engine.
>>>
>>> On Thu, Jul 29, 2021 at 9:00 AM Jacques Nadeau
>>> wrote:
>>
here.
>>
>> Have we converged? I think most people would assume that silence is a
>> vote for the status-quo.
>>
>> On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen
>> wrote:
>>
>>> Hi,
>>>
>>> It seems we converged here that UU
Hi,
I agree with Ryan, that it takes some precautions before one can assume
uniqueness of UUID values, and that this shouldn't be any special for UUIDs
at all.
After all, this is just a primitive type, which is commonly used for
certain things, but "commonly" doesn't mean "always".
The
t;>>> edu...@dremio.com> wrote:
>>>>>
>>>>>> Could we just update the slack link to
>>>>>> https://join.slack.com/t/apache-iceberg/ on the website (see PR#2882
>>>>>> <https://github.com/apache/iceberg/pull/2882>
- @Jacques Table references in the views can be arbitrary objects such
>as tables from other catalogs or elasticsearch tables etc. I will clarify
>it in the spec.
>
> I will work on incorporating all the comments in the spec and make the
> next revision available for r
]
Best
PF
On Thu, Jul 29, 2021 at 12:51 PM Eduard Tudenhoefner
wrote:
> The default invite link expires after 30 days, that's why I was looking
> for alternatives. Maybe a slack admin can check if the invite link can be
> configured to not expire.
>
> On Thu, Jul 29, 2021, 11:51
Hi,
I was told the screenshot in my previous email doesn't show up, so sharing
it as link instead
https://gist.github.com/findepi/68e6a141d6ea06049c33e85c5ccd5835#gistcomment-3835460
Best
PF
On Thu, Jul 29, 2021 at 4:13 PM Piotr Findeisen
wrote:
> Hi,
>
> @Ryan Blue , where can
Hi
Igor, does fs.gs.outputstream.upload.chunk.size affect the file size I can
upload?
Can i upload e.g. 1GB Parquet file, while also setting fs.gs.outputstream.
upload.chunk.size=8388608 (8MB / MiB)?
Best
PF
On Fri, Dec 3, 2021 at 5:33 PM Igor Dvorzhak wrote:
> No, right now this is a global
Hi,
I don't know about Pandas, but the question about timestamp precision is
interesting to me nonetheless.
At Starburst, we've had customer asking for nanosecond timestamp precision,
and this drove adding that capability to Trino.
(Actually, picosecond timestamp precision was implemented, but I
Hi,
When you come from storage perspective, then the current design of 'not
owning' location makes sense.
However, if you come from SQL perspective, then all this is impractical
limitation. Analysts and other SQL users want to be able to delete their
data and must have confidence that all the
Hi
Just curious. S3URI seems aws s3-specific. What would be the goal of using
S3URI with google cloud storage urls?
what problem are we solving?
PF
On Wed, Dec 1, 2021 at 4:56 PM Russell Spitzer
wrote:
> Sounds reasonable to me if they are compatible
>
> On Wed, Dec 1, 2021 at 8:27 AM Mayur
S URIs are compatible
> with the AWS S3 SDKs and if they are added to the list of supported
> prefixes, they work with S3FileIO.
>
>
>
> Thanks,
>
> Mayur
>
>
>
> *From:* Piotr Findeisen
> *Sent:* Wednesday, December 1, 2021 10:58 AM
> *To:* Iceberg Dev List
> *
ures to be not supported in S3FileIO, so I
>>> think a specific GCS FileIO would likely be better for GCS support in the
>>> long term.
>>>
>>>
>>>
>>> Could you describe how you configure S3FileIO to talk to GCS? Do you
>>> need to override th
Hi,
FWIW, in Trino we just added Trino views support.
https://github.com/trinodb/trino/pull/8540
Of course, this is by no means usable by other query engines.
Anjali, your document does not talk much about compatibility between query
engines.
How do you plan to address that?
For example, I am
Hi,
I've filed https://github.com/apache/iceberg/issues/2837 for this as well.
Best
PF
On Sat, Jul 17, 2021 at 12:48 AM Piotr Findeisen
wrote:
> Hi,
>
> It was discovered by @Mateusz Gajewski
> that Iceberg bucketing
> transformation for string isn't regular Murmur3 32-bit
ed language of our own. (What additional
>> functionality does it provide over Calcite?)
>>
>> Given that the view metadata is json, it is easily extendable to
>> incorporate any new fields needed to make the SQL truly compatible across
>> engines.
>>
>>
Hi Bhavyam,
Has this been discussed on the sync?
Ryan, will it be making into the table metadata spec?
Best,
PF
On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal
wrote:
> Hi Everyone,
>
> I would like to discuss and get feedback on the following proposal for
> Z-Ordering in the Iceberg Sync
Hi,
It was discovered by @Mateusz Gajewski
that
Iceberg bucketing transformation for string isn't regular Murmur3 32-bit
hash.
Upon closer investigation we found out that the code:
Hi,
File level distinct count (a number) has limited applicability in Trino.
It's useful for pointed queries, where we can prune all the other files
away, but in other cases, Trino optimizer wouldn't be able to make an
educated use of that.
Internally, Ćukasz and I we were talking about sketches
T during a create view query.
>>
>> 3. With these considerations, I think the "sql" field can potentially be
>> a map (maybe called "engine-sqls"?), where key is the engine type and
>> version like "Spark 3.1", and value is the view SQL
Hi,
I don't have opinion which Slack workspace this is in, as long as it's easy
to join.
Manual joining process is not healthy for sure.
Btw, the apache-iceberg is currently limited to @apache.org emails, which
some people do not have (e.g. i do not).
Will you be sharing an invite link or
Hi Zaicheng,
thanks for following up on this. I'm certainly interested.
The proposed time doesn't work for me though, I'm in the CET time zone.
Best,
PF
On Sat, Mar 5, 2022 at 9:33 AM Zaicheng Wang
wrote:
> Hi dev folks,
>
> As discussed in the sync
>
Hi,
We at Starburst are looking into adding number distinct values (NDV)
statistics to Iceberg tables, to let e.g. the Trino cost-based query
optimizer produce better plans when working with Iceberg tables.
The initial approach is for table-level statistics, and may be improved in
the future.
I
;> +1, it's an exciting step for Iceberg, look forward to all the new
>> statistics and secondary indices it will allow.
>>
>>
>>
>> Had a few questions of what the reference to Puffin file(s) will be in
>> the Iceberg spec, but it's orthogonal to Puffin file
Hi Everyone,
I propose that we adopt Puffin file format as a file format for statistics
and indexes in Iceberg tables.
Puffin file format specification:
https://github.com/apache/iceberg/blob/master/format/puffin-spec.md
(previous discussions: https://github.com/apache/iceberg/pull/4944,
Hi Peter,
FWIW, Trino Iceberg connector writes deletion files with just positions,
without row data. cc @Alexander Jo
> For the 1st point we just need to collect the statistics during the
delete, but we do not have to actually persist the data.
I would be weary of creating ORC/Parquet files
Hi,
FWIW Trino already creates v2 tables by default.
Thought it's worth sharing for context.
Best
PF
On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang wrote:
> Hi all,
>
> We've maintained a forked Iceberg internally and all our use cases involve
> v2 tables with row-level updates and deletes.
Hi Ajantha,
this is very interesting document, thank you for your work on this!
I've added a few comments there.
I have one high-level design comment so I thought it would be nicer to
everyone if I re-post it here
is "partition" the right level of keeping the stats?
> We do this in Hive, but
Hi,
https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-core/1.1.0/
is already published (on Nov 22, so before voting was concluded)
Is it "the" release, or there will be new tag pushed to maven central?
best,
PF
On Mon, Nov 28, 2022 at 9:18 AM Gabor Kaszab wrote:
>
> Hey All,
>
Hi Peter
Thanks for bringing this issue up and thanks for working on it as well.
I haven't experienced these problems first-hand, so don't have opinion yet
on what the solution should or shouldn't be.
With this new approach, is there any plan to address situations where more
than one application
30 matches
Mail list logo