[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

tigerquoll Tue, 04 Sep 2018 16:44:09 -0700

Github user tigerquoll commented on the issue:

https://github.com/apache/spark/pull/21306

Sure,
I am looking at the point of view of supporting Kudu. Check out
https://kudu.apache.org/docs/schema_design.html#partitioning for some of the
details. In particular
https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html.
As kudu is a column store, each column also has attributes associated with
it such as encoding and compression codecs.
Apache Kudu - Apache Kudu Schema
Design<https://kudu.apache.org/docs/schema_design.html#partitioning>
A new open source Apache Hadoop ecosystem project, Apache Kudu completes
Hadoop's storage layer to enable fast analytics on fast data
kudu.apache.org

I really think that partitions should be considered part of the table
schema. They have an existence above and beyond the definition of a filter
that matches a record. Adding an empty partition changes the state of many
underlying systems. Many systems that support partitions also have APIs for
adding and removing partition definitions, some systems require partition
information to be specified during table creation. Those systems that support
changing partitions after creation usually have specific for adding and
removing partitions.

Dale,

________________________________
From: Ryan Blue <notificati...@github.com>
Sent: Tuesday, 4 September 2018 4:20 PM
To: apache/spark
Cc: tigerquoll; Comment
Subject: Re: [apache/spark] [SPARK-24252][SQL] Add catalog registration and
table catalog APIs. (#21306)

Can we support column range partition predicates please?

This has an "apply" transform for passing other functions directly through,
so that may help if you have additional transforms that aren't committed to
Spark yet.

As for range partitioning, can you be more specific about what you mean?
What does that transform function look like? Part of the rationale for the
existing proposal is that these are all widely used and understood. I want to
make sure that as we expand the set of validated transforms, we aren't
introducing confusion.

Also, could you share the use case you intend for this? It would be great
to hear about uses other than just Iceberg tables.

â
You are receiving this because you commented.
Reply to this email directly, view it on
GitHub<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F21306%23issuecomment-418430089&data=02%7C01%7C%7C335b27fc36b2449d1ac208d612824fa2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716748222067761&sdata=yWzFakaWAq5yhYAo%2FuBoFkIXpP9hoh9f1N6xm3XcQOs%3D&reserved=0>,
or mute the
thread<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAH9Fuh3RPZ-hTd5T3e92TX-xmiPHEGv5ks5uXqhEgaJpZM4T8FJh&data=02%7C01%7C%7C335b27fc36b2449d1ac208d612824fa2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716748222067761&sdata=wJSnYO69FKZ8ZHbqGNrxxGsjC1W0rR7NIWOAE0EqXTA%3D&reserved=0>.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Reply via email to