On Wed, Jun 21, 2023 at 12:25 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > > > On 6/20/23 16:56, Joel Jacobson wrote: > > On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote: > >> On 6/20/23 12:59, Joel Jacobson wrote: > >>> On Mon, Jun 19, 2023, at 02:00, jian he wrote: > >>>> select hashset_contains('{1,2}'::int4hashset,NULL::int); > >>>> should return null? > >>> > >>> I agree, it should. > >>> > >>> I've now changed all functions except int4hashset() (the init function) > >>> and the aggregate functions to be STRICT. > >> > >> I don't think this is correct / consistent with what we do elsewhere. > >> IMHO it's perfectly fine to have a hashset containing a NULL value, > > > > The reference to consistency with what we do elsewhere might not be entirely > > applicable in this context, since the set feature we're designing is a new > > beast > > in the SQL landscape. > > > > I don't see how it's new, considering relational algebra is pretty much > based on (multi)sets, and the three-valued logic with NULL values is > pretty well established part of that. > > > I think adhering to the theoretical purity of sets by excluding NULLs > > aligns us > > with set theory, simplifies our code, and parallels set implementations in > > other > > languages. > > > > I don't see how that would be more theoretically pure, really. The > three-valued logic is a well established part of relational algebra, so > not respecting that is more a violation of the purity. > > > I think we have an opportunity here to innovate and potentially influence a > > future set concept in the SQL standard. > > > > I doubt this going to influence what the SQL standard says, especially > because it already defined the behavior for MULTISETS (of which the sets > are a special case, pretty much). So this has 0% chance of success. > > > However, I see how one could argue against this reasoning, on the basis that > > PostgreSQL users might be more familiar with and expect NULLs can exist > > everywhere in all data structures. > > > > Right, it's what we already do for similar cases, and if you have NULLS > in the data, you better be aware of the behavior. Granted, some people > are surprised by three-valued logic, but using a different behavior for > some new features would just increase the confusion. > > > A different perspective is to look at what use-cases we can foresee. > > > > I've been trying hard, but I can't find compelling use-cases where a NULL > > element > > in a set would offer a more natural SQL query than handling NULLs within > > SQL and > > keeping the set NULL-free. > > > > IMO if you have NULL values in the data, you better be aware of it and > handle the case accordingly (e.g. by filtering them out when building > the set). If you don't have NULLs in the data, there's no issue. > > And in the graph case, I don't see why you'd have any NULLs, considering > we're dealing with adjacent nodes, and if there's adjacent node, it's ID > is not NULL. > > > Does anyone else have a strong realistic example where including NULLs in > > the > > set would simplify the SQL query? > > > > I'm sure there are cases where you have NULLs in the dat aand need to > filter them out, but that's just natural consequence of having NULLs. If > you have them you better know what NULLs do ... > > It's too early to make any strong statements, but it's going to be hard > to convince me we should handle NULLs differently from what we already > do elsewhere. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
> http://www.wiscorp.com/sqlmultisets.zip > Conceptually, a multiset is an unordered collection of elements, all of the > same type, with dupli- > cates permitted. Unlike arrays, a multiset is an unbounded collection, with > no declared maximum > cardinality. This does not mean that the user can insert elements in a > multiset without limit, just > that the standard does not mandate that there should be a limit. This is > analogous to tables, which > have no declared maximum number of rows. Postgres arrays don't have size limits. unordered means no need to use subscript? So multiset is a more limited array type? null is fine. but personally I feel like so far the hashset main feature is the quickly aggregate unique value using hashset. I found using hashset count distinct (non null values) is quite faster.