Re: Long text and complex data types support

Mauricio Aristizabal Wed, 11 Sep 2019 17:05:30 -0700

Would be good if Kudu supported the way Impala can store and query nested
data in hdfs/parquet, so it would be (at least mostly) transparent to query
nested data in either storage engine.  We recently had a use for this
(basically storing N order item details along with each order record) but
decided against it because we know we'll be moving that table from Parquet
to Kudu soon.


On Wed, Sep 11, 2019 at 1:49 PM Dmitry Degrave <dmee...@gmail.com> wrote:

> Hi Grant,
>
> An example from genomics. Current scheme is simple [1] (denormalized
> for performance), but requires N = S * V rows in genotype table (S is
> number of samples, V is average number of variants in a sample,
> typical value for WGS V=5*10^6 and we deal with tens of thousands of
> samples). More optimal scheme would keep all variants of a sample in a
> single row, which is impossible now.
>
> Supporting nested data structures, e.g. similar to implemented in
> ClickHouse [2], would be useful too.
>
> Supporting serialized objects (e.g. java's hashtables with
> capabilities to select only rows with hashtables containing some
> specific keys) would make Kudu super-special ;)
>
> ~dmitry
>
> [1] https://gist.github.com/dnafault/e55ea987c55d2960c738d94e4811d043
> [2]
> https://clickhouse-docs.readthedocs.io/en/latest/data_types/nested_data_structures/nested.html
>
> On Mon, 9 Sep 2019 at 08:18, Grant Henke <ghe...@cloudera.com> wrote:
> >
> > Hi Boris,
> >
> > Can you describe in more detail what exactly you are looking for in a
> long text type? Is there another database that has an equivalent type for
> reference?
> >
> > I have started looking at complex type support and plan to put up a
> design document soon. No estimates on when it would be complete or how much
> work is required exists yet. Do you have any sample schemas with complex
> types you could send me to help inform designs and trade offs?
> >
> > Thank you,
> > Grant
> >
> > On Sat, Sep 7, 2019 at 11:43 AM Boris Tyukin <bo...@boristyukin.com>
> wrote:
> >>
> >> Hi guys,
> >>
> >> Any plans to support long text type in Kudu? We would love to use Kudu
> with other projects but unfortunately long text data are pretty common in
> healthcare industry and we have to use hive/Impala/hdfs instead which is
> quite painful since we cannot do in place updates and deletes.
> >>
> >> Same question about complex types (arrays, maps etc.)
> >>
> >> Thanks
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauri...@impact.com | 323 309 4260
https://impact.com
<https://www.linkedin.com/company/impact-martech/>
<https://www.facebook.com/ImpactParTech/>
<https://twitter.com/impactpartech>
<https://www.youtube.com/c/impactmartech>


<http://go.impact.com/WR-PC-AW-DiscoveringGrowthThroughPartnerships.html?utm_medium=owned-email-send&utm_source=sigsatori&utm_campaign=webinarreg-201909-discoveringgrowth-pc>

Re: Long text and complex data types support

Reply via email to