Yes. My understanding is that a Data Lakehouse is a " format/engine" to improve processing (over Hadoop or some object storage), replacing pure data files (avro/parquet) by a new format. The "lakehouse" format I am looking at includes Delta, Apache Hudi or Apache Iceberg. In my understanding, they provide a combo of data file + transaction log + indexes. I believe compaction and indexing is provided by a set of background processes. The idea is to have a "data warehouse" at the price of a "data lake" (therefore the "lakehouse" term).
I was wondering if putting Apache Ignite on top of the Data Lakehouse could further improve the performance of it. I was wondering if someone already tried it and was running such a configuration successfully. On Mon, Sep 26, 2022 at 4:40 AM Stephen Darlington < [email protected]> wrote: > Similar. The original question was about using the Cache Store (with > read-through). The architecture described in the Hadoop Acceleration page > is probably better for most purposes. > > On 25 Sep 2022, at 23:25, John Smith <[email protected]> wrote: > > Something like this? > > https://ignite.apache.org/use-cases/hadoop-acceleration.html > > On Thu., Sep. 22, 2022, 3:44 a.m. Stephen Darlington, < > [email protected]> wrote: > >> I don’t know of anyone doing this, however it looks like it should be >> possible. >> >> According to a quick skim of the docs, to read/write to Hudl you need >> Flink or Spark. To use the Cache Store (read/write-through) you’d need to >> embed one of those inside Ignite, so plenty of opportunity for “dependency >> hell.” I do know of one project where they embedded Spark. >> >> On 22 Sep 2022, at 03:58, Tecno Brain <[email protected]> >> wrote: >> >> I have heard of a tool called Alluxio used between Hudi and Spark/Presto. >> ( >> https://www.alluxio.io/blog/building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go/ >> ) >> I was wondering if Apache Ignite could serve the same purpose, allowing >> queries to be processed faster. >> >> On Thu, Sep 15, 2022 at 10:29 AM Jeremy McMillan < >> [email protected]> wrote: >> >>> I just read this, about hudi, and I can't see a use case for putting >>> hudi behind an Ignite write-through cache. >>> >>> https://www.xenonstack.com/insights/what-is-hudi >>> >>> Hudi seems to be a write accelerator for Spark on HDFS, primarily. >>> >>> What would the expected outcome be if we assume the magic integration >>> was present and working as you intend? What's the difference between that >>> and not using Ignite with Hudi? >>> >>> On Wed, Sep 14, 2022, 22:50 Tecno Brain <[email protected]> >>> wrote: >>> >>>> In particular I am looking if anyone has used Apache Ignite as a >>>> write-through cache to Hudi. >>>> Does that make sense? >>>> >>>> On Wed, Sep 14, 2022 at 10:50 PM Tecno Brain < >>>> [email protected]> wrote: >>>> >>>>> I was wondering if anybody has used Hudi + Ignite? >>>>> Any references to articles, conferences are greatly appreciated. >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> >> >
