Hey all,
I'm looking to add readiness checks to coordinator and overlord nodes to
improve automatic deployment of clusters. More specifically, I'm planning
to add an endpoint to the coordinator and overlord that returns 200 OK when
the node is ready to process lookup/tiering
data from deep storage instead of sending an
HTTP request to the Druid cluster and waiting for the response.
On Sat, Feb 9, 2019 at 5:02 PM Rajiv Mordani
wrote:
> Thanks Julian,
> See some questions in-line:
>
> On 2/6/19, 3:01 PM, "Julian Jaffe" wrote:
>
&g
I think this question is going the other way (e.g. how to read data into
Spark, as opposed to into Druid). For that, the quickest and dirtiest
approach is probably to use Spark's json support to parse a Druid response.
You may also be able to repurpose some code from
; On Fri, 5 Apr 2019 at 23:23, Julian Jaffe
> wrote:
>
> > Hey all,
> >
> > I'm looking to add readiness checks to coordinator and overlord nodes to
> > improve automatic deployment of clusters. More specifically, I'm planning
> > to add an endpoint to the
Hey all,
Druid currently executes UNION ALL queries sequentially (
https://github.com/apache/incubator-druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidUnionRel.java#L98).
There's a comment in that method that restates this, but does not explain
why. Is there a reason why
I've submitted https://github.com/apache/druid/pull/9454 today to add a
`OnHeapMemorySegmentWriteOutMediumFactory`.
On Mon, Mar 2, 2020 at 8:57 AM Oğuzhan Mangır
wrote:
>
>
> On 2020/02/26 13:26:13, itai yaffe wrote:
> > Hey,
> > Per Gian's proposal, and following this thread in Druid user
I think for whatever approach we take, we'll need to expose a
OnHeapMemorySegmentWriteOutMediumFactory for OnHeapMemorySegmentWriteOutMedium
that parallels OffHeapMemorySegmentWriteOutMediumFactory. Although off heap
index building will be faster, it's very difficult to get most schedulers
to
dataQueries, to allow
> batch-oriented queries to work with against the deep storage :)
>
> Anyway, as I said, I think we can focus on write capabilities for now, and
> worry about read capabilities later (if that's OK).
>
> On 2020/03/05 18:29:09, Julian Jaffe
> wrote:
> >
The spark-druid-connector you shared brings up another design decision we
should probably talk through. That connector effectively wraps an HTTP
query client with Spark plumbing. An alternative approach (and the one I
ended up building due to our business requirements) is to build a reader
that
op
> and Kafka?
>
> Julian
>
> > On Mar 9, 2020, at 4:48 PM, Julian Jaffe
> wrote:
> >
> > Hey all,
> >
> > I recently wrote a proposal <https://github.com/apache/druid/issues/9463
> >
> > to add namespacing to Druid segments to a
t of the requirements please include querying / reading from
> Spark
> > > as well. This is a high priority for us.
> > >
> > > - Rajiv
> > >
> > > On 3/10/20, 1:26 AM, "Oguzhan Mangir"
> > > wrote:
> > >
> > > What we wil
Hey all,
There have been ongoing discussions on this list and in Slack about
improving interoperability between Spark and Druid by creating Spark
connectors that can read from and write to Druid clusters. As these
discussions have begun to converge on a potential solution, I've opened a
proposal
Github proposal <https://github.com/apache/druid/issues/9780>. I'll send a
separate email to the dev list in the morning as well.
On Thu, Apr 2, 2020 at 11:04 AM Julian Jaffe wrote:
> I had a few hours last night, so I worked up a rough cut of a spark reader
> <htt
Bimonthly ping for reviews :) I’m perfectly willing to hop on Slack or a video
call to walk through the code and design as well if potential reviewers would
find that helpful.
> On Apr 14, 2021, at 10:06 AM, Julian Jaffe wrote:
>
>
> Hey Samarth,
>
> I’m overjoyed to
Hey Jagannatha,
Please see the Druid Schema Design Tips[1] for more information, but for SCD2
usually the easiest and most performant solution is to denormalize your data.
If you store the current value at ingestion time, when your dimensions change,
new rows will be written with the new
Hey Benedict,
Have you tried creating indices on your segments table? I’ve managed Druid
clusters with orders of magnitude more segments without this issue by indexing
key filter columns. (The coordinator is still a painful bottle neck, just not
due to query times to the metadata server )
Hey Gian,
I’d be overjoyed to be proven wrong! For what it’s worth, my pessimism was not
driven by a lack of faith in the Druid community or the Druid committers but by
the fact that these connectors may be an awkward fit in the Druid code base
without more buy-in from the community writ
Hey Druids,
Last April, there was some discussion on this mailing list, Slack, and GitHub
around building Spark-Druid connectors. After working up a rough cut, the
effort was dormant until a few weeks ago when I returned to it. I’ve opened a
pull request for the connectors, but I don’t
our Spark-Druid connector PRs. Ingesting data
> into Druid using Spark SQL and Dataframe API is something we are very keen
> to onboard.
> Could you point me to them or alternatively add me as a reviewer?
>
> - Samarth
>
>> On Tue, Apr 13, 2021 at 11:51 PM Julian Jaffe
>
Thu, Feb 25, 2021 at 12:03 AM Julian Jaffe
>> wrote:
>>
>> Hey Gian,
>>
>> I’d be overjoyed to be proven wrong! For what it’s worth, my pessimism was
>> not driven by a lack of faith in the Druid community or the Druid
>> committers but by the fact that
For Spark support, the connector I wrote remains functional but I haven’t
updated the PR for six months or so since it didn’t seem like there was an
appetite for review. If that’s changing I could migrate back some more recent
changes to the OSS PR. Even with an up-to-date patch though I see
Hey all,
There was talk earlier this year about resurrecting the effort to add direct
Spark readers and writers to Druid. Rather than repeat the previous attempt and
parachute in with updated connectors, I’d like to start by building a little
more consensus around what the Druid dev community
22 matches
Mail list logo