Re: A list of issues for new committers

2019-10-09 Thread Gian Merlino
That is definitely a good property for starter issues (not needing a production cluster to validate). On Fri, Oct 4, 2019 at 12:01 AM Roman Leventov wrote: > I don't use "Contributions Welcome" exclusively for starter issues. I do > _not_ put this label though on issues that are impossible or

Re: Looking out of the Druid's development bubble at modern Java testing practices

2019-10-09 Thread Gian Merlino
I like the "Complete Vertical Slide" recommendation. It goes against the wisdom of having focused unit tests, but I think in my experience, the tests that shake out the most bugs (and are most robust to refactoring) have been ones that wrap together a lot of layers. One thing I didn't see in the

Re: A list of issues for new committers

2019-10-09 Thread Gian Merlino
Vadim, the idea of removing the Difficulty labels and repurposing "Easy" for intro issues sounds good to me. It sounds like "Starter" as you envision it is a subset of "Contributions Welcome" as Roman envisions it. I wonder if there is some way we can align these better. It looks like _most_ of

Graduation 

2019-12-20 Thread Gian Merlino
Hey Druids, It is official: Druid has graduated to a top level project! Now, we need to conduct various post-graduation tasks. The first is to raise an infra ticket to migrate the appropriate resources. I started it off here: https://issues.apache.org/jira/browse/INFRA-19609. Please, take a look

Travis emails being send to dev list

2020-03-04 Thread Gian Merlino
Recently, Travis CI emails started being sent from bui...@travis-ci.org to dev@druid.apache.org. Did someone change something recently to make this happen? Also, do people enjoy that they show up here? I'm asking because currently they end up in a spam moderation queue and need to be manually

Re: Travis emails being send to dev list

2020-03-05 Thread Gian Merlino
ss (but keep the > notification for success to failure). > > Thanks, > Chi > > > On Mar 4, 2020, at 11:47 AM, Gian Merlino wrote: > > > > Recently, Travis CI emails started being sent from bui...@travis-ci.org > to > > dev@druid.apache.org. Did someone

Re: ByteBuffer / Memory / Unsafe et al

2020-02-05 Thread Gian Merlino
This does not exclude or means we should not use Memory API for other stuff > like sketches et al, in fact i think even for project like Sketches it > makes more sense to move to newer API offered by the JDK rather that do it > your self. > > > On Tue, Feb 4, 2020 at 10:12 P

Re: ByteBuffer / Memory / Unsafe et al

2020-02-05 Thread Gian Merlino
straction. > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino wrote: > > > The thing that worries me about JEP 370 is that if historical Java user > > migration patterns hold up, we will need to support Java 11 for a while > > (probably another 2–3 years), and we would therefore nee

Re: ByteBuffer / Memory / Unsafe et al

2020-02-06 Thread Gian Merlino
later, once we drop support for Java pre-14. Separately, I think if we do build an abstraction layer here, we need to make sure the performance overhead is zero — it's important that the jvm be able to inline the underlying calls. > @Gian Merlino I think i am not 100% sure about the scope

Re: ByteBuffer / Memory / Unsafe et al

2020-02-06 Thread Gian Merlino
ar, and is workable, we will be strongly > motivated to replace our current use of Unsafe inside Memory with the newer > API, and all of that could be behind the current Memory API. > > > > > On 2020/02/06 20:33:18, Gian Merlino wrote: > > By the way, I just did a q

Re: mocking frameworks for tests in druid

2020-02-04 Thread Gian Merlino
I've never really liked EasyMock. I agree that its design tends to make test code too tightly coupled with the specific implementation being tested. > What would it take for us to try another framework like Mockito? IMO, for me all it'd take is a PR changing one of our EasyMock tests to a

ByteBuffer / Memory / Unsafe et al

2020-02-04 Thread Gian Merlino
Hey Druids, There has generally been a lot of talk about moving away from ByteBuffer and towards the DataSketches Memory package ( https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or even using Unsafe directly. Much of that discussion happened on

Re: draft ASF Board Report Feb 2020

2020-02-16 Thread Gian Merlino
Thanks for taking a look, Julian. I added this to the agenda via Whimsy Friday and fixed the spelling error. On Mon, Feb 17, 2020 at 11:04 AM Julian Hyde wrote: > Looks good. > > Maybe mention that we are working with Sally on a press release to > announce graduation? > > Spelling: New Dehli

Re: draft ASF Board Report Feb 2020

2020-02-14 Thread Gian Merlino
Thanks Clint! I could not have written it better myself. I just added this to the Board agenda for next week. On Fri, Feb 14, 2020 at 5:28 PM Clint Wylie wrote: > Hey all, > > I've put together our ASF board report for Feb 2020, and while I haven't > yet determined how to actually submit it, I

Re: Pull Requests need a review

2020-01-15 Thread Gian Merlino
Hey Serge, Thanks for the patches! I took a look at https://github.com/apache/druid/pull/8881 and posted a review. If anyone else could help review the other two, I'd be grateful. Gian On Mon, Jan 13, 2020 at 9:06 AM Serge Bespalov wrote: > Hello Druid developers. > I have following opened

Re: Publish staging docs for release earlier?

2020-01-15 Thread Gian Merlino
I love the idea. Even better if we can publish the docs from master somewhere (in addition to the current release branch). Both are useful to see. On Tue, Jan 14, 2020 at 7:14 PM Jonathan Wei wrote: > Hi all, > > We currently publish a staging version of the Druid docs for an upcoming >

Re: Druid Summit 2020: Call for speakers!

2020-01-14 Thread Gian Merlino
Gian Merlino wrote: > Hey Druids, > > I am excited to announce Druid Summit <https://druidsummit.org/>, an > event being held in the San Francisco Bay Area next April 13–15, 2020. The > entire Apache Druid community is welcome. > > It would be great to see a bunch of peopl

New committer: Chi Cao Minh

2020-01-21 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Chi Cao Minh (@ccaominh on GitHub) to become a committer and we are pleased to announce that he has accepted. Chi has done work in a variety of areas, including adding range partitioning to native batch ingestion, quality-of-life work on CI and dependency

January 2020 Druid report

2020-01-09 Thread Gian Merlino
Hey Druids, Now that we're a top level project we're required to report periodically to the board. I just sent the following report (our first one). I'm posting it here in case anyone has any feedback, and so anyone interested can read it. Next time we'll post a draft here before submitting the

New committer: Samarth Jain

2020-01-02 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Samarth Jain (@samarthjain on GitHub) to become a committer and we are pleased to announce that he has accepted. Samarth has contributed a variety of improvements to Druid over the past year and has also given back to the

Re: Empty Data Source in Druid

2019-12-23 Thread Gian Merlino
In Druid it's not possible to have a datasource without any segments. But is possible, in theory, to have an empty datasource: you would need a single segment that has no rows (the important part is that the segment exists, not that it actually has rows in it). But there are two problems with

Re: Test naming in Druid

2019-12-23 Thread Gian Merlino
Suneet, Sometimes it's hard to understand how things would improve without an example. Could you point to a test file that you think would be improved by this change? Also, there are some test files that I would struggle to fit into this framework. It seems best suited to simple single-method

Druid Summit 2020: Call for speakers!

2019-12-24 Thread Gian Merlino
Hey Druids, I am excited to announce Druid Summit , an event being held in the San Francisco Bay Area next April 13–15, 2020. The entire Apache Druid community is welcome. It would be great to see a bunch of people from the community giving talks about Druid. The call

Re: Nulls vs Optional

2019-12-30 Thread Gian Merlino
asant to work > with > > than the JDK's Optional. > > > > On Thu, Oct 10, 2019 at 5:46 PM Gian Merlino wrote: > > > >> For reference, a (brief) earlier conversation about this: > >> https://github.com/apache/incubator-druid/issues/4275, which links to > >&g

Re: Test naming in Druid

2019-12-30 Thread Gian Merlino
s/expects is a good idea (that would be enough info in my view). > > > > Since I don't think the proposed format applies universally, I would > prefer > > starting it off as a suggestion/best practice instead of as a hard > > requirement and seeing how that goes. &

Re: Graduation 

2019-12-30 Thread Gian Merlino
of the incubation references (download > links and the ASF release process guide aren't updated): > https://github.com/apache/druid/pull/9108 > > On Sat, Dec 21, 2019 at 4:10 PM Vadim Ogievetsky > wrote: > > > Huzzah! > > > > Thank you for all the hard work Gian. > >

Re: 8399 Migrating Guava to Caffeine

2020-01-05 Thread Gian Merlino
Hey JJ, I think your idea of adding a new option and deprecating "guava" is a good way forward. Gian On Fri, Dec 27, 2019 at 7:50 AM JJ Meyer wrote: > Hello all, > > I'm planning on contributing for the first time. I'm working on > https://github.com/apache/druid/issues/8399. No issues seem

Re: 8399 Migrating Guava to Caffeine

2020-01-06 Thread Gian Merlino
in this, Caffeine's concurrency is practically > "elastic" and doesn't demand concurrencyLevel. > > On Mon, 6 Jan 2020 at 01:13, Gian Merlino wrote: > > > Hey JJ, > > > > I think your idea of adding a new option and deprecating "guava" is a > good

New committer: Alexander Saydakov

2020-01-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Alexander Saydakov (@AlexanderSaydakov on GitHub) to become a committer and we are pleased to announce that he has accepted. Alexander has contributed extensively to Druid's DataSketches extension, and is also a committer and PPMC member on the Apache

Re: Draft April ASF Board Report

2020-04-09 Thread Gian Merlino
thly reports and wasn't > sure > > if it is necessary to go over it again. I think it is probably fine > without > > it since the information was included in previous reports? > > > > On Wed, Apr 8, 2020 at 1:44 PM Gian Merlino wrote: > > > > > It does matter! But, we mentio

Re: Draft April ASF Board Report

2020-04-08 Thread Gian Merlino
It does matter! But, we mentioned those in a previous report (our last one was just a month ago — so this one covers the last month). After this report they'll start being quarterly and covering 3 months. On Wed, Apr 8, 2020 at 1:18 PM itai yaffe wrote: > Hey, > Not sure it matters, but we

Re: Draft April ASF Board Report

2020-04-08 Thread Gian Merlino
Looks good to me. Thank you for drafting the report this month. On Tue, Apr 7, 2020 at 6:05 PM Clint Wylie wrote: > Hey all, > > I put together a draft for the quarterly ASF board report due tomorrow, > sorry for the short notice. Let me know if I missed anything or should make > any changes.

Re: Cross-platform discrepancies

2020-04-29 Thread Gian Merlino
is will still require some changes to our library to support memory >>> >> allocation like this, but it seems to be less challenging then the >>> current >>> >> Direct memory mode we have. >>> >> >>> >> There are some trade offs here, as us

Re: Moving Average error

2020-04-29 Thread Gian Merlino
Hey Damiano, This is a contrib extension so you might get limited support for it here. That being said, at first, I suggest looking through the logs mentioned by the supervise errors (like /home/damiano/druid/0.17.1/var/sv/coordinator-overlord.log). IIRC you might need to disable the extension

Streaming updates and deletes

2020-04-30 Thread Gian Merlino
Hey Druids, Now that a join operator exists and is well on its way to being useful, I started thinking about some other pie in the sky ideas. In particular one that seems very useful is supporting updates and deletes. Of course, we support updates and deletes today, but only on a

New committer: Atul Mohan

2020-09-02 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Atul Mohan (@a2l007 on github) to become a committer and we are pleased to announce that he has accepted. Atul has been actively working on various parts of Druid, including indexing from SQL sources and result-level caching.

Code reviews, UX, and tests

2020-10-15 Thread Gian Merlino
Hey Druids, I am writing to you all to ask for your help  In particular, your help in ensuring that potential code contributions are reviewed in a timely fashion. Right now we have 72 open PRs, which due to stalebot are mostly opened pretty recently. That's a lot of people that want to

Re: [CRON] Broken: apache/druid#28120 (master - c72f96a)

2020-08-19 Thread Gian Merlino
There's a lot of these with messages like: > [ERROR] Failed to execute goal org.owasp:dependency-check-maven:5.3.2:check (default-cli) on project druid: Fatal exception(s) analyzing Druid: One or more exceptions occurred during analysis: > [ERROR] Unable to connect to the dependency-check

Re: Help in Configuring data retention

2020-09-21 Thread Gian Merlino
Hey Satish, Are you asking if Druid can write a log of load/drop rule changes to a Kafka topic? If so, no, it cannot. But it does write them to the metadata store, and perhaps you could use a tool to copy them from the metadata store into Kafka. On Mon, Sep 21, 2020 at 6:46 AM Satish Embadi <

Re: PRs awaiting review

2020-05-27 Thread Gian Merlino
Hey Samarth, It looks like the last PR has been merged already — great! I just wrote up a review for your first PR, about round robin data types. I haven't had a chance to check out the unknown-complex-types PR yet; apologies. I'm now subscribed to them all, though. On Fri, May 15, 2020 at

Re: Study On Rejected Refactorings

2020-08-12 Thread Gian Merlino
Hey Jevgenija, I recently filled out the survey — hope the response is helpful! On Tue, Aug 11, 2020 at 1:05 PM Jevgenija Pantiuchina < jevgenija.pantiuch...@usi.ch> wrote: > Dear contributors, > > As part of a research team from Università della Svizzera italiana > (Switzerland) and University

Re: SQL Support for Tuple Sketches

2020-08-12 Thread Gian Merlino
Hey Mithal, I'm not aware of anyone currently working on it, so you certainly are welcome to! On Mon, Aug 10, 2020 at 11:56 AM Mithal Kothari wrote: > Hi Druid Dev team, > > I just wanted to follow up with you'll and find out if there is a > plan/possibility to introduce sql support for tuple

Re: Druid not listed in Apache project list by category?

2020-07-31 Thread Gian Merlino
That's a good point. We must be missing some metadata. I'm not sure how this page works — does anyone else know? On Fri, Jul 31, 2020 at 11:49 AM Will Lauer wrote: > I was browsing the list of Apache projects today looking for something, and > while I was there, I noticed that Druid was

New committer: Lucas Capistrant

2020-07-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Lucas Capistrant (@capistrant on github) to become a committer and we are pleased to announce that he has accepted. Lucas has been active throughout the past year, contributing various enhancements and fixes. Congratulations

New committer: Suneet Saldanha

2020-07-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Suneet Saldanha (@suneet-s on github) to become a committer and we are pleased to announce that he has accepted. Suneet has contributed to areas including the new join functionality, documentation, and general code quality. He

New committer: Maytas Monsereenusorn

2020-07-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Maytas Monsereenusorn (@maytasm on github) to become a committer and we are pleased to announce that he has accepted. Maytas has contributed to various areas including automated testing improvements and bug fixes. He has also been

New committer: Maggie Brewster

2020-07-07 Thread Gian Merlino
Hey Druids, The Druid PMC has invited Maggie Brewster (@mcbrewster on github) to become a committer and we are pleased to announce that she has accepted. Maggie has made dozens of contributions to Druid, especially to the (relatively) new web console.

Druid + Presto?

2020-07-09 Thread Gian Merlino
Hey Druids, I was wondering, is anyone on this list using Druid + Presto together? If so, what does your architecture look like and which edition / flavor of Presto and Druid connector are you using? What's your experience been like? I'm asking since I'm starting to think about whether it makes

Re: Druid + Presto?

2020-07-09 Thread Gian Merlino
By the way, I see that the other Presto has a Druid connector too: https://prestodb.io/docs/current/connector/druid.html. From the docs it looks like it has different lineage and might even work differently. On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino wrote: > I was thinking of exploring id

Re: Druid + Presto?

2020-07-09 Thread Gian Merlino
o be improved over the next few > releases. We are currently evaluating using the presto-druid connector in > our Tableau setup. It would be interesting to see what changes in Druid > would be needed to support that integration. > > Thanks, > Samarth > > On Thu, Jul 9, 2020 at 10

Re: Druid + Presto?

2020-07-10 Thread Gian Merlino
iao > > > > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh wrote: > > > > > Hello Gian, > > > > > > We are currently testing the (other) Presto Druid connector at our end. > > It > > > has aggregation push down support. Adding Zhenxiao to thi

Re: Any benchmarks for druid iingesting, querying (min, max, topn avg etc)

2020-07-10 Thread Gian Merlino
Hey Rajiv, I'm not aware of one for ingestion. For querying, two recent results using the Star Schema Benchmark are this paper comparing Druid, Hive, and Presto: https://www.researchgate.net/publication/333831332_Challenging_SQL-on-Hadoop_Performance_with_Apache_Druid, and this blog post

Re: Druid + Presto?

2020-07-10 Thread Gian Merlino
n push down support. Adding Zhenxiao to this thread since > he > > is the primary developer of the connector. He can provide the kind of > > details you are looking for. > > > > Thanks, > > Mainak > > > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino wrot

Re: Druid + Presto?

2020-07-10 Thread Gian Merlino
the last year or so — does it work the same way in each one? I'm wondering how much work can be shared between these different efforts and perhaps between these efforts and the Druid project itself. On Thu, Jul 9, 2020 at 11:24 PM Gian Merlino wrote: > Hey Samarth, > > Thanks fo

Re: Feature lifecycle for Druid features

2020-06-15 Thread Gian Merlino
IMO the alpha / beta / GA terminology makes sense, and makes things clearer to users, which is good. Some thoughts on the specifics of your proposal: - You're suggesting we commit to a specific number of releases that a GA feature will be forward / backward compatible for. IMO, our current

Re: Druid 0.19.0

2020-06-15 Thread Gian Merlino
I commented on https://github.com/apache/druid/issues/10011; it looks like a SQL planner problem to me. I also logged a review of https://github.com/apache/druid/pull/10027. I don't think either of these needs to be a release blocker, though. 10027 in particular I am sure has been around for a

Re: Non JSON-query API clients

2020-11-13 Thread Gian Merlino
I'm not aware of plans to build out official clients for those other APIs; when I've written python programs to integrate with them I've usually called them through http directly. I'm not familiar with OpenAPI, but looking at it briefly, it seems like an interesting concept and a potential way to

Re: [E] Re: Removing Druid support for JDK 8 and adding support for JDK 11

2020-11-13 Thread Gian Merlino
Seconding (thirding?) the idea that keeping JDK 8 for integration with Hadoop is important. Druid's Hadoop integration is built against Hadoop 2.x and that version only supports JDK 8: https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions. We shouldn't drop JDK 8 support until we

Re: Forbidding forced git push

2021-01-15 Thread Gian Merlino
Will this help for the (common) case where PR branches are in people's forks? On Fri, Jan 15, 2021 at 1:00 PM Jihoon Son wrote: > Hi all, > > The forced git push is usually used to make the commit history clean, which > I understand its importance. However, one of its downsides is, because it >

Re: Deprecate support for ZooKeeper 3.4.x

2021-01-19 Thread Gian Merlino
About time, I suppose. I replied to the issue on GitHub. I think the trickiest part is figuring out what migration will look like for users so we can write up some useful release notes. On Tue, Jan 19, 2021 at 5:43 PM Xavier Léauté wrote: > Hi everyone, I wrote up a short issue on deprecating

Re: Enabling dependabot in our github repository

2021-06-07 Thread Gian Merlino
There's been some extra discussion this PR: https://github.com/apache/druid/pull/11079 I just +1'ed it, but I wanted to come back here to say that IMO, we should avoid getting in the habit of blindly applying these updates without testing. There's been lots of situations in the past where a

Re: A question about a potential bug in Druid Joins

2021-06-24 Thread Gian Merlino
_id = DIM.api_client_id > > So the “api_client_id” field is `long` type in both > “inline_data” and “inline_dimension_api_clients_1” datasources. However, > when doing a join, the makeLongProcessor method will be called, and > throw an “UnsupportedOperationException" because

Re: Enabling dependabot in our github repository

2021-06-08 Thread Gian Merlino
Here's a running list of PRs opened by the dependabot: https://github.com/apache/druid/pulls?q=is%3Apr+author%3Aapp%2Fdependabot On Mon, Jun 7, 2021 at 12:22 PM Gian Merlino wrote: > There's been some extra discussion this PR: > https://github.com/apache/druid/pull/11079 > >

Re: Push-down of operations for SystemSchema tables

2021-05-19 Thread Gian Merlino
Hey Frank, These notes are really interesting. Thanks for writing them down. I agree that the three things you laid out are all important. With regard to SQL clauses from the web console, I did notice one recent change went in that changed the SQL clauses to only query sys.segments for columns

Re: FlattenSpec for Nested Data With Unknown Array Length

2021-05-20 Thread Gian Merlino
Hey Evan, Druid's data model doesn't currently have a good way of storing arrays of objects like this. And you're right that even though joins exist, to get peak performance you want to avoid them at query time. In similar situations I have stored data models like this as 3 tables (entries,

Re: Push-down of operations for SystemSchema tables

2021-05-19 Thread Gian Merlino
Hey Jason, It sounds like we have two different, but related goals: 1) Your goal is to improve the performance of system tables. 2) My goal with the branch Clint linked is to enable using Druid's native query engine for system tables, in order to achieve consistency in how SQL queries are

Re: Subject: [CVE-2021-26919] Authenticated users can execute arbitrary code from malicious MySQL database systems

2021-04-01 Thread Gian Merlino
I wanted to add a few more details about this advisory, in the hopes that it will be helpful to people that are upgrading. Here's a link to the relevant docs about the new properties: https://druid.apache.org/docs/latest/configuration/index.html#ingestion-security-configuration And the most

Re: SpringBoot +MyBatis +Apache Druid

2021-03-10 Thread Gian Merlino
Hey Shamriya, It would help to know some more about what kind of integration you're trying to do, and which kind of driver class isn't being recognized. On Wed, Mar 10, 2021 at 11:36 AM nandalapadu shamriyashaik < nshamr...@gmail.com> wrote: > Hi, > > I am new to Druid and struggling to

Re: Spark-Druid Connectors

2021-03-02 Thread Gian Merlino
Thank you! On Thu, Feb 25, 2021 at 12:03 AM Julian Jaffe wrote: > Hey Gian, > > I’d be overjoyed to be proven wrong! For what it’s worth, my pessimism was > not driven by a lack of faith in the Druid community or the Druid > committers but by the fact that these connectors may be an awkward fit

Re: Contribute a new Community extensions : Launch Peon Pods Based on K8s

2021-03-02 Thread Gian Merlino
Hey Yue, Very interesting idea. I am not a kubernetes expert, but this seems like a neat concept. I guess the idea is only one MM would be needed? (Or maybe a handful, if one can't manage every pod?) If so, great. Hopefully someone that is more of a kubernetes expert will be able to chime in on

Re: L1 (caffeine) cache hits/misses metrics not emitted

2021-02-24 Thread Gian Merlino
Hey Vadim, According to https://druid.apache.org/docs/latest/operations/metrics.html#cache, today, the number of hits and misses for the hybrid cache are both emitted, but there isn't differentiation between L1 hits and L2 hits. Is that what you mean? If so, I think the main issue is there just

Re: Spark-Druid Connectors

2021-02-23 Thread Gian Merlino
Hey Julian, Your pessimism in this matter is understandable but regrettable! It would be great to see this effort become part of mainline Druid. It is a more maintainable approach than a separate repo, because it gets rid of the risk of interface drift, and it makes sure that all the tests are

Re: Adding support to Kafka events keys

2021-04-21 Thread Gian Merlino
Hey Noam, I think this would certainly be useful, and thank you for your interest in contributing! I think the toughest part will be designing a good API (meaning: what would users specify in the kafka supervisor json spec in order to activate and configure this feature?). So a good way to

Re: Get Druid Service details in runtime (via extension)

2021-08-22 Thread Gian Merlino
Does the "getNodeRole()" method on DiscoveryDruidNode do what you want? On Fri, Aug 20, 2021 at 3:07 PM Jeet Patel wrote: > Hi all, > > Is there a way to to know what druid services are running in a DruidNode > (Not > talking about the HTTP APIs)? > I went through druid-server module, class >

Re: Apache Druid Project Structure

2021-08-18 Thread Gian Merlino
rs who are looking to contribute to the > project and make them feel more confident knowing the project layout. > > Thank you, > Jeet > > On 2021/08/17 17:12:33, Gian Merlino wrote: > > Hey Jeet, > > > > I think it is a case of "it seemed like a good idea at

Re: Apache Druid Project Structure

2021-08-17 Thread Gian Merlino
Hey Jeet, I think it is a case of "it seemed like a good idea at the time". Some things about the current layout do work well: one is that there is actually a lot of common query engine code between anything that handles queries. That's historical, broker, peon, and indexer. That common query

Re: Get Druid Service details in runtime (via extension)

2021-08-23 Thread Gian Merlino
ating org.apache.druid.discovery.DiscoveryDruidNode > for the 3rd parameter of > com.custom.MyEmitterModule.getEmitter(MyEmitterModule.java:39) > > According to the error, it looks like I cannot add DiscoveryDruidNode > because it does not have @Inject or a zero-argument constructor. But I'm > ab

Re: [Proposal] - Kafka Input Format for headers, key and payload parsing

2021-09-16 Thread Gian Merlino
to get all your replies  On Tue, Sep 14, 2021 at 10:10 PM Gian Merlino wrote: > Hey Lokesh, > > The concept and API looks solid to me! Thank you for writing this up. I > agree with Ben's comment. This will be really useful functionality. > > I have a few questions about how it

Re: compression strategy concurrency

2021-09-14 Thread Gian Merlino
Hey Rahul, What kind of errors are you seeing? I ran the test a few times with a bumped up number of threads, and I did see a few problems but they were in the Closer. It looks like a single Closer is used for every thread, which is bad because Closers are not thread-safe (they are built around

Re: [Proposal] - Kafka Input Format for headers, key and payload parsing

2021-09-14 Thread Gian Merlino
Hey Lokesh, The concept and API looks solid to me! Thank you for writing this up. I agree with Ben's comment. This will be really useful functionality. I have a few questions about how it would work: 1) How is the timestamp exposed exactly? I see there is a recordTimestampLabelPrefix, but what

Re: Interested in contributing an article to your site

2021-07-30 Thread Gian Merlino
Hi Angela, There are a couple of places on the Druid website where we include content from the community. 1) If Sisu Data uses Druid internally, or produces Druid-based products, it would be appropriate to describe Sisu's usage of Druid on our Powered By page:

Re: Question about merging groupby v2 spill files

2021-08-10 Thread Gian Merlino
Hey Will, The sorting that happens on the data servers is really useful, because it means the Broker can do its part of the query fully streaming instead of buffering things up. At one point we had a similar problem in ingestion (you could have a ton of spill files if you had a lot of sketches)

Re: [Proposal] - Kafka Input Format for headers, key and payload parsing

2021-09-21 Thread Gian Merlino
ill know > how to use this feature. And it'll help us better understand how it's > supposed to work. (Perhaps it could have answered the two questions above) > > >>> Absolutely agree with you, I will do that along with other review > comments from the code. > > Thanks aga

Druid Summit 2021

2021-09-28 Thread Gian Merlino
Hey Druids, I am excited to write to you about Druid Summit (https://druidsummit.org/), an event being held virtually on November 9–10, 2021. The entire Apache Druid community is welcome, and registration is free. It would also be great to see a bunch of people from the community giving talks

Re: [E] [DISCUSS] Patch to fix new vulnerabilities in log4j

2021-12-20 Thread Gian Merlino
I think doing a 0.22.2 would be worth it for users' peace of mind, even if Druid isn't vulnerable by default. Just because people are on edge about log4j-related stuff right now. In case other people agree, I created an 0.22.2 branch just now. Is anyone able to release-manage this one? Btw, John

Druid-specific Calcite keywords

2021-11-04 Thread Gian Merlino
Hey Druids, I'm looking into how to add keywords to Druid's SQL dialect, and I wanted to ask if anyone has enough familiarity with Calcite to point at some info about how to do that without needing to modify Calcite itself?

Re: Druid-specific Calcite keywords

2021-11-05 Thread Gian Merlino
t; a location that expects an identifier (e.g. after FROM), BERNOULLI > will be converted into an identifier. Thus you can use BERNOULLI as a > table name. > > Julian > > On Thu, Nov 4, 2021 at 2:18 PM Gian Merlino wrote: > > > > Hey Druids, > > > > I'm looking into ho

Re: Push-down of operations for SystemSchema tables

2021-11-29 Thread Gian Merlino
on the same pathway with ordered scan > query, so I could rebase on top of that and break into a smaller set of > PRs, nonetheless the conceptual approach and direction is something that I > think will work. > > Thanks! > Jason > > > > > > > On Wed, May 19, 2021

Re: Log4j vulnerability - hotfix?

2021-12-10 Thread Gian Merlino
lenging than for > projects on the slightly newer versions of log4j2, perhaps it would be > appropriate to put out one or two more patch releases, against 0.21 > and/or 0.20? I know our installation is still on 0.21, which is less > than 2 months old. > > On Fri, Dec 10, 2021 a

Re: Log4j vulnerability - hotfix?

2021-12-10 Thread Gian Merlino
We're working on this right now and will be getting a vote / release for 0.22.1 out asap. Btw, the log4j announcement mentions a mitigation that does work for our current version (2.8.2). It's part (b) here, specifying "%m{nolookups}" in the PatternLayout configuration:

Re: [VOTE] Release Apache Druid 0.22.1 [RC1]

2021-12-10 Thread Gian Merlino
My vote is 0 on this release. I verified the usual things, and compared the src and bin packages against 0.22.0 to make sure there were no unexpected changes. That all looks OK to me. But there is an issue with weird errors at the end of logfiles for processes that exit normally. It's especially

Re: [RESULT][VOTE] Release Apache Druid 0.22.1 [RC2]

2021-12-11 Thread Gian Merlino
Thank you for running this release! On Sat, Dec 11, 2021 at 12:28 AM Jihoon Son wrote: > Thanks to everyone who participated in the vote! The vote has passed > with 3 binding +1s. > > Gian Merlino: +1 (binding) > Clint Wylie: +1 (binding) > Jonatha

Re: [VOTE] Release Apache Druid 0.22.1 [RC2]

2021-12-10 Thread Gian Merlino
+1 on releasing 0.22.1-rc2 I verified: - hashes / gpg - unit tests - compared the src and bin packages against 0.22.0 to make sure there were no unexpected changes - attempted to trigger the jndi lookup functionality; it triggered on 0.22.0 but not 0.22.1-rc2 - verified that task logs look

Re: Need Help Benchmarking Druid

2021-12-11 Thread Gian Merlino
Hey Abdel, Feel free to DM me on ASF Slack. The info to join is here: https://druid.apache.org/community/ On Fri, Dec 3, 2021 at 9:11 AM Abdelouahab Khelifati wrote: > Hello, > > I am Abdel, a researcher of Computer Science and I am working on a > benchmarking paper on time series database

Re: Need help in understanding real-time ingestion task pause behavior during checkpointing

2021-12-02 Thread Gian Merlino
Harini, those are interesting findings. I'm not sure if the two pauses are necessary, but my thought is that it ideally shouldn't matter because the supervisor shouldn't be taking that long to handle its notices. A couple things come to mind about that: 1) Did you see what specifically the

Re: Apache Druid security advisory: critical vulnerability CVE-2021-44228 in Apache Log4j

2021-12-13 Thread Gian Merlino
To clarify about the mitigations: the "-Dlog4j2.formatMsgNoLookups=true" mitigation that has been floating around the Internet is *not effective* for log4j 2.8.2, which was used by Druid 0.22.0 and other recent versions. If you are going to stay on an older version of Druid, do not use this

Re: druid can't parse string

2021-07-16 Thread Gian Merlino
Druid stores strings as UTF-8 and from a storage and query basis, it should work fine with any language. The "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial has strings in a variety of languages (check the "page" field):

Re: druid can't parse string

2021-07-16 Thread Gian Merlino
Including the original poster in case they are not on the dev list themselves (hello!). On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino wrote: > Druid stores strings as UTF-8 and from a storage and query basis, it > should work fine with any language. The > "wikiticker-2015-09-12-s

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Gian Merlino
Hey Michael, Very cool! To answer your question: it is critical to have a BufferAggregator. Some context; there are 3 kinds of aggregators: - Aggregator: stores intermediate state on heap; is used during ingestion and by the non-vectorized timeseries query engine. Required, or else some queries

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Gian Merlino
of using too much heap memory. The only advantage (2) has is that you don't need a Direct version of the ItemsSketch for it to work. On Fri, Jul 23, 2021 at 1:35 PM Gian Merlino wrote: > Hey Michael, > > Very cool! > > To answer your question: it is critical to have a BufferAggregator.

<    1   2   3   4   5   >