Re: First Apache release of Druid

2018-09-19 Thread Gian Merlino
; https://lists.apache.org/thread.html/e6a378201f7e7ab6da2493fe6ee4ae276768c461ea5c676a953d8139@%3Cdev.druid.apache.org%3E > >> < > >> > https://lists.apache.org/thread.html/e6a378201f7e7ab6da2493fe6ee4ae276768c461ea5c676a953d8139@%3Cdev.druid.apache.org%3E > < > https://lists.a

Re: Towards 0.13 (Apache release)

2018-09-19 Thread Gian Merlino
they are regressions from 0.12, in which case we must fix them for 0.13.0). See also the thread "First Apache release of Druid" for motivation on why we want to get this done soon. On Tue, Sep 4, 2018 at 11:43 AM Gian Merlino wrote: > Hi Qiu, > > It's in master, so that means it will be

Re: Unique Sketch aggregations and bias correction

2018-09-24 Thread Gian Merlino
I have not. The original HLL paper does have some points in it about bias corrections for small cardinalities, and I am not sure if those are implemented in Druid's HLL implementation. On Mon, Sep 24, 2018 at 8:49 AM Charles Allen wrote: > https://github.com/apache/incubator-druid/pull/5712

Re: subscribe to druid

2018-09-28 Thread Gian Merlino
Hey Dayue, You can subscribe to the list by emailing "dev-subscr...@druid.apache.org". On Thu, Sep 27, 2018 at 7:14 PM Dayue Gao wrote: > >

Re: Off list major development

2019-01-02 Thread Gian Merlino
In this particular case: please consider the PR as a proposal. Don't feel like just because there is code there that takes a certain approach, that the approach is somehow sacred. I had to implement something to crystallize my own thinking about how the problem could be approached. I won't be

Re: Writing a Druid extension

2019-01-02 Thread Gian Merlino
Some other comments, For 3) it is not safe to assume that QueryLifecycleFactory::runSimple always returns org.apache.druid.data.input.Row. It does for groupBy queries but not for other query types. The SQL layer has a bunch of code to adapt the various query type's return types into the uniform

Re: Off list major development

2019-01-02 Thread Gian Merlino
> the case. And so forth. At any point in the proceedings, people can chime > in with their opinions. > > In my opinion, a formal “design review” process is not necessary. Just > build consensus iteratively, by starting the conversation early in the > process. > > Julian

Re: Batch Ingestion

2019-01-02 Thread Gian Merlino
Hey Satya, The easiest way to ingest data is to ask Druid to pull it from Kafka ( http://druid.io/docs/latest/tutorials/tutorial-kafka.html) or Hadoop ( http://druid.io/docs/latest/tutorials/tutorial-batch-hadoop.html). Tranquility Server (

Re: Experimental feature 'graduation' in 0.14

2019-01-15 Thread Gian Merlino
ost-apache and if there's > consensus on deprecating it it'd be good opportunity to collate what work > needs done to get there. > > > > On Tue, 15 Jan 2019 at 11:31, 邱明明 wrote: > > > +1. > > > > Gian Merlino 于2019年1月15日周二 上午6:18写道: > > > > &g

Re: Sync up this week

2019-01-16 Thread Gian Merlino
Thanks, Jihoon! On Wed, Jan 16, 2019 at 3:37 PM Jihoon Son wrote: > Sorry, a bit late note for the last dev sync. > > Attendees: Charles Allen, Jon Wei, Jihoon Son, Atul Mohan, Dylan Wylie, > Clint Wylie > > - Charles pre-reported a bug in HLLCollector ( >

Re: Include Empty Buckets at Granularity Defined Queries

2019-01-17 Thread Gian Merlino
Hey Furkan, For timeseries there's a "skipEmptyBuckets" parameter that you can make true or false. For other query types, empty buckets are always skipped. On Wed, Jan 16, 2019 at 10:58 PM Furkan KAMACI wrote: > Hi, > > As I know that the granularity field determines how data gets bucketed >

Re: Experimental feature 'graduation' in 0.14

2019-01-15 Thread Gian Merlino
d still have this issue by default. > > Before marking SQL as non-experimental, I'd suggest either fixing the root > cause, or making HTTP segment discovery the default and then explicitly > deprecating ZK segment discovery. > > > On Mon, Jan 14, 2019 at 2:18 PM Gian Merlino wrote: &g

Re: Bug report!

2018-12-20 Thread Gian Merlino
Hey Mike, I would look to Hive to fix this - it should be able to handle either a 0 or 0.0 in the response equally well. I suppose I wouldn't consider it to be a bug in Druid. On Mon, Dec 17, 2018 at 10:15 AM mike wrote: > Hello, Could anybody give me a hand? > > Recently, I upgraded my druid

Re: [ANNOUNCE] Apache Druid (incubating) 0.13.0 released

2018-12-18 Thread Gian Merlino
A big milestone!! Thanks Dave for capably wearing the release manager hat, and thanks to everyone else that contributed!! On Tue, Dec 18, 2018 at 12:37 PM David Lim wrote: > We're happy to announce the release of Apache Druid (incubating) 0.13.0! > > Druid 0.13.0-incubating contains over 400

Re: Drop 0. from the version

2018-12-21 Thread Gian Merlino
icial apache release would make a lot of sense. > > Cheers, > Charles Allen > > > On Thu, Dec 20, 2018 at 11:07 PM Gian Merlino wrote: > > > I think it's a good point. Culturally we have been willing to break > > extension APIs for relatively small benefits. But we ha

Re: Podling Report Reminder - December 2018

2018-11-29 Thread Gian Merlino
Is anyone willing to volunteer to pick up December's report? On Thu, Nov 29, 2018 at 1:12 PM wrote: > Dear podling, > > This email was sent by an automated system on behalf of the Apache > Incubator PMC. It is an initial reminder to give you plenty of time to > prepare your quarterly board

Re: [VOTE] Release Apache Druid (incubating) 0.13.0 [RC4]

2018-12-04 Thread Gian Merlino
+1 - Verified signatures and checksums of both src and bin packages. - Source tarball matches tag. - Source tarball builds and tests pass. - Ran through quickstart on binary tarball. On Mon, Dec 3, 2018 at 5:57 PM benedictjin2...@gmail.com < benedictjin2...@gmail.com> wrote: > > > On 2018/12/02

Re: Resume making timely releases

2018-12-04 Thread Gian Merlino
Agree. I think the only reason we stopped was the ASF migration. On Tue, Dec 4, 2018 at 5:15 AM Roman Leventov wrote: > I suggest resuming making quarterly releases. 0.13 branch was created > between October 12 and 19, as far as I can see. Then 0.14 branch should be > created between 12 and 19

Re: Sync up this week

2018-11-20 Thread Gian Merlino
IMO, minutes would be good. We did recordings in the past and I thought they were a bit of a struggle (it was sometimes tough to get them working right, since different people hosted from week to week, we didn't always use a consistent hosting platform, and we would end up having conversations

Re: Weekly dev sync minutes (2018-11-27)

2018-11-27 Thread Gian Merlino
Thanks, Dave! On Tue, Nov 27, 2018 at 10:51 AM David Lim wrote: > Attendees: David Lim, Jihoon Son, Atul, Clint Wylie, Eyal Yurman, Roman > Leventov > > Following the discussion after last week's sync, we will be taking meeting > minutes of the weekly dev sync going forward so that everyone in

Re: [VOTE] Release Apache Druid (incubating) 0.13.0 [RC3]

2018-11-27 Thread Gian Merlino
+1 Source release: - GPG signature and SHA512 are ok - Tarball name is ok - git.version file looks ok (references tag druid-0.13.0-incubating-rc3) - LICENSE, NOTICE, and DISCLAIMER are present - Tarball contents match git tag druid-0.13.0-incubating-rc3 (no unexpected extra files, no critical

Re: [VOTE] Release Apache Druid (incubating) 0.13.0 [RC3]

2018-11-20 Thread Gian Merlino
When voting please mention what you did to verify the release (see http://www.apache.org/legal/release-policy.html#release-approval, search on page for "Before casting +1 binding votes, individuals are required to"). On Tue, Nov 20, 2018 at 1:03 AM Fangjin Yang wrote: > +1 > > On Fri, Nov 16,

Re: Pointers on implementing a new ShardSpec

2019-01-08 Thread Gian Merlino
Hey Julian, There aren't any gotchas that I can think of other than the fact that they are not super well documented, and you might miss some features if you're just skimming the code. A couple points that might matter, 1) PartitionChunk is what allows a shard spec to contribute to the

Re: Off list major development

2019-01-08 Thread Gian Merlino
dea what > work will turn out to be “major”, so they get a little more leeway.) > > Julian > > > > On Jan 7, 2019, at 12:10 PM, Gian Merlino wrote: > > > > I don't think there's a need to raise issues for every change: a small > bug > > fix or doc fix should ju

Re: PR Milestone policy

2019-01-07 Thread Gian Merlino
My feeling is that setting a milestone on PRs before they're merged is a way of making their authors feel more included. I don't necessarily see a problem with setting milestones optimistically and then, when a release branch is about to be cut (based on the timed release date), we bulk-update

Re: Off list major development

2019-01-07 Thread Gian Merlino
sh a primitive form of a CODE FREE Abstract > > > Proposal > > > > containing at least the following bullet points. > > > > - The problem description and motivation > > > > - Overview of the proposed change > > > > - Operational impact (compatibility/ plans to u

Re: Off list major development

2019-01-07 Thread Gian Merlino
Julian > > > > On Jan 7, 2019, at 11:24 AM, Gian Merlino wrote: > > > > It sounds like splitting design from code review is a common theme in a > few > > of the posts here. How does everyone feel about making a point of > > encouraging design reviews to be done

Re: Watermarks!

2019-01-07 Thread Gian Merlino
For Kafka, maybe something that tells you if all committed data is actually loaded, & what offset has been committed up to? Would there by any problems caused by the fact that only the most recent commit is saved in the DB? Is this feature connected at all to an ask I have heard from a few

Druid 0.14 timing

2019-01-04 Thread Gian Merlino
It feels like 0.13.0 was just recently released, but it was branched off back in October, and it has almost been 3 months since then. How do we feel about doing an 0.14 branch cut at the end of January (Thu Jan 31) - going back to the every 3 months cycle? For this release, based on the feedback

Re: First Apache release of Druid

2018-09-17 Thread Gian Merlino
Hi Julian, I am surprised to read that you feel the project hasn't come up with a plan for an Apache release yet. I feel like we do have a plan. I wonder if your message means that our plan is no good, or just that it isn't clear. >From my perspective, as a community, we have decided that our

Re: Druid 0.12.3 release vote

2018-09-17 Thread Gian Merlino
+1, thanks Jon! On Tue, Sep 11, 2018 at 11:11 AM Jonathan Wei wrote: > Hi all, > > I'm going ahead and opening the vote for the 0.12.3 release. > > Please chime in with your vote once you've had a chance to test the release > candidate. > > Thanks, > Jon >

Re: The etiquette of pocking people on Github and the policy when people stop responding

2019-01-24 Thread Gian Merlino
The timelines you outlined seem quite slow. Especially "if there are enough approvals, a PR could be merged not sooner than in two weeks since they left the last review comment". IMO, rather than delaying patches by so long, a better way to be courteous of a reviewer being too busy to review in a

Re: script, GPL, container question

2019-01-25 Thread Gian Merlino
PMC (email gene...@incubator.apache.org) or Apache Legal (raise a LEGAL ticket in JIRA) for advice. I would probably go for the Incubator PMC first, since the audience is a bit larger and this may have come up before. On Fri, Jan 25, 2019 at 1:24 PM Don Bowman wrote: > On Fri, 25 Jan 2019 at 16:07

Re: The etiquette of pocking people on Github and the policy when people stop responding

2019-01-28 Thread Gian Merlino
e. On Mon, Jan 28, 2019 at 8:43 AM Roman Leventov wrote: > On Fri, 25 Jan 2019 at 23:12, Gian Merlino wrote: > > > If enough other committers have already reviewed and accepted a patch, I > > don't think it's fair to the author or to those other reviewers for > > com

Re: [VOTE] Release Apache Druid (incubating) 0.14.0 [RC3]

2019-04-02 Thread Gian Merlino
+1 - NOTICE, LICENSE, DISCLAIMER are present in both src and bin packages - Verified signatures and checksums of both src and bin packages. - Source tarball matches tag. - Source tarball builds and tests pass. - git.version file is present and correct. - Ran through quickstart on binary tarball.

Re: Building Druid

2019-03-25 Thread Gian Merlino
You could also do "export AWS_REGION=us-east-1" before running the tests. This is what we do when running them in Travis (see .travis.yml). On Sun, Mar 24, 2019 at 1:51 PM Surekha Saharan wrote: > Hi Rajiv, > > You can skip the tests with -DskipTests option. See here for Druid build >

Re: JDK 11 support

2019-04-03 Thread Gian Merlino
I think we should keep supporting JDK 8, probably for at least another year and maybe more. Despite the fact that Oracle has decided to abandon OpenJDK 8 there are still other vendors (Azul, Amazon) that have expressed a commitment to maintaining JDK 8 releases and backporting security, etc

Re: historical druid.segmentCache error

2019-03-25 Thread Gian Merlino
Nothing comes to mind - is it possible there is some slight difference between the working and nonworking files, maybe in the whitespace? On Wed, Mar 20, 2019 at 10:29 AM Don Bowman wrote: > One of my historical nodes crashes on startup, yielding this error (below). > Can someone suggest how to

Re: Use add support for using dropwizard metrics

2019-02-18 Thread Gian Merlino
The only caveat I could think of is whether/how it would integrate with the existing metrics emitter system (Emitter, ServiceEmitter, LoggingEmitter, & friends). I am not too familiar with Dropwizard so I don't have much to say about how the integration could work. On Thu, Feb 14, 2019 at 1:15 PM

Re: Knowledge sharing between Druid developers via technical talks

2019-02-18 Thread Gian Merlino
I am interested especially if the format is something live. An in-person meetup with a recording distributed afterwards would be my preference, if people are into that. Maybe something at one of the Druid meetups? On Wed, Feb 13, 2019 at 8:38 PM Eyal Yurman wrote: > Hi, > > This is something

Re: docker build

2019-02-18 Thread Gian Merlino
at 1:59 PM Don Bowman wrote: > i can just remove the mysql, the postgres works, i was just assuming folks > wanted it. > > > On Mon, 18 Feb 2019 at 16:58, Gian Merlino wrote: > > > A discussion is progressing on > > https://issues.apache.org/jira/browse/LEGAL-437. I

Re: docker build

2019-02-18 Thread Gian Merlino
A discussion is progressing on https://issues.apache.org/jira/browse/LEGAL-437. It doesn't seem to have got anywhere firm yet. On Fri, Feb 8, 2019 at 12:23 PM Gian Merlino wrote: > I don't think anything is strictly needed from you at this point, but > things happen when people driv

Re: TeamCity having problems

2019-02-25 Thread Gian Merlino
gt; > > > > [1] > > > > > > https://hub.jetbrains.com/auth/login?response_type=code_id=7b63a7f6-a4ca-4ad2-99d6-ecede5a769a5_uri=https:%2F%2Fteamcity.jetbrains.com%2FhubPlugin%2Flogin.html=0-0-0-0-0%207b63a7f6-a4ca-4ad2-99d6-ecede5a769a5=%2FviewLog.html%3FbuildId%3D1

Re: Datasketches

2019-02-25 Thread Gian Merlino
What scope would you suggest for the label or github project? There seem to be discussions going on around making DataSketches HLL and/or Quantiles more 'default' options for their respective areas -- are you thinking that kind of thing? On Mon, Feb 25, 2019 at 9:57 AM Charles Allen wrote: >

Re: Druid metadata query

2019-02-26 Thread Gian Merlino
Hmm. I think you're talking about the SegmentMetadata queries that DruidSchema runs. The intent is that they include an empty analysisTypes list, so they only use cached metadata and don't actually read segments, and are pretty resource-light on historicals. But if you implemented some sort of

Re: TeamCity having problems

2019-02-27 Thread Gian Merlino
> > > https://teamcity.jetbrains.com/admin/editRequirements.html?id=buildType:OpenSourceProjects_Druid_Inspections > > and > > added here: > > > > > https://teamcity.jetbrains.com/admin/editRequirements.html?id=buildType:OpenSourceProjects_Druid_InspectionsP

Re: Auto-closing old PRs

2019-02-28 Thread Gian Merlino
ils the following search string > > should take return all those mails in GMail for bulk operations > > > > "from:(stale[bot]) apache/incubator-druid" > > > > On Mon, 11 Feb 2019 at 22:15, Gian Merlino wrote: > > > > > IMO it makes se

Re: Uninterruptibles and Futures.getUnchecked()

2019-02-28 Thread Gian Merlino
Have you got sections in mind in Druid code that would be improved by using these? On Tue, Feb 26, 2019 at 3:04 PM Roman Leventov wrote: > I've recently discovered two utilities in Guava that are very useful in > combating InterruptedExceptions that contaminate business logic of code: > > -

Re: Downloading binaries of previous Druid versions

2019-03-01 Thread Gian Merlino
They are available at URLs like http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz (not listed anywhere, but you should be able to guess them). They aren't available from the Apache mirrors, since they are not Apache releases. On Thu, Feb 28, 2019 at 9:33 PM Eyal Yurman wrote: >

Re: Namespacing segments, or preventing unknown segments from wreaking havoc

2019-03-01 Thread Gian Merlino
To me this seems like a lot of effort to go through just to detect cases where servers from two different clusters are misconfigured to read each others' files or talk to each other by accident. I wonder if there's an easier way to do it. Maybe keep the cluster name idea, but write it to a marker

Re: Make a regular issue template to de-emphasize "Proposals"

2019-02-20 Thread Gian Merlino
Druid behavior or operation: I feel like this type of issue > is better handled on the mailing lists. > > On Mon, Feb 18, 2019 at 11:09 AM Gian Merlino wrote: > > > Sounds good to me. IMO it also would make sense to remove the license > > header from the templates, and ad

Re: Upcoming podling report

2019-03-04 Thread Gian Merlino
I think it's fair to say that the community is strong already, and we are "Nearing graduation". On Mon, Mar 4, 2019 at 2:58 PM Jonathan Wei wrote: > I'm planning on submitting the following report, please take a look and let > me know if you have any comments. > > === > >

TeamCity having problems

2019-02-22 Thread Gian Merlino
It looks like TeamCity is having problems since a few days ago - builds of master and PRs are flaky. Here's the link to recent builds of master:

Re: The compatibility rules for alert data

2019-02-22 Thread Gian Merlino
My feeling is that the alerts are more like log messages or exceptions (where, in general, there isn't a particular contract around what exceptions get thrown when, except in specific cases like query errors: http://druid.io/docs/latest/querying/querying#query-errors). So reasonable changes could

Re: Knowledge sharing between Druid developers via technical talks

2019-02-22 Thread Gian Merlino
t;track" which is > open to all but is dev-focused? This could be before/after the main event. > > I promise that once I get enough experience with the code base, I'd > volunteer to present, but hopefully, there are much better candidates at > the moment :) > > On Mon, Feb 1

Re: Incubator report

2019-03-05 Thread Gian Merlino
Jon Wei posted a draft in another thread: https://lists.apache.org/thread.html/4e84f8802ee1f66fd0d2d6c04a5a9df26c803dfc02eee56877e15ed4@%3Cdev.druid.apache.org%3E On Tue, Mar 5, 2019 at 2:24 AM Julian Hyde wrote: > Hi Druid PPMC, > > The March incubator report is due tomorrow. Is someone

Re: About [tags] in issue and PR headers

2019-03-16 Thread Gian Merlino
eel we should keep, just to > disambiguate between the backport and the original PR. > >> On Fri, Mar 15, 2019 at 11:26 AM Gian Merlino wrote: >> >> I do think that other than a couple key ones like proposal and backport, we >> should encourage people to not put tags (o

Re: Proposed website migration plan

2019-03-12 Thread Gian Merlino
OpenOffice and Groovy both chose to sort of "meld" their classic and Apache sites together: https://www.openoffice.org/, http://groovy-lang.org/. Note how when you click around, you get shuttled between the classic domain and the Apache domain. Some pages are available on both sites, like

Re: TeamCity CI - new IntelliJ version

2019-03-08 Thread Gian Merlino
On Fri, Mar 8, 2019 at 2:58 PM Gian Merlino wrote: > > > Please help fix anything that breaks. Hopefully this also _improves_ > things > > -- I recall an inspection bug we hit that was fixed in some version later > > than 2017.2.4. > > > > On Fri, Mar 8, 2019 a

Re: TeamCity CI - new IntelliJ version

2019-03-08 Thread Gian Merlino
Please help fix anything that breaks. Hopefully this also _improves_ things -- I recall an inspection bug we hit that was fixed in some version later than 2017.2.4. On Fri, Mar 8, 2019 at 12:21 PM Roman Leventov wrote: > I've updated inspections build step in TeamCity CI to use IntelliJ

Re: TeamCity CI - new IntelliJ version

2019-03-08 Thread Gian Merlino
Or at least do some spot checks to verify that the TC errors are not related to the patch in question. On Fri, Mar 8, 2019 at 5:50 PM Gian Merlino wrote: > That sounds fine to me (ignoring TC for now on other PRs while any new > issues since the upgrade is fixed separately). If no-one i

Re: T-Digest backed sketch aggregator

2019-03-19 Thread Gian Merlino
(The template is on https://github.com/apache/incubator-druid/issues/new/choose) It sounds cool to me too! On Tue, Mar 19, 2019 at 5:19 PM Jihoon Son wrote: > Sounds great! > Would you mind writing a proposal about this? > > Jihoon > > On Tue, Mar 19, 2019 at 3:54 PM Samarth Jain wrote: > > >

Re: [VOTE] Release Apache Druid (incubating) 0.14.0 [RC1]

2019-03-18 Thread Gian Merlino
I think we can do the Docker image separately. The way other projects seem to do it is that it is not an official release artifact that gets voted on, but is instead something that is created after the fact by an automated job. Let's continue the discussion in the 'docker build' thread. Gian On

Re: docker build

2019-03-18 Thread Gian Merlino
; > > the release artifacts and on dockerhub to foster adoption. > > > > if the only issue is the mysql connector i can remove it in favour of > > the > > > > postgres connector. > > > > > > > > > > > > On Mon, 18 Feb 2019

Re: About [tags] in issue and PR headers

2019-03-15 Thread Gian Merlino
I do think that other than a couple key ones like proposal and backport, we should encourage people to not put tags (or issue #s) in PR titles. On Thu, Mar 14, 2019 at 7:56 PM Gian Merlino wrote: > Personally I use email as my main interface to see what's new in github, > and it doesn'

Re: About [tags] in issue and PR headers

2019-03-14 Thread Gian Merlino
Personally I use email as my main interface to see what's new in github, and it doesn't show labels, and I think being able to pick out proposals and backports easily is useful. So I like the tags. But, not so much that I would fight to keep them if consensus is going the other direction. On Thu,

Re: docker build

2019-02-08 Thread Gian Merlino
> > I'm not clear if anything further is needed of me, i'm hoping to get an > automated build going into dockerhub, and tagged w/ each release. i think > this will help adoption. > > > > On Fri, 8 Feb 2019 at 14:22, Gian Merlino wrote: > > > First off thanks a lot

Re: Off list major development

2019-02-12 Thread Gian Merlino
Does anyone have thoughts on the above suggestions? On Fri, Feb 1, 2019 at 2:16 PM Gian Merlino wrote: > I think we should clarify the process too. Might I suggest, > > 1) Add a GitHub issue template with proposal headers and some description > of what each section should be, so peo

Re: Dev sync

2019-02-13 Thread Gian Merlino
I personally join about 1 in 10 of them so, from that perspective, I feel that I am getting what I need in terms of communication out of the lists and github and don't need extra utility from the dev syncs. Even if we stop doing them, meeting face to face is still nice, and I always like to see

Re: Spark batch with Druid

2019-02-13 Thread Gian Merlino
ection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSparklineData%2Fspark-druid-olapdata=02%7C01%7Crmordani%40vmware.com%7C4b7f159a82db4dc4fdc008d690647969%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636855158547887488sdata=9Uq3ox5hhes60fxfqMOxmjfQPZdwFrfSs7glVLTafs0%3Dreserved=0 > , > > but I don't think >

Re: Auto-closing old PRs

2019-02-11 Thread Gian Merlino
IMO it makes sense to keep PRs open if they have a milestone or have a Security or Bug label. 60 days with no activity as a threshold sounds good to me - it's a pretty long time. On Mon, Feb 11, 2019 at 11:22 AM Jihoon Son wrote: > Hi Dylan, thank you for starting a discussion. > > I think this

Re: Druid Auto Field Type Detection

2019-02-11 Thread Gian Merlino
eger. > > I think that Solr could be an example for us such a schemaless mode. > What do you think? > > Kind Regards, > Furkan KAMACI > > On Fri, Jan 25, 2019 at 8:56 PM Gian Merlino wrote: > > > Hey Furkan, > > > > Right now when Druid detects dimensions

Re: Segment files for ITTwitterQueryTest and ITWikipediaQueryTest

2019-02-11 Thread Gian Merlino
The keys should be in the repo. I think the ones in "integration-tests/docker/historical.conf" will work. On Mon, Feb 11, 2019 at 2:19 PM Jihoon Son wrote: > Good question. I has been always curious about this too. > Does anyone know about it? > > Jihoon > > On Mon, Feb 11, 2019 at 2:15 PM Atul

Re: Segment files for ITTwitterQueryTest and ITWikipediaQueryTest

2019-02-11 Thread Gian Merlino
; Atul > > On Mon, Feb 11, 2019 at 4:25 PM Gian Merlino wrote: > > > The keys should be in the repo. I think the ones in > > "integration-tests/docker/historical.conf" will work. > > > > On Mon, Feb 11, 2019 at 2:19 PM Jihoon Son wrote: > > > > &

Re: Spark batch with Druid

2019-02-06 Thread Gian Merlino
Hey Rajiv, There's an unofficial Druid/Spark adapter at: https://github.com/metamx/druid-spark-batch. If you want to stick to official things, then the best approach would be to use Spark to write data to HDFS or S3 and then ingest it into Druid using Druid's Hadoop-based or native batch

Re: Spark batch with Druid

2019-02-06 Thread Gian Merlino
approach is probably to use Spark's json support to parse a Druid response. > You may also be able to repurpose some code from > https://github.com/SparklineData/spark-druid-olap, but I don't think > there's any official guidance on this. > > On Wed, Feb 6, 2019 at 2:21 PM Gian Merlino

Re: Topic regex for kafka-indexing-service

2019-02-19 Thread Gian Merlino
Hey Prabhat, We wrote up a blog post a couple years back discussing the design: https://imply.io/post/exactly-once-streaming-ingestion. A few of the key PRs are: - https://github.com/apache/incubator-druid/pull/2220 (original PR adding the KafkaIndexTask) -

Re: Slow download of segments from deep storage

2019-02-19 Thread Gian Merlino
ach which prevents us from parallelizing the > segment load/drop workload. Also have raised a PR > https://github.com/apache/incubator-druid/pull/7088 to help address it. > > On Wed, Jan 30, 2019 at 4:40 PM Gian Merlino wrote: > > > I believe today, if you use the (experi

Re: Proposal to shade Guava manually in Druid

2019-01-29 Thread Gian Merlino
Interesting proposal - I commented on the issue. It sounds like a good idea. On Tue, Jan 29, 2019 at 7:22 AM Roman Leventov wrote: > https://github.com/apache/incubator-druid/issues/6942 >

Re: The etiquette of pocking people on Github and the policy when people stop responding

2019-01-29 Thread Gian Merlino
d > trust the committers approving the PR and move forward. > > On Mon, Jan 28, 2019 at 9:28 AM Gian Merlino wrote: > > > I don't think it's irresponsible to start a review and not be able to > > finish it promptly. But drawing the process out can feel frustrating to >

Re: Contributing an extension

2019-01-29 Thread Gian Merlino
Hi Eyal, I'll take a look too. For some reason I missed this when you first posted it, but it is very interesting work, and looks like it could be part of a path to supporting generic windowed aggregations in Druid SQL. (Moving average, cumulative sum, and so on) On Tue, Jan 29, 2019 at 7:07 PM

Re: Indexing Arbitrary Key/Value Data

2019-01-25 Thread Gian Merlino
Hey Furkan, There isn't currently an out of the box parser in Druid that can do what you are describing. But it is an interesting feature to think about. Today you could implement this using a custom parser (instead of using the builtin json/avro/etc parsers, write an extension that implements an

Re: The etiquette of pocking people on Github and the policy when people stop responding

2019-01-25 Thread Gian Merlino
n't be rushed into the codebase. > > On Fri, 25 Jan 2019 at 04:05, Gian Merlino wrote: > > > The timelines you outlined seem quite slow. Especially "if there are > enough > > approvals, a PR could be merged not sooner than in two weeks since they > > left the

Re: HAS ISSUE

2019-01-25 Thread Gian Merlino
Hey Mingwen, This looks like it's related to the Hive/Druid integration, so it might be a better question for the Hive mailing list. (The code for that integration lives in the Hive project.) On Tue, Jan 22, 2019 at 11:29 PM mingwen@analyticservice.net < mingwen@analyticservice.net>

Re: script, GPL, container question

2019-01-25 Thread Gian Merlino
For Q1 the legal guidance as I understand it is that we can provide users with instructions for how to get optional (L)GPL dependencies, but we can't distribute them ourselves. Putting the mysql-connector in an Docker image does feel like distribution… Q2 is an interesting question. I wonder if

Re: script, GPL, container question

2019-01-25 Thread Gian Merlino
the question of GPLed components that come from the base image. On Fri, Jan 25, 2019 at 10:17 AM Gian Merlino wrote: > For Q1 the legal guidance as I understand it is that we can provide users > with instructions for how to get optional (L)GPL dependencies, but we can't > distribute them

Re: script, GPL, container question

2019-01-25 Thread Gian Merlino
reviewed the PR enough to have an opinion on that.) On Fri, Jan 25, 2019 at 12:53 PM Don Bowman wrote: > On Fri, 25 Jan 2019 at 13:17, Gian Merlino wrote: > > > For Q1 the legal guidance as I understand it is that we can provide users > > with instructions for how to get optional (

Re: Off list major development

2019-02-01 Thread Gian Merlino
ested template looks good to me. > > > > Jihoon > > > > On Thu, Jan 31, 2019 at 9:27 AM Gian Merlino wrote: > > > > > If it's not clear - I am agreeing with Jihoon and Slim that a separate > > > "Rationale" section makes sense in addition

Re: Off list major development

2019-02-01 Thread Gian Merlino
cool, if not, that's cool too, but perhaps having it present in the > > template would encourage ppl to think about testing strategies early on > if > > they aren't already) > > > > > > On Thu, Jan 31, 2019 at 2:17 PM Jihoon Son wrote: > > > > > Th

Re: Off list major development

2019-01-30 Thread Gian Merlino
I think it'd also be nice to tweak a couple parts of the KIP template (Motivation; Public Interfaces; Proposed Changes; Compatibility, Deprecation, and Migration Plan; Test Plan; Rejected Alternatives). A couple people have suggested adding a "Rationale" section, how about adding that and removing

Re: Slow download of segments from deep storage

2019-01-30 Thread Gian Merlino
I believe today, if you use the (experimental) HTTP-based load queues, they will parallelize segment downloads. Adding similar functionality for the ZK-based load queues would definitely be useful though, since at this time nobody seems to be actively driving a migration to HTTP-based load queues

Re: Forbiddenapis Plugin

2019-01-31 Thread Gian Merlino
I get those sometimes with generated sources -- typically doing a "mvn clean" beforehand clears it up. We might be able to add exclusions for the generated source directories in order to avoid the need to do this. On Thu, Jan 31, 2019 at 5:15 AM Furkan KAMACI wrote: > I try to run forbiddenapis

Re: Off list major development

2019-01-31 Thread Gian Merlino
If it's not clear - I am agreeing with Jihoon and Slim that a separate "Rationale" section makes sense in addition to a couple other suggested tweaks. On Wed, Jan 30, 2019 at 3:46 PM Gian Merlino wrote: > I think it'd also be nice to tweak a couple parts of the KIP template

Re: Forbiddenapis Plugin

2019-01-31 Thread Gian Merlino
Good question. I'm not sure. They are at least doing String.format on _something_ with no default locale. On Thu, Jan 31, 2019 at 9:36 AM Charles Allen wrote: > Is this indicative of latent bugs the generated sources have? > > On Thu, Jan 31, 2019 at 8:55 AM Gian Merlino wrote: &g

Re: Domain-driven Observability

2019-04-16 Thread Gian Merlino
Hmm, interesting read. Some stuff like this seems to have evolved organically - I'd say TaskRealtimeMetricsMonitor is an example of this technique. It seems like a reasonable pattern when you're doing multiple different kinds of instrumentation (like, both logs and metrics), if the thing you're

Re: codecov, automatic PR unit test coverage report

2019-04-17 Thread Gian Merlino
ed earlier. > > On Wed, Apr 17, 2019 at 8:23 PM Gian Merlino wrote: > >> I'm not seeing the attached png (it shows up as a broken image). Seeing >> code coverage sounds interesting as an FYI kind of thing. I wouldn't want >> to use it as a gating factor, but seeing it c

Graduation

2019-06-07 Thread Gian Merlino
Hey Druids, Druid has been in the incubator for a while, and we have done 4 releases so far (0.13.0, 0.14.0, 0.14.1, and 0.14.2) with a 5th on the way. There has been some discussion off-list recently about pushing for graduation and it was pointed out that it is way past time to have a

Re: Proposed website migration plan

2019-06-04 Thread Gian Merlino
used for], it's #2 on Google, and not ranked on the first page on Bing & DDG. Will monitor this over the next few days. On Mon, May 6, 2019 at 5:43 PM Gian Merlino wrote: > Hi all, > > It sounds like we will need a redirect server that issues 301s from each > druid.io page

Re: Proposed website migration plan

2019-06-11 Thread Gian Merlino
!) On Mon, Jun 10, 2019 at 9:00 AM David Lim wrote: > No objections from me - thank you for testing this out. > > On Mon, Jun 10, 2019 at 7:48 AM Gian Merlino wrote: > > > It looks like Google has picked up the 301 and [druid use cases] #1 > result > > is https://druid.apache.o

Re: Graduation

2019-06-18 Thread Gian Merlino
es/graduation.html < > https://incubator.apache.org/guides/graduation.html>. E.g. you need a > charter, agree the text of the resolution for the board, and decide who > will be on the PMC, and who will be PMC chair. > > > On Jun 13, 2019, at 2:24 PM, Gian Merlino wrote: >

<    1   2   3   4   5   >