Hi, Yeah as a side note, I had to fork Drill to update the netty jars also so that's done in my fork as best as my understanding is. There is already a ticket about that issue: https://issues.apache.org/jira/browse/DRILL-7546 That seems to be the same problem described there.
Thanks, I hope I will be able to open source it when it's more production ready and the company is able to do it. On Tue, Mar 24, 2020 at 2:41 PM Paul Rogers <[email protected]> wrote: > Hi Benjamin, > > Thanks much for the update. Congrats on getting everything working. Glad > you did not end up in jar conflict hell. > > If you can, please file a JIRA ticket about the endpoint map issue. The > situation you describe seems wrong. Perhaps the bug crept in as people > added information to the endpoint. In any event, we should fix the issue. > > I look forward to the link to see how you did the integration. > > Thanks, > - Paul > > > > On Tuesday, March 24, 2020, 6:17:41 AM PDT, Benjamin Schaff < > [email protected]> wrote: > > Hi, > > Just wanted to give some feedback, I finally got a chance to work on that > part of our database. > I successfully integrated drill as the sql engine directly inside the > database with partition placement that gave me the lift I expected from the > old version that was using an external spark cluster. > So right now, the "storage" nodes and drillbits are in the same JVM and can > read the data directly. > I am not seeing any issue as of now but I am continuing the benchmarks and > try to have better coverage for testing. > > One thing that is a bit of struggle and I fixed it partially in Drill was > the fact that DrillbitEndpoints are used in HashMap and the hashcode > contains the state field of the node which sometimes ends up duplicating > the endpoints and gave me some issues with the "hard" affinity mode and the > "required" endpoint flag. > Unfortunately, since I don't know the internal of Drill, the patch I did is > really just for my use case but if it's of any interest, I could contribute > fixing that properly. (endpoint with startup state get mixed up with > endpoints with online state and are one and the same) > > My company is considering open-sourcing our product, if any one is > interested, I will give the link whenever it's available as an example on > how to do it. > > Thanks everyone for you help and suggestions. > > > On Wed, Jan 22, 2020 at 9:28 AM Benjamin Schaff < > [email protected]> wrote: > > > Hi, thanks everyone for the feedback. > > > > The current database query API support pushdowns (filtering and > > projections) but when dealing with billion rows, it's still a lot to move > > over the network. > > The RPC API itself is not the performance bottleneck, we have our own > > binary format similar to flatbuffer with query time codegen readers and > > writers so that part is ok. > > > > On the question about why the spark part is kind of slow, I do batch > > (usually around 50k rows at a time) but my guess is that going from our > > binary format to spark internal row format and then spark moves it to > > unsaferow is a lot of transformation for "nothing". > > We have a codegen parser that does internal format to spark row format > but > > directly speaking unsaferow is much more involved so I put it on the side > > for now. > > > > Here is what I am going to try from all the feedback you gave me: > > 1) Since premature optimization is the root of evil, and my spark > > assumption might not hold true for Drill, I will try to do a "remote" > > integration > > 2) I will try to see if I can use Drill internal format to ship it on the > > network, if anybody could be kind enough to give me a pointer where to > look > > that would be awesome > > 3) I will upgrade my current integration to merge the "remote" with the > > "local" one > > > > I will keep you guys updated and publish my results so that I can give > > back some of my experiments. > > > > On a separate note, I was wondering if/how it was possible for Drill > > (probably hacking somewhere in calcite plan, to push down the joins > filter > > parts or if it is done automatically) > > > > Again, any idea or comment is welcome. > > > > Thanks. > > > > On Wed, Jan 22, 2020 at 1:28 AM Ted Dunning <[email protected]> > wrote: > > > >> Hmmm.... > >> > >> I disagree with a lot of what Paul says. > >> > >> Here is where I agree fully: > >> > >> 1) collocating processes in the same JVM increases the blast radius of > >> failures. If either the DB or the Drill threads go south, it will take > the > >> other out. This is a relatively low probability event, but increasing > the > >> probability, or, worse, coupling the probabilities isn't necessary. On a > >> very closely related note, the blast radius of GC is also coupled > between > >> the two processes. > >> > >> 2) lack of control over either process or memory for either process will > >> affect the other. That would be bad. See (1). > >> > >> 3) coupled scaling is sub-optimal. But that might be compensated for by > >> the > >> close coupling of within process communication. > >> > >> Where I disagree is how serious these considerations are. Drill is > fairly > >> well disciplined in terms of heap and off-heap space. Presumably the DB > is > >> as well. That would mean that the likely impact of (2) would be very > >> small. > >> The ease of communication between threads within the same process is > >> dramatically better than communication between processes, even > >> (especially?) with shared memory. > >> > >> My own recommendation would be to *allow* collocation but not assume it. > >> Allow for non-collocated Drill bits as well. That allows you to pivot at > >> any point. > >> > >> > >> On the other hand > >> > >> On Tue, Jan 21, 2020 at 5:10 PM Paul Rogers <[email protected]> > >> wrote: > >> > >> > Hi Benjamin, > >> > > >> > Very cool project! Drill works well on top of custom data sources. > >> > > >> > That said, I suspect that actually running Drill inside your process > >> will > >> > lead to a large amount of complexity. Your comment focuses on code > >> issues. > >> > However, there are larger concerns. Although we think of Drill as a > >> simple > >> > single-threaded, single node tool (when run in SqlLine or on a Mac), > >> Drill > >> > is designed to be fully distributed. > >> > > >> > As queries get larger, you will find that Drill itself uses large > >> amounts > >> > of memory and CPU to run a query quickly. (Imagine a join or sort of > >> > billions of rows from several tables.) Drill has its own memory > >> management > >> > system to handle the large blocks of memory needed. Your DB also needs > >> > memory. You'd need a way to unify Drill's memory management with your > >> own > >> > -- a daunting task. > >> > > >> > Grinding through billions of rows is CPU intensive. Drill manages its > >> own > >> > thread and makes very liberal use of CPU. Your DB engine likely also > >> has a > >> > threading model. Again, integrating the two is difficult. We could go > >> on. > >> > > >> > In short, although Drill works well as a query engine on top of a > custom > >> > data source; Drill itself is not designed to be a library included > into > >> > your app process; it is designed to run as its own distributed set of > >> > processes running alongside your process. > >> > > >> > We could, of course, change the design, but that would be a bit of a > big > >> > project because of the above issues. Might be interesting to think how > >> > you'd embed a distributed framework as a library in some host process. > >> Not > >> > sure I've ever seen this done for any tool. (If anyone knows of an > >> example, > >> > please let us know.) > >> > > >> > > >> > I wonder if there is a better solution. Run Drill alongside your DB on > >> the > >> > same nodes. Have Drill then obtain data from your DB via an API. The > >> quick > >> > & dirty solution is to use an RPC API. You can get fancy and use > shared > >> > memory. A side benefit is that other tools can also use the API. For > >> > example, if you find you need Spark integration, it is easier to > >> provide. > >> > (You can't, of course, run Spark in your DB process.) > >> > > >> > In this case, an "embedded solution" means that Drill is embedded in > >> your > >> > app cluster (like ZK), not that it is embedded in your app process. > >> > > >> > > >> > In this way, you can tune Drill's memory and CPU usage separately from > >> > that of your engine, making the problem tractable. This model is, in > >> fact, > >> > very similar to the traditional HDFS model in which both Drill and > HDFS > >> run > >> > on the same nodes. It is also similar to what MapR did with the MapR > DB > >> > integration. > >> > > >> > > >> > Further, by separating the two, you can run Drill on its own nodes if > >> you > >> > find your queries are getting larger and more expensive. That is, you > >> can > >> > scale out be separating compute (Drill) from storage (your DB), > allowing > >> > each to scale independently. > >> > > >> > > >> > And, of course, a failure in one engine (Drill or DB) won't take down > >> the > >> > other if the two are in separate processes. > >> > > >> > > >> > In either case, your storage plugin needs to compute data locality. If > >> > your DB is distributed, then perhaps it has some scheme for > distributing > >> > data: hash partitioning, range partitioning, or whatever. Somehow, if > I > >> > have key 'x', I know to go to node Y to get that value. For example, > in > >> > HDFS, Drill can distribute block scans to the node(s) with the blocks. > >> > > >> > > >> > Or, maybe data is randomly distributed, so that every scan must run > >> > against every DB node; in which case if you have N nodes, you'll run N > >> > scans and each will find whatever it happens to contain. > >> > > >> > > >> > If your DB has N nodes, then you need to distribute work to those > nodes > >> by > >> > telling Drill that the max parallelization (reported by the group > scan) > >> is > >> > N. Then, Drill will ask you for the SubScan for each of the N scans, > and > >> > you can allocate work to those nodes. Either by subsetting the scan > (as > >> in > >> > HDFS) or just running the same scan everywhere. > >> > > >> > > >> > If you go with the two-process model, then your storage plugin can use > >> > soft affinity: run the scan on the node that has your DB, else run it > on > >> > any node and use an RPC to obtain the data. This is how Drill works if > >> it > >> > runs on a subset of HDFS nodes. > >> > > >> > You also asked about the Foreman. At present, Drill assumes nodes are > >> > homogeneous: all nodes evenly share work, including the work of the > >> > Foreman. Impala, for example, has added a feature to dedicate some > >> nodes to > >> > be only coordinators (the equivalent of Drill's Foreman). Drill does > not > >> > yet have that feature. > >> > > >> > Without the homogeneity assumption, Drill would need some kind of work > >> > scheduler to know to give less work to the Forman + Drillbit node and > >> more > >> > work to the Drillbit-only nodes. Having Foreman-only nodes would keep > >> > things simpler. In your ase, such a Foreman would have to reside on a > >> node > >> > other than one of your DB nodes to keep the DB nodes symmetrical. > >> > > >> > > >> > The above is a high-level survey of the challenges. We'd be happy to > >> > discuss specific issues as you refine your design. > >> > > >> > > >> > Thanks, > >> > - Paul > >> > > >> > > >> > > >> > On Tuesday, January 21, 2020, 3:00:21 PM PST, Benjamin Schaff < > >> > [email protected]> wrote: > >> > > >> > Hi everyone, > >> > > >> > I would like to see if you could provide some recommendations/help > >> around > >> > integrating Apache Drill as a distributed sql engine in a custom > >> database. > >> > Maybe I am going about it the wrong way so any feedback is > appreciated. > >> > > >> > What I would like to achieve, is to be able to embed drillbits into my > >> > database node, it's a distributed database written mostly in scala so > >> it's > >> > running inside the jvm. As you would expect, each storage node holds a > >> > partition of the data and I would like for each SubScan to be routed > to > >> the > >> > drillbit instance embedded within the database node. > >> > > >> > At this point, drillbits are running communicating properly with zk (I > >> am > >> > using zookeeper for the database also). I can connect to the Plugin I > >> > created using sqlline and I can list schemas and tables. So basically, > >> all > >> > the metadata part is done and working. > >> > > >> > I managed to build-up the patitionwork and affinity using the > >> distributed > >> > metadata off the database and I am stuck in the following situation. > >> > > >> > If I override the "DistributionAffinity getDistributionAffinity()" > >> method > >> > to put it to "HARD", then I end up with having the following error: > >> > "IllegalArgumentException: Sender fragment endpoint list should not be > >> > empty", and the "applyAssignments" method of the GroupScan receives > and > >> > empty list of endpoints. > >> > > >> > If I don't override it then node without "local access" get some work > >> > scheduled. > >> > > >> > I was wondering if there was a way to exclude drillbits to become a > >> > foreman. > >> > > >> > Thanks in advance for any guidance. > >> > > >> > -- > >> > *This e-mail and any > >> > attachments may contain confidential information and > >> > is intended for use solely > >> > by the addressee(s). If you are not the > >> > > >> > intended recipient of this e-mail, please be aware that any > >> dissemination, > >> > > >> > distribution, copying, or other use of the e-mail in whole or in part, > >> is > >> > > >> > strictly prohibited. If you have > >> > received this e-mail in error, please > >> > notify the sender and permanently delete > >> > the original and all copies of the > >> > e-mail, attachments, and any printouts. * ** > >> > > > > -- > *This e-mail and any > attachments may contain confidential information and > is intended for use solely > by the addressee(s). If you are not the > > intended recipient of this e-mail, please be aware that any dissemination, > > distribution, copying, or other use of the e-mail in whole or in part, is > > strictly prohibited. If you have > received this e-mail in error, please > notify the sender and permanently delete > the original and all copies of the > e-mail, attachments, and any printouts. * ** -- *This e-mail and any attachments may contain confidential information and is intended for use solely by the addressee(s). If you are not the intended recipient of this e-mail, please be aware that any dissemination, distribution, copying, or other use of the e-mail in whole or in part, is strictly prohibited. If you have received this e-mail in error, please notify the sender and permanently delete the original and all copies of the e-mail, attachments, and any printouts. * **
