jnturton commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1076404891
> Drill 2.0 is an opportunity to reorient Drill away from fading big data
space and toward the data science use cases that most PRs now seem to support.
(It's not that big data itself is gone, it'd just that most folks who need that
kind of scale now run in the cloud where Drill is not common.) As one of many
examples, REST APIs make no sense at scale, but do make sense for a "small
data" tool.
> Or, have two Drill additions, the old-school "distributed systems" edition
and the newer "data science edition". Those who still need Drill to work
distributed can keep that edition going (along with the big data CSV quirks),
while the data science folks can fork the data science edition, chuck the
distributed systems stuff that gets in the way, and focus on things that data
scientists do (such as reading Excel and PDF files.)
@paul-rogers supporting two editions of Drill would be even harder for our
small band of developers than supporting one, surely? Also, I think it would
be a major loss to unpick all of the MPP work done in Drill to make big data
queryable, in any notional edition. Indeed, for a small-data-only query
engine, I doubt that there would be any sense in starting from Drill at all. A
fresh start based on Calcite, Pandas or Julia would be simpler and cleaner.
Many people do their big data processing in the cloud but not all of them
want the vendor lock-in of the SAAS products so prefer to deploy open source in
the cloud. Others remain on-prem. In addition, I still contend that the
worlds of small and big data are not disjoint, and that a single system that
can query over many storages, formats and data sizes is valuable to a viable
audience. If the contrib/ plugins can be sufficiently separated away from the
rest of Drill then the variability in their scalability and behaviour is
quarantined away from core Drill.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org