[GitHub] [drill] jnturton commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-24 Thread GitBox


jnturton commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1077330375


   > Rebased on master. It is now failing on the C++ code check, which is a bit 
odd because I don't recall seeing much C++ code in Drill...
   
   What are these new CodeQL actions here, a rebranding of LGTM?  And what 
prompted this new analysis of our C++, we previously only analysed Java, JS and 
Python IIRC.  
   
   (These are more questions for us, @cgivre @luocooong @vdiravka, than for you 
@paul-rogers)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] jnturton commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

2022-03-23 Thread GitBox


jnturton commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1076404891


   > Drill 2.0 is an opportunity to reorient Drill away from fading big data 
space and toward the data science use cases that most PRs now seem to support. 
(It's not that big data itself is gone, it'd just that most folks who need that 
kind of scale now run in the cloud where Drill is not common.) As one of many 
examples, REST APIs make no sense at scale, but do make sense for a "small 
data" tool.
   > Or, have two Drill additions, the old-school "distributed systems" edition 
and the newer "data science edition". Those who still need Drill to work 
distributed can keep that edition going (along with the big data CSV quirks), 
while the data science folks can fork the data science edition, chuck the 
distributed systems stuff that gets in the way, and focus on things that data 
scientists do (such as reading Excel and PDF files.)
   
   @paul-rogers supporting two editions of Drill would be even harder for our 
small band of developers than supporting one, surely?  Also, I think it would 
be a major loss to unpick all of the MPP work done in Drill to make big data 
queryable, in any notional edition.  Indeed, for a small-data-only query 
engine, I doubt that there would be any sense in starting from Drill at all.  A 
fresh start based on Calcite, Pandas or Julia would be simpler and cleaner.
   
   Many people do their big data processing in the cloud but not all of them 
want the vendor lock-in of the SAAS products so prefer to deploy open source in 
the cloud.  Others remain on-prem.  In addition, I still contend that the 
worlds of small and big data are not disjoint, and that a single system that 
can query over many storages, formats and data sizes is valuable to a viable 
audience.  If the contrib/ plugins can be sufficiently separated away from the 
rest of Drill then the variability in their scalability and behaviour is 
quarantined away from core Drill.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org