Re: Scan vs TableInputFormat to process data

2019-06-03 Thread Jean-Marc Spaggiari
Also, keep in mind that by bypassing the RegionServer you also bypass the security rules... JMS Le sam. 1 juin 2019 à 21:43, Josh Elser a écrit : > Hi Guillermo, > > Yes, you are missing something. > > TableInputFormat uses the Scan API just like Spark would. > > Bypassing the RegionServer

Re: Scan vs TableInputFormat to process data

2019-06-01 Thread Josh Elser
Hi Guillermo, Yes, you are missing something. TableInputFormat uses the Scan API just like Spark would. Bypassing the RegionServer and reading from HFiles directly is accomplished by using the TableSnapshotInputFormat. You can only read from HFiles directly when you are using a Snapshot, as

Scan vs TableInputFormat to process data

2019-05-29 Thread Guillermo Ortiz Fernández
Just to be sure, if I execute Scan inside Spark, the execution is goig through RegionServers and I get all the features of HBase/Scan (filters and so on), all the parallelization is in charge of the RegionServers (even I'm running the program with spark) If I use TableInputFormat I read all the