Re: SplitNormalizationPlan SplitPoint

2019-05-29 Thread Stack
Austin: Looks like oversight to me. As you suggest, you'd think the split point -- if present -- would be passed to the admin split call. Mind putting up a patch sir? Thanks, S On Mon, May 27, 2019 at 2:36 PM aheyne wrote: > Hey all, > > We've been having good fun working with custom

Re: Reading the whole table with MapReduce and Spark.

2019-05-29 Thread James Kebinger
TableInputFormat doesn't read the filesystem directly it essentially issues a scan over the whole table (or the specified range) so it'll read the data you expect to read if you'd done a scan from any client. There is a TableSnapshotInputFormat that bypasses the hbase server itself

Re: [DISCUSS] Publishing hbase binaries with different hadoop release lines

2019-05-29 Thread Artem Ervits
I can't comment whether we need every Hadoop release but at the minimum, 2.8.5 as we recently switched to it from 2.7.7. I ran into issues with 2.8.5 and 2.1.5rc0 and used workaround in https://issues.apache.org/jira/browse/HBASE-22052 to overcome it. I guess if 2.1 will not live past 2.1.6 then

[DISCUSS] Publishing hbase binaries with different hadoop release lines

2019-05-29 Thread Duo Zhang
See the comments here https://issues.apache.org/jira/browse/HBASE-22394 Although we claim that hbase 2.1.x can work together with hadoop 3.1.x, actually we require users to build the hbase binary with hadoop 3.1.x by their own if they really want to use hbase together with hadoop 3.1.x clients.

Re: Disk hot swap for data node while hbase use short-circuit

2019-05-29 Thread Wei-Chiu Chuang
Do you have a list of files that was being opened? I'd like to know if those are files opened for writes or for reads. If you are on the more recent version of Hadoop (2.8.0 and above), there's a HDFS command to interrupt ongoing writes to DataNodes (HDFS-9945

Scan vs TableInputFormat to process data

2019-05-29 Thread Guillermo Ortiz Fernández
Just to be sure, if I execute Scan inside Spark, the execution is goig through RegionServers and I get all the features of HBase/Scan (filters and so on), all the parallelization is in charge of the RegionServers (even I'm running the program with spark) If I use TableInputFormat I read all the

Re: Reading the whole table with MapReduce and Spark.

2019-05-29 Thread Guillermo Ortiz Fernández
Another little doubt it's: if I use the class TableinputFormat to read a HBase table, Am I going to read the whole table? or data what haven't been flushed to storefiles it's not going to be read? El mié., 29 may. 2019 a las 0:14, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>)