I can submit a MapReduce job reading that table, although its processing
rate is also a litter slower than I expected, but not that slow as Spark.

2014-10-01 12:04 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:

> Can you launch a job which exercises TableInputFormat on the same table
> without using Spark ?
>
> This would show whether the slowdown is in HBase code or somewhere else.
>
> Cheers
>
> On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao <xiaotao.cs....@gmail.com>
> wrote:
>
>> I checked HBase UI. Well, this table is not completely evenly spread
>> across the nodes, but I think to some extent it can be seen as nearly
>> evenly spread - at least there is not a single node which has too many
>> regions.  Here is a screenshot of HBase UI
>> <http://imgbin.org/index.php?page=image&id=19539>.
>>
>> Besides, I checked the size of each region in bytes for this table in the
>> HBase shell as follows:
>>
>>
>> -bash-4.1$ hadoop dfs -du -h /hbase/data/default/C_CONS
>> DEPRECATED: Use of this script to execute hdfs command is deprecated.
>> Instead use the hdfs command for it.
>>
>> 288      /hbase/data/default/C_CONS/.tabledesc
>> 0        /hbase/data/default/C_CONS/.tmp
>> 159.6 M  /hbase/data/default/C_CONS/0008c2494a5399d68495d9c8ae147821
>> 76.7 M   /hbase/data/default/C_CONS/021d7d21d7faeb7b2a77835d6f86747e
>> 81.3 M   /hbase/data/default/C_CONS/02a39a316ac6d2bda89e72e74aa18a6e
>> 155.3 M  /hbase/data/default/C_CONS/02fe51bc077290febc85651d8ee31abc
>> 173.4 M  /hbase/data/default/C_CONS/045859bcc70e36eb4d33f8ca3b7d9633
>> 82.6 M   /hbase/data/default/C_CONS/05c868b6036cc4f1836f70be6215c851
>> 74.1 M   /hbase/data/default/C_CONS/0816378c837f1f3b84f4d4060d22beb3
>> 84.7 M   /hbase/data/default/C_CONS/083da8f5eb8a5b1cca76376449f357ca
>> 346.6 M  /hbase/data/default/C_CONS/0ac70fcb1baea0896ea069a6bcc30898
>> 333.8 M  /hbase/data/default/C_CONS/0b3be845bd4f5e958e8c9a18c8eaab21
>> 72.7 M   /hbase/data/default/C_CONS/12c13610c50dbc8ab27f20b0ebf2bfc4
>> 76.1 M   /hbase/data/default/C_CONS/1341966315d7e53be719d948d595bee0
>> 72.4 M   /hbase/data/default/C_CONS/1acdbc05c502b11da4852a1f21228f44
>> 70.0 M   /hbase/data/default/C_CONS/1b8f57d65f6c0e4de721e4c8f1944829
>> 183.9 M  /hbase/data/default/C_CONS/1f1ae7ca9f725fcf9639a4d52086fa50
>> 65.5 M   /hbase/data/default/C_CONS/20c10b96e2b9c40684aaeb6d0cfbf7c0
>> 76.0 M   /hbase/data/default/C_CONS/22515194fe09adcd4cbb2f5307303c73
>> 78.4 M   /hbase/data/default/C_CONS/236cd80393cb5b7c526bd2c45ce53a0a
>> 150.0 M  /hbase/data/default/C_CONS/23bd80852f47b97b4122709ec844d4ed
>> 81.6 M   /hbase/data/default/C_CONS/241b8bc415029dedf94c4a84e6c4ad3b
>> 77.9 M   /hbase/data/default/C_CONS/27f1e59bde75ef3096a5bdd3eb402cd7
>> 160.8 M  /hbase/data/default/C_CONS/30c2ae3be38b8cdf3b337054a7d61478
>> 372.2 M  /hbase/data/default/C_CONS/31d606da71b35844d0cdc8a195c97d2e
>> 182.6 M  /hbase/data/default/C_CONS/3274a022bc7419d426cf63caa1cc88e1
>> 92.1 M   /hbase/data/default/C_CONS/344faae7971d87b51edf23f75a7c3746
>> 154.7 M  /hbase/data/default/C_CONS/3b3f0c839bdb32ed2104f67c8a02da41
>> 77.4 M   /hbase/data/default/C_CONS/3cf6b2bd0cfe85f3111d0ba1b84a60b4
>> 71.5 M   /hbase/data/default/C_CONS/3f466db078d07e2ddddbfb11c681e0e3
>> 77.8 M   /hbase/data/default/C_CONS/3f8c1b7dec05118eb9894bb591e32b2f
>> 83.6 M   /hbase/data/default/C_CONS/45e105856fcb54748c48bd45e973a3b9
>> 185.2 M  /hbase/data/default/C_CONS/4becd90d46a2d4a6bd8ecbe02b60892c
>> 165.6 M  /hbase/data/default/C_CONS/4dcebd58c7013062c4a8583012a11b5a
>> 67.3 M   /hbase/data/default/C_CONS/51f845d842605dda66b1ae01ad8a17e8
>> 148.2 M  /hbase/data/default/C_CONS/532189155ab78dbd1e36aac3ab4878a8
>> 172.6 M  /hbase/data/default/C_CONS/5401d9cb19adb9bd78718ea047e6d9d7
>> 139.4 M  /hbase/data/default/C_CONS/547d2a8c54aae73e8f12b4570efd984c
>> 89.5 M   /hbase/data/default/C_CONS/54cbac1f71c7781697052bb2aa1c5a18
>> 101.3 M  /hbase/data/default/C_CONS/55263ce293327683b9c6e6098ec3e89a
>> 85.2 M   /hbase/data/default/C_CONS/55f8c278e35de6bca5083c7a66e355fb
>> 85.8 M   /hbase/data/default/C_CONS/57112558912e1de016327e115bc84f11
>> 171.8 M  /hbase/data/default/C_CONS/572b886cbfe92ddcb97502f041953fb8
>> 51       /hbase/data/default/C_CONS/6bd64d8cf6b38806731f7693bdd673c9
>> 86.6 M   /hbase/data/default/C_CONS/7695703b7b527afc5f3524eee9b5d806
>> 74.8 M   /hbase/data/default/C_CONS/7bb7567685f5e16a4379d7cf79de2ecc
>> 120.1 M  /hbase/data/default/C_CONS/7c144bef991bb3c959d7ef6e2fa5036a
>> 166.0 M  /hbase/data/default/C_CONS/7c7817eb3e531d5bda88b5f0de6a20de
>> 173.5 M  /hbase/data/default/C_CONS/7d07c139575d007ecbb23fa946e39130
>> 139.2 M  /hbase/data/default/C_CONS/8295aa701110ddf4055e8c3ca5bd9cad
>> 91.7 M   /hbase/data/default/C_CONS/84b340d22471580ed8100d6614668eb1
>> 81.2 M   /hbase/data/default/C_CONS/8605f4470498a01a5ec4c88e7ea8a458
>> 78.3 M   /hbase/data/default/C_CONS/897da8e33275b80926ef38200132f819
>> 234.4 M  /hbase/data/default/C_CONS/93f5ce30ed8e54cc282cb5b88fa28d76
>> 126.3 M  /hbase/data/default/C_CONS/96dd1decd62e35c394bb8e7f6095f054
>> 80.9 M   /hbase/data/default/C_CONS/998364405e57a7eedae094bca76a419e
>> 184.8 M  /hbase/data/default/C_CONS/9df3b62b1bff59b67b75ad86d694b8c8
>> 126.6 M  /hbase/data/default/C_CONS/a4531e06f3440349e7e6776b8bfedaf0
>> 79.3 M   /hbase/data/default/C_CONS/aa0b8341d3ca925ed24309f46e0ab845
>> 79.9 M   /hbase/data/default/C_CONS/aa45bfa549a439ded2a8b159a5c9caaa
>> 84.9 M   /hbase/data/default/C_CONS/abae60b33de2999698a7452ff62dad08
>> 87.0 M   /hbase/data/default/C_CONS/ac5ff05785bc6e07637106450c74d02a
>> 80.7 M   /hbase/data/default/C_CONS/aca765b578b236978b11ec26c167a958
>> 68.0 M   /hbase/data/default/C_CONS/b03614566cc8d521a9c983d418b57866
>> 77.4 M   /hbase/data/default/C_CONS/b1ae0451f592b28eed8a58908f91293a
>> 91.5 M   /hbase/data/default/C_CONS/b8396049e2b742108add1485c0eb4aeb
>> 81.2 M   /hbase/data/default/C_CONS/b8d25b3e536b4fea5ee4ee2b21885c76
>> 87.8 M   /hbase/data/default/C_CONS/bbfbe319705df23a23a89b40e52d89a8
>> 81.3 M   /hbase/data/default/C_CONS/bccaeedc65d9295289f78aaec588cc3d
>> 95.8 M   /hbase/data/default/C_CONS/c229d583958802571dfaa9a39453df0d
>> 88.5 M   /hbase/data/default/C_CONS/c9d7a038243d1b3e2448a48007f1f9e0
>> 158.8 M  /hbase/data/default/C_CONS/cca1bf1f013724af25d71ad4310e5d4a
>> 212.8 M  /hbase/data/default/C_CONS/ccabf798734aa8e05798c43c132ad565
>> 85.1 M   /hbase/data/default/C_CONS/d1cb54346e109b1ba76fd95aa4540161
>> 84.4 M   /hbase/data/default/C_CONS/d4dd8c3fa81b751892689cc92a96aa99
>> 139.5 M  /hbase/data/default/C_CONS/dc15ceeed21474b51086f3103cbd0074
>> 97.7 M   /hbase/data/default/C_CONS/df20e2077f22e83ecd8e55550d52dea1
>> 221.0 M  /hbase/data/default/C_CONS/e30d0d55e0887a676c8b79e03771ad23
>> 75.7 M   /hbase/data/default/C_CONS/e6ed24ce0b3e1e903bd9757d28380f3a
>> 74.9 M   /hbase/data/default/C_CONS/e9732d9905f5373fb0fd7a1ce033e17b
>> 101.2 M  /hbase/data/default/C_CONS/f2a49dbaf018f0e45bbd7a758f123418
>> 172.6 M  /hbase/data/default/C_CONS/f34645de36d3c1413ce83177e2118947
>> 89.2 M   /hbase/data/default/C_CONS/f3db2bf3b7ffb7b4c0029eac5d631bdb
>> 81.6 M   /hbase/data/default/C_CONS/f43b49c4f384853266e9ee45a98104a6
>> 68.9 M   /hbase/data/default/C_CONS/fa4fb0047ec98fb10bf84fd72937f415
>> 86.7 M   /hbase/data/default/C_CONS/fc69f349655676e046c9110550825f5a
>> 155.0 M  /hbase/data/default/C_CONS/feb0835bdf73c257de11c65f18b1330d
>> 75.2 M   /hbase/data/default/C_CONS/fff9fbe56af8b9e0e00826f8936e7a56
>>
>>
>>
>> From the result above we can see that the biggest region's size is 346.6
>> M, while most other regions' size are near each other.
>>
>> So what may be the real reason ?
>>
>> 2014-09-30 12:17 GMT+08:00 Vladimir Rodionov <vrodio...@splicemachine.com
>> >:
>>
>>> HBase TableInputFormat creates input splits one per each region. You can
>>> not achieve high level of parallelism unless you have 5-10 regions per RS
>>> at least. What does it mean? You probably have too few regions. You can
>>> verify that in HBase Web UI.
>>>
>>> -Vladimir Rodionov
>>>
>>> On Mon, Sep 29, 2014 at 7:21 PM, Tao Xiao <xiaotao.cs....@gmail.com>
>>> wrote:
>>>
>>>> I submitted a job in Yarn-Client mode, which simply reads from a HBase
>>>> table containing tens of millions of records and then does a *count 
>>>> *action.
>>>> The job runs for a much longer time than I expected, so I wonder whether it
>>>> was because the data to read was too much. Actually, there are 20 nodes in
>>>> my Hadoop cluster so the HBase table seems not so big (tens of millopns of
>>>> records). :
>>>>
>>>> I'm using CDH 5.0.0 (Spark 0.9 and HBase 0.96).
>>>>
>>>> BTW, when the job was running, I can see logs on the console, and
>>>> specifically I'd like to know what the following log means:
>>>>
>>>> 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Starting task 0.0:20
>>>> as TID 20 on executor 2: b04.jsepc.com (PROCESS_LOCAL)
>>>> 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Serialized task 0.0:20
>>>> as 13454 bytes in 0 ms
>>>> 14/09/30 09:45:20 INFO scheduler.TaskSetManager: Finished TID 19 in
>>>> 16426 ms on b04.jsepc.com (progress: 18/86)
>>>> 14/09/30 09:45:20 INFO scheduler.DAGScheduler: Completed ResultTask(0,
>>>> 19)
>>>>
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>

Reply via email to