Re: HBase Timeout on queries

2018-02-06 Thread Flavio Pompermaier
Hi Pedro,
I was query the COUNT just as a first dumb query to test if everything was
ok...indeed I had to increase 4 timeouts in order to answer that query
without errors.
By the way, I think that count is something very useful to know about a
table and, IMHO, should be something always available as a table metadata.
I don't know why HBase does't care that much about that info...

Best,
Flavio

On Mon, Feb 5, 2018 at 7:10 PM, Pedro Boado  wrote:

> Flavio I get same behaviour, a count(*) over 180M records needs a couple
> of minutes to complete for a table with 10 regions and 4 rs serving it.
>
> Why are you evaluating robustness in terms of full scans? As Anil said I
> wouldn't expect a NoSQL database to run quick counts on hundreds of
> millions or even billions of records.
>
> In terms of usage we have a production  Phoenix cluster with 12 RS serving
> a table with ~100 billion records (6TB)  - . Queries always scan by first
> column of our primary key, meaning no more than a few thousand records are
> pulled in well under a second response time.
>
>
> On 1 Feb 2018 16:38, "James Taylor"  wrote:
>
> I don’t think the HBase row_counter job is going to be faster than a
> count(*) query. Both require a full table scan, so neither will be
> particularly fast.
>
> A couple of alternatives if you’re ok with an approximate count: 1) enable
> stats collection (but you can leave off usage to parallelize queries) and
> the do a SUM over the size column for the table using stats table directly,
> or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as
> described above) to prevent a full scan.
>
> On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier 
> wrote:
>
>> Hi Anil,
>> Obviously I'm not using HBase just for the count query..Most of the time
>> I do INSERT and selective queries, I was just trying to figure out if my
>> HBase + Phoenix installation is robust enough to deal with a huge amount of
>> data..
>>
>> On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:
>>
>>> Hey Flavio,
>>>
>>> IMHO, If most of your app is just doing full table scans then i am not
>>> really sure HBase(or any other NoSql) will be a good fit for your
>>> solution.(building an OLAP system?) If you have point lookups and short
>>> range scans then HBase/Phoenix will work well.
>>> Also, if you wanna do select count(*). The HBase row_counter job will be
>>> much faster than phoenix queries.
>>>
>>> Thanks,
>>> Anil Gupta
>>>
>>> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier >> > wrote:
>>>
 I was able to make it work changing the following params (both on
 server and client side and restarting hbase) and now the query answers in
 about 6 minutes:

 hbase.rpc.timeout (to 60)
 phoenix.query.timeoutMs (to 60)
 hbase.client.scanner.timeout.period (from 1 m to 10m)
 hbase.regionserver.lease.period (from 1 m to 10m)

 However I'd like to know id those performances could be easily improved
 or not. Any ideas?

 On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
 wrote:

> I've the same problem, even after I increased the hbase.rpc.timeout
> the result is same. The difference is that I use 4.12.
>
>
> On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier <
> pomperma...@okkam.it> wrote:
>
>> Hi to all,
>> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase
>> and everything was fine until the data was quite small (about few
>> millions). As I inserted 170 M of rows in my table I cannot get the row
>> count anymore (using ELECT COUNT) because of 
>> org.apache.hbase.ipc.CallTimeoutException
>> (operationTimeout 6 expired).
>> How can I fix this problem? I could increase the hbase.rpc.timeout
>> parameter but I suspect I could improve a little bit the HBase 
>> performance
>> first..the problem is that I don't know how.
>>
>> Thanks in advance,
>> Flavio
>>
>
>

>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Anil Gupta
>>>
>>
>


Re: HBase Timeout on queries

2018-02-05 Thread Pedro Boado
Flavio I get same behaviour, a count(*) over 180M records needs a couple of
minutes to complete for a table with 10 regions and 4 rs serving it.

Why are you evaluating robustness in terms of full scans? As Anil said I
wouldn't expect a NoSQL database to run quick counts on hundreds of
millions or even billions of records.

In terms of usage we have a production  Phoenix cluster with 12 RS serving
a table with ~100 billion records (6TB)  - . Queries always scan by first
column of our primary key, meaning no more than a few thousand records are
pulled in well under a second response time.


On 1 Feb 2018 16:38, "James Taylor"  wrote:

I don’t think the HBase row_counter job is going to be faster than a
count(*) query. Both require a full table scan, so neither will be
particularly fast.

A couple of alternatives if you’re ok with an approximate count: 1) enable
stats collection (but you can leave off usage to parallelize queries) and
the do a SUM over the size column for the table using stats table directly,
or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as
described above) to prevent a full scan.

On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier 
wrote:

> Hi Anil,
> Obviously I'm not using HBase just for the count query..Most of the time I
> do INSERT and selective queries, I was just trying to figure out if my
> HBase + Phoenix installation is robust enough to deal with a huge amount of
> data..
>
> On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:
>
>> Hey Flavio,
>>
>> IMHO, If most of your app is just doing full table scans then i am not
>> really sure HBase(or any other NoSql) will be a good fit for your
>> solution.(building an OLAP system?) If you have point lookups and short
>> range scans then HBase/Phoenix will work well.
>> Also, if you wanna do select count(*). The HBase row_counter job will be
>> much faster than phoenix queries.
>>
>> Thanks,
>> Anil Gupta
>>
>> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier 
>> wrote:
>>
>>> I was able to make it work changing the following params (both on server
>>> and client side and restarting hbase) and now the query answers in about 6
>>> minutes:
>>>
>>> hbase.rpc.timeout (to 60)
>>> phoenix.query.timeoutMs (to 60)
>>> hbase.client.scanner.timeout.period (from 1 m to 10m)
>>> hbase.regionserver.lease.period (from 1 m to 10m)
>>>
>>> However I'd like to know id those performances could be easily improved
>>> or not. Any ideas?
>>>
>>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
>>> wrote:
>>>
 I've the same problem, even after I increased the hbase.rpc.timeout the
 result is same. The difference is that I use 4.12.


 On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier <
 pomperma...@okkam.it> wrote:

> Hi to all,
> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase
> and everything was fine until the data was quite small (about few
> millions). As I inserted 170 M of rows in my table I cannot get the row
> count anymore (using ELECT COUNT) because of 
> org.apache.hbase.ipc.CallTimeoutException
> (operationTimeout 6 expired).
> How can I fix this problem? I could increase the hbase.rpc.timeout
> parameter but I suspect I could improve a little bit the HBase performance
> first..the problem is that I don't know how.
>
> Thanks in advance,
> Flavio
>


>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>


Re: HBase Timeout on queries

2018-02-01 Thread James Taylor
I don’t think the HBase row_counter job is going to be faster than a
count(*) query. Both require a full table scan, so neither will be
particularly fast.

A couple of alternatives if you’re ok with an approximate count: 1) enable
stats collection (but you can leave off usage to parallelize queries) and
the do a SUM over the size column for the table using stats table directly,
or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as
described above) to prevent a full scan.

On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier 
wrote:

> Hi Anil,
> Obviously I'm not using HBase just for the count query..Most of the time I
> do INSERT and selective queries, I was just trying to figure out if my
> HBase + Phoenix installation is robust enough to deal with a huge amount of
> data..
>
> On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:
>
>> Hey Flavio,
>>
>> IMHO, If most of your app is just doing full table scans then i am not
>> really sure HBase(or any other NoSql) will be a good fit for your
>> solution.(building an OLAP system?) If you have point lookups and short
>> range scans then HBase/Phoenix will work well.
>> Also, if you wanna do select count(*). The HBase row_counter job will be
>> much faster than phoenix queries.
>>
>> Thanks,
>> Anil Gupta
>>
>> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier 
>> wrote:
>>
>>> I was able to make it work changing the following params (both on server
>>> and client side and restarting hbase) and now the query answers in about 6
>>> minutes:
>>>
>>> hbase.rpc.timeout (to 60)
>>> phoenix.query.timeoutMs (to 60)
>>> hbase.client.scanner.timeout.period (from 1 m to 10m)
>>> hbase.regionserver.lease.period (from 1 m to 10m)
>>>
>>> However I'd like to know id those performances could be easily improved
>>> or not. Any ideas?
>>>
>>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
>>> wrote:
>>>
 I've the same problem, even after I increased the hbase.rpc.timeout the
 result is same. The difference is that I use 4.12.


 On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier <
 pomperma...@okkam.it> wrote:

> Hi to all,
> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase
> and everything was fine until the data was quite small (about few
> millions). As I inserted 170 M of rows in my table I cannot get the row
> count anymore (using ELECT COUNT) because of
> org.apache.hbase.ipc.CallTimeoutException (operationTimeout 6 
> expired).
> How can I fix this problem? I could increase the hbase.rpc.timeout
> parameter but I suspect I could improve a little bit the HBase performance
> first..the problem is that I don't know how.
>
> Thanks in advance,
> Flavio
>


>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>


Re: HBase Timeout on queries

2018-02-01 Thread Flavio Pompermaier
Hi Anil,
Obviously I'm not using HBase just for the count query..Most of the time I
do INSERT and selective queries, I was just trying to figure out if my
HBase + Phoenix installation is robust enough to deal with a huge amount of
data..

On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:

> Hey Flavio,
>
> IMHO, If most of your app is just doing full table scans then i am not
> really sure HBase(or any other NoSql) will be a good fit for your
> solution.(building an OLAP system?) If you have point lookups and short
> range scans then HBase/Phoenix will work well.
> Also, if you wanna do select count(*). The HBase row_counter job will be
> much faster than phoenix queries.
>
> Thanks,
> Anil Gupta
>
> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier 
> wrote:
>
>> I was able to make it work changing the following params (both on server
>> and client side and restarting hbase) and now the query answers in about 6
>> minutes:
>>
>> hbase.rpc.timeout (to 60)
>> phoenix.query.timeoutMs (to 60)
>> hbase.client.scanner.timeout.period (from 1 m to 10m)
>> hbase.regionserver.lease.period (from 1 m to 10m)
>>
>> However I'd like to know id those performances could be easily improved
>> or not. Any ideas?
>>
>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
>> wrote:
>>
>>> I've the same problem, even after I increased the hbase.rpc.timeout the
>>> result is same. The difference is that I use 4.12.
>>>
>>>
>>> On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier >> > wrote:
>>>
 Hi to all,
 I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
 everything was fine until the data was quite small (about few millions). As
 I inserted 170 M of rows in my table I cannot get the row count anymore
 (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
 (operationTimeout 6 expired).
 How can I fix this problem? I could increase the hbase.rpc.timeout
 parameter but I suspect I could improve a little bit the HBase performance
 first..the problem is that I don't know how.

 Thanks in advance,
 Flavio

>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>


Re: HBase Timeout on queries

2018-02-01 Thread Flavio Pompermaier
I was able to make it work changing the following params (both on server
and client side and restarting hbase) and now the query answers in about 6
minutes:

hbase.rpc.timeout (to 60)
phoenix.query.timeoutMs (to 60)
hbase.client.scanner.timeout.period (from 1 m to 10m)
hbase.regionserver.lease.period (from 1 m to 10m)

However I'd like to know id those performances could be easily improved or
not. Any ideas?

On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha  wrote:

> I've the same problem, even after I increased the hbase.rpc.timeout the
> result is same. The difference is that I use 4.12.
>
>
> On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
>> everything was fine until the data was quite small (about few millions). As
>> I inserted 170 M of rows in my table I cannot get the row count anymore
>> (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
>> (operationTimeout 6 expired).
>> How can I fix this problem? I could increase the hbase.rpc.timeout
>> parameter but I suspect I could improve a little bit the HBase performance
>> first..the problem is that I don't know how.
>>
>> Thanks in advance,
>> Flavio
>>
>
>


Re: HBase Timeout on queries

2018-02-01 Thread Vaghawan Ojha
I've the same problem, even after I increased the hbase.rpc.timeout the
result is same. The difference is that I use 4.12.


On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier 
wrote:

> Hi to all,
> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
> everything was fine until the data was quite small (about few millions). As
> I inserted 170 M of rows in my table I cannot get the row count anymore
> (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
> (operationTimeout 6 expired).
> How can I fix this problem? I could increase the hbase.rpc.timeout
> parameter but I suspect I could improve a little bit the HBase performance
> first..the problem is that I don't know how.
>
> Thanks in advance,
> Flavio
>


HBase Timeout on queries

2018-02-01 Thread Flavio Pompermaier
Hi to all,
I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
everything was fine until the data was quite small (about few millions). As
I inserted 170 M of rows in my table I cannot get the row count anymore
(using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
(operationTimeout 6 expired).
How can I fix this problem? I could increase the hbase.rpc.timeout
parameter but I suspect I could improve a little bit the HBase performance
first..the problem is that I don't know how.

Thanks in advance,
Flavio