Re: HBase Timeout on queries
Hi Pedro, I was query the COUNT just as a first dumb query to test if everything was ok...indeed I had to increase 4 timeouts in order to answer that query without errors. By the way, I think that count is something very useful to know about a table and, IMHO, should be something always available as a table metadata. I don't know why HBase does't care that much about that info... Best, Flavio On Mon, Feb 5, 2018 at 7:10 PM, Pedro Boadowrote: > Flavio I get same behaviour, a count(*) over 180M records needs a couple > of minutes to complete for a table with 10 regions and 4 rs serving it. > > Why are you evaluating robustness in terms of full scans? As Anil said I > wouldn't expect a NoSQL database to run quick counts on hundreds of > millions or even billions of records. > > In terms of usage we have a production Phoenix cluster with 12 RS serving > a table with ~100 billion records (6TB) - . Queries always scan by first > column of our primary key, meaning no more than a few thousand records are > pulled in well under a second response time. > > > On 1 Feb 2018 16:38, "James Taylor" wrote: > > I don’t think the HBase row_counter job is going to be faster than a > count(*) query. Both require a full table scan, so neither will be > particularly fast. > > A couple of alternatives if you’re ok with an approximate count: 1) enable > stats collection (but you can leave off usage to parallelize queries) and > the do a SUM over the size column for the table using stats table directly, > or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as > described above) to prevent a full scan. > > On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier > wrote: > >> Hi Anil, >> Obviously I'm not using HBase just for the count query..Most of the time >> I do INSERT and selective queries, I was just trying to figure out if my >> HBase + Phoenix installation is robust enough to deal with a huge amount of >> data.. >> >> On Thu, Feb 1, 2018 at 5:07 PM, anil gupta wrote: >> >>> Hey Flavio, >>> >>> IMHO, If most of your app is just doing full table scans then i am not >>> really sure HBase(or any other NoSql) will be a good fit for your >>> solution.(building an OLAP system?) If you have point lookups and short >>> range scans then HBase/Phoenix will work well. >>> Also, if you wanna do select count(*). The HBase row_counter job will be >>> much faster than phoenix queries. >>> >>> Thanks, >>> Anil Gupta >>> >>> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier >> > wrote: >>> I was able to make it work changing the following params (both on server and client side and restarting hbase) and now the query answers in about 6 minutes: hbase.rpc.timeout (to 60) phoenix.query.timeoutMs (to 60) hbase.client.scanner.timeout.period (from 1 m to 10m) hbase.regionserver.lease.period (from 1 m to 10m) However I'd like to know id those performances could be easily improved or not. Any ideas? On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha wrote: > I've the same problem, even after I increased the hbase.rpc.timeout > the result is same. The difference is that I use 4.12. > > > On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier < > pomperma...@okkam.it> wrote: > >> Hi to all, >> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase >> and everything was fine until the data was quite small (about few >> millions). As I inserted 170 M of rows in my table I cannot get the row >> count anymore (using ELECT COUNT) because of >> org.apache.hbase.ipc.CallTimeoutException >> (operationTimeout 6 expired). >> How can I fix this problem? I could increase the hbase.rpc.timeout >> parameter but I suspect I could improve a little bit the HBase >> performance >> first..the problem is that I don't know how. >> >> Thanks in advance, >> Flavio >> > > >>> >>> >>> -- >>> Thanks & Regards, >>> Anil Gupta >>> >> >
Re: HBase Timeout on queries
Flavio I get same behaviour, a count(*) over 180M records needs a couple of minutes to complete for a table with 10 regions and 4 rs serving it. Why are you evaluating robustness in terms of full scans? As Anil said I wouldn't expect a NoSQL database to run quick counts on hundreds of millions or even billions of records. In terms of usage we have a production Phoenix cluster with 12 RS serving a table with ~100 billion records (6TB) - . Queries always scan by first column of our primary key, meaning no more than a few thousand records are pulled in well under a second response time. On 1 Feb 2018 16:38, "James Taylor"wrote: I don’t think the HBase row_counter job is going to be faster than a count(*) query. Both require a full table scan, so neither will be particularly fast. A couple of alternatives if you’re ok with an approximate count: 1) enable stats collection (but you can leave off usage to parallelize queries) and the do a SUM over the size column for the table using stats table directly, or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as described above) to prevent a full scan. On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier wrote: > Hi Anil, > Obviously I'm not using HBase just for the count query..Most of the time I > do INSERT and selective queries, I was just trying to figure out if my > HBase + Phoenix installation is robust enough to deal with a huge amount of > data.. > > On Thu, Feb 1, 2018 at 5:07 PM, anil gupta wrote: > >> Hey Flavio, >> >> IMHO, If most of your app is just doing full table scans then i am not >> really sure HBase(or any other NoSql) will be a good fit for your >> solution.(building an OLAP system?) If you have point lookups and short >> range scans then HBase/Phoenix will work well. >> Also, if you wanna do select count(*). The HBase row_counter job will be >> much faster than phoenix queries. >> >> Thanks, >> Anil Gupta >> >> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier >> wrote: >> >>> I was able to make it work changing the following params (both on server >>> and client side and restarting hbase) and now the query answers in about 6 >>> minutes: >>> >>> hbase.rpc.timeout (to 60) >>> phoenix.query.timeoutMs (to 60) >>> hbase.client.scanner.timeout.period (from 1 m to 10m) >>> hbase.regionserver.lease.period (from 1 m to 10m) >>> >>> However I'd like to know id those performances could be easily improved >>> or not. Any ideas? >>> >>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha >>> wrote: >>> I've the same problem, even after I increased the hbase.rpc.timeout the result is same. The difference is that I use 4.12. On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier < pomperma...@okkam.it> wrote: > Hi to all, > I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase > and everything was fine until the data was quite small (about few > millions). As I inserted 170 M of rows in my table I cannot get the row > count anymore (using ELECT COUNT) because of > org.apache.hbase.ipc.CallTimeoutException > (operationTimeout 6 expired). > How can I fix this problem? I could increase the hbase.rpc.timeout > parameter but I suspect I could improve a little bit the HBase performance > first..the problem is that I don't know how. > > Thanks in advance, > Flavio > >>> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> >
Re: HBase Timeout on queries
I don’t think the HBase row_counter job is going to be faster than a count(*) query. Both require a full table scan, so neither will be particularly fast. A couple of alternatives if you’re ok with an approximate count: 1) enable stats collection (but you can leave off usage to parallelize queries) and the do a SUM over the size column for the table using stats table directly, or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as described above) to prevent a full scan. On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaierwrote: > Hi Anil, > Obviously I'm not using HBase just for the count query..Most of the time I > do INSERT and selective queries, I was just trying to figure out if my > HBase + Phoenix installation is robust enough to deal with a huge amount of > data.. > > On Thu, Feb 1, 2018 at 5:07 PM, anil gupta wrote: > >> Hey Flavio, >> >> IMHO, If most of your app is just doing full table scans then i am not >> really sure HBase(or any other NoSql) will be a good fit for your >> solution.(building an OLAP system?) If you have point lookups and short >> range scans then HBase/Phoenix will work well. >> Also, if you wanna do select count(*). The HBase row_counter job will be >> much faster than phoenix queries. >> >> Thanks, >> Anil Gupta >> >> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier >> wrote: >> >>> I was able to make it work changing the following params (both on server >>> and client side and restarting hbase) and now the query answers in about 6 >>> minutes: >>> >>> hbase.rpc.timeout (to 60) >>> phoenix.query.timeoutMs (to 60) >>> hbase.client.scanner.timeout.period (from 1 m to 10m) >>> hbase.regionserver.lease.period (from 1 m to 10m) >>> >>> However I'd like to know id those performances could be easily improved >>> or not. Any ideas? >>> >>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha >>> wrote: >>> I've the same problem, even after I increased the hbase.rpc.timeout the result is same. The difference is that I use 4.12. On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier < pomperma...@okkam.it> wrote: > Hi to all, > I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase > and everything was fine until the data was quite small (about few > millions). As I inserted 170 M of rows in my table I cannot get the row > count anymore (using ELECT COUNT) because of > org.apache.hbase.ipc.CallTimeoutException (operationTimeout 6 > expired). > How can I fix this problem? I could increase the hbase.rpc.timeout > parameter but I suspect I could improve a little bit the HBase performance > first..the problem is that I don't know how. > > Thanks in advance, > Flavio > >>> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> >
Re: HBase Timeout on queries
Hi Anil, Obviously I'm not using HBase just for the count query..Most of the time I do INSERT and selective queries, I was just trying to figure out if my HBase + Phoenix installation is robust enough to deal with a huge amount of data.. On Thu, Feb 1, 2018 at 5:07 PM, anil guptawrote: > Hey Flavio, > > IMHO, If most of your app is just doing full table scans then i am not > really sure HBase(or any other NoSql) will be a good fit for your > solution.(building an OLAP system?) If you have point lookups and short > range scans then HBase/Phoenix will work well. > Also, if you wanna do select count(*). The HBase row_counter job will be > much faster than phoenix queries. > > Thanks, > Anil Gupta > > On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier > wrote: > >> I was able to make it work changing the following params (both on server >> and client side and restarting hbase) and now the query answers in about 6 >> minutes: >> >> hbase.rpc.timeout (to 60) >> phoenix.query.timeoutMs (to 60) >> hbase.client.scanner.timeout.period (from 1 m to 10m) >> hbase.regionserver.lease.period (from 1 m to 10m) >> >> However I'd like to know id those performances could be easily improved >> or not. Any ideas? >> >> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha >> wrote: >> >>> I've the same problem, even after I increased the hbase.rpc.timeout the >>> result is same. The difference is that I use 4.12. >>> >>> >>> On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier >> > wrote: >>> Hi to all, I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and everything was fine until the data was quite small (about few millions). As I inserted 170 M of rows in my table I cannot get the row count anymore (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException (operationTimeout 6 expired). How can I fix this problem? I could increase the hbase.rpc.timeout parameter but I suspect I could improve a little bit the HBase performance first..the problem is that I don't know how. Thanks in advance, Flavio >>> >>> >> > > > -- > Thanks & Regards, > Anil Gupta >
Re: HBase Timeout on queries
I was able to make it work changing the following params (both on server and client side and restarting hbase) and now the query answers in about 6 minutes: hbase.rpc.timeout (to 60) phoenix.query.timeoutMs (to 60) hbase.client.scanner.timeout.period (from 1 m to 10m) hbase.regionserver.lease.period (from 1 m to 10m) However I'd like to know id those performances could be easily improved or not. Any ideas? On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojhawrote: > I've the same problem, even after I increased the hbase.rpc.timeout the > result is same. The difference is that I use 4.12. > > > On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier > wrote: > >> Hi to all, >> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and >> everything was fine until the data was quite small (about few millions). As >> I inserted 170 M of rows in my table I cannot get the row count anymore >> (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException >> (operationTimeout 6 expired). >> How can I fix this problem? I could increase the hbase.rpc.timeout >> parameter but I suspect I could improve a little bit the HBase performance >> first..the problem is that I don't know how. >> >> Thanks in advance, >> Flavio >> > >
Re: HBase Timeout on queries
I've the same problem, even after I increased the hbase.rpc.timeout the result is same. The difference is that I use 4.12. On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaierwrote: > Hi to all, > I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and > everything was fine until the data was quite small (about few millions). As > I inserted 170 M of rows in my table I cannot get the row count anymore > (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException > (operationTimeout 6 expired). > How can I fix this problem? I could increase the hbase.rpc.timeout > parameter but I suspect I could improve a little bit the HBase performance > first..the problem is that I don't know how. > > Thanks in advance, > Flavio >
HBase Timeout on queries
Hi to all, I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and everything was fine until the data was quite small (about few millions). As I inserted 170 M of rows in my table I cannot get the row count anymore (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException (operationTimeout 6 expired). How can I fix this problem? I could increase the hbase.rpc.timeout parameter but I suspect I could improve a little bit the HBase performance first..the problem is that I don't know how. Thanks in advance, Flavio