Apache Phoenix + Solr integration?

2018-02-01 Thread Pedro Boado
Hi all,

Do you know of any integration approach to stream documents from Phoenix to
Solr in a similar way to what Lily HBase Indexer does?

Thanks!


Re: High CPU usage on Hbase region Server with GlobalMemoryManager warnings

2018-02-01 Thread Sergey Soldatov
that kind of messages may happen when there were queries that utilize
memory manager (usually joins and group by) and they were timed out or
failed due to some reason. So the message itself is hardly related to CPU
usage or GC.
BUT. That may mean that your region servers are unable to handle properly
such kind of workload.
Since you say that this issue started after Yarn work I would suggest
checking swappiness and huge pages (there are quite a lot of resources over
the Internet how they affect HBase). It might be the case when you just run
out of HW resources.

Thanks,
Sergey

On Wed, Jan 31, 2018 at 6:40 PM, Jins George  wrote:

> Hi,
>
> On analyzing a prod issue of High CPU usage on Hbase Region server, I came
> across warning messages from region server logs complaining about Orphaned
> chunk of memory.
>
> 2018-01-30 19:16:31,565 WARN org.apache.phoenix.memory.GlobalMemoryManager: 
> Orphaned chunk of 104000 bytes found during finalize
> 2018-01-30 19:16:31,565 WARN org.apache.phoenix.memory.GlobalMemoryManager: 
> Orphaned chunk of 104000 bytes found during finalize
>
>
> The high CPU usage looks like due to garbage collection and it lasted for
> almost 6 hours.  And throughout 6 hours, region server logs had these
> warning messages logged.
>
> Cluster Details:
> 4 node( 1 master + 3 slaves)  cdh cluster
> Hbase version 1.2
> Phoenix version 4.7
> Region Server Heap : 4G
> Total Regions: ~135
> Total tables : ~35
>
> Out of 3 region servers, 2 of them had the warning logs and both suffered
> high CPU. Third region server nither had High CPU nor the warning logs. Any
> idea why these messages are logged and can that trigger continuous GC ?
>
> Before this issue started( or around the same time) huge application log
> files were copied to HDFS by Yarn.. But can't think of that causing issue
> on Hbase Region  server.
>
> Any help is appreciated.
>
> Thanks,
> Jins George
>


Re: HBase Timeout on queries

2018-02-01 Thread James Taylor
I don’t think the HBase row_counter job is going to be faster than a
count(*) query. Both require a full table scan, so neither will be
particularly fast.

A couple of alternatives if you’re ok with an approximate count: 1) enable
stats collection (but you can leave off usage to parallelize queries) and
the do a SUM over the size column for the table using stats table directly,
or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as
described above) to prevent a full scan.

On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier 
wrote:

> Hi Anil,
> Obviously I'm not using HBase just for the count query..Most of the time I
> do INSERT and selective queries, I was just trying to figure out if my
> HBase + Phoenix installation is robust enough to deal with a huge amount of
> data..
>
> On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:
>
>> Hey Flavio,
>>
>> IMHO, If most of your app is just doing full table scans then i am not
>> really sure HBase(or any other NoSql) will be a good fit for your
>> solution.(building an OLAP system?) If you have point lookups and short
>> range scans then HBase/Phoenix will work well.
>> Also, if you wanna do select count(*). The HBase row_counter job will be
>> much faster than phoenix queries.
>>
>> Thanks,
>> Anil Gupta
>>
>> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier 
>> wrote:
>>
>>> I was able to make it work changing the following params (both on server
>>> and client side and restarting hbase) and now the query answers in about 6
>>> minutes:
>>>
>>> hbase.rpc.timeout (to 60)
>>> phoenix.query.timeoutMs (to 60)
>>> hbase.client.scanner.timeout.period (from 1 m to 10m)
>>> hbase.regionserver.lease.period (from 1 m to 10m)
>>>
>>> However I'd like to know id those performances could be easily improved
>>> or not. Any ideas?
>>>
>>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
>>> wrote:
>>>
 I've the same problem, even after I increased the hbase.rpc.timeout the
 result is same. The difference is that I use 4.12.


 On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier <
 pomperma...@okkam.it> wrote:

> Hi to all,
> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase
> and everything was fine until the data was quite small (about few
> millions). As I inserted 170 M of rows in my table I cannot get the row
> count anymore (using ELECT COUNT) because of
> org.apache.hbase.ipc.CallTimeoutException (operationTimeout 6 
> expired).
> How can I fix this problem? I could increase the hbase.rpc.timeout
> parameter but I suspect I could improve a little bit the HBase performance
> first..the problem is that I don't know how.
>
> Thanks in advance,
> Flavio
>


>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>


Re: HBase Timeout on queries

2018-02-01 Thread Flavio Pompermaier
Hi Anil,
Obviously I'm not using HBase just for the count query..Most of the time I
do INSERT and selective queries, I was just trying to figure out if my
HBase + Phoenix installation is robust enough to deal with a huge amount of
data..

On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:

> Hey Flavio,
>
> IMHO, If most of your app is just doing full table scans then i am not
> really sure HBase(or any other NoSql) will be a good fit for your
> solution.(building an OLAP system?) If you have point lookups and short
> range scans then HBase/Phoenix will work well.
> Also, if you wanna do select count(*). The HBase row_counter job will be
> much faster than phoenix queries.
>
> Thanks,
> Anil Gupta
>
> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier 
> wrote:
>
>> I was able to make it work changing the following params (both on server
>> and client side and restarting hbase) and now the query answers in about 6
>> minutes:
>>
>> hbase.rpc.timeout (to 60)
>> phoenix.query.timeoutMs (to 60)
>> hbase.client.scanner.timeout.period (from 1 m to 10m)
>> hbase.regionserver.lease.period (from 1 m to 10m)
>>
>> However I'd like to know id those performances could be easily improved
>> or not. Any ideas?
>>
>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
>> wrote:
>>
>>> I've the same problem, even after I increased the hbase.rpc.timeout the
>>> result is same. The difference is that I use 4.12.
>>>
>>>
>>> On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier >> > wrote:
>>>
 Hi to all,
 I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
 everything was fine until the data was quite small (about few millions). As
 I inserted 170 M of rows in my table I cannot get the row count anymore
 (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
 (operationTimeout 6 expired).
 How can I fix this problem? I could increase the hbase.rpc.timeout
 parameter but I suspect I could improve a little bit the HBase performance
 first..the problem is that I don't know how.

 Thanks in advance,
 Flavio

>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>


Re: HBase Timeout on queries

2018-02-01 Thread Flavio Pompermaier
I was able to make it work changing the following params (both on server
and client side and restarting hbase) and now the query answers in about 6
minutes:

hbase.rpc.timeout (to 60)
phoenix.query.timeoutMs (to 60)
hbase.client.scanner.timeout.period (from 1 m to 10m)
hbase.regionserver.lease.period (from 1 m to 10m)

However I'd like to know id those performances could be easily improved or
not. Any ideas?

On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha  wrote:

> I've the same problem, even after I increased the hbase.rpc.timeout the
> result is same. The difference is that I use 4.12.
>
>
> On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
>> everything was fine until the data was quite small (about few millions). As
>> I inserted 170 M of rows in my table I cannot get the row count anymore
>> (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
>> (operationTimeout 6 expired).
>> How can I fix this problem? I could increase the hbase.rpc.timeout
>> parameter but I suspect I could improve a little bit the HBase performance
>> first..the problem is that I don't know how.
>>
>> Thanks in advance,
>> Flavio
>>
>
>


Re: HBase Timeout on queries

2018-02-01 Thread Vaghawan Ojha
I've the same problem, even after I increased the hbase.rpc.timeout the
result is same. The difference is that I use 4.12.


On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier 
wrote:

> Hi to all,
> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
> everything was fine until the data was quite small (about few millions). As
> I inserted 170 M of rows in my table I cannot get the row count anymore
> (using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
> (operationTimeout 6 expired).
> How can I fix this problem? I could increase the hbase.rpc.timeout
> parameter but I suspect I could improve a little bit the HBase performance
> first..the problem is that I don't know how.
>
> Thanks in advance,
> Flavio
>


HBase Timeout on queries

2018-02-01 Thread Flavio Pompermaier
Hi to all,
I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase and
everything was fine until the data was quite small (about few millions). As
I inserted 170 M of rows in my table I cannot get the row count anymore
(using ELECT COUNT) because of org.apache.hbase.ipc.CallTimeoutException
(operationTimeout 6 expired).
How can I fix this problem? I could increase the hbase.rpc.timeout
parameter but I suspect I could improve a little bit the HBase performance
first..the problem is that I don't know how.

Thanks in advance,
Flavio