Re: How to tell when an insertion has "finished"

2016-07-28 Thread Heather, James (ELS)
I don't really know enough about the low level details to know which 
replication I was referring to...

Let me ask the higher level question:

1. Am I right in thinking that after you insert a large number of rows, the 
performance of the cluster (and maybe of those rows in particular) will be 
initially slow while some stuff is still happening at a lower level in the 
background?

2. If so, how do you tell when that stuff has finished, and when your query 
performance will reach a steady state?

James

On 29 July 2016 12:05:30 a.m. James Taylor  wrote:

That's a good point, Mujtaba. Not sure which replication he meant either.

On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohan 
> wrote:
Oh sorry I thought OP was referring to HDFS level replication.

On Thu, Jul 28, 2016 at 3:48 PM, James Taylor 
> wrote:
I believe you can also measure the depth of the replication queue to know 
what's pending. HBase replication is asynchronous, so you're right that Phoenix 
would return while replication may still be occurring.

On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan 
> wrote:
Query running first time would be slower since data is not in HBase cache 
rather than things being not settled. Replication shouldn't be putting load on 
cluster which you can check by turning replication off. On HBase side to force 
things to be optimal before running perf queries is to do a major compaction 
and wait for compaction to complete.

- mujtaba

On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) 
> wrote:

If you upsert lots of rows into a table, presumably Phoenix will return as soon 
as HBase has received the data, but before the data has been replicated?


Is there a way to tell when everything has "settled", i.e., when everything has 
finished replicating or whatever it needs to do?


The reason I ask is that this might affect our benchmarking. If we add lots of 
rows, and then run some sample queries straight away, they might return more 
slowly initially, if the replication is still taking place.


(Does this make sense? I'm not completely clear on how HBase replication works 
anyway.)


James



Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.







Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.


Re: How to tell when an insertion has "finished"

2016-07-28 Thread James Taylor
That's a good point, Mujtaba. Not sure which replication he meant either.

On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohan  wrote:

> Oh sorry I thought OP was referring to HDFS level replication.
>
> On Thu, Jul 28, 2016 at 3:48 PM, James Taylor 
> wrote:
>
>> I believe you can also measure the depth of the replication queue to know
>> what's pending. HBase replication is asynchronous, so you're right that
>> Phoenix would return while replication may still be occurring.
>>
>> On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan 
>> wrote:
>>
>>> Query running first time would be slower since data is not in HBase
>>> cache rather than things being not settled. Replication shouldn't be
>>> putting load on cluster which you can check by turning replication off. On
>>> HBase side to force things to be optimal before running perf queries is to
>>> do a major compaction and wait for compaction to complete.
>>>
>>> - mujtaba
>>>
>>> On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) <
>>> james.heat...@elsevier.com> wrote:
>>>
 If you upsert lots of rows into a table, presumably Phoenix will return
 as soon as HBase has received the data, but before the data has been
 replicated?


 Is there a way to tell when everything has "settled", i.e., when
 everything has finished replicating or whatever it needs to do?


 The reason I ask is that this might affect our benchmarking. If we add
 lots of rows, and then run some sample queries straight away, they might
 return more slowly initially, if the replication is still taking place.


 (Does this make sense? I'm not completely clear on how HBase
 replication works anyway.)


 James

 --

 Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
 Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
 Registered in England and Wales.

>>>
>>>
>>
>


Re: How to tell when an insertion has "finished"

2016-07-28 Thread Mujtaba Chohan
Oh sorry I thought OP was referring to HDFS level replication.

On Thu, Jul 28, 2016 at 3:48 PM, James Taylor 
wrote:

> I believe you can also measure the depth of the replication queue to know
> what's pending. HBase replication is asynchronous, so you're right that
> Phoenix would return while replication may still be occurring.
>
> On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan 
> wrote:
>
>> Query running first time would be slower since data is not in HBase cache
>> rather than things being not settled. Replication shouldn't be putting load
>> on cluster which you can check by turning replication off. On HBase side to
>> force things to be optimal before running perf queries is to do a major
>> compaction and wait for compaction to complete.
>>
>> - mujtaba
>>
>> On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) <
>> james.heat...@elsevier.com> wrote:
>>
>>> If you upsert lots of rows into a table, presumably Phoenix will return
>>> as soon as HBase has received the data, but before the data has been
>>> replicated?
>>>
>>>
>>> Is there a way to tell when everything has "settled", i.e., when
>>> everything has finished replicating or whatever it needs to do?
>>>
>>>
>>> The reason I ask is that this might affect our benchmarking. If we add
>>> lots of rows, and then run some sample queries straight away, they might
>>> return more slowly initially, if the replication is still taking place.
>>>
>>>
>>> (Does this make sense? I'm not completely clear on how HBase replication
>>> works anyway.)
>>>
>>>
>>> James
>>>
>>> --
>>>
>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>> Registered in England and Wales.
>>>
>>
>>
>


Re: How to tell when an insertion has "finished"

2016-07-28 Thread James Taylor
I believe you can also measure the depth of the replication queue to know
what's pending. HBase replication is asynchronous, so you're right that
Phoenix would return while replication may still be occurring.

On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan  wrote:

> Query running first time would be slower since data is not in HBase cache
> rather than things being not settled. Replication shouldn't be putting load
> on cluster which you can check by turning replication off. On HBase side to
> force things to be optimal before running perf queries is to do a major
> compaction and wait for compaction to complete.
>
> - mujtaba
>
> On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) <
> james.heat...@elsevier.com> wrote:
>
>> If you upsert lots of rows into a table, presumably Phoenix will return
>> as soon as HBase has received the data, but before the data has been
>> replicated?
>>
>>
>> Is there a way to tell when everything has "settled", i.e., when
>> everything has finished replicating or whatever it needs to do?
>>
>>
>> The reason I ask is that this might affect our benchmarking. If we add
>> lots of rows, and then run some sample queries straight away, they might
>> return more slowly initially, if the replication is still taking place.
>>
>>
>> (Does this make sense? I'm not completely clear on how HBase replication
>> works anyway.)
>>
>>
>> James
>>
>> --
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>


Re: How to tell when an insertion has "finished"

2016-07-28 Thread Mujtaba Chohan
Query running first time would be slower since data is not in HBase cache
rather than things being not settled. Replication shouldn't be putting load
on cluster which you can check by turning replication off. On HBase side to
force things to be optimal before running perf queries is to do a major
compaction and wait for compaction to complete.

- mujtaba

On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) <
james.heat...@elsevier.com> wrote:

> If you upsert lots of rows into a table, presumably Phoenix will return as
> soon as HBase has received the data, but before the data has been
> replicated?
>
>
> Is there a way to tell when everything has "settled", i.e., when
> everything has finished replicating or whatever it needs to do?
>
>
> The reason I ask is that this might affect our benchmarking. If we add
> lots of rows, and then run some sample queries straight away, they might
> return more slowly initially, if the replication is still taking place.
>
>
> (Does this make sense? I'm not completely clear on how HBase replication
> works anyway.)
>
>
> James
>
> --
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>