Re: How to tell when an insertion has "finished"
I don't really know enough about the low level details to know which replication I was referring to... Let me ask the higher level question: 1. Am I right in thinking that after you insert a large number of rows, the performance of the cluster (and maybe of those rows in particular) will be initially slow while some stuff is still happening at a lower level in the background? 2. If so, how do you tell when that stuff has finished, and when your query performance will reach a steady state? James On 29 July 2016 12:05:30 a.m. James Taylorwrote: That's a good point, Mujtaba. Not sure which replication he meant either. On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohan > wrote: Oh sorry I thought OP was referring to HDFS level replication. On Thu, Jul 28, 2016 at 3:48 PM, James Taylor > wrote: I believe you can also measure the depth of the replication queue to know what's pending. HBase replication is asynchronous, so you're right that Phoenix would return while replication may still be occurring. On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan > wrote: Query running first time would be slower since data is not in HBase cache rather than things being not settled. Replication shouldn't be putting load on cluster which you can check by turning replication off. On HBase side to force things to be optimal before running perf queries is to do a major compaction and wait for compaction to complete. - mujtaba On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) > wrote: If you upsert lots of rows into a table, presumably Phoenix will return as soon as HBase has received the data, but before the data has been replicated? Is there a way to tell when everything has "settled", i.e., when everything has finished replicating or whatever it needs to do? The reason I ask is that this might affect our benchmarking. If we add lots of rows, and then run some sample queries straight away, they might return more slowly initially, if the replication is still taking place. (Does this make sense? I'm not completely clear on how HBase replication works anyway.) James Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.
Re: How to tell when an insertion has "finished"
That's a good point, Mujtaba. Not sure which replication he meant either. On Thu, Jul 28, 2016 at 4:02 PM, Mujtaba Chohanwrote: > Oh sorry I thought OP was referring to HDFS level replication. > > On Thu, Jul 28, 2016 at 3:48 PM, James Taylor > wrote: > >> I believe you can also measure the depth of the replication queue to know >> what's pending. HBase replication is asynchronous, so you're right that >> Phoenix would return while replication may still be occurring. >> >> On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan >> wrote: >> >>> Query running first time would be slower since data is not in HBase >>> cache rather than things being not settled. Replication shouldn't be >>> putting load on cluster which you can check by turning replication off. On >>> HBase side to force things to be optimal before running perf queries is to >>> do a major compaction and wait for compaction to complete. >>> >>> - mujtaba >>> >>> On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) < >>> james.heat...@elsevier.com> wrote: >>> If you upsert lots of rows into a table, presumably Phoenix will return as soon as HBase has received the data, but before the data has been replicated? Is there a way to tell when everything has "settled", i.e., when everything has finished replicating or whatever it needs to do? The reason I ask is that this might affect our benchmarking. If we add lots of rows, and then run some sample queries straight away, they might return more slowly initially, if the replication is still taking place. (Does this make sense? I'm not completely clear on how HBase replication works anyway.) James -- Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales. >>> >>> >> >
Re: How to tell when an insertion has "finished"
Oh sorry I thought OP was referring to HDFS level replication. On Thu, Jul 28, 2016 at 3:48 PM, James Taylorwrote: > I believe you can also measure the depth of the replication queue to know > what's pending. HBase replication is asynchronous, so you're right that > Phoenix would return while replication may still be occurring. > > On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohan > wrote: > >> Query running first time would be slower since data is not in HBase cache >> rather than things being not settled. Replication shouldn't be putting load >> on cluster which you can check by turning replication off. On HBase side to >> force things to be optimal before running perf queries is to do a major >> compaction and wait for compaction to complete. >> >> - mujtaba >> >> On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) < >> james.heat...@elsevier.com> wrote: >> >>> If you upsert lots of rows into a table, presumably Phoenix will return >>> as soon as HBase has received the data, but before the data has been >>> replicated? >>> >>> >>> Is there a way to tell when everything has "settled", i.e., when >>> everything has finished replicating or whatever it needs to do? >>> >>> >>> The reason I ask is that this might affect our benchmarking. If we add >>> lots of rows, and then run some sample queries straight away, they might >>> return more slowly initially, if the replication is still taking place. >>> >>> >>> (Does this make sense? I'm not completely clear on how HBase replication >>> works anyway.) >>> >>> >>> James >>> >>> -- >>> >>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane, >>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, >>> Registered in England and Wales. >>> >> >> >
Re: How to tell when an insertion has "finished"
I believe you can also measure the depth of the replication queue to know what's pending. HBase replication is asynchronous, so you're right that Phoenix would return while replication may still be occurring. On Thu, Jul 28, 2016 at 12:06 PM, Mujtaba Chohanwrote: > Query running first time would be slower since data is not in HBase cache > rather than things being not settled. Replication shouldn't be putting load > on cluster which you can check by turning replication off. On HBase side to > force things to be optimal before running perf queries is to do a major > compaction and wait for compaction to complete. > > - mujtaba > > On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) < > james.heat...@elsevier.com> wrote: > >> If you upsert lots of rows into a table, presumably Phoenix will return >> as soon as HBase has received the data, but before the data has been >> replicated? >> >> >> Is there a way to tell when everything has "settled", i.e., when >> everything has finished replicating or whatever it needs to do? >> >> >> The reason I ask is that this might affect our benchmarking. If we add >> lots of rows, and then run some sample queries straight away, they might >> return more slowly initially, if the replication is still taking place. >> >> >> (Does this make sense? I'm not completely clear on how HBase replication >> works anyway.) >> >> >> James >> >> -- >> >> Elsevier Limited. Registered Office: The Boulevard, Langford Lane, >> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, >> Registered in England and Wales. >> > >
Re: How to tell when an insertion has "finished"
Query running first time would be slower since data is not in HBase cache rather than things being not settled. Replication shouldn't be putting load on cluster which you can check by turning replication off. On HBase side to force things to be optimal before running perf queries is to do a major compaction and wait for compaction to complete. - mujtaba On Thu, Jul 28, 2016 at 8:09 AM, Heather, James (ELS) < james.heat...@elsevier.com> wrote: > If you upsert lots of rows into a table, presumably Phoenix will return as > soon as HBase has received the data, but before the data has been > replicated? > > > Is there a way to tell when everything has "settled", i.e., when > everything has finished replicating or whatever it needs to do? > > > The reason I ask is that this might affect our benchmarking. If we add > lots of rows, and then run some sample queries straight away, they might > return more slowly initially, if the replication is still taking place. > > > (Does this make sense? I'm not completely clear on how HBase replication > works anyway.) > > > James > > -- > > Elsevier Limited. Registered Office: The Boulevard, Langford Lane, > Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, > Registered in England and Wales. >