[jira] [Commented] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372460#comment-16372460 ] Vikas Vishwakarma commented on PHOENIX-4625: Tested locally , attached the GC graphs with the issue (GC_Leak.png) and after fix (GC_After_fix.png) > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Fix For: 4.14.0 > > Attachments: GC_After_fix.png, GC_Leak.png, PHOENIX-4625.patch, QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Attachment: GC_Leak.png > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Fix For: 4.14.0 > > Attachments: GC_After_fix.png, GC_Leak.png, PHOENIX-4625.patch, QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Attachment: GC_After_fix.png > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Fix For: 4.14.0 > > Attachments: GC_After_fix.png, GC_Leak.png, PHOENIX-4625.patch, QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372438#comment-16372438 ] Vikas Vishwakarma commented on PHOENIX-4625: [~jamestaylor] please review just replaced this in PhoenixConnection services.supportsFeature(Feature.RENEW_LEASE) with services.isRenewingLeasesEnabled() > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Fix For: 4.14.0 > > Attachments: PHOENIX-4625.patch, QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Attachment: PHOENIX-4625.patch > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Attachments: PHOENIX-4625.patch, QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Affects Version/s: (was: 4.13.0) 4.14.0 > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Attachments: QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Description: We have two different code path # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the following checks if renew lease feature is supported and if the renew lease config is enabled supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled # In PhoenixConnection for every scan iterator is added to a Queue for lease renewal based on just the check if the renew lease feature is supported services.supportsFeature(Feature.RENEW_LEASE) In PhoenixConnection we however miss the check whether renew lease config is enabled (phoenix.scanner.lease.renew.enabled) Now consider a situation where Renew lease feature is supported but phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In this case PhoenixConnection will keep adding the iterators for every scan into the scannerQueue for renewal based on the feature supported check but the renewal task is not running because phoenix.scanner.lease.renew.enabled is set to false, so the scannerQueue will keep growing as long as the PhoenixConnection is alive and multiple scans requests are coming on this connection. We have a use case that uses a single PhoenixConnection that is perpetual and does billions of scans on this connection. In this case scannerQueue is growing to several GB's and ultimately leading to Consecutive Full GC's/OOM Add iterators for Lease renewal in PhoenixConnection = {code:java} public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { if (services.supportsFeature(Feature.RENEW_LEASE)) { checkNotNull(itr); scannerQueue.add(new WeakReference(itr)); } } {code} Starting the RenewLeaseTask = checks if Feature.RENEW_LEASE is supported and if phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask {code:java} ConnectionQueryServicesImpl { this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, DEFAULT_RENEW_LEASE_ENABLED); . @Override public boolean isRenewingLeasesEnabled(){ return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled; } private void scheduleRenewLeaseTasks() { if (isRenewingLeasesEnabled()) { renewLeaseExecutor = Executors.newScheduledThreadPool(renewLeasePoolSize, renewLeaseThreadFactory); for (LinkedBlockingQueueq : connectionQueues) { renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); } } } ... } {code} To solve this We must add both checks in PhoenixConnection if the feature is supported and if the config is enabled before adding the iterators to scannerQueue ConnectionQueryServices.Feature.RENEW_LEASE is true && phoenix.scanner.lease.renew.enabled is true instead of just checking if the feature ConnectionQueryServices.Feature.RENEW_LEASE is supported was: We have two different code path # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the following checks if renew lease feature is supported and if the renew lease config is enabled supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled # In PhoenixConnection for every scan iterator is added to a Queue for lease renewal based on just the check if the renew lease feature is supported services.supportsFeature(Feature.RENEW_LEASE) In PhoenixConnection we however miss the check whether renew lease config is enabled (phoenix.scanner.lease.renew.enabled) Now consider a situation where Renew lease feature is supported but phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In this case PhoenixConnection will keep adding the iterators for every scan into the scannerQueue for renewal based on the feature supported check but the renewal task is not running because phoenix.scanner.lease.renew.enabled is set to false, so the scannerQueue will keep growing as long as the PhoenixConnection is alive and multiple scans requests are coming on this connection. We have a use case that uses a single PhoenixConnection that is perpetual and does billions of scans on this connection. In this case scannerQueue is growing to several GB's and ultimately leading to Consecutive Full GC's/OOM Add iterators for Lease renewal in PhoenixConnection = {code} public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { if (services.supportsFeature(Feature.RENEW_LEASE)) { checkNotNull(itr); scannerQueue.add(new WeakReference(itr)); } } {code} Starting the RenewLeaseTask = checks if
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Description: We have two different code path # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the following checks if renew lease feature is supported and if the renew lease config is enabled supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled # In PhoenixConnection for every scan iterator is added to a Queue for lease renewal based on just the check if the renew lease feature is supported services.supportsFeature(Feature.RENEW_LEASE) In PhoenixConnection we however miss the check whether renew lease config is enabled (phoenix.scanner.lease.renew.enabled) Now consider a situation where Renew lease feature is supported but phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In this case PhoenixConnection will keep adding the iterators for every scan into the scannerQueue for renewal based on the feature supported check but the renewal task is not running because phoenix.scanner.lease.renew.enabled is set to false, so the scannerQueue will keep growing as long as the PhoenixConnection is alive and multiple scans requests are coming on this connection. We have a use case that uses a single PhoenixConnection that is perpetual and does billions of scans on this connection. In this case scannerQueue is growing to several GB's and ultimately leading to Consecutive Full GC's/OOM Add iterators for Lease renewal in PhoenixConnection = {code} public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { if (services.supportsFeature(Feature.RENEW_LEASE)) { checkNotNull(itr); scannerQueue.add(new WeakReference(itr)); } } {code} Starting the RenewLeaseTask = checks if Feature.RENEW_LEASE is supported and if phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask {code} ConnectionQueryServicesImpl { this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, DEFAULT_RENEW_LEASE_ENABLED); . @Override public boolean isRenewingLeasesEnabled() { return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled; } private void scheduleRenewLeaseTasks() { if (isRenewingLeasesEnabled()) { renewLeaseExecutor = Executors.newScheduledThreadPool(renewLeasePoolSize, renewLeaseThreadFactory); for (LinkedBlockingQueueq : connectionQueues) { renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); } } } ... } {code} To solve this We must add both checks in PhoenixConnection if the feature is supported and if the config is enabled before adding the iterators to scannerQueue ConnectionQueryServices.Feature.RENEW_LEASE is true && phoenix.scanner.lease.renew.enabled is true instead of just checking if the feature ConnectionQueryServices.Feature.RENEW_LEASE is supported was: We have two different code path # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the following checks if renew lease feature is supported and if the renew lease config is enabled supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled # In PhoenixConnection for every scan iterator is added to a Queue for lease renewal based on just the check if the renew lease feature is supported services.supportsFeature(Feature.RENEW_LEASE) In PhoenixConnection we however miss the check whether renew lease config is enabled (phoenix.scanner.lease.renew.enabled) Now consider a situation where Renew lease feature is supported but phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In this case PhoenixConnection will keep adding the iterators for every scan into the scannerQueue for renewal based on the feature supported check but the renewal task is not running because phoenix.scanner.lease.renew.enabled is set to false, so the scannerQueue will keep growing as long as the PhoenixConnection is alive and multiple scans requests are coming on this connection. We have a use case that uses a single PhoenixConnection that is perpetual and does billions of scans on this connection. In this case scannerQueue is growing to several GB's and ultimately leading to Consecutive Full GC's/OOM Add iterators for Lease renewal in PhoenixConnection = public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { if (services.supportsFeature(Feature.RENEW_LEASE)) { checkNotNull(itr); scannerQueue.add(new WeakReference(itr)); } } Starting the RenewLeaseTask = checks if Feature.RENEW_LEASE is supported and if
[jira] [Updated] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4625: --- Attachment: QS.png > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Vikas Vishwakarma >Priority: Major > Attachments: QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) { > checkNotNull(itr); > scannerQueue.add(new WeakReference(itr)); > } > } > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled() { > return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { > renewLeaseExecutor = > Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); > for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); > } > } > } > ... > } > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372409#comment-16372409 ] Vikas Vishwakarma commented on PHOENIX-4625: [~larsh] [~giacomotaylor] this is one of the memory leak case observed with QueryServer > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Vikas Vishwakarma >Priority: Major > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) { > checkNotNull(itr); > scannerQueue.add(new WeakReference(itr)); > } > } > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled() { > return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { > renewLeaseExecutor = > Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); > for (LinkedBlockingQueueq : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); > } > } > } > ... > } > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
Vikas Vishwakarma created PHOENIX-4625: -- Summary: memory leak in PhoenixConnection if scanner renew lease thread is not enabled Key: PHOENIX-4625 URL: https://issues.apache.org/jira/browse/PHOENIX-4625 Project: Phoenix Issue Type: Bug Affects Versions: 4.13.0 Reporter: Vikas Vishwakarma We have two different code path # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the following checks if renew lease feature is supported and if the renew lease config is enabled supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled # In PhoenixConnection for every scan iterator is added to a Queue for lease renewal based on just the check if the renew lease feature is supported services.supportsFeature(Feature.RENEW_LEASE) In PhoenixConnection we however miss the check whether renew lease config is enabled (phoenix.scanner.lease.renew.enabled) Now consider a situation where Renew lease feature is supported but phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In this case PhoenixConnection will keep adding the iterators for every scan into the scannerQueue for renewal based on the feature supported check but the renewal task is not running because phoenix.scanner.lease.renew.enabled is set to false, so the scannerQueue will keep growing as long as the PhoenixConnection is alive and multiple scans requests are coming on this connection. We have a use case that uses a single PhoenixConnection that is perpetual and does billions of scans on this connection. In this case scannerQueue is growing to several GB's and ultimately leading to Consecutive Full GC's/OOM Add iterators for Lease renewal in PhoenixConnection = public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { if (services.supportsFeature(Feature.RENEW_LEASE)) { checkNotNull(itr); scannerQueue.add(new WeakReference(itr)); } } Starting the RenewLeaseTask = checks if Feature.RENEW_LEASE is supported and if phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask ConnectionQueryServicesImpl { this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, DEFAULT_RENEW_LEASE_ENABLED); . @Override public boolean isRenewingLeasesEnabled() { return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && renewLeaseEnabled; } private void scheduleRenewLeaseTasks() { if (isRenewingLeasesEnabled()) { renewLeaseExecutor = Executors.newScheduledThreadPool(renewLeasePoolSize, renewLeaseThreadFactory); for (LinkedBlockingQueueq : connectionQueues) { renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); } } } ... } To solve this We must add both checks in PhoenixConnection if the feature is supported and if the config is enabled before adding the iterators to scannerQueue ConnectionQueryServices.Feature.RENEW_LEASE is true && phoenix.scanner.lease.renew.enabled is true instead of just checking if the feature ConnectionQueryServices.Feature.RENEW_LEASE is supported -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4044) Orphaned chunk bytes found warnings during finalize in GlobalMemoryManager
[ https://issues.apache.org/jira/browse/PHOENIX-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4044: --- Summary: Orphaned chunk bytes found warnings during finalize in GlobalMemoryManager (was: Orphaned chunk bytes found during finalize in ) > Orphaned chunk bytes found warnings during finalize in GlobalMemoryManager > -- > > Key: PHOENIX-4044 > URL: https://issues.apache.org/jira/browse/PHOENIX-4044 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vikas Vishwakarma >Priority: Minor > > Observing these WARNINGS in the RegionServer logs similar to what is > discussed PHOENIX-1011 > 2017-07-20 10:10:00,421 WARN [Finalizer] memory.GlobalMemoryManager - > Orphaned chunk of 1565 bytes found during finalize > 2017-07-20 10:10:00,421 WARN [Finalizer] memory.GlobalMemoryManager - > Orphaned chunk of 1565 bytes found during finalize > 2017-07-20 10:10:21,153 WARN [Finalizer] memory.GlobalMemoryManager - > Orphaned chunk of 1565 bytes found during finalize > I am not able to identify the exact sequence of events that lead to these > warnings but the scenario we were testing is as follows: > We were trying to simulate Index update failures for one of our tests to > check index rebuild scenario > For this we did a close_region on one of the index table regions so that all > updates on this region will fail with NSRE > With this we start seeing index update failures, scanner lease expiries, etc > After sometime we re-assign the closed_region to trigger index rebuild > We start seeing these WARNING messages related to Orphaned chunks in > Regionserver logs in this case -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4044) Orphaned chunk bytes found during finalize in
[ https://issues.apache.org/jira/browse/PHOENIX-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-4044: --- Summary: Orphaned chunk bytes found during finalize in (was: Orphaned chunk of 1565 bytes found during finalize) > Orphaned chunk bytes found during finalize in > -- > > Key: PHOENIX-4044 > URL: https://issues.apache.org/jira/browse/PHOENIX-4044 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vikas Vishwakarma >Priority: Minor > > Observing these WARNINGS in the RegionServer logs similar to what is > discussed PHOENIX-1011 > 2017-07-20 10:10:00,421 WARN [Finalizer] memory.GlobalMemoryManager - > Orphaned chunk of 1565 bytes found during finalize > 2017-07-20 10:10:00,421 WARN [Finalizer] memory.GlobalMemoryManager - > Orphaned chunk of 1565 bytes found during finalize > 2017-07-20 10:10:21,153 WARN [Finalizer] memory.GlobalMemoryManager - > Orphaned chunk of 1565 bytes found during finalize > I am not able to identify the exact sequence of events that lead to these > warnings but the scenario we were testing is as follows: > We were trying to simulate Index update failures for one of our tests to > check index rebuild scenario > For this we did a close_region on one of the index table regions so that all > updates on this region will fail with NSRE > With this we start seeing index update failures, scanner lease expiries, etc > After sometime we re-assign the closed_region to trigger index rebuild > We start seeing these WARNING messages related to Orphaned chunks in > Regionserver logs in this case -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4044) Orphaned chunk of 15650000 bytes found during finalize
Vikas Vishwakarma created PHOENIX-4044: -- Summary: Orphaned chunk of 1565 bytes found during finalize Key: PHOENIX-4044 URL: https://issues.apache.org/jira/browse/PHOENIX-4044 Project: Phoenix Issue Type: Bug Affects Versions: 4.10.0 Reporter: Vikas Vishwakarma Priority: Minor Observing these WARNINGS in the RegionServer logs similar to what is discussed PHOENIX-1011 2017-07-20 10:10:00,421 WARN [Finalizer] memory.GlobalMemoryManager - Orphaned chunk of 1565 bytes found during finalize 2017-07-20 10:10:00,421 WARN [Finalizer] memory.GlobalMemoryManager - Orphaned chunk of 1565 bytes found during finalize 2017-07-20 10:10:21,153 WARN [Finalizer] memory.GlobalMemoryManager - Orphaned chunk of 1565 bytes found during finalize I am not able to identify the exact sequence of events that lead to these warnings but the scenario we were testing is as follows: We were trying to simulate Index update failures for one of our tests to check index rebuild scenario For this we did a close_region on one of the index table regions so that all updates on this region will fail with NSRE With this we start seeing index update failures, scanner lease expiries, etc After sometime we re-assign the closed_region to trigger index rebuild We start seeing these WARNING messages related to Orphaned chunks in Regionserver logs in this case -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on ps.execute() call
[ https://issues.apache.org/jira/browse/PHOENIX-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma resolved PHOENIX-1215. Resolution: Invalid This turned out to be a issue in our client implementation. Closing. some phoenix transactions are hanging indefinitely on ps.execute() call --- Key: PHOENIX-1215 URL: https://issues.apache.org/jira/browse/PHOENIX-1215 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz MemTotal: 65801264 kB, 1 Gbps Network Red Hat Enterprise Linux Server release 6.2 Reporter: Vikas Vishwakarma Test Setup: == 4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. Each load client spawns a thread and runs a upsert query in batch mode Pseudo code: == SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME (ROW_KEY_ID, ROW_VAL) VALUES (?, ?) for (int i=0; i6400; i++) { upsertSql.append(SQL_UPSERT_HBASE_DATA); ps = conn.prepareStatement(preparedSqlTableStmt.toString()); ps.setString(1, rowKey); ps.setString(2, rowVal); ps.execute(); } conn.commit(); tryClose(conn); Observation: == Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout(). Log Analysis: = On the client side there are no errors or exceptions except some threads in hung state On the service side, the observation is as follows: Also after some log analysis, it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded. RegionServer log snapshots during the time period this issue was observed on the client side: - 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x0E,stopRow:\\x0F,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x07,stopRow:\\x08,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1 2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully. 2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1 2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully. ... 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to
[jira] [Closed] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on ps.execute() call
[ https://issues.apache.org/jira/browse/PHOENIX-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma closed PHOENIX-1215. -- This turned out to be a issue in our client implementation. Closing. some phoenix transactions are hanging indefinitely on ps.execute() call --- Key: PHOENIX-1215 URL: https://issues.apache.org/jira/browse/PHOENIX-1215 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz MemTotal: 65801264 kB, 1 Gbps Network Red Hat Enterprise Linux Server release 6.2 Reporter: Vikas Vishwakarma Test Setup: == 4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. Each load client spawns a thread and runs a upsert query in batch mode Pseudo code: == SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME (ROW_KEY_ID, ROW_VAL) VALUES (?, ?) for (int i=0; i6400; i++) { upsertSql.append(SQL_UPSERT_HBASE_DATA); ps = conn.prepareStatement(preparedSqlTableStmt.toString()); ps.setString(1, rowKey); ps.setString(2, rowVal); ps.execute(); } conn.commit(); tryClose(conn); Observation: == Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout(). Log Analysis: = On the client side there are no errors or exceptions except some threads in hung state On the service side, the observation is as follows: Also after some log analysis, it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded. RegionServer log snapshots during the time period this issue was observed on the client side: - 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x0E,stopRow:\\x0F,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x07,stopRow:\\x08,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1 2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully. 2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1 2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully. ... 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27
[jira] [Created] (PHOENIX-1215) some transactions are hanging indefinitely on
Vikas Vishwakarma created PHOENIX-1215: -- Summary: some transactions are hanging indefinitely on Key: PHOENIX-1215 URL: https://issues.apache.org/jira/browse/PHOENIX-1215 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz MemTotal: 65801264 kB, 1 Gbps Network Red Hat Enterprise Linux Server release 6.2 Reporter: Vikas Vishwakarma Test Setup: == 4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. Each load client spawns a thread and runs a upsert query in batch mode Pseudo code: == SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME (ROW_KEY_ID, ROW_VAL) VALUES (?, ?) for (int i=0; i6400; i++) { upsertSql.append(SQL_UPSERT_HBASE_DATA); ps = conn.prepareStatement(preparedSqlTableStmt.toString()); ps.setString(1, rowKey); ps.setString(2, rowVal); ps.execute(); } conn.commit(); tryClose(conn); Observation: == Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout(). Log Analysis: = On the client side there are no errors or exceptions except some threads in hung state On the service side, the observation is as follows: Also after some log analysis, it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded. RegionServer log snapshots during the time period this issue was observed on the client side: - 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x0E,stopRow:\\x0F,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x07,stopRow:\\x08,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1 2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully. 2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1 2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully. ... 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. .. 2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan
[jira] [Updated] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on ps.execute() call
[ https://issues.apache.org/jira/browse/PHOENIX-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-1215: --- Summary: some phoenix transactions are hanging indefinitely on ps.execute() call (was: some phoenix transactions are hanging indefinitely on ) some phoenix transactions are hanging indefinitely on ps.execute() call --- Key: PHOENIX-1215 URL: https://issues.apache.org/jira/browse/PHOENIX-1215 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: Num Cores: 24, Model : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz MemTotal: 65801264 kB, 1 Gbps Network Red Hat Enterprise Linux Server release 6.2 Reporter: Vikas Vishwakarma Test Setup: == 4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. Each load client spawns a thread and runs a upsert query in batch mode Pseudo code: == SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME (ROW_KEY_ID, ROW_VAL) VALUES (?, ?) for (int i=0; i6400; i++) { upsertSql.append(SQL_UPSERT_HBASE_DATA); ps = conn.prepareStatement(preparedSqlTableStmt.toString()); ps.setString(1, rowKey); ps.setString(2, rowVal); ps.execute(); } conn.commit(); tryClose(conn); Observation: == Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout(). Log Analysis: = On the client side there are no errors or exceptions except some threads in hung state On the service side, the observation is as follows: Also after some log analysis, it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded. RegionServer log snapshots during the time period this issue was observed on the client side: - 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x0E,stopRow:\\x0F,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x07,stopRow:\\x08,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1 2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully. 2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1 2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully. ... 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that
[jira] [Updated] (PHOENIX-1215) some phoenix transactions are hanging indefinitely on ps.execute() call
[ https://issues.apache.org/jira/browse/PHOENIX-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikas Vishwakarma updated PHOENIX-1215: --- Description: Test Setup: == 4 load client with 16 threads each running against a HBase cluster with 6 RegionServers. Each load client spawns a thread and runs a upsert query in batch mode Pseudo code: == SQL_UPSERT_HBASE_DATA = UPSERT INTO TABLE_NAME (ROW_KEY_ID, ROW_VAL) VALUES (?, ?) for (int i=0; i6400; i++) { upsertSql.append(SQL_UPSERT_HBASE_DATA); ps = conn.prepareStatement(preparedSqlTableStmt.toString()); ps.setString(1, rowKey); ps.setString(2, rowVal); ps.execute(); } conn.commit(); tryClose(conn); Observation: == Observing some issues where a small number of transactions (approx 40 out of 80,000) are hanging indefinitely on preparedStatement.execute() call. I tried to put a timeout on the transaction but looks like phoenix still does not support preparedStatement.setQueryTimeout(). Log Analysis: = On the client side there are no errors or exceptions except some threads in hung state On the service side, the observation is as follows: Also after some log analysis, it looks like all the batches that got stuck, are during the start of the test. From the RegionServer logs, the phoenix co-processor loading and index updates is in progress during this time (which involves some failures and retries). Once this phase is over, remaining write batches that started later have succeeded. RegionServer log snapshots during the time period this issue was observed on the client side: - 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,114 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ScanRegionObserver from HTD of HT_42 successfully. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,252 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x0E,stopRow:\\x0F,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:00,931 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110080691],batch:-1,startRow:\\x07,stopRow:\\x08,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:06,507 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.ServerCachingEndpointImpl with path null and priority 1 2014-08-27 03:28:06,507 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.ServerCachingEndpointImpl from HTD of HT_44 successfully. 2014-08-27 03:28:06,508 DEBUG org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Loading coprocessor class org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver with path null and priority 1 2014-08-27 03:28:06,508 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Loaded coprocessor org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver from HTD of HT_44 successfully. ... 2014-08-27 03:28:19,525 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. 2014-08-27 03:28:19,547 INFO org.apache.phoenix.hbase.index.Indexer: Found some outstanding index updates that didn't succeed during WAL replay - attempting to replay now. .. 2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Starting ungrouped coprocessor scan {timeRange:[0,1409110100554],batch:-1,startRow:\\x0C,stopRow:\\x0D,loadColumnFamiliesOnDemand:true,totalColumns:1,cacheBlocks:true,families:{0:[ALL]},maxResultSize:-1,maxVersions:1,caching:-1} 2014-08-27 03:28:20,660 INFO org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver: Finished scanning 0 rows for ungrouped coprocessor scan
[jira] [Commented] (PHOENIX-998) SocketTimeoutException under high concurrent write access to phoenix indexed table
[ https://issues.apache.org/jira/browse/PHOENIX-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072996#comment-14072996 ] Vikas Vishwakarma commented on PHOENIX-998: --- So this is my complete analysis: In the scanner, I was setting the cache size to 6400 rows that comes to ~1.5 MB and I was just iterating over the results once as given in the code snapshot above and discarding them since the goal of my loader client was to create high scan load and I don't really care about the results. With hbase-0.94 build this was working fine which means it was able to complete within 60 seconds but with hbase-0.98 some of the scans are probably taking more than 60 seconds leading to lease expiry and resultant failures. I reduced the cache size to 3200 from 6400 and there were fewer failures and then I reduced it to 1/4th i.e 1600 rows (~0.4 - 0.5 MB) and this passed with 0 failures SocketTimeoutException under high concurrent write access to phoenix indexed table -- Key: PHOENIX-998 URL: https://issues.apache.org/jira/browse/PHOENIX-998 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: HBase 0.98.1-SNAPSHOT, Hadoop 2.3.0-cdh5.0.0 Reporter: wangxianbin Priority: Critical we have a small hbase cluster, which has one master, six slaves, we test phoenix index concurrent write access performance with four write clients, each client has 100 threads, each thread has one phoenix jdbc connection, and we encounter SocketTimeoutException as follow, and it will retry for very long time, how can i deal with such issue? 2014-05-22 17:22:58,490 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t10] client.AsyncProcess: #16016, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,436 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t6] client.AsyncProcess: #16027, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,440 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t1] client.AsyncProcess: #16013, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,449 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t7] client.AsyncProcess: #16028, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,473 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t8] client.AsyncProcess: #16020, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,494 INFO [htable-pool20-t13] client.AsyncProcess: #16016, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, tracking started Thu May 22 17:21:32 CST 2014, retrying after 20189 ms, replay 1 ops. 2014-05-22 17:23:02,439 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t4] client.AsyncProcess: #16022, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:02,496 INFO [htable-pool20-t3] client.AsyncProcess: #16013, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, tracking started Thu May 22 17:21:32 CST 2014, retrying after 20001 ms, replay 1 ops. 2014-05-22 17:23:02,496 INFO [htable-pool20-t16] client.AsyncProcess: #16028, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch :
[jira] [Commented] (PHOENIX-998) SocketTimeoutException under high concurrent write access to phoenix indexed table
[ https://issues.apache.org/jira/browse/PHOENIX-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069897#comment-14069897 ] Vikas Vishwakarma commented on PHOENIX-998: --- The client does scan all the records: ResultScanner scanner = table.getScanner(scan); IteratorResult iterator = scanner.iterator(); while (iterator.hasNext()) { Result next = iterator.next(); next.getRow(); next.getValue(Bytes.toBytes(historyColumnFamily), Bytes.toBytes(historyColumnQualifier)); scancounter++; } Also I don't see this issue when running the same against hbase-0.94 build. From service logs in DEBUG mode I am not getting any more info on this. I will try to put some trace logs in the client and check further. SocketTimeoutException under high concurrent write access to phoenix indexed table -- Key: PHOENIX-998 URL: https://issues.apache.org/jira/browse/PHOENIX-998 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: HBase 0.98.1-SNAPSHOT, Hadoop 2.3.0-cdh5.0.0 Reporter: wangxianbin Priority: Critical we have a small hbase cluster, which has one master, six slaves, we test phoenix index concurrent write access performance with four write clients, each client has 100 threads, each thread has one phoenix jdbc connection, and we encounter SocketTimeoutException as follow, and it will retry for very long time, how can i deal with such issue? 2014-05-22 17:22:58,490 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t10] client.AsyncProcess: #16016, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,436 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t6] client.AsyncProcess: #16027, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,440 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t1] client.AsyncProcess: #16013, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,449 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t7] client.AsyncProcess: #16028, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,473 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t8] client.AsyncProcess: #16020, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,494 INFO [htable-pool20-t13] client.AsyncProcess: #16016, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, tracking started Thu May 22 17:21:32 CST 2014, retrying after 20189 ms, replay 1 ops. 2014-05-22 17:23:02,439 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t4] client.AsyncProcess: #16022, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:02,496 INFO [htable-pool20-t3] client.AsyncProcess: #16013, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, tracking started Thu May 22 17:21:32 CST 2014, retrying after 20001 ms, replay 1 ops. 2014-05-22 17:23:02,496 INFO [htable-pool20-t16] client.AsyncProcess: #16028, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017
[jira] [Commented] (PHOENIX-998) SocketTimeoutException under high concurrent write access to phoenix indexed table
[ https://issues.apache.org/jira/browse/PHOENIX-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067447#comment-14067447 ] Vikas Vishwakarma commented on PHOENIX-998: --- I was able to reproduce and identify this issue. It is related to LeaseExpiry in RegionServers. You will see something like this in the RegionServer logs when this issue occurs. There is some refactoring around leaseExpiry in hbase-0.98 and may have introduced some race condition (refer https://issues.apache.org/jira/browse/HBASE-8449) 2014-07-19 05:20:03,696 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 120 lease expired on region PB4_1,EXCHG71:TOPIC79:,1405745722171.631771d92412b9744e342aeffa61b880. 2014-07-19 05:20:03,907 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 117 lease expired on region PB4_1,EXCHG41:TOPIC79:,1405745722171.dd30b2115b87f2c1f8aadcefa52d28ee. 2014-07-19 05:20:04,008 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 122 lease expired on region ... 2014-07-19 05:20:04,008 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 119 lease expired on region PB4_1,EXCHG53:TOPIC79:,1405745722171.a55923b2b74ed6a61d4cad2c6afb4dd0. 2014-07-19 05:20:21,420 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 146 lease expired on region PB4_1,EXCHG27:TOPIC79:,1405745722170.89dcec9f29c6c36664dcca0b51bb477a. 2014-07-19 05:20:21,420 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 145 lease expired on region PB4_1,EXCHG25:TOPIC79:,1405745722170.cda36ed3c880f0cdc95b0729306fdd56. SocketTimeoutException under high concurrent write access to phoenix indexed table -- Key: PHOENIX-998 URL: https://issues.apache.org/jira/browse/PHOENIX-998 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0 Environment: HBase 0.98.1-SNAPSHOT, Hadoop 2.3.0-cdh5.0.0 Reporter: wangxianbin Priority: Critical we have a small hbase cluster, which has one master, six slaves, we test phoenix index concurrent write access performance with four write clients, each client has 100 threads, each thread has one phoenix jdbc connection, and we encounter SocketTimeoutException as follow, and it will retry for very long time, how can i deal with such issue? 2014-05-22 17:22:58,490 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t10] client.AsyncProcess: #16016, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,436 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t6] client.AsyncProcess: #16027, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,440 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t1] client.AsyncProcess: #16013, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,449 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t7] client.AsyncProcess: #16028, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,473 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t8] client.AsyncProcess: #16020, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:00,494 INFO [htable-pool20-t13] client.AsyncProcess: #16016, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to storm3.org/172.16.2.23:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.2.24:52017 remote=storm3.org/172.16.2.23:60020] on storm3.org,60020,1400750242156, tracking started Thu May 22 17:21:32 CST 2014, retrying after 20189 ms, replay 1 ops. 2014-05-22 17:23:02,439 INFO [storm4.org,60020,1400750242045-index-writer--pool3-t4] client.AsyncProcess: #16022, waiting for some tasks to finish. Expected max=0, tasksSent=13, tasksDone=12, currentTasksDone=12, retries=11 hasError=false, tableName=IPHOENIX10M 2014-05-22 17:23:02,496 INFO [htable-pool20-t3] client.AsyncProcess: #16013, table=IPHOENIX10M, attempt=12/350 failed 1 ops, last exception: java.net.SocketTimeoutException: Call to