RE: Performance degradation on query analysis

2019-09-27 Thread Stepan Migunov
Thanks Josh,  you are right, we have actually disabled automatic major
compaction. Now we added SYSTEM.STATS to weekly compaction and I hope this
resolve the issue.


-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Tuesday, September 24, 2019 6:39 PM
To: user@phoenix.apache.org
Subject: Re: Performance degradation on query analysis

Did you change your configuration to prevent compactions from regularly
happening, Stepan?

By default, you should have a major compaction run weekly which would have
fixed this for you, although minor compactions would have run automatically
as well to rewrite small hfiles as you are creating new one (generating new
stats).

On 9/19/19 4:50 PM, Ankit Singhal wrote:
> Please schedule compaction on SYSTEM.STATS table to clear the old entries.
>
> On Thu, Sep 19, 2019 at 1:48 PM Stepan Migunov
>  <mailto:stepan.migu...@firstlinesoftware.com>> wrote:
>
> Thanks, Josh. The problem was really related to reading the
> SYSTEM.STATS
> table.
> There were only 8,000 rows in the table, but COUNT took more than 10
> minutes. I noticed that the storage files (34) had a total size of
> 10 GB.
>
> DELETE FROM SYSTEM.STATS did not help - the storage files are still
> 10 GB,
> and COUNT took a long time.
> Then I truncated the table from the hbase shell. And this fixed the
> problem - after UPDATE STATS for each table, everything works fine.
>
> Are there any known issues with SYSTEM.STATS table? Apache Phoenix
> 4.13.1
> with 15 Region Servers.
>
> -Original Message-
> From: Josh Elser [mailto:els...@apache.org <mailto:els...@apache.org>]
> Sent: Tuesday, September 17, 2019 5:16 PM
> To: user@phoenix.apache.org <mailto:user@phoenix.apache.org>
> Subject: Re: Performance degradation on query analysis
>
> Can you share the output you see from the EXPLAIN? Does it differ
> between
> times it's "fast" and times it's "slow"?
>
> Sharing the table(s) DDL statements would also help, along with the
> shape
> and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
> RegionServers).
>
> Spit-balling ideas:
>
> Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS
> table.
>
> Have you looked more coarsely at the RegionServer logs/metrics? Any
> obvious
> saturation issues (e.g. handlers consumed, JVM GC pauses, host CPU
> saturation)?
>
> Turn on DEBUG log4j client side (beware of chatty ZK logging) and see
> if
> there's something obvious from when the EXPLAIN is slow.
>
> On 9/17/19 3:58 AM, Stepan Migunov wrote:
>  > Hi
>  > We have an issue with our production environment - from time to
> time we
>  > notice a significant performance degradation for some queries.
> The strange
>  > thing is that the EXPLAIN operator for these queries takes the
> same time
>  > as queries execution (5 minutes or more). So, I guess, the issue is
>  > related to query's analysis but not data extraction. Is it
> possible that
>  > issue is related to SYSTEM.STATS access problem? Any other ideas?
>  >
>


RE: Performance degradation on query analysis

2019-09-19 Thread Stepan Migunov
Thanks, Josh. The problem was really related to reading the SYSTEM.STATS
table.
There were only 8,000 rows in the table, but COUNT took more than 10
minutes. I noticed that the storage files (34) had a total size of 10 GB.

DELETE FROM SYSTEM.STATS did not help - the storage files are still 10 GB,
and COUNT took a long time.
Then I truncated the table from the hbase shell. And this fixed the
problem - after UPDATE STATS for each table, everything works fine.

Are there any known issues with SYSTEM.STATS table? Apache Phoenix 4.13.1
with 15 Region Servers.

-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Tuesday, September 17, 2019 5:16 PM
To: user@phoenix.apache.org
Subject: Re: Performance degradation on query analysis

Can you share the output you see from the EXPLAIN? Does it differ between
times it's "fast" and times it's "slow"?

Sharing the table(s) DDL statements would also help, along with the shape
and version of your cluster (e.g. Apache Phoenix 4.14.2 with 8
RegionServers).

Spit-balling ideas:

Could be reads over the SYSTEM.CATALOG table or the SYSTEM.STATS table.

Have you looked more coarsely at the RegionServer logs/metrics? Any obvious
saturation issues (e.g. handlers consumed, JVM GC pauses, host CPU
saturation)?

Turn on DEBUG log4j client side (beware of chatty ZK logging) and see if
there's something obvious from when the EXPLAIN is slow.

On 9/17/19 3:58 AM, Stepan Migunov wrote:
> Hi
> We have an issue with our production environment - from time to time we
> notice a significant performance degradation for some queries. The strange
> thing is that the EXPLAIN operator for these queries takes the same time
> as queries execution (5 minutes or more). So, I guess, the issue is
> related to query's analysis but not data extraction. Is it possible that
> issue is related to SYSTEM.STATS access problem? Any other ideas?
>


Performance degradation on query analysis

2019-09-17 Thread Stepan Migunov
Hi
We have an issue with our production environment - from time to time we notice 
a significant performance degradation for some queries. The strange thing is 
that the EXPLAIN operator for these queries takes the same time as queries 
execution (5 minutes or more). So, I guess, the issue is related to query's 
analysis but not data extraction. Is it possible that issue is related to 
SYSTEM.STATS access problem? Any other ideas?


RE: Phoenix ODBC driver limitations

2018-05-24 Thread Stepan Migunov
Yes, I read this. But the document says "This needs to be set at client and
server both". I've been confused - what is "client" in case of
ODBC-connection. I supposed that driver, but it seems queryserver.

-Original Message-
From: Francis Chuang [mailto:francischu...@apache.org]
Sent: Thursday, May 24, 2018 1:35 AM
To: user@phoenix.apache.org
Subject: Re: Phoenix ODBC driver limitations

Namespace mapping is something you need to enable on the server (it's off by
default).

See documentation for enabling it here:
http://phoenix.apache.org/namspace_mapping.html

Francis

On 24/05/2018 5:23 AM, Stepan Migunov wrote:
> Thanks you for response, Josh!
>
> I got something like "Inconsistent namespace mapping properties" and
> thought it was because it's impossible to set
> "isNamespaceMappingEnabled" for the ODBC driver (client side). After
> your explanation I understood that the "client" in this case is
> queryserver but not ODBC driver. And now I need to check why queryserver
> doesn't apply this property.
>
> -Original Message-
> From: Josh Elser [mailto:els...@apache.org]
> Sent: Wednesday, May 23, 2018 6:52 PM
> To: user@phoenix.apache.org
> Subject: Re: Phoenix ODBC driver limitations
>
> I'd be surprised to hear that the ODBC driver would need to know
> anything about namespace-mapping.
>
> Do you have an error? Steps to reproduce an issue which you see?
>
> The reason I am surprised is that namespace mapping is an
> implementation detail of the JDBC driver which lives inside of PQS --
> *not* the ODBC driver. The trivial thing you can check would be to
> validate that the hbase-site.xml which PQS references is up to date
> and that PQS was restarted to pick up the newest version of
> hbase-site.xml
>
> On 5/22/18 4:16 AM, Stepan Migunov wrote:
>> Hi,
>>
>> Is the ODBC driver from Hortonworks the only way to access Phoenix
>> from .NET code now?
>> The problem is that driver has some critical limitations - it seems,
>> driver doesn't support Namespace Mapping (it couldn't be able to
>> connect to Phoenix if phoenix.schema.isNamespaceMappingEnabled=true)
>> and doesn't support query hints.
>>
>> Regards,
>> Stepan.
>>


RE: Phoenix ODBC driver limitations

2018-05-23 Thread Stepan Migunov
Thanks you for response, Josh!

I got something like "Inconsistent namespace mapping properties" and thought
it was because it's impossible to set "isNamespaceMappingEnabled" for the
ODBC driver (client side). After your explanation I understood that the
"client" in this case is queryserver but not ODBC driver. And now I need to
check why queryserver doesn't apply this property.

-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Wednesday, May 23, 2018 6:52 PM
To: user@phoenix.apache.org
Subject: Re: Phoenix ODBC driver limitations

I'd be surprised to hear that the ODBC driver would need to know anything
about namespace-mapping.

Do you have an error? Steps to reproduce an issue which you see?

The reason I am surprised is that namespace mapping is an implementation
detail of the JDBC driver which lives inside of PQS -- *not* the ODBC
driver. The trivial thing you can check would be to validate that the
hbase-site.xml which PQS references is up to date and that PQS was restarted
to pick up the newest version of hbase-site.xml

On 5/22/18 4:16 AM, Stepan Migunov wrote:
> Hi,
>
> Is the ODBC driver from Hortonworks the only way to access Phoenix from
> .NET code now?
> The problem is that driver has some critical limitations - it seems,
> driver doesn't support Namespace Mapping (it couldn't be able to connect
> to Phoenix if phoenix.schema.isNamespaceMappingEnabled=true) and doesn't
> support query hints.
>
> Regards,
> Stepan.
>


Phoenix ODBC driver limitations

2018-05-22 Thread Stepan Migunov
Hi,

Is the ODBC driver from Hortonworks the only way to access Phoenix from .NET 
code now? 
The problem is that driver has some critical limitations - it seems, driver 
doesn't support Namespace Mapping (it couldn't be able to connect to Phoenix if 
phoenix.schema.isNamespaceMappingEnabled=true) and doesn't support query hints.

Regards,
Stepan.


RE: UPSERT null vlaues

2018-04-28 Thread Stepan Migunov
Thank you James, it was “immutable”. I didn't know that it affects.



*From:* James Taylor [mailto:jamestay...@apache.org]
*Sent:* Friday, April 27, 2018 5:37 PM
*To:* user@phoenix.apache.org
*Subject:* Re: UPSERT null vlaues



Hi Stepan,

Please post your complete DDL and indicate the version of Phoenix and HBase
you’re using.  Your example should work as expected barring declaration of
the table as immutable or COL2 being part of the primary key.



Thanks,

James



On Fri, Apr 27, 2018 at 6:13 AM Stepan Migunov <
stepan.migu...@firstlinesoftware.com> wrote:

Hi,
Could you please clarify, how I can set a value to NULL?

After upsert into temp.table (ROWKEY, COL1, COL2) values (100, "ABC",
null); the value of COL2 still has a previous value (COL1 has "ABC" as
expected).

Or there is only one way - to set  STORE_NULLS = true?

Thanks,
Stepan.


UPSERT null vlaues

2018-04-27 Thread Stepan Migunov
Hi,
Could you please clarify, how I can set a value to NULL? 

After upsert into temp.table (ROWKEY, COL1, COL2) values (100, "ABC", null); 
the value of COL2 still has a previous value (COL1 has "ABC" as expected).

Or there is only one way - to set  STORE_NULLS = true?

Thanks,
Stepan.


RE: Storage Handler for Apache Hive

2018-03-27 Thread Stepan Migunov
Thank you, Artem!

Does it mean that Phoenix-Hive integration works with Hadoop >= 2.7 only?





*From:* Artem Ervits [mailto:artemerv...@gmail.com]
*Sent:* Tuesday, March 27, 2018 12:10 PM
*To:* user@phoenix.apache.org
*Subject:* Re: Storage Handler for Apache Hive



Stepan, you're using version of Hadoop where StopWatch class is not defined



https://github.com/apache/hadoop/tree/release-2.6.4-RC0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StopWatch.java



If you at least go to Hadoop 2.7, this error will disappear



https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StopWatch.java





On Tue, Mar 27, 2018, 4:47 AM Stepan Migunov <
stepan.migu...@firstlinesoftware.com> wrote:

Hi,

Phoenix 4.12.0-HBase-1.1, hadoop 2.6.4, hive 2.1.1

I have setup Hive for using external Phoenix tables. But after
phoenix-hive.jar was included into the hive-site.xml, the hive console give
exception on some operations (e.g. show databases or query with order by
clause):

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/util/StopWatch
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:314)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at
org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2098)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.util.StopWatch
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 19 more

Any suggestions are welcome.


RE: Phoenix as a source for Spark processing

2018-03-15 Thread Stepan Migunov
The table is about 300GB in hbase.
I've done some more research and now my test is very simple - I'm tryng to
calculate count of records of the table. No "distincts" and etc., just
phoenixTableAsDataFrame(...).count().

And now I see the issue - Spark creates about 400 task (14 executors),
starts calculation, speed is pretty good. Hbase shows about 1000 requests
per second. But then Sparks stops tasks as completed. I can see that Spark
have read only 20% of records, but completed 50% tasks. HBase shows only 100
requests per second. When Sparks "thinks" that 99% completed (only 5 tasks
left), actually it read only 70% records. The rest of work will be done by 5
tasks with 1-2 request per second...

Is the any way to force Spark distribute workload evenly? I have tried to
pre-split my Phonix table (now it has about 1200 regions), but it did't
help.

-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Friday, March 9, 2018 2:17 AM
To: user@phoenix.apache.org
Subject: Re: Phoenix as a source for Spark processing

How large is each row in this case? Or, better yet, how large is the table
in HBase?

You're spreading out approximately 7 "clients" to each Regionserver fetching
results (100/14). So, you should have pretty decent saturation from Spark
into HBase.

I'd be taking a look at the EXPLAIN plan for your SELECT DISTINCT to really
understand what Phoenix is doing. For example, are you getting ample
saturation of the resources that your servers have available (32core/128Gb
memory is pretty good). Validating how busy Spark is actually keeping HBase,
and how much time is spent transforming the data would be good. Or, another
point, are you excessively scanning data in the system which you could
otherwise preclude by a different rowkey structure via logic such as a
skip-scan (which would be shown in the EXPLAIN plan).

You may actually find that using the built-in UPSERT SELECT logic may
out-perform the Spark integration since you aren't actually doing any
transformation logic inside of Spark.


On 3/5/18 3:14 PM, Stepan Migunov wrote:
> Hi Josh, thank you for response!
>
> Our cluster has 14 nodes (32 cores each/128 GB memory). The source
> Phoenix table contains about 1 billion records (100 columns). We start
> a Spark's job with about 100 executors. Spark executes SELECT from the
> source table (select 6 columns with DISTINCT) and writes down output
> to another Phoenix table. Expected that the target table will contains
> about 100 million records.
> HBase has 14 region servers, both tables salted with SALT_BUCKETS=42.
> Spark's job running via Yarn.
>
>
> -Original Message-
> From: Josh Elser [mailto:els...@apache.org]
> Sent: Monday, March 5, 2018 9:14 PM
> To: user@phoenix.apache.org
> Subject: Re: Phoenix as a source for Spark processing
>
> Hi Stepan,
>
> Can you better ballpark the Phoenix-Spark performance you've seen (e.g.
> how much hardware do you have, how many spark executors did you use,
> how many region servers)? Also, what versions of software are you using?
>
> I don't think there are any firm guidelines on how you can solve this
> problem, but you've found the tools available for you.
>
> * You can try Phoenix+Spark to run over the Phoenix tables in place
> * You can use Phoenix+Hive to offload the data into Hive for queries
>
> If Phoenix-Spark wasn't fast enough, I'd imagine using the
> Phoenix-Hive integration to query the data would be similarly not fast
> enough.
>
> It's possible that the bottleneck is something we could fix in the
> integration, or fix configuration of Spark and/or Phoenix. We'd need
> you to help quantify this better :)
>
> On 3/4/18 6:08 AM, Stepan Migunov wrote:
>> In our software we need to combine fast interactive access to the
>> data with quite complex data processing. I know that Phoenix intended
>> for fast access, but hoped that also I could be able to use Phoenix
>> as a source for complex processing with the Spark.  Unfortunately,
>> Phoenix + Spark shows very poor performance. E.g., querying big
>> (about billion records) table with distinct takes about 2 hours. At
>> the same time this task with Hive source takes a few minutes. Is it
>> expected? Does it mean that Phoenix is absolutely not suitable for
>> batch processing with spark and I should duplicate data to Hive and
>> process it with Hive?
>>


Re: Phoenix as a source for Spark processing

2018-03-07 Thread Stepan Migunov
Some more details... We have done some simple tests to compare read/write 
possibility spark+hive and spark+phoenix. And now we have the following results:

Copy table (with no any transformations) (about 800 million rec):
Hive (TEZ) - 752 sec

Spark:
>From Hive to Hive: 2463 sec
>From Phoenix to Hive - 13310 sec
>From Hive to Phoenix - > 30240 sec

We use Spark 2.2.1; hbase 1.1.2, Phonix 4.13, Hive 2.1.1

So it seems that Spark + Phoenix led great performance degradation. Any 
thoughts?

On 2018/03/04 11:08:56, Stepan Migunov <stepan.migu...@firstlinesoftware.com> 
wrote: 
> In our software we need to combine fast interactive access to the data with 
> quite complex data processing. I know that Phoenix intended for fast access, 
> but hoped that also I could be able to use Phoenix as a source for complex 
> processing with the Spark.  Unfortunately, Phoenix + Spark shows very poor 
> performance. E.g., querying big (about billion records) table with distinct 
> takes about 2 hours. At the same time this task with Hive source takes a few 
> minutes. Is it expected? Does it mean that Phoenix is absolutely not suitable 
> for batch processing with spark and I should  duplicate data to Hive and 
> process it with Hive?
> 


RE: Phoenix as a source for Spark processing

2018-03-05 Thread Stepan Migunov
Hi Josh, thank you for response!

Our cluster has 14 nodes (32 cores each/128 GB memory). The source Phoenix
table contains about 1 billion records (100 columns). We start a Spark's job
with about 100 executors. Spark executes SELECT from the source table
(select 6 columns with DISTINCT) and writes down output to another Phoenix
table. Expected that the target table will contains about 100 million
records.
HBase has 14 region servers, both tables salted with SALT_BUCKETS=42.
Spark's job running via Yarn.


-Original Message-
From: Josh Elser [mailto:els...@apache.org]
Sent: Monday, March 5, 2018 9:14 PM
To: user@phoenix.apache.org
Subject: Re: Phoenix as a source for Spark processing

Hi Stepan,

Can you better ballpark the Phoenix-Spark performance you've seen (e.g.
how much hardware do you have, how many spark executors did you use, how
many region servers)? Also, what versions of software are you using?

I don't think there are any firm guidelines on how you can solve this
problem, but you've found the tools available for you.

* You can try Phoenix+Spark to run over the Phoenix tables in place
* You can use Phoenix+Hive to offload the data into Hive for queries

If Phoenix-Spark wasn't fast enough, I'd imagine using the Phoenix-Hive
integration to query the data would be similarly not fast enough.

It's possible that the bottleneck is something we could fix in the
integration, or fix configuration of Spark and/or Phoenix. We'd need you to
help quantify this better :)

On 3/4/18 6:08 AM, Stepan Migunov wrote:
> In our software we need to combine fast interactive access to the data
> with quite complex data processing. I know that Phoenix intended for fast
> access, but hoped that also I could be able to use Phoenix as a source for
> complex processing with the Spark.  Unfortunately, Phoenix + Spark shows
> very poor performance. E.g., querying big (about billion records) table
> with distinct takes about 2 hours. At the same time this task with Hive
> source takes a few minutes. Is it expected? Does it mean that Phoenix is
> absolutely not suitable for batch processing with spark and I should
> duplicate data to Hive and process it with Hive?
>


Phoenix as a source for Spark processing

2018-03-04 Thread Stepan Migunov
In our software we need to combine fast interactive access to the data with 
quite complex data processing. I know that Phoenix intended for fast access, 
but hoped that also I could be able to use Phoenix as a source for complex 
processing with the Spark.  Unfortunately, Phoenix + Spark shows very poor 
performance. E.g., querying big (about billion records) table with distinct 
takes about 2 hours. At the same time this task with Hive source takes a few 
minutes. Is it expected? Does it mean that Phoenix is absolutely not suitable 
for batch processing with spark and I should  duplicate data to Hive and 
process it with Hive?


Pool size / queue size with thin client

2017-11-14 Thread Stepan Migunov
Hi,

Could you please suggest how I can change pool size / queue size when using 
thin client? I have added to hbase-site.xml the following options:


phoenix.query.threadPoolSize
2000



phoenix.query.queueSize
10


restarted hbase (master and regions), but still receive the following response 
(via JDBC-thin client): 

Remote driver error: RuntimeException: 
org.apache.phoenix.exception.PhoenixIOException: Task 
org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask@69529e2 rejected 
from org.apache.phoenix.job.JobManager$1@48b8311c[Running, pool size = 128, 
active threads = 128, queued tasks = 5000, completed tasks = 0] 

My guess that settings are not applied and default values (128/5000) still used.
What's wrong?

Thanks,
Stepan.




Spark & UpgradeInProgressException: Cluster is being concurrently upgraded from 4.11.x to 4.12.x

2017-11-10 Thread Stepan Migunov
Hi,



I have just upgraded my cluster to Phoenix 4.12 and got an issue with tasks
running on Spark 2.2 (yarn cluster mode). Any attempts to use method
phoenixTableAsDataFrame to load data from existing database causes an
exception (see below).



The tasks worked fine on version 4.11. I have checked connection with
sqlline - it works now and shows that version is 4.12. Moreover, I have
noticed, that if limit the number of executors to one, the Spark's task
executing successfully too!



It looks like that executors running in parallel "interferes" each other’s
and could not acquire version's mutex.



Any suggestions please?



*final Connection connection =
ConnectionUtil.getInputConnection(configuration, overridingProps);*

*User class threw exception: org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
failure: Lost task 0.3 in stage 0.0 (TID 36, n7701-hdp005, executor 26):
java.lang.RuntimeException:
org.apache.phoenix.exception.UpgradeInProgressException: Cluster is being
concurrently upgraded from 4.11.x to 4.12.x. Please retry establishing
connection.*

*at
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)*

*at
org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:76)*

*at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:180)*

*at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:179)*

*at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)*

*at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)*

*at org.apache.phoenix.spark.PhoenixRDD.compute(PhoenixRDD.scala:64)*

*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)*

*at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)*

*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*

*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)*

*at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)*

*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*

*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)*

*at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)*

*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*

*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)*

*at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)*

*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*

*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)*

*at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)*

*at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)*

*at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)*

*at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)*

*at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)*

*at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)*

*at org.apache.spark.scheduler.Task.run(Task.scala:108)*

*at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)*

*at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*

*at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*

*at java.lang.Thread.run(Thread.java:745)*

*Caused by: org.apache.phoenix.exception.UpgradeInProgressException:
Cluster is being concurrently upgraded from 4.11.x to 4.12.x. Please retry
establishing connection.*

*at
org.apache.phoenix.query.ConnectionQueryServicesImpl.acquireUpgradeMutex(ConnectionQueryServicesImpl.java:3173)*

*at
org.apache.phoenix.query.ConnectionQueryServicesImpl.upgradeSystemTables(ConnectionQueryServicesImpl.java:2567)*

*at
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2440)*

*at
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2360)*

*at
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)*

*at
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2360)*

*at
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)*

*at
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:150)*

*at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)*

*at java.sql.DriverManager.getConnection(DriverManager.java:664)*

*at java.sql.DriverManager.getConnection(DriverManager.java:208)*

*at
org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)*

*at
org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)*

*at
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:176)*

*... 30 more*