(Moving this over to the user list as that's the appropriate list for
this question)
Do you get an error? We can't help you with only a "it didn't work" :)
I'd suggest that you try to narrow down the scope of the problem: is it
unique to Hive external tables? Can you use a different Hive
StorageHandler successfully (e.g. the HBaseStorageHandler)?
Finally, as you're using HDP, please also consider using their customer
support.
On 7/11/19 2:18 AM, 马士成 wrote:
Hello All,
In Apache Phoenix homepage, It shows two additional functions: Apache
Spark Integration and Phoenix Storage Handler for Apache Hive,
According the guidance, I can query phoenix table from beeline-cli, I
can load phoenix table as dataframe using Spark-sql.
So my question is :
Does Phoenix support spark-sql query the hive external table mapped from
Phoenix ?
I am working on hdp3.0 ( Phoenix 5.0 Hbase 2.0, Hive 3.1.0 ,Spark2.3.1
) and facing the issue as subject mentioned.
I tried to solve this problem but failed, I found some similar questions
on internet but the answers didn’t work for me.
My submit command :
spark-submit test3.py --jars \
/usr/hdp/current/phoenix-client/lib/phoenix-hive-5.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/phoenix-client/lib/hadoop-mapreduce-client-core.jar\
,/usr/hdp/current/phoenix-client/lib/phoenix-core-5.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/phoenix-client/lib/phoenix-spark-5.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-metastore-3.1.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-common-3.1.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hbase-client-2.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hbase-mapreduce-2.0.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-serde-3.1.0.3.0.0.0-1634.jar\
,/usr/hdp/current/hive-client/lib/hive-shims-3.1.0.3.0.0.0-1634.jar
Log attached and Demo code as below:
from pyspark.sql import SparkSession
if __name__ == '__main__':
spark = SparkSession.builder \
.appName("test") \
.enableHiveSupport() \
.getOrCreate()
df= spark.sql("select count(*) from ajmide_dw.part_device")
df.show()
Similar Issues:
https://community.hortonworks.com/questions/140097/facing-issue-from-spark-sql.html
https://stackoverflow.com/questions/51501044/unable-to-access-hive-external-tables-from-spark-shell
Any comment or suggestion is appreciated!
Thanks,
Shi-Cheng, Ma