Re: error trying to save to database (Phoenix)
If you look at the dependencies of the 5.0.0-HBase-2.0 artifact https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark/5.0.0-HBase-2.0 it was built against Spark 2.3.0, Scala 2.11.8 You may need to check with the Phoenix community if your setup with Spark 3.4.1 etc is supported by something like https://github.com/apache/phoenix-connectors/tree/master/phoenix5-spark3 On Mon, Aug 21, 2023 at 6:12 PM Kal Stevens wrote: > Sorry for being so Dense and thank you for your help. > > I was using this version > phoenix-spark-5.0.0-HBase-2.0.jar > > Because it was the latest in this repo > https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark > > > On Mon, Aug 21, 2023 at 5:07 PM Sean Owen wrote: > >> It is. But you have a third party library in here which seems to require >> a different version. >> >> On Mon, Aug 21, 2023, 7:04 PM Kal Stevens wrote: >> >>> OK, it was my impression that scala was packaged with Spark to avoid a >>> mismatch >>> https://spark.apache.org/downloads.html >>> >>> It looks like spark 3.4.1 (my version) uses scala Scala 2.12 >>> How do I specify the scala version? >>> >>> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen wrote: >>> That's a mismatch in the version of scala that your library uses vs spark uses. On Mon, Aug 21, 2023, 6:46 PM Kal Stevens wrote: > I am having a hard time figuring out what I am doing wrong here. > I am not sure if I have an incompatible version of something installed > or something else. > I can not find anything relevant in google to figure out what I am > doing wrong > I am using *spark 3.4.1*, and *python3.10* > > This is my code to save my dataframe > urls = [] > pull_sitemap_xml(robot, urls) > df = spark.createDataFrame(data=urls, schema=schema) > df.write.format("org.apache.phoenix.spark") \ > .mode("overwrite") \ > .option("table", "property") \ > .option("zkUrl", "192.168.1.162:2181") \ > .save() > > urls is an array of maps, containing a "url" and a "last_mod" field. > > Here is the error that I am getting > > Traceback (most recent call last): > > File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, > in main > > .save() > > File > "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", > line 1396, in save > > self._jwrite.save() > > File > "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > > return_value = get_return_value( > > File > "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", > line 169, in deco > > return f(*a, **kw) > > File > "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", > line 326, in get_return_value > > raise Py4JJavaError( > > py4j.protocol.Py4JJavaError: An error occurred while calling o636.save. > > : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps > scala.Predef$.refArrayOps(java.lang.Object[])' > > at > org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76) > > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35) > > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) > > at > org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) > > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) >
Re: error trying to save to database (Phoenix)
Sorry for being so Dense and thank you for your help. I was using this version phoenix-spark-5.0.0-HBase-2.0.jar Because it was the latest in this repo https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark On Mon, Aug 21, 2023 at 5:07 PM Sean Owen wrote: > It is. But you have a third party library in here which seems to require a > different version. > > On Mon, Aug 21, 2023, 7:04 PM Kal Stevens wrote: > >> OK, it was my impression that scala was packaged with Spark to avoid a >> mismatch >> https://spark.apache.org/downloads.html >> >> It looks like spark 3.4.1 (my version) uses scala Scala 2.12 >> How do I specify the scala version? >> >> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen wrote: >> >>> That's a mismatch in the version of scala that your library uses vs >>> spark uses. >>> >>> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens wrote: >>> I am having a hard time figuring out what I am doing wrong here. I am not sure if I have an incompatible version of something installed or something else. I can not find anything relevant in google to figure out what I am doing wrong I am using *spark 3.4.1*, and *python3.10* This is my code to save my dataframe urls = [] pull_sitemap_xml(robot, urls) df = spark.createDataFrame(data=urls, schema=schema) df.write.format("org.apache.phoenix.spark") \ .mode("overwrite") \ .option("table", "property") \ .option("zkUrl", "192.168.1.162:2181") \ .save() urls is an array of maps, containing a "url" and a "last_mod" field. Here is the error that I am getting Traceback (most recent call last): File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in main .save() File "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1396, in save self._jwrite.save() File "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( File "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco return f(*a, **kw) File "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o636.save. : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps scala.Predef$.refArrayOps(java.lang.Object[])' at org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76) at org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35) at org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) at org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) >>>
Re: error trying to save to database (Phoenix)
It is. But you have a third party library in here which seems to require a different version. On Mon, Aug 21, 2023, 7:04 PM Kal Stevens wrote: > OK, it was my impression that scala was packaged with Spark to avoid a > mismatch > https://spark.apache.org/downloads.html > > It looks like spark 3.4.1 (my version) uses scala Scala 2.12 > How do I specify the scala version? > > On Mon, Aug 21, 2023 at 4:47 PM Sean Owen wrote: > >> That's a mismatch in the version of scala that your library uses vs spark >> uses. >> >> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens wrote: >> >>> I am having a hard time figuring out what I am doing wrong here. >>> I am not sure if I have an incompatible version of something installed >>> or something else. >>> I can not find anything relevant in google to figure out what I am doing >>> wrong >>> I am using *spark 3.4.1*, and *python3.10* >>> >>> This is my code to save my dataframe >>> urls = [] >>> pull_sitemap_xml(robot, urls) >>> df = spark.createDataFrame(data=urls, schema=schema) >>> df.write.format("org.apache.phoenix.spark") \ >>> .mode("overwrite") \ >>> .option("table", "property") \ >>> .option("zkUrl", "192.168.1.162:2181") \ >>> .save() >>> >>> urls is an array of maps, containing a "url" and a "last_mod" field. >>> >>> Here is the error that I am getting >>> >>> Traceback (most recent call last): >>> >>> File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in >>> main >>> >>> .save() >>> >>> File >>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", >>> line 1396, in save >>> >>> self._jwrite.save() >>> >>> File >>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", >>> line 1322, in __call__ >>> >>> return_value = get_return_value( >>> >>> File >>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", >>> line 169, in deco >>> >>> return f(*a, **kw) >>> >>> File >>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", >>> line 326, in get_return_value >>> >>> raise Py4JJavaError( >>> >>> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save. >>> >>> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps >>> scala.Predef$.refArrayOps(java.lang.Object[])' >>> >>> at >>> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76) >>> >>> at >>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35) >>> >>> at >>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) >>> >>> at >>> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) >>> >>> at >>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) >>> >>> at >>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) >>> >>> at >>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) >>> >>
Re: error trying to save to database (Phoenix)
OK, it was my impression that scala was packaged with Spark to avoid a mismatch https://spark.apache.org/downloads.html It looks like spark 3.4.1 (my version) uses scala Scala 2.12 How do I specify the scala version? On Mon, Aug 21, 2023 at 4:47 PM Sean Owen wrote: > That's a mismatch in the version of scala that your library uses vs spark > uses. > > On Mon, Aug 21, 2023, 6:46 PM Kal Stevens wrote: > >> I am having a hard time figuring out what I am doing wrong here. >> I am not sure if I have an incompatible version of something installed or >> something else. >> I can not find anything relevant in google to figure out what I am doing >> wrong >> I am using *spark 3.4.1*, and *python3.10* >> >> This is my code to save my dataframe >> urls = [] >> pull_sitemap_xml(robot, urls) >> df = spark.createDataFrame(data=urls, schema=schema) >> df.write.format("org.apache.phoenix.spark") \ >> .mode("overwrite") \ >> .option("table", "property") \ >> .option("zkUrl", "192.168.1.162:2181") \ >> .save() >> >> urls is an array of maps, containing a "url" and a "last_mod" field. >> >> Here is the error that I am getting >> >> Traceback (most recent call last): >> >> File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in >> main >> >> .save() >> >> File >> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", >> line 1396, in save >> >> self._jwrite.save() >> >> File >> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", >> line 1322, in __call__ >> >> return_value = get_return_value( >> >> File >> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", >> line 169, in deco >> >> return f(*a, **kw) >> >> File >> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", >> line 326, in get_return_value >> >> raise Py4JJavaError( >> >> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save. >> >> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps >> scala.Predef$.refArrayOps(java.lang.Object[])' >> >> at >> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76) >> >> at >> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35) >> >> at >> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) >> >> at >> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) >> >> at >> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) >> >> at >> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) >> >> at >> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) >> >
Re: error trying to save to database (Phoenix)
That's a mismatch in the version of scala that your library uses vs spark uses. On Mon, Aug 21, 2023, 6:46 PM Kal Stevens wrote: > I am having a hard time figuring out what I am doing wrong here. > I am not sure if I have an incompatible version of something installed or > something else. > I can not find anything relevant in google to figure out what I am doing > wrong > I am using *spark 3.4.1*, and *python3.10* > > This is my code to save my dataframe > urls = [] > pull_sitemap_xml(robot, urls) > df = spark.createDataFrame(data=urls, schema=schema) > df.write.format("org.apache.phoenix.spark") \ > .mode("overwrite") \ > .option("table", "property") \ > .option("zkUrl", "192.168.1.162:2181") \ > .save() > > urls is an array of maps, containing a "url" and a "last_mod" field. > > Here is the error that I am getting > > Traceback (most recent call last): > > File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in > main > > .save() > > File > "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", > line 1396, in save > > self._jwrite.save() > > File > "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > > return_value = get_return_value( > > File > "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", > line 169, in deco > > return f(*a, **kw) > > File > "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", > line 326, in get_return_value > > raise Py4JJavaError( > > py4j.protocol.Py4JJavaError: An error occurred while calling o636.save. > > : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps > scala.Predef$.refArrayOps(java.lang.Object[])' > > at > org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76) > > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35) > > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) > > at > org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) > > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) > > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) >