Re: error trying to save to database (Phoenix)

2023-08-22 Thread Gera Shegalov
If you look at the dependencies of the 5.0.0-HBase-2.0 artifact
https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark/5.0.0-HBase-2.0
it was built against Spark 2.3.0, Scala 2.11.8

You may need to check with the Phoenix community if your setup with Spark
3.4.1 etc  is supported by something like
https://github.com/apache/phoenix-connectors/tree/master/phoenix5-spark3



On Mon, Aug 21, 2023 at 6:12 PM Kal Stevens  wrote:

> Sorry for being so Dense and thank you for your help.
>
> I was using this version
> phoenix-spark-5.0.0-HBase-2.0.jar
>
> Because it was the latest in this repo
> https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark
>
>
> On Mon, Aug 21, 2023 at 5:07 PM Sean Owen  wrote:
>
>> It is. But you have a third party library in here which seems to require
>> a different version.
>>
>> On Mon, Aug 21, 2023, 7:04 PM Kal Stevens  wrote:
>>
>>> OK, it was my impression that scala was packaged with Spark to avoid a
>>> mismatch
>>> https://spark.apache.org/downloads.html
>>>
>>> It looks like spark 3.4.1 (my version) uses scala Scala 2.12
>>> How do I specify the scala version?
>>>
>>> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen  wrote:
>>>
 That's a mismatch in the version of scala that your library uses vs
 spark uses.

 On Mon, Aug 21, 2023, 6:46 PM Kal Stevens 
 wrote:

> I am having a hard time figuring out what I am doing wrong here.
> I am not sure if I have an incompatible version of something installed
> or something else.
> I can not find anything relevant in google to figure out what I am
> doing wrong
> I am using *spark 3.4.1*, and *python3.10*
>
> This is my code to save my dataframe
> urls = []
> pull_sitemap_xml(robot, urls)
> df = spark.createDataFrame(data=urls, schema=schema)
> df.write.format("org.apache.phoenix.spark") \
> .mode("overwrite") \
> .option("table", "property") \
> .option("zkUrl", "192.168.1.162:2181") \
> .save()
>
> urls is an array of maps, containing a "url" and a "last_mod" field.
>
> Here is the error that I am getting
>
> Traceback (most recent call last):
>
>   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65,
> in main
>
> .save()
>
>   File
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
> line 1396, in save
>
> self._jwrite.save()
>
>   File
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
> line 1322, in __call__
>
> return_value = get_return_value(
>
>   File
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
> line 169, in deco
>
> return f(*a, **kw)
>
>   File
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
> line 326, in get_return_value
>
> raise Py4JJavaError(
>
> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.
>
> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
> scala.Predef$.refArrayOps(java.lang.Object[])'
>
> at
> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)
>
> at
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)
>
> at
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>
> at
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>
> at
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
>
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>



Re: error trying to save to database (Phoenix)

2023-08-21 Thread Kal Stevens
Sorry for being so Dense and thank you for your help.

I was using this version
phoenix-spark-5.0.0-HBase-2.0.jar

Because it was the latest in this repo
https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark


On Mon, Aug 21, 2023 at 5:07 PM Sean Owen  wrote:

> It is. But you have a third party library in here which seems to require a
> different version.
>
> On Mon, Aug 21, 2023, 7:04 PM Kal Stevens  wrote:
>
>> OK, it was my impression that scala was packaged with Spark to avoid a
>> mismatch
>> https://spark.apache.org/downloads.html
>>
>> It looks like spark 3.4.1 (my version) uses scala Scala 2.12
>> How do I specify the scala version?
>>
>> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen  wrote:
>>
>>> That's a mismatch in the version of scala that your library uses vs
>>> spark uses.
>>>
>>> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens  wrote:
>>>
 I am having a hard time figuring out what I am doing wrong here.
 I am not sure if I have an incompatible version of something installed
 or something else.
 I can not find anything relevant in google to figure out what I am
 doing wrong
 I am using *spark 3.4.1*, and *python3.10*

 This is my code to save my dataframe
 urls = []
 pull_sitemap_xml(robot, urls)
 df = spark.createDataFrame(data=urls, schema=schema)
 df.write.format("org.apache.phoenix.spark") \
 .mode("overwrite") \
 .option("table", "property") \
 .option("zkUrl", "192.168.1.162:2181") \
 .save()

 urls is an array of maps, containing a "url" and a "last_mod" field.

 Here is the error that I am getting

 Traceback (most recent call last):

   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in
 main

 .save()

   File
 "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
 line 1396, in save

 self._jwrite.save()

   File
 "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
 line 1322, in __call__

 return_value = get_return_value(

   File
 "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
 line 169, in deco

 return f(*a, **kw)

   File
 "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
 line 326, in get_return_value

 raise Py4JJavaError(

 py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.

 : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
 scala.Predef$.refArrayOps(java.lang.Object[])'

 at
 org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)

 at
 org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)

 at
 org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)

 at
 org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)

 at
 org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)

 at
 org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)

 at
 org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)

>>>


Re: error trying to save to database (Phoenix)

2023-08-21 Thread Sean Owen
It is. But you have a third party library in here which seems to require a
different version.

On Mon, Aug 21, 2023, 7:04 PM Kal Stevens  wrote:

> OK, it was my impression that scala was packaged with Spark to avoid a
> mismatch
> https://spark.apache.org/downloads.html
>
> It looks like spark 3.4.1 (my version) uses scala Scala 2.12
> How do I specify the scala version?
>
> On Mon, Aug 21, 2023 at 4:47 PM Sean Owen  wrote:
>
>> That's a mismatch in the version of scala that your library uses vs spark
>> uses.
>>
>> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens  wrote:
>>
>>> I am having a hard time figuring out what I am doing wrong here.
>>> I am not sure if I have an incompatible version of something installed
>>> or something else.
>>> I can not find anything relevant in google to figure out what I am doing
>>> wrong
>>> I am using *spark 3.4.1*, and *python3.10*
>>>
>>> This is my code to save my dataframe
>>> urls = []
>>> pull_sitemap_xml(robot, urls)
>>> df = spark.createDataFrame(data=urls, schema=schema)
>>> df.write.format("org.apache.phoenix.spark") \
>>> .mode("overwrite") \
>>> .option("table", "property") \
>>> .option("zkUrl", "192.168.1.162:2181") \
>>> .save()
>>>
>>> urls is an array of maps, containing a "url" and a "last_mod" field.
>>>
>>> Here is the error that I am getting
>>>
>>> Traceback (most recent call last):
>>>
>>>   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in
>>> main
>>>
>>> .save()
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>>> line 1396, in save
>>>
>>> self._jwrite.save()
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
>>> line 1322, in __call__
>>>
>>> return_value = get_return_value(
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
>>> line 169, in deco
>>>
>>> return f(*a, **kw)
>>>
>>>   File
>>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
>>> line 326, in get_return_value
>>>
>>> raise Py4JJavaError(
>>>
>>> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.
>>>
>>> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
>>> scala.Predef$.refArrayOps(java.lang.Object[])'
>>>
>>> at
>>> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)
>>>
>>> at
>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)
>>>
>>> at
>>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>>>
>>> at
>>> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
>>>
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>>>
>>> at
>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>>>
>>


Re: error trying to save to database (Phoenix)

2023-08-21 Thread Kal Stevens
OK, it was my impression that scala was packaged with Spark to avoid a
mismatch
https://spark.apache.org/downloads.html

It looks like spark 3.4.1 (my version) uses scala Scala 2.12
How do I specify the scala version?

On Mon, Aug 21, 2023 at 4:47 PM Sean Owen  wrote:

> That's a mismatch in the version of scala that your library uses vs spark
> uses.
>
> On Mon, Aug 21, 2023, 6:46 PM Kal Stevens  wrote:
>
>> I am having a hard time figuring out what I am doing wrong here.
>> I am not sure if I have an incompatible version of something installed or
>> something else.
>> I can not find anything relevant in google to figure out what I am doing
>> wrong
>> I am using *spark 3.4.1*, and *python3.10*
>>
>> This is my code to save my dataframe
>> urls = []
>> pull_sitemap_xml(robot, urls)
>> df = spark.createDataFrame(data=urls, schema=schema)
>> df.write.format("org.apache.phoenix.spark") \
>> .mode("overwrite") \
>> .option("table", "property") \
>> .option("zkUrl", "192.168.1.162:2181") \
>> .save()
>>
>> urls is an array of maps, containing a "url" and a "last_mod" field.
>>
>> Here is the error that I am getting
>>
>> Traceback (most recent call last):
>>
>>   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in
>> main
>>
>> .save()
>>
>>   File
>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
>> line 1396, in save
>>
>> self._jwrite.save()
>>
>>   File
>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
>> line 1322, in __call__
>>
>> return_value = get_return_value(
>>
>>   File
>> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
>> line 169, in deco
>>
>> return f(*a, **kw)
>>
>>   File
>> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
>> line 326, in get_return_value
>>
>> raise Py4JJavaError(
>>
>> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.
>>
>> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
>> scala.Predef$.refArrayOps(java.lang.Object[])'
>>
>> at
>> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)
>>
>> at
>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)
>>
>> at
>> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>>
>> at
>> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>>
>> at
>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
>>
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>>
>> at
>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>>
>


Re: error trying to save to database (Phoenix)

2023-08-21 Thread Sean Owen
That's a mismatch in the version of scala that your library uses vs spark
uses.

On Mon, Aug 21, 2023, 6:46 PM Kal Stevens  wrote:

> I am having a hard time figuring out what I am doing wrong here.
> I am not sure if I have an incompatible version of something installed or
> something else.
> I can not find anything relevant in google to figure out what I am doing
> wrong
> I am using *spark 3.4.1*, and *python3.10*
>
> This is my code to save my dataframe
> urls = []
> pull_sitemap_xml(robot, urls)
> df = spark.createDataFrame(data=urls, schema=schema)
> df.write.format("org.apache.phoenix.spark") \
> .mode("overwrite") \
> .option("table", "property") \
> .option("zkUrl", "192.168.1.162:2181") \
> .save()
>
> urls is an array of maps, containing a "url" and a "last_mod" field.
>
> Here is the error that I am getting
>
> Traceback (most recent call last):
>
>   File "/home/kal/real-estate/pullhttp/pull_properties.py", line 65, in
> main
>
> .save()
>
>   File
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py",
> line 1396, in save
>
> self._jwrite.save()
>
>   File
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
> line 1322, in __call__
>
> return_value = get_return_value(
>
>   File
> "/hadoop/spark/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py",
> line 169, in deco
>
> return f(*a, **kw)
>
>   File
> "/hadoop/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",
> line 326, in get_return_value
>
> raise Py4JJavaError(
>
> py4j.protocol.Py4JJavaError: An error occurred while calling o636.save.
>
> : java.lang.NoSuchMethodError: 'scala.collection.mutable.ArrayOps
> scala.Predef$.refArrayOps(java.lang.Object[])'
>
> at
> org.apache.phoenix.spark.DataFrameFunctions.getFieldArray(DataFrameFunctions.scala:76)
>
> at
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:35)
>
> at
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>
> at
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>
> at
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47)
>
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>