subject:"Using Avro file format with SparkSQL"

Re: Using Avro file format with SparkSQL

2022-02-17 Thread Artemis User


Please try these two corrections:

1. The --packages isn't the right command line argument for
   spark-submit.  Please use --conf spark.jars.packages=your-package to
   specify Maven packages or define your configuration parameters in
   the spark-defaults.conf file
2. Please check the version number of your spark-avro jar file in
   MavenCentral and see if that version is indeed available and
   compatible with Spark 3.2.  The version we are currently using for
   Spark 3.2 is spark-avro_2.12-3.1.1.jar, not 3.2.0.

BTW, you do have to include the spark-avro lib as a customer jar file.  
The Spark 3.2 distribution includes only the avro libs, not the 
spark-avro lib.  Hope this helps...


-- ND


On 2/9/22 10:25 PM, Karanika, Anna wrote:

Hello,

I have been trying to use spark SQL’s operations that are related to 
the Avro file format,
e.g., stored as, save, load, in a Java class but they keep failing 
with the following stack trace:


Exception in thread "main" org.apache.spark.sql.AnalysisException: 
 Failed to find data source: avro. Avro is built-in but external data 
source module since Spark 2.4. Please deploy the application as per 
the deployment section of "Apache Avro Data Source Guide".
        at 
org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
        at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
        at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
        at 
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
        at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)

        at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at 
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

For context, I am invoking spark-submit and adding arguments 
--packages org.apache.spark:spark-avro_2.12:3.2.0.

Yet, Spark responds as if the dependency was not added.
I am running spark-v3.2.0 (Scala 2.12).

On the other hand, everything works great with spark-shell or spark-sql.

I would appreciate any advice or feedback to get this running.

Thank you,
Anna

RE: Re: Using Avro file format with SparkSQL

2022-02-14 Thread Morven Huang

Hi Steve, 

You’re correct about the '--packages' option, seems my memory does not serve me 
well :) 

On 2022/02/15 07:04:27 Stephen Coy wrote:
> Hi Morven,
> 
> We use —packages for all of our spark jobs. Spark downloads the specified jar 
> and all of its dependencies from a Maven repository.
> 
> This means we never have to build fat or uber jars.
> 
> It does mean that the Apache Ivy configuration has to be set up correctly 
> though.
> 
> Cheers,
> 
> Steve C
> 
> > On 15 Feb 2022, at 5:58 pm, Morven Huang  wrote:
> >
> > I wrote a toy spark job and ran it within my IDE, same error if I don’t add 
> > spark-avro to my pom.xml. After putting spark-avro dependency to my 
> > pom.xml, everything works fine.
> >
> > Another thing is, if my memory serves me right, the spark-submit options 
> > for extra jars is ‘--jars’ , not ‘--packages’.
> >
> > Regards,
> >
> > Morven Huang
> >
> >
> > On 2022/02/10 03:25:28 "Karanika, Anna" wrote:
> >> Hello,
> >>
> >> I have been trying to use spark SQL’s operations that are related to the 
> >> Avro file format,
> >> e.g., stored as, save, load, in a Java class but they keep failing with 
> >> the following stack trace:
> >>
> >> Exception in thread "main" org.apache.spark.sql.AnalysisException:  Failed 
> >> to find data source: avro. Avro is built-in but external data source 
> >> module since Spark 2.4. Please deploy the application as per the 
> >> deployment section of "Apache Avro Data Source Guide".
> >>at 
> >> org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
> >>at 
> >> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
> >>at 
> >> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
> >>at 
> >> org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
> >>at 
> >> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
> >>at 
> >> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
> >>at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
> >>at 
> >> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> >> Method)
> >>at 
> >> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
> >>at 
> >> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> >>at 
> >> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> >>at 
> >> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> >>at 
> >> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> >>at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> >>at 
> >> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> >>at 
> >> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> >>at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> >>at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >>
> >> For context, I am invoking spark-submit and adding arguments --packages 
> >> org.apache.spark:spark-avro_2.12:3.2.0.
> >> Yet, Spark responds as if the dependency was not added.
> >> I am running spark-v3.2.0 (Scala 2.12).
> >>
> >> On the other hand, everything works great with spark-shell or spark-sql.
> >>
> >> I would appreciate any advice or feedback to get this running.
> >>
> >> Thank you,
> >> Anna
> >>
> >>
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> 
> This email contains confidential information of and is the copyright of 
> Infomedia. It must not be forwarded, amended or disclosed without consent of 
> the sender. If you received this message by mistake, please advise the sender 
> and delete all copies. Security of transmission on the internet cannot be 
> guaranteed, could be infected, intercepted, or corrupted and you should 
> ensure you have suitable antivirus protection in place. By sending us your or 
> any third party personal details, you consent to (or confirm you have 
> obtained consent from such third parties) to Infomedia’s privacy policy. 
> http://www.infomedia.com.au/privacy-policy/
> 
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Using Avro file format with SparkSQL

2022-02-14 Thread Stephen Coy

Hi Morven,

We use —packages for all of our spark jobs. Spark downloads the specified jar 
and all of its dependencies from a Maven repository.

This means we never have to build fat or uber jars.

It does mean that the Apache Ivy configuration has to be set up correctly 
though.

Cheers,

Steve C

> On 15 Feb 2022, at 5:58 pm, Morven Huang  wrote:
>
> I wrote a toy spark job and ran it within my IDE, same error if I don’t add 
> spark-avro to my pom.xml. After putting spark-avro dependency to my pom.xml, 
> everything works fine.
>
> Another thing is, if my memory serves me right, the spark-submit options for 
> extra jars is ‘--jars’ , not ‘--packages’.
>
> Regards,
>
> Morven Huang
>
>
> On 2022/02/10 03:25:28 "Karanika, Anna" wrote:
>> Hello,
>>
>> I have been trying to use spark SQL’s operations that are related to the 
>> Avro file format,
>> e.g., stored as, save, load, in a Java class but they keep failing with the 
>> following stack trace:
>>
>> Exception in thread "main" org.apache.spark.sql.AnalysisException:  Failed 
>> to find data source: avro. Avro is built-in but external data source module 
>> since Spark 2.4. Please deploy the application as per the deployment section 
>> of "Apache Avro Data Source Guide".
>>at 
>> org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
>>at 
>> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
>>at 
>> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
>>at 
>> org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
>>at 
>> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
>>at 
>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
>>at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
>>at 
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>> Method)
>>at 
>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>>at 
>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>>at 
>> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>>at 
>> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>>at 
>> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>>at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>>at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>>at 
>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>>at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>>at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> For context, I am invoking spark-submit and adding arguments --packages 
>> org.apache.spark:spark-avro_2.12:3.2.0.
>> Yet, Spark responds as if the dependency was not added.
>> I am running spark-v3.2.0 (Scala 2.12).
>>
>> On the other hand, everything works great with spark-shell or spark-sql.
>>
>> I would appreciate any advice or feedback to get this running.
>>
>> Thank you,
>> Anna
>>
>>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/

RE: Using Avro file format with SparkSQL

2022-02-14 Thread Morven Huang

I wrote a toy spark job and ran it within my IDE, same error if I don’t add 
spark-avro to my pom.xml. After putting spark-avro dependency to my pom.xml, 
everything works fine.

Another thing is, if my memory serves me right, the spark-submit options for 
extra jars is ‘--jars’ , not ‘--packages’. 

Regards, 

Morven Huang


On 2022/02/10 03:25:28 "Karanika, Anna" wrote:
> Hello,
> 
> I have been trying to use spark SQL’s operations that are related to the Avro 
> file format,
> e.g., stored as, save, load, in a Java class but they keep failing with the 
> following stack trace:
> 
> Exception in thread "main" org.apache.spark.sql.AnalysisException:  Failed to 
> find data source: avro. Avro is built-in but external data source module 
> since Spark 2.4. Please deploy the application as per the deployment section 
> of "Apache Avro Data Source Guide".
> at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
> at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
> at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
> at 
> org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
> at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
> at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 
> For context, I am invoking spark-submit and adding arguments --packages 
> org.apache.spark:spark-avro_2.12:3.2.0.
> Yet, Spark responds as if the dependency was not added.
> I am running spark-v3.2.0 (Scala 2.12).
> 
> On the other hand, everything works great with spark-shell or spark-sql.
> 
> I would appreciate any advice or feedback to get this running.
> 
> Thank you,
> Anna
> 
> 
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Using Avro file format with SparkSQL

2022-02-11 Thread Gourav Sengupta

Hi Anna,

Avro libraries should be inbuilt in SPARK in case I am not wrong. Any
particular reason why you are using a deprecated or soon to be deprecated
version of SPARK?

SPARK 3.2.1 is fantastic.

Please do let us know about your set up if possible.


Regards,
Gourav Sengupta

On Thu, Feb 10, 2022 at 3:35 AM Karanika, Anna  wrote:

> Hello,
>
> I have been trying to use spark SQL’s operations that are related to the
> Avro file format,
> e.g., stored as, save, load, in a Java class but they keep failing with
> the following stack trace:
>
> Exception in thread "main" org.apache.spark.sql.AnalysisException:  Failed
> to find data source: avro. Avro is built-in but external data source module
> since Spark 2.4. Please deploy the application as per the deployment
> section of "Apache Avro Data Source Guide".
> at
> org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
> at
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
> at
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
> at
> org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
> at
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
> at
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
> at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> For context, I am invoking spark-submit and adding arguments --packages
> org.apache.spark:spark-avro_2.12:3.2.0.
> Yet, Spark responds as if the dependency was not added.
> I am running spark-v3.2.0 (Scala 2.12).
>
> On the other hand, everything works great with spark-shell or spark-sql.
>
> I would appreciate any advice or feedback to get this running.
>
> Thank you,
> Anna
>
>

Re: Using Avro file format with SparkSQL

2022-02-09 Thread frakass


Have you added the dependency in the build.sbt?
Can you 'sbt package' the source successfully?

regards
frakass

On 2022/2/10 11:25, Karanika, Anna wrote:
For context, I am invoking spark-submit and adding arguments --packages 
org.apache.spark:spark-avro_2.12:3.2.0.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Using Avro file format with SparkSQL

2022-02-09 Thread Karanika, Anna

Hello,

I have been trying to use spark SQL’s operations that are related to the Avro 
file format,
e.g., stored as, save, load, in a Java class but they keep failing with the 
following stack trace:

Exception in thread "main" org.apache.spark.sql.AnalysisException:  Failed to 
find data source: avro. Avro is built-in but external data source module since 
Spark 2.4. Please deploy the application as per the deployment section of 
"Apache Avro Data Source Guide".
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1032)
at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
at 
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852)
at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at xsys.fileformats.SparkSQLvsAvro.main(SparkSQLvsAvro.java:57)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

For context, I am invoking spark-submit and adding arguments --packages 
org.apache.spark:spark-avro_2.12:3.2.0.
Yet, Spark responds as if the dependency was not added.
I am running spark-v3.2.0 (Scala 2.12).

On the other hand, everything works great with spark-shell or spark-sql.

I would appreciate any advice or feedback to get this running.

Thank you,
Anna

Re: Using Avro file format with SparkSQL

RE: Re: Using Avro file format with SparkSQL

Re: Using Avro file format with SparkSQL

RE: Using Avro file format with SparkSQL

Re: Using Avro file format with SparkSQL

Re: Using Avro file format with SparkSQL

Using Avro file format with SparkSQL

7 matches

Site Navigation

Mail list logo

Footer information