[Spark-SQL] Dataframe write saveAsTable failed

2023-06-26 Thread Anil Dasari
Hi, We have upgraded Spark from 2.4.x to 3.3.1 recently and managed table creation while writing dataframe as saveAsTable failed with below error. Can not create the managed table(``) The associated location('hdfs:') already exists. On high level our code does below before writing dataframe as

Accelerating Spark SQL / Dataframe using GPUs & Alluxio

2021-04-23 Thread Bin Fan
Hi Spark users, We have been working on GPU acceleration for Apache Spark SQL / Dataframe using the RAPIDS Accelerator for Apache Spark <https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/apache-spark-3/> and open source project Alluxio <https://github.com/Alluxi

[Spark SQL]: Dataframe group by potential bug (Scala)

2019-10-31 Thread ludwiggj
This is using Spark Scala 2.4.4. I'm getting some very strange behaviour after reading in a dataframe from a json file, using sparkSession.read in permissive mode. I've included the error column when reading in the data, as I want to log details of any errors in the input json file. My suspicion

[Spark SQL]: DataFrame schema resulting in NullPointerException

2017-11-19 Thread Chitral Verma
Hey, I'm working on this use case that involves converting DStreams to Dataframes after some transformations. I've simplified my code into the following snippet so as to reproduce the error. Also, I've mentioned below my environment settings. *Environment:* Spark Version: 2.2.0 Java: 1.8

RE: Spark SQL DataFrame to Kafka Topic

2017-05-16 Thread Revin Chalil
Thanks Michael, that worked, appreciate your help. From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Monday, May 15, 2017 11:45 AM To: Revin Chalil <rcha...@expedia.com> Cc: User <user@spark.apache.org> Subject: Re: Spark SQL DataFrame to Kafka Topic The foreach sink fr

Re: Spark SQL DataFrame to Kafka Topic

2017-05-15 Thread Michael Armbrust
elp is much appreciated. Thank you. > > > > > > *From:* Tathagata Das [mailto:tathagata.das1...@gmail.com] > *Sent:* Friday, January 13, 2017 3:31 PM > *To:* Koert Kuipers <ko...@tresata.com> > *Cc:* Peyman Mohajerian <mohaj...@gmail.com>; Senthil Kumar < >

RE: Spark SQL DataFrame to Kafka Topic

2017-05-15 Thread Revin Chalil
rs <ko...@tresata.com>; silvio.fior...@granturing.com Subject: RE: Spark SQL DataFrame to Kafka Topic Hi TD / Michael, I am trying to use the foreach sink to write to Kafka and followed this<https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-

RE: Spark SQL DataFrame to Kafka Topic

2017-05-14 Thread Revin Chalil
athagata.das1...@gmail.com] Sent: Friday, January 13, 2017 3:31 PM To: Koert Kuipers <ko...@tresata.com> Cc: Peyman Mohajerian <mohaj...@gmail.com>; Senthil Kumar <senthilec...@gmail.com>; User <user@spark.apache.org>; senthilec...@apache.org Subject: Re: Spark SQL DataFrame to Kafka

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread vaquar khan
will shuffle, and following join COULD cause another shuffle. >> So I am not sure if it is a smart way. >> >> Yong >> >> -- >> *From:* shyla deshpande <deshpandesh...@gmail.com> >> *Sent:* Wednesday, March 29, 2017 12:33 PM

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Vidya Sujeet
it is a smart way. > > Yong > > -- > *From:* shyla deshpande <deshpandesh...@gmail.com> > *Sent:* Wednesday, March 29, 2017 12:33 PM > *To:* user > *Subject:* Re: Spark SQL, dataframe join questions. > > > > On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpa

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread Yong Zhang
join COULD cause another shuffle. So I am not sure if it is a smart way. Yong From: shyla deshpande <deshpandesh...@gmail.com> Sent: Wednesday, March 29, 2017 12:33 PM To: user Subject: Re: Spark SQL, dataframe join questions. On Tue, Mar 28, 2017 at 2

Re: Spark SQL, dataframe join questions.

2017-03-29 Thread shyla deshpande
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande wrote: > Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD

Re: Spark SQL DataFrame to Kafka Topic

2017-01-24 Thread ayan guha
Kumar <senthilec...@gmail.com> > wrote: > > Hi Team , > > Sorry if this question already asked in this forum.. > > Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? > > Here is my Code which Reads Parquet File : > > *val sqlContext = new org.apache.s

Re: Spark SQL DataFrame to Kafka Topic

2017-01-24 Thread Koert Kuipers
reaming-kafka.html >>> http://spark.apache.org/docs/latest/structured-streaming-pro >>> gramming-guide.html >>> >>> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar <senthilec...@gmail.com> >>> wrote: >>> >>>> Hi Team , >>&g

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Tathagata Das
t;> >>> Hi Team , >>> >>> Sorry if this question already asked in this forum.. >>> >>> Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? >>> >>> Here is my Code which Reads Parquet File : >>> >>>

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Koert Kuipers
.html > http://spark.apache.org/docs/latest/structured-streaming- > programming-guide.html > > On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar <senthilec...@gmail.com> > wrote: > >> Hi Team , >> >> Sorry if this question already asked in this forum.. >>

Re: Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Peyman Mohajerian
; Hi Team , > > Sorry if this question already asked in this forum.. > > Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? > > Here is my Code which Reads Parquet File : > > *val sqlContext = new org.apache.spark.sql.SQLContext(sc);* > > *val df = sqlCo

Spark SQL DataFrame to Kafka Topic

2017-01-13 Thread Senthil Kumar
Hi Team , Sorry if this question already asked in this forum.. Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ?? Here is my Code which Reads Parquet File : *val sqlContext = new org.apache.spark.sql.SQLContext(sc);* *val df = sqlContext.read.parquet("

Spark sql dataframe

2016-06-29 Thread pooja mehta
Hi, Want to add a metadata field to StructField case class in spark. case class StructField(name: String) And how to carry over the metadata in query execution.

Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
Hi Spark Developers, I just ran some very simple operations on a dataset. I was surprise by the execution plan of take(1), head() or first(). For your reference, this is what I did in pyspark 1.5: df=sqlContext.read.parquet("someparquetfiles") df.head() The above lines take over 15 minutes. I

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Seems 1.4 has the same issue. On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote: > btw, does 1.4 has the same problem? > > On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote: > >> Hi Jerry, >> >> Looks like it is a Python-specific issue. Can you create

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
Hi Yin, You are right! I just tried the scala version with the above lines, it works as expected. I'm not sure if it happens also in 1.4 for pyspark but I thought the pyspark code just calls the scala code via py4j. I didn't expect that this bug is pyspark specific. That surprises me actually a

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Jerry Lam
I just noticed you found 1.4 has the same issue. I added that as well in the ticket. On Mon, Sep 21, 2015 at 1:43 PM, Jerry Lam wrote: > Hi Yin, > > You are right! I just tried the scala version with the above lines, it > works as expected. > I'm not sure if it happens

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Looks like the problem is df.rdd does not work very well with limit. In scala, df.limit(1).rdd will also trigger the issue you observed. I will add this in the jira. On Mon, Sep 21, 2015 at 10:44 AM, Jerry Lam wrote: > I just noticed you found 1.4 has the same issue. I

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
Hi Jerry, Looks like it is a Python-specific issue. Can you create a JIRA? Thanks, Yin On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote: > Hi Spark Developers, > > I just ran some very simple operations on a dataset. I was surprise by the > execution plan of take(1),

Re: Spark SQL DataFrame 1.5.0 is extremely slow for take(1) or head() or first()

2015-09-21 Thread Yin Huai
btw, does 1.4 has the same problem? On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote: > Hi Jerry, > > Looks like it is a Python-specific issue. Can you create a JIRA? > > Thanks, > > Yin > > On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote: > >> Hi

Re:Re: Possible issue for Spark SQL/DataFrame

2015-08-12 Thread Netwaver
On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wanglong_...@163.com wrote: Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text file in below format id gender height 1 M 180 2

Re: Possible issue for Spark SQL/DataFrame

2015-08-12 Thread Eugene Morozov
*/ On 10 Aug 2015, at 09:36, Netwaver wanglong_...@163.com wrote: Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text file in below format id gender height 1 M 180 2 F

Re: Possible issue for Spark SQL/DataFrame

2015-08-10 Thread Akhil Das
Isnt it a space separated data? It is not a comma(,) separated nor pipe (|) separated data. Thanks Best Regards On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wanglong_...@163.com wrote: Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text

Possible issue for Spark SQL/DataFrame

2015-08-10 Thread Netwaver
Hi Spark experts, I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API with text file in below format id gender height 1 M 180 2 F 167 ... ... But I meet

Re: Spark SQL DataFrame: Nullable column and filtering

2015-08-01 Thread Martin Senne
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-31 Thread Martin Senne
: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Michael Armbrust
) case y: StructField = y }) df.sqlContext.createDataFrame( df.rdd, newSchema) } Is there a cheaper solution? 3. *Any comments?* Cheers and thx in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
}) df.sqlContext.createDataFrame( df.rdd, newSchema) } Is there a cheaper solution? 3. *Any comments?* Cheers and thx in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
}) df.sqlContext.createDataFrame( df.rdd, newSchema) } Is there a cheaper solution? 3. *Any comments?* Cheers and thx in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087

Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread martinibus77
in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Michael Armbrust
solution? 3. *Any comments?* Cheers and thx in advance, Martin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html Sent from the Apache Spark User List mailing list archive at Nabble.com