Hi,
We have upgraded Spark from 2.4.x to 3.3.1 recently and managed table
creation while writing dataframe as saveAsTable failed with below error.
Can not create the managed table(``) The associated
location('hdfs:') already exists.
On high level our code does below before writing dataframe as
Hi Spark users,
We have been working on GPU acceleration for Apache Spark SQL / Dataframe
using the RAPIDS Accelerator for Apache Spark
<https://www.nvidia.com/en-us/deep-learning-ai/solutions/data-science/apache-spark-3/>
and open source project Alluxio <https://github.com/Alluxi
This is using Spark Scala 2.4.4. I'm getting some very strange behaviour
after reading in a dataframe from a json file, using sparkSession.read in
permissive mode. I've included the error column when reading in the data, as
I want to log details of any errors in the input json file.
My suspicion
Hey,
I'm working on this use case that involves converting DStreams to
Dataframes after some transformations. I've simplified my code into the
following snippet so as to reproduce the error. Also, I've mentioned below
my environment settings.
*Environment:*
Spark Version: 2.2.0
Java: 1.8
Thanks Michael, that worked, appreciate your help.
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Monday, May 15, 2017 11:45 AM
To: Revin Chalil <rcha...@expedia.com>
Cc: User <user@spark.apache.org>
Subject: Re: Spark SQL DataFrame to Kafka Topic
The foreach sink fr
elp is much appreciated. Thank you.
>
>
>
>
>
> *From:* Tathagata Das [mailto:tathagata.das1...@gmail.com]
> *Sent:* Friday, January 13, 2017 3:31 PM
> *To:* Koert Kuipers <ko...@tresata.com>
> *Cc:* Peyman Mohajerian <mohaj...@gmail.com>; Senthil Kumar <
>
rs
<ko...@tresata.com>; silvio.fior...@granturing.com
Subject: RE: Spark SQL DataFrame to Kafka Topic
Hi TD / Michael,
I am trying to use the foreach sink to write to Kafka and followed
this<https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-
athagata.das1...@gmail.com]
Sent: Friday, January 13, 2017 3:31 PM
To: Koert Kuipers <ko...@tresata.com>
Cc: Peyman Mohajerian <mohaj...@gmail.com>; Senthil Kumar
<senthilec...@gmail.com>; User <user@spark.apache.org>; senthilec...@apache.org
Subject: Re: Spark SQL DataFrame to Kafka
will shuffle, and following join COULD cause another shuffle.
>> So I am not sure if it is a smart way.
>>
>> Yong
>>
>> --
>> *From:* shyla deshpande <deshpandesh...@gmail.com>
>> *Sent:* Wednesday, March 29, 2017 12:33 PM
it is a smart way.
>
> Yong
>
> --
> *From:* shyla deshpande <deshpandesh...@gmail.com>
> *Sent:* Wednesday, March 29, 2017 12:33 PM
> *To:* user
> *Subject:* Re: Spark SQL, dataframe join questions.
>
>
>
> On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpa
join COULD cause another shuffle. So I
am not sure if it is a smart way.
Yong
From: shyla deshpande <deshpandesh...@gmail.com>
Sent: Wednesday, March 29, 2017 12:33 PM
To: user
Subject: Re: Spark SQL, dataframe join questions.
On Tue, Mar 28, 2017 at 2
On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande
wrote:
> Following are my questions. Thank you.
>
> 1. When joining dataframes is it a good idea to repartition on the key column
> that is used in the join or
> the optimizer is too smart so forget it.
>
> 2. In RDD
Kumar <senthilec...@gmail.com>
> wrote:
>
> Hi Team ,
>
> Sorry if this question already asked in this forum..
>
> Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
>
> Here is my Code which Reads Parquet File :
>
> *val sqlContext = new org.apache.s
reaming-kafka.html
>>> http://spark.apache.org/docs/latest/structured-streaming-pro
>>> gramming-guide.html
>>>
>>> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar <senthilec...@gmail.com>
>>> wrote:
>>>
>>>> Hi Team ,
>>&g
t;>
>>> Hi Team ,
>>>
>>> Sorry if this question already asked in this forum..
>>>
>>> Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
>>>
>>> Here is my Code which Reads Parquet File :
>>>
>>>
.html
> http://spark.apache.org/docs/latest/structured-streaming-
> programming-guide.html
>
> On Fri, Jan 13, 2017 at 3:32 AM, Senthil Kumar <senthilec...@gmail.com>
> wrote:
>
>> Hi Team ,
>>
>> Sorry if this question already asked in this forum..
>>
; Hi Team ,
>
> Sorry if this question already asked in this forum..
>
> Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
>
> Here is my Code which Reads Parquet File :
>
> *val sqlContext = new org.apache.spark.sql.SQLContext(sc);*
>
> *val df = sqlCo
Hi Team ,
Sorry if this question already asked in this forum..
Can we ingest data to Apache Kafka Topic from Spark SQL DataFrame ??
Here is my Code which Reads Parquet File :
*val sqlContext = new org.apache.spark.sql.SQLContext(sc);*
*val df = sqlContext.read.parquet("
Hi,
Want to add a metadata field to StructField case class in spark.
case class StructField(name: String)
And how to carry over the metadata in query execution.
Hi Spark Developers,
I just ran some very simple operations on a dataset. I was surprise by the
execution plan of take(1), head() or first().
For your reference, this is what I did in pyspark 1.5:
df=sqlContext.read.parquet("someparquetfiles")
df.head()
The above lines take over 15 minutes. I
Seems 1.4 has the same issue.
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
> btw, does 1.4 has the same problem?
>
> On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
>
>> Hi Jerry,
>>
>> Looks like it is a Python-specific issue. Can you create
Hi Yin,
You are right! I just tried the scala version with the above lines, it
works as expected.
I'm not sure if it happens also in 1.4 for pyspark but I thought the
pyspark code just calls the scala code via py4j. I didn't expect that this
bug is pyspark specific. That surprises me actually a
I just noticed you found 1.4 has the same issue. I added that as well in
the ticket.
On Mon, Sep 21, 2015 at 1:43 PM, Jerry Lam wrote:
> Hi Yin,
>
> You are right! I just tried the scala version with the above lines, it
> works as expected.
> I'm not sure if it happens
Looks like the problem is df.rdd does not work very well with limit. In
scala, df.limit(1).rdd will also trigger the issue you observed. I will add
this in the jira.
On Mon, Sep 21, 2015 at 10:44 AM, Jerry Lam wrote:
> I just noticed you found 1.4 has the same issue. I
Hi Jerry,
Looks like it is a Python-specific issue. Can you create a JIRA?
Thanks,
Yin
On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
> Hi Spark Developers,
>
> I just ran some very simple operations on a dataset. I was surprise by the
> execution plan of take(1),
btw, does 1.4 has the same problem?
On Mon, Sep 21, 2015 at 10:01 AM, Yin Huai wrote:
> Hi Jerry,
>
> Looks like it is a Python-specific issue. Can you create a JIRA?
>
> Thanks,
>
> Yin
>
> On Mon, Sep 21, 2015 at 8:56 AM, Jerry Lam wrote:
>
>> Hi
On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wanglong_...@163.com wrote:
Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API
with text file in below format
id gender height
1 M 180
2
*/
On 10 Aug 2015, at 09:36, Netwaver wanglong_...@163.com wrote:
Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame
API with text file in below format
id gender height
1 M 180
2 F
Isnt it a space separated data? It is not a comma(,) separated nor pipe (|)
separated data.
Thanks
Best Regards
On Mon, Aug 10, 2015 at 12:06 PM, Netwaver wanglong_...@163.com wrote:
Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame
API with text
Hi Spark experts,
I am now using Spark 1.4.1 and trying Spark SQL/DataFrame API
with text file in below format
id gender height
1 M 180
2 F 167
... ...
But I meet
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com
:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.
-
To unsubscribe, e-mail: user
)
case y: StructField = y
})
df.sqlContext.createDataFrame( df.rdd, newSchema)
}
Is there a cheaper solution?
3. *Any comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL
})
df.sqlContext.createDataFrame( df.rdd, newSchema)
}
Is there a cheaper solution?
3. *Any comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering
})
df.sqlContext.createDataFrame( df.rdd, newSchema)
}
Is there a cheaper solution?
3. *Any comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087
in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
solution?
3. *Any comments?*
Cheers and thx in advance,
Martin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-DataFrame-Nullable-column-and-filtering-tp24087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
37 matches
Mail list logo