Hi,
which is more efficient?
this is already defined since 2.4.0
*def isEmpty: Boolean = withAction("isEmpty",
limit(1).groupBy().count().queryExecution) { plan =>
plan.executeCollect().head.getLong(0) == 0}*
or
* df.head(1).isEmpty*
I am checking if a DF is empty and it is taking forever
Hi ,
I am trying to write a generic method which will return custom type
datasets as well as spark.sql.Row
def read[T](params: Map[String, Any])(implicit encoder: Encoder[T]): Dataset[T]
is my method signature, which is working fine for custom types but when I
am trying to obtain a Dataset[Row]
/spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/DataFrameNaFunctions.html
>>
>> On Mon, Apr 29, 2019 at 4:57 PM Shixiong(Ryan) Zhu <
>> shixi...@databricks.com> wrote:
>>
>>> Hey Snehasish,
>>>
>>> Do you have a reproducer for this
Hi,
While writing to kafka using spark structured streaming , if all the values
in certain column are Null it gets dropped
Is there any way to override this , other than using na.fill functions
Regards,
Snehasish
Hi,
I am currently facing an issue , while performing union on three data fames
say df1,df2,df3 once the operation is performed and I am trying to save the
data , the data is getting shuffled so the ordering of data in df1,df2,df3
are not maintained.
When I save the data as text/csv file the
Hi,
I am using Spark 2.2 , a table fetched from database contains a (.) dot in
one of the column names.
Whenever I am trying to select that particular column I am getting query
analysis exception.
I have tried creating a temporary table , using createOrReplaceTempView()
and fetch the column's
Hi,
I am using spark 2.2 csv reader
I have data in following format
123|123|"abc"||""|"xyz"
Where || is null
And "" is one blank character as per the requirement
I was using option sep as pipe
And option quote as ""
Parsed the data and using regex I was able to fulfill all the mentioned
Hi Mina,
Even text won't work you may try this df.coalesce(1).write.option("h
eader","true").mode("overwrite").save("output",format=text)
Else convert to an rdd and use saveAsTextFile
Regards,
Snehasish
On Wed, Feb 21, 2018 at 3:38 AM, SNEHASISH DUTTA
alesce(1).write.option("header","true").mode("overwrite
> ").csv("output") throws
>
> java.lang.UnsupportedOperationException: CSV data source does not support
> struct<...> data type.
>
>
> Regards,
> Mina
>
>
>
>
Hi Mina,
This might help
df.coalesce(1).write.option("header","true").mode("overwrite").csv("output")
Regards,
Snehasish
On Wed, Feb 21, 2018 at 1:53 AM, Mina Aslani wrote:
> Hi,
>
> I would like to serialize a dataframe with vector values into a text/csv
> in pyspark.
>
Hi Lian,
This could be the solution
case class Symbol(symbol: String, sector: String)
case class Tick(symbol: String, sector: String, open: Double, close: Double)
// symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick]
symbolDs.map { k =>
11 matches
Mail list logo