Re: Spark DataFrame/DataSet Wide Transformations

2019-02-06 Thread Faiz Chachiya
Hi Hemant - Well it is pretty clear to me that conceptually the transformations would behave in similar way. My question is how to identify the parent dependencies as you would typically do with RDD. Thanks, Faiz On Thu, Feb 7, 2019 at 10:22 AM hemant singh wrote: > Same concept applies to

Re: Spark DataFrame/DataSet Wide Transformations

2019-02-06 Thread hemant singh
Same concept applies to Dataframe as it is with RDD with respect to transformations. Both are distributed data set. Thanks On Thu, Feb 7, 2019 at 8:51 AM Faiz Chachiya wrote: > Hello Team, > > With RDDs it is pretty clear which operations would result in wide > transformations and there are

Spark DataFrame/DataSet Wide Transformations

2019-02-06 Thread Faiz Chachiya
Hello Team, With RDDs it is pretty clear which operations would result in wide transformations and there are also options available to find out parent dependencies I have been struggling to do the same with DataFrame/DataSet, I need your helping in finding out which operations may lead to wide

RE : 3 equalTo "3.15" = true

2019-02-06 Thread Denis DEBARBIEUX
I am confused since the two column have the same name. De : Artur Sukhenko [artur.sukhe...@gmail.com] Date d'envoi : mercredi 6 février 2019 17:32 À : Russell Spitzer Cc : user@spark.apache.org Objet : Re: 3 equalTo "3.15" = true scala>

Re: 3 equalTo "3.15" = true

2019-02-06 Thread Artur Sukhenko
Probably it is wrong to compare StringType and ShortType. I'll use something like this df.select(colString, colShort, colShort.equalTo(colString.cast(DecimalType(38,15.show On Wed, Feb 6, 2019 at 6:32 PM Artur Sukhenko wrote: > scala> df.select(colString, colShort,

Re: 3 equalTo "3.15" = true

2019-02-06 Thread Artur Sukhenko
scala> df.select(colString, colShort, colShort.equalTo(colString)).explain == Physical Plan == LocalTableScan [tier_id#3, tier_id#56, (CAST(tier_id AS SMALLINT) = tier_id)#50] On Wed, Feb 6, 2019 at 6:19 PM Russell Spitzer wrote: > Run an "explain" instead of show, i'm betting it's casting

Re: 3 equalTo "3.15" = true

2019-02-06 Thread Russell Spitzer
Run an "explain" instead of show, i'm betting it's casting tier_id to a small_int to do the comparison On Wed, Feb 6, 2019 at 9:31 AM Artur Sukhenko wrote: > Hello guys, > I am migrating from Spark 1.6 to 2.2 and have this issue: > I am casting string to short and comparing them with equal . >

3 equalTo "3.15" = true

2019-02-06 Thread Artur Sukhenko
Hello guys, I am migrating from Spark 1.6 to 2.2 and have this issue: I am casting string to short and comparing them with equal . Original code is: ... when(col(fieldName).equalTo(castedValueCol), castedValueCol). otherwise(defaultErrorValueCol) Reproduce (version 2.3.0.cloudera4): scala> val

Re: DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-06 Thread Shubham Chaurasia
Thanks Ryan On Tue, Feb 5, 2019 at 10:28 PM Ryan Blue wrote: > Shubham, > > DataSourceV2 passes Spark's internal representation to your source and > expects Spark's internal representation back from the source. That's why > you consume and produce InternalRow: "internal" indicates that Spark >