Access Array StructField inside StructType.

2017-12-12 Thread satyajit vegesna
Hi All,

How to iterate over the StructField inside *after*,

StructType(StructField(*after*,StructType(*StructField(Alarmed,LongType,true),
StructField(CallDollarLimit,StringType,true),
StructField(CallRecordWav,StringType,true),
StructField(CallTimeLimit,LongType,true),
StructField(Signature,StringType,true*), true)

Regards,
Satyajit.


Re: OutputMetrics empty for DF writes - any hints?

2017-12-12 Thread Jason White
It should be in the first email in this chain.

On Tue, Dec 12, 2017, 7:10 PM Ryan Blue  wrote:

> Great. What's the JIRA issue?
>
> On Mon, Dec 11, 2017 at 8:12 PM, Jason White 
> wrote:
>
>> Yes, the fix has been merged at should make it into the 2.3 release.
>>
>> On Mon, Dec 11, 2017, 5:50 PM Ryan Blue  wrote:
>>
>>> Is anyone currently working on this? I just fixed it in our Spark build
>>> and can contribute the fix if there isn't already a PR for it.
>>>
>>> On Mon, Nov 27, 2017 at 12:59 PM, Jason White 
>>> wrote:
>>>
 It doesn't look like the insert command has any metrics in it. I don't
 see
 any commands with metrics, but I could be missing something.
>>>
>>>



 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: OutputMetrics empty for DF writes - any hints?

2017-12-12 Thread Ryan Blue
Great. What's the JIRA issue?

On Mon, Dec 11, 2017 at 8:12 PM, Jason White 
wrote:

> Yes, the fix has been merged at should make it into the 2.3 release.
>
> On Mon, Dec 11, 2017, 5:50 PM Ryan Blue  wrote:
>
>> Is anyone currently working on this? I just fixed it in our Spark build
>> and can contribute the fix if there isn't already a PR for it.
>>
>> On Mon, Nov 27, 2017 at 12:59 PM, Jason White 
>> wrote:
>>
>>> It doesn't look like the insert command has any metrics in it. I don't
>>> see
>>> any commands with metrics, but I could be missing something.
>>
>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: RDD[internalRow] -> DataSet

2017-12-12 Thread Vadim Semenov
not possible, but you can add your own object in your project to the
spark's package that would give you access to private methods

package org.apache.spark.sql

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.StructType

object DataFrameUtil {
  /**
* Creates a DataFrame out of RDD[InternalRow] that you can get
using `df.queryExection.toRdd`
*/
  def createFromInternalRows(sparkSession: SparkSession, schema:
StructType, rdd: RDD[InternalRow]): DataFrame = {
val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession)
Dataset.ofRows(sparkSession, logicalPlan)
  }
}


Decimals

2017-12-12 Thread Marco Gaido
Hi all,

I saw in these weeks that there are a lot of problems related to decimal
values (SPARK-22036, SPARK-22755, for instance). Some are related to
historical choices, which I don't know, thus please excuse me if I am
saying dumb things:

 - why are we interpreting literal constants in queries as Decimal and not
as Double? I think it is very unlikely that a user can enter a number which
is beyond Double precision.
 - why are we returning null in case of precision loss? Is this approach
better than just giving a result which might loose some accuracy?

Thanks,
Marco


Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-12 Thread Jacek Laskowski
Hi,

It appears that there's already a discussion about why GenerateExec
operator has the flag off.

1. https://issues.apache.org/jira/browse/SPARK-21657 Spark has exponential
time complexity to explode(array of structs) which is in progress
2. And more importantly @rxin has turned that off because --> "Disable
generate codegen since it fails my workload." - Wished he included the
workload to showcase the issue :(

Looks like there are a bunch of wise people already on it so I'll just
listen...

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Mon, Dec 11, 2017 at 10:15 PM, Jacek Laskowski  wrote:

> Hi,
>
> After another day trying to get my head around WholeStageCodegenExec
> and InputAdapter and CollapseCodegenStages optimization rule I came to
> conclusion that it may have something to do with UnsafeRow vs
> GenericInternalRow/InternalRow so when a physical operator wants to
> _somehow_ participate in whole-stage codegen it can extend CodegenSupport
> trait and enable accessing GenericInternalRow by turning supportCodegen
> flag off.
>
> I can understand how badly that can read, but without help from Spark SQL
> devs that's all I can figure out myself. Any help appreciated.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Sun, Dec 10, 2017 at 10:34 PM, Stephen Boesch 
> wrote:
>
>> A relevant observation:  there was a closed/executed jira last year to
>> remove the option to disable the codegen flag (and unsafe flag as well):
>> https://issues.apache.org/jira/browse/SPARK-11644
>>
>> 2017-12-10 13:16 GMT-08:00 Jacek Laskowski :
>>
>>> Hi,
>>>
>>> I'm wondering why a physical operator like GenerateExec would
>>> extend CodegenSupport [1], but had the supportCodegen flag turned off?
>>>
>>> What's the meaning of such a combination -- be a CodegenSupport with
>>> supportCodegen off?
>>>
>>> [1] https://github.com/apache/spark/blob/master/sql/core/src
>>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64
>>>
>>> [2] https://github.com/apache/spark/blob/master/sql/core/src
>>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://about.me/JacekLaskowski
>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>
>>
>