Access Array StructField inside StructType.
Hi All, How to iterate over the StructField inside *after*, StructType(StructField(*after*,StructType(*StructField(Alarmed,LongType,true), StructField(CallDollarLimit,StringType,true), StructField(CallRecordWav,StringType,true), StructField(CallTimeLimit,LongType,true), StructField(Signature,StringType,true*), true) Regards, Satyajit.
Re: OutputMetrics empty for DF writes - any hints?
It should be in the first email in this chain. On Tue, Dec 12, 2017, 7:10 PM Ryan Blue wrote: > Great. What's the JIRA issue? > > On Mon, Dec 11, 2017 at 8:12 PM, Jason White > wrote: > >> Yes, the fix has been merged at should make it into the 2.3 release. >> >> On Mon, Dec 11, 2017, 5:50 PM Ryan Blue wrote: >> >>> Is anyone currently working on this? I just fixed it in our Spark build >>> and can contribute the fix if there isn't already a PR for it. >>> >>> On Mon, Nov 27, 2017 at 12:59 PM, Jason White >>> wrote: >>> It doesn't look like the insert command has any metrics in it. I don't see any commands with metrics, but I could be missing something. >>> >>> -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> > > > -- > Ryan Blue > Software Engineer > Netflix >
Re: OutputMetrics empty for DF writes - any hints?
Great. What's the JIRA issue? On Mon, Dec 11, 2017 at 8:12 PM, Jason White wrote: > Yes, the fix has been merged at should make it into the 2.3 release. > > On Mon, Dec 11, 2017, 5:50 PM Ryan Blue wrote: > >> Is anyone currently working on this? I just fixed it in our Spark build >> and can contribute the fix if there isn't already a PR for it. >> >> On Mon, Nov 27, 2017 at 12:59 PM, Jason White >> wrote: >> >>> It doesn't look like the insert command has any metrics in it. I don't >>> see >>> any commands with metrics, but I could be missing something. >> >> >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix
Re: RDD[internalRow] -> DataSet
not possible, but you can add your own object in your project to the spark's package that would give you access to private methods package org.apache.spark.sql import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.LogicalRDD import org.apache.spark.sql.types.StructType object DataFrameUtil { /** * Creates a DataFrame out of RDD[InternalRow] that you can get using `df.queryExection.toRdd` */ def createFromInternalRows(sparkSession: SparkSession, schema: StructType, rdd: RDD[InternalRow]): DataFrame = { val logicalPlan = LogicalRDD(schema.toAttributes, rdd)(sparkSession) Dataset.ofRows(sparkSession, logicalPlan) } }
Decimals
Hi all, I saw in these weeks that there are a lot of problems related to decimal values (SPARK-22036, SPARK-22755, for instance). Some are related to historical choices, which I don't know, thus please excuse me if I am saying dumb things: - why are we interpreting literal constants in queries as Decimal and not as Double? I think it is very unlikely that a user can enter a number which is beyond Double precision. - why are we returning null in case of precision loss? Is this approach better than just giving a result which might loose some accuracy? Thanks, Marco
Re: GenerateExec, CodegenSupport and supportCodegen flag off?!
Hi, It appears that there's already a discussion about why GenerateExec operator has the flag off. 1. https://issues.apache.org/jira/browse/SPARK-21657 Spark has exponential time complexity to explode(array of structs) which is in progress 2. And more importantly @rxin has turned that off because --> "Disable generate codegen since it fails my workload." - Wished he included the workload to showcase the issue :( Looks like there are a bunch of wise people already on it so I'll just listen... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Dec 11, 2017 at 10:15 PM, Jacek Laskowski wrote: > Hi, > > After another day trying to get my head around WholeStageCodegenExec > and InputAdapter and CollapseCodegenStages optimization rule I came to > conclusion that it may have something to do with UnsafeRow vs > GenericInternalRow/InternalRow so when a physical operator wants to > _somehow_ participate in whole-stage codegen it can extend CodegenSupport > trait and enable accessing GenericInternalRow by turning supportCodegen > flag off. > > I can understand how badly that can read, but without help from Spark SQL > devs that's all I can figure out myself. Any help appreciated. > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > On Sun, Dec 10, 2017 at 10:34 PM, Stephen Boesch > wrote: > >> A relevant observation: there was a closed/executed jira last year to >> remove the option to disable the codegen flag (and unsafe flag as well): >> https://issues.apache.org/jira/browse/SPARK-11644 >> >> 2017-12-10 13:16 GMT-08:00 Jacek Laskowski : >> >>> Hi, >>> >>> I'm wondering why a physical operator like GenerateExec would >>> extend CodegenSupport [1], but had the supportCodegen flag turned off? >>> >>> What's the meaning of such a combination -- be a CodegenSupport with >>> supportCodegen off? >>> >>> [1] https://github.com/apache/spark/blob/master/sql/core/src >>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64 >>> >>> [2] https://github.com/apache/spark/blob/master/sql/core/src >>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125 >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> >>> https://about.me/JacekLaskowski >>> Spark Structured Streaming https://bit.ly/spark-structured-streaming >>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >> >> >