Re: [SparkSQL, SparkUI, RESTAPI] How to extract the WholeStageCodeGen ids from SparkUI

2023-04-11 Thread Chitral Verma
try explain codegen on your DF and then pardee the string On Fri, 7 Apr, 2023, 3:53 pm Chenghao Lyu, wrote: > Hi, > > The detailed stage page shows the involved WholeStageCodegen Ids in its > DAG visualization from the Spark UI when running a SparkSQL. (e.g., under > the link > node:18088/histor

Re: Non string type partitions

2023-04-11 Thread Chitral Verma
Because the name of the directory cannot be an object, it has to be a string to create partitioned dirs like "date=2023-04-10" On Tue, 11 Apr, 2023, 8:27 pm Charles vinodh, wrote: > > Hi Team, > > We are running into the below error when we are trying to run a simple > query a partitioned table

Fwd: [New Project] sparksql-ml : Distributed Machine Learning using SparkSQL.

2023-02-27 Thread Chitral Verma
inputCol='raw', outputCol='filtered') AND WRITE AT LOCATION '/path/to/test-transformer' But a lot more can be done with this library. I was wondering if any of you find this interesting and would like to contribute to the project here, https://github.com/chitralverma/sparksql-ml Regards, Chitral Verma

Re: Profiling data quality with Spark

2022-12-29 Thread Chitral Verma
Hi Rajat, I have worked for years in democratizing data quality for some of the top organizations and I'm also an Apache Griffin Contributor and PMC - so I know a lot about this space. :) Coming back to your original question, there are a lot of data quality options available in the market today a

[Spark SQL]: DataFrame schema resulting in NullPointerException

2017-11-19 Thread Chitral Verma
quot;, "india")) .toDF("name", "country") val sc = df.schema df.rdd .map(x => x.toSeq) .map(x => new GenericRowWithSchema(x.toArray, sc)) .foreach(println) } } I wonder why this is happening as *df.rdd* is not an action and there is visible change in state of dataframe just yet. What are your thoughts on this? Regards, Chitral Verma