from:"\"Damien Hawes\""

Re: pyspark dataframe join with two different data type

2024-05-10 Thread Damien Hawes

Right now, with the structure of your data, it isn't possible. The rows aren't duplicates of each other. "a" and "b" both exist in the array. So Spark is correctly performing the join. It looks like you need to find another way to model this data to get what you want to achieve. Are the values of

[SparkListener] Accessing classes loaded via the '--packages' option

2024-04-26 Thread Damien Hawes

Hi folks, I'm contributing to the OpenLineage project, specifically the Apache Spark integration. My current focus is on extending the project to support data lineage extraction for Spark Streaming, beginning with Apache Kafka sources and sinks. I've encountered an obstacle when attempting to acc