Efficiently updating running sums only on new data

2022-10-11 Thread Greg Kopff
I'm new to Spark and would like to seek some advice on how to approach a problem. I have a large dataset that has dated observations. There are also columns that are running sums of some of other columns. date | thing | foo | bar | foo_sum | bar_sum |

Reading parquet strips non-nullability from schema

2022-07-06 Thread Greg Kopff
Hi. I’ve spent the last couple of hours trying to chase down an issue with writing/reading parquet files. I was trying to save (and then read in) a parquet file with a schema that sets my non-nullability details correctly. After having no success for some time, I posted to Stack Overflow

Re: [Java 17] --add-exports required?

2022-06-23 Thread Greg Kopff
Hi. > So the above issue occurs at build and test a maven project with Spark 3.3.0 > and Java 17, rather than test spark-3.3 source code? Yes, that’s correct — it’s my project’s unit test that fails, not a Spark source code unit test. Sorry for the confusion - and thanks for the info about

Re: [Java 17] --add-exports required?

2022-06-23 Thread Greg Kopff
.com> wrote:Hi, Greg"--add-exports java.base/sun.nio.ch=ALL-UNNAMED " does not need to be added when SPARK-33772 is completed, so in order to answer your question, I need more details for testing:1.  Where can I download Java 17 (Temurin-17+35)?2.  What test commands do you u

[Java 17] --add-exports required?

2022-06-22 Thread Greg Kopff
Hi. According to the release notes[1], and specifically the ticket Build and Run Spark on Java 17 (SPARK-33772)[2], Spark now supports running on Java 17. However, using Java 17 (Temurin-17+35) with Maven (3.8.6) and maven-surefire-plugin (3.0.0-M7), when running a unit test that uses Spark