Re: Help needed to locate the csv parser (for Spark bug reporting/fixing)

2022-02-10 Thread Marnix van den Broek
Thanks, Sean! It was actually on the Catalyst side of things, but I found where column pruning pushdown is delegated to univocity, see [1]. I've tried setting the spark configuration *spark.sql.csv.parser.columnPruning.enabled* to *False* and this prevents the bug from happening. I am unfamiliar

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Martin Grigorov
I've found the problem! It was indeed a local thingy! $ cat ~/.mavenrc MAVEN_OPTS='-XX:+TieredCompilation -XX:TieredStopAtLevel=1' I've added this some time ago. It optimizes the build time. But it seems it also overrides the env var MAVEN_OPTS... Now it fails with: [INFO] ---

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Sean Owen
I think it's another occurrence that I had to change or had to set MAVEN_OPTS. I think this occurs in a way that this setting doesn't affect, though I don't quite understand it. Try the stack size in test runner configs On Thu, Feb 10, 2022, 2:02 PM Martin Grigorov wrote: > Hi Sean, > > On Thu,

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Martin Grigorov
Hi Sean, On Thu, Feb 10, 2022 at 5:37 PM Sean Owen wrote: > Yes I've seen this; the JVM stack size needs to be increased. I'm not sure > if it's env specific (though you and I at least have hit it, I think > others), or whether we need to change our build script. > In the pom.xml file, find

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-10 Thread John Zhuge
The vote is now closed and the vote passes. Thank you to everyone who took the time to review and vote on this SPIP. I’m looking forward to adding this feature to the next Spark release. The tracking JIRA is https://issues.apache.org/jira/browse/SPARK-31357. The tally is: +1s: Walaa Eldin

Re: Help needed to locate the csv parser (for Spark bug reporting/fixing)

2022-02-10 Thread Sean Owen
It starts in org.apache.spark.sql.execution.datasources.csv.CSVDataSource. Yes univocity is used for much of the parsing. I am not sure of the cause of the bug but it does look like one indeed. In one case the parser is asked to read all fields, in the other, to skip one. The pushdown helps

Help needed to locate the csv parser (for Spark bug reporting/fixing)

2022-02-10 Thread Marnix van den Broek
hi all, Yesterday I filed a CSV parsing bug [1] for Spark, that leads to data incorrectness when data contains sequences similar to the one in the report. I wanted to take a look at the parsing logic to see if I could spot the error to update the issue with more information and to possibly

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Sean Owen
Yes I've seen this; the JVM stack size needs to be increased. I'm not sure if it's env specific (though you and I at least have hit it, I think others), or whether we need to change our build script. In the pom.xml file, find "-Xss..." settings and make them something like "-Xss4m", see if that

Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Martin Grigorov
Hi, I am not able to build Spark due to the following error : ERROR] ## Exception when compiling 543 sources to /home/martin/git/apache/spark/sql/catalyst/target/scala-2.12/classes java.lang.BootstrapMethodError: call site initialization exception