Re: [DISCUSS] SPIP: XML data source support

2023-07-18 Thread Hyukjin Kwon
Yeah I support this. XML is pretty outdated format TBH but still used in
many legacy systems. For example, Wikipedia dump is one case.

Even when you take a look from stats CVS vs XML vs JSON, some show that XML
is more used in CSV.

On Wed, Jul 19, 2023 at 12:58 AM Sandip Agarwala <
sandip.agarw...@databricks.com> wrote:

> Dear Spark community,
>
> I would like to start a discussion on "XML data source support".
>
> XML is a widely used data format. An external spark-xml package (
> https://github.com/databricks/spark-xml) is available to read and write
> XML data in spark. Making spark-xml built-in will provide a better user
> experience for Spark SQL and structured streaming. The proposal is to
> inline code from the spark-xml package.
> I am collaborating with Hyukjin Kwon, who is the original author of
> spark-xml, for this effort.
>
> SPIP link:
>
> https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing
>
> JIRA:
> https://issues.apache.org/jira/browse/SPARK-44265
>
> Looking forward to your feedback.
> Thanks, Sandip
>


Re: Spark Scala SBT Local build fails

2023-07-18 Thread Varun Shah
++ DEV community


On Mon, Jul 17, 2023 at 4:14 PM Varun Shah 
wrote:

> Resending this message with a proper Subject line
>
> Hi Spark Community,
>
> I am trying to set up my forked apache/spark project locally for my 1st
> Open Source Contribution, by building and creating a package as mentioned here
> under Running Individual Tests
> .
> Here are the steps I have followed:
> >> .build/sbt  # this opens a sbt console
> >> test # to execute all tests
>
> I am getting the following error and the tests are failing. Even compile /
> package sbt commands fail with the same errors.
>
>>
>> [info] compiling 19 Java sources to
>> forked/spark/common/network-shuffle/target/scala-2.12/test-classes ...
>> [info] compiling 330 Scala sources and 29 Java sources to
>> forked/spark/core/target/scala-2.12/test-classes ...
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:21:0:
>> There should at least one a single empty line separating groups 3rdParty
>> and spark.
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:32:0:
>> org.json4s.JsonAST.JValue should be in group 3rdParty, not spark.
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:33:0:
>> org.json4s.JsonDSL._ should be in group 3rdParty, not spark.
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:34:0:
>> org.json4s._ should be in group 3rdParty, not spark.
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:35:0:
>> org.json4s.jackson.JsonMethods._ should be in group 3rdParty, not spark.
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:37:0:
>> java.util.Locale should be in group java, not spark.
>> [error]
>> forked/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala:38:0:
>> scala.util.control.NonFatal should be in group scala, not spark.
>> [error]
>> forked/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala:226:
>> File line length exceeds 100 characters
>> [error] stack trace is suppressed; run last catalyst /
>> scalaStyleOnCompile for the full output
>> [error] stack trace is suppressed; run last scalaStyleOnTest for the full
>> outpu
>> [error] (catalyst / scalaStyleOnCompile) Failing because of negative
>> scalastyle result
>> [error] (scalaStyleOnTest) Failing because of negative scalastyle result
>>
>
> Can you please guide me if I am doing something wrong.
>
> Regards,
> Varun Shah
>