Re: DataSourceWriter V2 Api questions

2019-12-05 Thread Wenchen Fan
I also share the concerns of "writing twice", which hurts performance a lot. What's worse, the final write may not be scalable, like writing the staging table to the final table. If the sink itself doesn't support global transaction, but only local transaction (e.g. kafla), using staging tables

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Ping Liu
Thanks Deepak! I'll try it. On Thu, Dec 5, 2019 at 4:13 PM Deepak Vohra wrote: > The Guava issue could be fixed in one of two ways: > > - Use Hadoop v3 > - Create an Uber jar, refer > > https://gite.lirmm.fr/yagoubi/spark/commit/c9f743957fa963bc1dbed7a44a346ffce1a45cf2 > Managing Java

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Ping Liu
Hi Deepak, For Spark, I am using master branch and just have code updated yesterday. For Guava, I actually deleted my old versions from the local Maven repo. The build process of Spark automatically downloaded a few versions. The oldest version is 14.0.1. But even in 14.0,1 (

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Ping Liu
Hi Sean, Oh, sorry. I just came back to Spark home. However, the same error came out. D:\apache\spark\bin>cd .. D:\apache\spark>bin\spark-shell Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Ping Liu
Hi Deepak, Yes, I did use Maven. I even have the build pass successfully when setting Hadoop version to 3.2. Please see my response to Sean's email. Unfortunately, I only have Docker Toolbox as my Windows doesn't have Microsoft Hyper-V. So I want to avoid using Docker to do major work if

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Sean Owen
No, the build works fine, at least certainly on test machines. As I say, try running from the actual Spark home, not bin/. You are still running spark-shell there. On Thu, Dec 5, 2019 at 4:37 PM Ping Liu wrote: > > Hi Sean, > > Thanks for your response! > > Sorry, I didn't mention that

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Ping Liu
Hi Sean, Thanks for your response! Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go to Spark home directory and ran mvn from there. Following is my build and running result. The source code was just updated yesterday. I guess the POM should specify newer Guava library

Re: Enabling fully disaggregated shuffle on Spark

2019-12-05 Thread Imran Rashid
> Anyway, there were a *lot* of people on the call today and we didn't get a chance to dig into the nitty-gritty details of these points. I would like to know what others think of these (not-fleshed-out) proposals, how they do (or do not) work with disaggregated shuffle implementations in the

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Sean Owen
What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But

Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Ping Liu
Hello, I understand Spark is preferably built on Linux. But I have a Windows machine with a slow Virtual Box for Linux. So I wish I am able to build and run Spark code on Windows environment. Unfortunately, # Apache Hadoop 2.6.X ./build/mvn -Pyarn -DskipTests clean package # Apache Hadoop

Closing stale PRs with a GitHub Action

2019-12-05 Thread Nicholas Chammas
It’s that topic again.  We have almost 500 open PRs. A good chunk of them are more than a year old. The oldest open PR dates to summer 2015. https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Acreated-asc GitHub has an Action for closing stale PRs.

Re: [DISCUSS] Consistent relation resolution behavior in SparkSQL

2019-12-05 Thread Ryan Blue
+1 for the proposal. The current behavior is confusing. We also came up with another case that we should consider while implementing a ViewCatalog: an unresolved relation in a permanent view (from a view catalog) should never resolve a temporary table. If I have a view `pview` defined as `select

[PROPOSAL] Support ANSI type real/numeric as synonyms for float/decimal

2019-12-05 Thread Dr. Kent Yao
Hi all, For better SQL standard support, I recently opened a pull request https://github.com/apache/spark/pull/26537 to support real type as float and numeric as decimal. We have researched a bit and discussed it among several contributors/committers. Sending the email to the dev list to