Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-07-26 Thread Ryan Blue
I don’t think that we want to block this work until we have a public and stable Expression. Like our decision to expose InternalRow, I think that while this option isn’t great, it at least allows us to move forward. We can hopefully replace it later. Also note that the use of Expression is in the

Re: [DISCUSS][SQL] Control the number of output files

2018-07-26 Thread John Zhuge
Filed https://issues.apache.org/jira/browse/SPARK-24940. Will upload a patch shortly. SPARK-20857 introduced a generic SQL Hint Framework since 2.2.0. On Thu, Jul 26, 2018 at 4:25 PM Reynold Xin wrote: > John, > > You want to create a ticket and submit a patch for this? If there is a >

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-07-26 Thread Reynold Xin
Seems reasonable at high level. I don't think we can use Expression's and SortOrder's in public APIs though. Those are not meant to be public and can break easily across versions. On Tue, Jul 24, 2018 at 9:26 AM Ryan Blue wrote: > The recently adopted SPIP to standardize logical plans requires

Re: [DISCUSS][SQL] Control the number of output files

2018-07-26 Thread Reynold Xin
John, You want to create a ticket and submit a patch for this? If there is a coalesce hint, inject a coalesce logical node. Pretty simple. On Wed, Jul 25, 2018 at 2:48 PM John Zhuge wrote: > Thanks for the comment, Forest. What I am asking is to make whatever DF > repartition/coalesce

offheap memory usage & netty configuration

2018-07-26 Thread Imran Rashid
*I’ve been looking at where untracked memory is getting used in spark, especially offheap memory, and I’ve discovered some things I’d like to share with the community. Most of what I’ve learned has been about the way spark is using netty -- I’ll go into some more detail about that below. I’m also

Qs on Dataset API -- groups of createXXXTempViews and XXXcheckpoint methods

2018-07-26 Thread Jacek Laskowski
Hi, I'd appreciate your help on the following two questions about Dataset API: 1. Why do Dataset methods: createTempView, createOrReplaceTempView, createGlobalTempView and createOrReplaceGlobalTempView not return a DataFrame? They seem to be neither actions nor transformations (and probably the

Re: Asking for reviewing PRs regarding structured streaming

2018-07-26 Thread Jungtaek Lim
I'd like to bump this again, since only one of 6 pull requests is merged (5 remaining), and others are not reviewed (non code style) from committers. https://github.com/apache/spark/pulls/HeartSaVioR All pull requests are related to Structured Streaming, and most of all are already reviewed by