Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Michael Shtelma
Which alternatives to ThriftServer do we really have? If ThriftServer is not there anymore, there is no other way to connect to Spark SQL using JDBC and this is the primary way for connecting BI tools to Spark SQL. Do I miss something? The question is, if Spark would like to be the tool, used

Re: Helper methods for PySpark discussion

2018-10-26 Thread Holden Karau
Ok so let's say you made a spark dataframe, you call length -- what do you expect to happen? Personallt I expect Spark to evaluate the dataframe, this is what happens with collections and even iterables. The interplay with cache is a bit strange, but presumably if you've marked your Dataframe

Re: Helper methods for PySpark discussion

2018-10-26 Thread Li Jin
> (2) If the method forces evaluation this matches most obvious way that would implemented then we should add it with a note in the docstring I am not sure about this because force evaluation could be something that has side effect. For example, df.count() can realize a cache and if we implement

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Reynold Xin
People do use it, and the maintenance cost is pretty low so I don't think we should just drop it. We can be explicit about there are not a lot of developments going on and we are unlikely to add a lot of new features to it, and users are also welcome to use other JDBC/ODBC endpoint implementations

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen
Maybe that's what I really mean (you can tell I don't follow the Hive part closely) In my travels, indeed the thrift server has been viewed as an older solution to a problem probably better met by others. >From my perspective it's worth dropping, but, that's just anecdotal. Any other arguments for

Re: DataSourceV2 hangouts sync

2018-10-26 Thread Ryan Blue
Looks like the majority opinion is for Wednesday. I've sent out an invite to everyone that replied and will add more people as I hear more responses. Thanks, everyone! On Fri, Oct 26, 2018 at 3:23 AM Gengliang Wang wrote: > +1 > > On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon wrote: > > I didn't

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Marco Gaido
Hi all, one big problem about getting rid of the Hive fork is the thriftserver, which relies on the HiveServer from the Hive fork. We might migrate to an apache/hive dependency, but not sure this would help that much. I think a broader topic would be the actual opportunity of having a

Re: Helper methods for PySpark discussion

2018-10-26 Thread Leif Walsh
That all sounds reasonable but I think in the case of 4 and maybe also 3 I would rather see it implemented to raise an error message that explains what’s going on and suggests the explicit operation that would do the most equivalent thing. And perhaps raise a warning (using the warnings module)

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen
OK let's keep this about Hive. Right, good point, this is really about supporting metastore versions, and there is a good argument for retaining backwards-compatibility with older metastores. I don't know how far, but I guess, as far as is practical? Isn't there still a lot of Hive 0.x test

Re: Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Dongjoon Hyun
Hi, Sean and All. For the first question, we support only Hive Metastore from 1.x ~ 2.x. And, we can support Hive Metastore 3.0 simultaneously. Spark is designed like that. I don't think we need to drop old Hive Metastore Support. Is it for avoiding Hive Metastore sharing between Spark2 and

Helper methods for PySpark discussion

2018-10-26 Thread Holden Karau
Coming out of https://github.com/apache/spark/pull/21654 it was agreed the helper methods in question made sense but there was some desire for a plan as to which helper methods we should use. I'd like to purpose a light weight solution to start with for helper methods that match either Pandas or

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-26 Thread Sean Owen
This is all merged to master/2.4. AFAIK there aren't any items I'm monitoring that are needed for 2.4. On Thu, Oct 25, 2018 at 6:54 PM Sean Owen wrote: > Yep, we're going to merge a change to separate the k8s tests into a > separate profile, and fix up the Scala 2.12 thing. While non-critical

Drop support for old Hive in Spark 3.0?

2018-10-26 Thread Sean Owen
Here's another thread to start considering, and I know it's been raised before. What version(s) of Hive should Spark 3 support? If at least we know it won't include Hive 0.x, could we go ahead and remove those tests from master? It might significantly reduce the run time and flakiness. It seems

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-26 Thread Stavros Kontopoulos
Sean, Yes, I updated the PR and re-run it. On Fri, Oct 26, 2018 at 2:54 AM, Sean Owen wrote: > Yep, we're going to merge a change to separate the k8s tests into a > separate profile, and fix up the Scala 2.12 thing. While non-critical those > are pretty nice to have for 2.4. I think that's

Re: DataSourceV2 hangouts sync

2018-10-26 Thread Gengliang Wang
+1 > On Oct 26, 2018, at 8:45 AM, Hyukjin Kwon wrote: > > I didn't know I live in the same timezone with you Wenchen :D. > Monday or Wednesday at 5PM PDT sounds good to me too FWIW. > > 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성: > Good point. How about Monday or Wednesday at 5PM PDT then? >