Re: MLlib mission and goals

2017-01-23 Thread Stephen Boesch
Along the lines of #1: the spark packages seemed to have had a good start about two years ago: but now there are not more than a handful in general use - e.g. databricks CSV. When the available packages are browsed the majority are incomplete, empty, unmaintained, or unclear. Any ideas on how to

MLlib mission and goals

2017-01-23 Thread Joseph Bradley
This thread is split off from the "Feedback on MLlib roadmap process proposal" thread for discussing the high-level mission and goals for MLlib. I hope this thread will collect feedback and ideas, not necessarily lead to huge decisions. Copying from the previous thread: *Seth:* """ I would love

Re: Feedback on MLlib roadmap process proposal

2017-01-23 Thread Joseph Bradley
Hi Seth, The proposal is geared towards exactly the issue you're describing: providing more visibility into the capacity and intentions of committers. If there are things you'd add to it or change to improve further, it would be great to hear ideas! The past roadmap JIRA has some more background

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Julien Le Dem
Thank you Cheng! On Mon, Jan 23, 2017 at 12:02 PM, Cheng Lian wrote: > Sorry for being late, I'm building a Spark branch based on the most recent > master to test out 1.8.2-rc1, will post my result here ASAP. > > Cheng > > On 1/23/17 11:43 AM, Julien Le Dem wrote: > > Hi

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Cheng Lian
Sorry for being late, I'm building a Spark branch based on the most recent master to test out 1.8.2-rc1, will post my result here ASAP. Cheng On 1/23/17 11:43 AM, Julien Le Dem wrote: Hi Spark dev, Here is the voting thread for parquet 1.8.2 release. Cheng or someone else we would appreciate

Re: [VOTE] Release Apache Parquet 1.8.2 RC1

2017-01-23 Thread Julien Le Dem
Hi Spark dev, Here is the voting thread for parquet 1.8.2 release. Cheng or someone else we would appreciate you verify it as well and reply to the thread. On Mon, Jan 23, 2017 at 11:40 AM, Julien Le Dem wrote: > +1 > Followed:

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-23 Thread Michael Allman
Hi Stan, What OS/version are you using? Michael > On Jan 22, 2017, at 11:36 PM, StanZhai wrote: > > I'm using Parallel GC. > rxin wrote >> Are you using G1 GC? G1 sometimes uses a lot more memory than the size >> allocated. >> >> >> On Sun, Jan 22, 2017 at 12:58 AM

Re: A question about creating persistent table when in-memory catalog is used

2017-01-23 Thread Xiao Li
Reynold mentioned the direction we are heading. You can see many PRs the community submitted are for this target. To achieve this, a lot of works we need to do. For example, for some serde, Hive metastore will infer the schema when the schema is not provided, but our InMemoryCatalog does not have