Re: Questions regarding memory usage

2014-09-12 Thread Sean Owen
On Thu, Sep 11, 2014 at 10:17 PM, Tom thubregt...@gmail.com wrote: If I set SPARK_DRIVER_MEMORY to x GB, Spark reports /14/09/11 15:36:41 INFO MemoryStore: MemoryStore started with capacity ~0.55*x GB/ *Question:* Does this relate to spark.storage.memoryFraction (default 0.6), and is the

Re: PSA: SI-8835 (Iterator 'drop' method has a complexity bug causing quadratic behavior)

2014-09-12 Thread Reynold Xin
Thanks for the email, Erik. The Scala collection library implementation is a complicated beast ... On Sat, Sep 6, 2014 at 8:27 AM, Erik Erlandson e...@redhat.com wrote: I tripped over this recently while preparing a solution for SPARK-3250 (efficient sampling): Iterator 'drop' method has a

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Dibyendu Bhattacharya
Dear all, I am sorry. This was a false alarm There was some issue in the RDD processing logic which leads to large backlog. Once I fixed the issues in my processing logic, I can see all messages being pulled nicely without any Block Removed error. I need to tune certain configurations in my

Adding abstraction in MLlib

2014-09-12 Thread Egor Pahomov
Here in Yandex, during implementation of gradient boosting in spark and creating our ML tool for internal use, we found next serious problems in MLLib: - There is no Regression/Classification model abstraction. We were building abstract data processing pipelines, which should work just

Re: Reporting serialized task size after task broadcast change?

2014-09-12 Thread Guru Medasani
I thought we could see this on the Spark Web UI storage tab. May be I was looking at something else too. On Sep 11, 2014, at 8:47 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hmm, well I can't find it now, must have been hallucinating. Do you know off the top of your head where I'd be able

Re: Adding abstraction in MLlib

2014-09-12 Thread Egor Pahomov
Some architect suggestions on this matter - https://github.com/apache/spark/pull/2371 2014-09-12 16:38 GMT+04:00 Egor Pahomov pahomov.e...@gmail.com: Sorry, I misswrote - I meant learners part of framework - models already exists. 2014-09-12 15:53 GMT+04:00 Christoph Sawade

Re: Junit spark tests

2014-09-12 Thread Rajiv Abraham
Hi Sudershan, That's interesting. I don't have an answer to your question but considering the functional nature of Spark, I have hardly had to use mock objects(maybe you could inform us of your use case). Mock object 'expectations' are in 'most' cases implementation of 'Tell, Don't ask' principle

Re: Use Case of mutable RDD - any ideas around will help.

2014-09-12 Thread Patrick Wendell
[moving to user@] This would typically be accomplished with a union() operation. You can't mutate an RDD in-place, but you can create a new RDD with a union() which is an inexpensive operator. On Fri, Sep 12, 2014 at 5:28 AM, Archit Thakur archit279tha...@gmail.com wrote: Hi, We have a use

Re: parquet predicate / projection pushdown into unionAll

2014-09-12 Thread Cody Koeninger
Cool, thanks for your help on this. Any chance of adding it to the 1.1.1 point release, assuming there ends up being one? On Wed, Sep 10, 2014 at 11:39 AM, Michael Armbrust mich...@databricks.com wrote: Hey Cody, Thanks for doing this! Will look at your PR later today. Michael On Wed,

Re: Adding abstraction in MLlib

2014-09-12 Thread Reynold Xin
Xiangrui can comment more, but I believe Joseph and him are actually working on standardize interface and pipeline feature for 1.2 release. On Fri, Sep 12, 2014 at 8:20 AM, Egor Pahomov pahomov.e...@gmail.com wrote: Some architect suggestions on this matter -

Re: Spark authenticate enablement

2014-09-12 Thread Sandy Ryza
Hi Jun, I believe that's correct that Spark authentication only works against YARN. -Sandy On Thu, Sep 11, 2014 at 2:14 AM, Jun Feng Liu liuj...@cn.ibm.com wrote: Hi, there I am trying to enable the authentication on spark on standealone model. Seems like only SparkSubmit load the

Re: Adding abstraction in MLlib

2014-09-12 Thread Xiangrui Meng
Hi Egor, Thanks for the feedback! We are aware of some of the issues you mentioned and there are JIRAs created for them. Specifically, I'm pushing out the design on pipeline features and algorithm/model parameters this week. We can move our discussion to

Re: parquet predicate / projection pushdown into unionAll

2014-09-12 Thread Michael Armbrust
Yeah, thanks for implementing it! Since Spark SQL is an alpha component and moving quickly the plan is to backport all of master into the next point release in the 1.1 series. On Fri, Sep 12, 2014 at 9:27 AM, Cody Koeninger c...@koeninger.org wrote: Cool, thanks for your help on this. Any

Re: Adding abstraction in MLlib

2014-09-12 Thread Erik Erlandson
Are interface designs being captured anywhere as documents that the community can follow along with as the proposals evolve? I've worked on other open source projects where design docs were published as living documents (e.g. on google docs, or etherpad, but the particular mechanism isn't

don't trigger tests when only .md files are changed

2014-09-12 Thread Nicholas Chammas
Would it make sense to have Jenkins *not* trigger tests when the only files that have changed are .md files (example https://github.com/apache/spark/pull/2367)? Those don’t even need RAT checks, right? I can make this change if it makes sense. Nick ​

Re: don't trigger tests when only .md files are changed

2014-09-12 Thread Nicholas Chammas
We could still have Jenkins post a message to the effect of “this patch only modifies .md files; no tests will be run”. ​ On Fri, Sep 12, 2014 at 3:48 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Would it make sense to have Jenkins *not* trigger tests when the only files that have

Re: Adding abstraction in MLlib

2014-09-12 Thread Patrick Wendell
We typically post design docs on JIRA's before major work starts. For instance, pretty sure SPARk-1856 will have a design doc posted shortly. On Fri, Sep 12, 2014 at 12:10 PM, Erik Erlandson e...@redhat.com wrote: Are interface designs being captured anywhere as documents that the community

Re: don't trigger tests when only .md files are changed

2014-09-12 Thread Reynold Xin
I like that idea, but the load on Jenkins isn't very high. The more complexity we add to the test script, the easier it is to screw it up (at some point we would need to add unit tests for the build scripts). Maybe we can just add the message part, so it becomes clear that a pull request does not

Response to archived question 'Spark and Scala Worksheet'

2014-09-12 Thread Rajiv Abraham
Hi, This is a response to an archived email about how to run Spark in a Scala worksheet in the Scala IDE. http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3ccaauywg8a+mjqwhtgytz0lumntlgfwa-noxtopeyadeq+gws...@mail.gmail.com%3E I know it's a bit late :) but here is how I do it.

NullWritable not serializable

2014-09-12 Thread Du Li
Hi, I was trying the following on spark-shell (built with apache master and hadoop 2.4.0). Both calling rdd2.collect and calling rdd3.collect threw java.io.NotSerializableException: org.apache.hadoop.io.NullWritable. I got the same problem in similar code of my app which uses the newly

Re: NullWritable not serializable

2014-09-12 Thread Matei Zaharia
Hi Du, I don't think NullWritable has ever been serializable, so you must be doing something differently from your previous program. In this case though, just use a map() to turn your Writables to serializable types (e.g. null and String). Matie On September 12, 2014 at 8:48:36 PM, Du Li