Re: Mail to u...@spark.apache.org failing

2015-02-09 Thread Patrick Wendell
Ah - we should update it to suggest mailing the dev@ list (and if
there is enough traffic maybe do something else).

I'm happy to add you if you can give an organization name, URL, a list
of which Spark components you are using, and a short description of
your use case..

On Mon, Feb 9, 2015 at 9:00 PM, Meethu Mathew meethu.mat...@flytxt.com wrote:
 Hi,

 The mail id given in
 https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark seems to
 be failing. Can anyone tell me how to get added to Powered By Spark list?

 --

 Regards,

 *Meethu*

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Patrick Wendell
Hi Judy,

If you have added source files in the sink/ source folder, they should
appear in the assembly jar when you build. One thing I noticed is that you
are looking inside the /dist folder. That only gets populated if you run
make-distribution. The normal development process is just to do mvn
package and then look at the assembly jar that is contained in core/target.

- Patrick

On Mon, Feb 9, 2015 at 10:02 PM, Judy Nash judyn...@exchange.microsoft.com
wrote:

  Hello,



 Working on SPARK-5708 https://issues.apache.org/jira/browse/SPARK-5708
 - Add Slf4jSink to Spark Metrics Sink.



 Wrote a new Slf4jSink class (see patch attached), but the new class is not
 packaged as part of spark-assembly jar.



 Do I need to update build config somewhere to have this packaged?



 Current packaged class:



 Thought I must have missed something basic but can't figure out why.



 Thanks!

 Judy


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Judy Nash
Hello,

Working on SPARK-5708https://issues.apache.org/jira/browse/SPARK-5708 - Add 
Slf4jSink to Spark Metrics Sink.

Wrote a new Slf4jSink class (see patch attached), but the new class is not 
packaged as part of spark-assembly jar.

Do I need to update build config somewhere to have this packaged?

Current packaged class:
[cid:image001.png@01D044B4.1B17A1C0]

Thought I must have missed something basic but can't figure out why.

Thanks!
Judy

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

RE: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Judy Nash
Thanks Patrick! That was the issue.
Built the jars on windows env with mvn and forgot to run make-distributions.ps1 
 afterward, so was looking at old jars.

From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Monday, February 9, 2015 10:43 PM
To: Judy Nash
Cc: dev@spark.apache.org
Subject: Re: New Metrics Sink class not packaged in spark-assembly jar

Actually, to correct myself, the assembly jar is in assembly/target/scala-2.11 
(I think).

On Mon, Feb 9, 2015 at 10:42 PM, Patrick Wendell 
pwend...@gmail.commailto:pwend...@gmail.com wrote:
Hi Judy,

If you have added source files in the sink/ source folder, they should appear 
in the assembly jar when you build. One thing I noticed is that you are looking 
inside the /dist folder. That only gets populated if you run 
make-distribution. The normal development process is just to do mvn package 
and then look at the assembly jar that is contained in core/target.

- Patrick

On Mon, Feb 9, 2015 at 10:02 PM, Judy Nash 
judyn...@exchange.microsoft.commailto:judyn...@exchange.microsoft.com wrote:
Hello,

Working on SPARK-5708https://issues.apache.org/jira/browse/SPARK-5708 - Add 
Slf4jSink to Spark Metrics Sink.

Wrote a new Slf4jSink class (see patch attached), but the new class is not 
packaged as part of spark-assembly jar.

Do I need to update build config somewhere to have this packaged?

Current packaged class:
[cid:image001.png@01D044BC.8FE515C0]

Thought I must have missed something basic but can't figure out why.

Thanks!
Judy

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.orgmailto:dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.orgmailto:dev-h...@spark.apache.org




Re: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Evan R. Sparks
Great - perhaps we can move this discussion off-list and onto a JIRA
ticket? (Here's one: https://issues.apache.org/jira/browse/SPARK-5705)

It seems like this is going to be somewhat exploratory for a while (and
there's probably only a handful of us who really care about fast linear
algebra!)

- Evan

On Mon, Feb 9, 2015 at 4:48 PM, Ulanov, Alexander alexander.ula...@hp.com
wrote:

  Hi Evan,



 Thank you for explanation and useful link. I am going to build OpenBLAS,
 link it with Netlib-java and perform benchmark again.



 Do I understand correctly that BIDMat binaries contain statically linked
 Intel MKL BLAS? It might be the reason why I am able to run BIDMat not
 having MKL BLAS installed on my server. If it is true, I wonder if it is OK
 because Intel sells this library. Nevertheless, it seems that in my case
 precompiled MKL BLAS performs better than precompiled OpenBLAS given that
 BIDMat and Netlib-java are supposed to be on par with JNI overheads.



 Though, it might be interesting to link Netlib-java with Intel MKL, as you
 suggested. I wonder, are John Canny (BIDMat) and Sam Halliday (Netlib-java)
 interested to compare their libraries.



 Best regards, Alexander



 *From:* Evan R. Sparks [mailto:evan.spa...@gmail.com]
 *Sent:* Friday, February 06, 2015 5:58 PM

 *To:* Ulanov, Alexander
 *Cc:* Joseph Bradley; dev@spark.apache.org
 *Subject:* Re: Using CUDA within Spark / boosting linear algebra



 I would build OpenBLAS yourself, since good BLAS performance comes from
 getting cache sizes, etc. set up correctly for your particular hardware -
 this is often a very tricky process (see, e.g. ATLAS), but we found that on
 relatively modern Xeon chips, OpenBLAS builds quickly and yields
 performance competitive with MKL.



 To make sure the right library is getting used, you have to make sure it's
 first on the search path - export LD_LIBRARY_PATH=/path/to/blas/library.so
 will do the trick here.



 For some examples of getting netlib-java setup on an ec2 node and some
 example benchmarking code we ran a while back, see:
 https://github.com/shivaram/matrix-bench



 In particular - build-openblas-ec2.sh shows you how to build the library
 and set up symlinks correctly, and scala/run-netlib.sh shows you how to get
 the path setup and get that library picked up by netlib-java.



 In this way - you could probably get cuBLAS set up to be used by
 netlib-java as well.



 - Evan



 On Fri, Feb 6, 2015 at 5:43 PM, Ulanov, Alexander alexander.ula...@hp.com
 wrote:

  Evan, could you elaborate on how to force BIDMat and netlib-java to
 force loading the right blas? For netlib, I there are few JVM flags, such
 as -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I
 can force it to use Java implementation. Not sure I understand how to force
 use a specific blas (not specific wrapper for blas).



 Btw. I have installed openblas (yum install openblas), so I suppose that
 netlib is using it.



 *From:* Evan R. Sparks [mailto:evan.spa...@gmail.com]
 *Sent:* Friday, February 06, 2015 5:19 PM
 *To:* Ulanov, Alexander
 *Cc:* Joseph Bradley; dev@spark.apache.org


 *Subject:* Re: Using CUDA within Spark / boosting linear algebra



 Getting breeze to pick up the right blas library is critical for
 performance. I recommend using OpenBLAS (or MKL, if you already have it).
 It might make sense to force BIDMat to use the same underlying BLAS library
 as well.



 On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander alexander.ula...@hp.com
 wrote:

 Hi Evan, Joseph

 I did few matrix multiplication test and BIDMat seems to be ~10x faster
 than netlib-java+breeze (sorry for weird table formatting):

 |A*B  size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64|
 Breeze+Netlib-java f2jblas |
 +---+
 |100x100*100x100 | 0,00205596 | 0,03810324 | 0,002556 |
 |1000x1000*1000x1000 | 0,018320947 | 0,51803557 |1,638475459 |
 |1x1*1x1 | 23,78046632 | 445,0935211 | 1569,233228 |

 Configuration: Intel(R) Xeon(R) CPU E31240 3.3 GHz, 6GB RAM, Fedora 19
 Linux, Scala 2.11.

 Later I will make tests with Cuda. I need to install new Cuda version for
 this purpose.

 Do you have any ideas why breeze-netlib with native blas is so much slower
 than BIDMat MKL?

 Best regards, Alexander

 From: Joseph Bradley [mailto:jos...@databricks.com]
 Sent: Thursday, February 05, 2015 5:29 PM
 To: Ulanov, Alexander
 Cc: Evan R. Sparks; dev@spark.apache.org

 Subject: Re: Using CUDA within Spark / boosting linear algebra

 Hi Alexander,

 Using GPUs with Spark would be very exciting.  Small comment: Concerning
 your question earlier about keeping data stored on the GPU rather than
 having to move it between main memory and GPU memory on each iteration, I
 would guess this would be critical to getting good performance.  If you
 could do multiple local iterations before aggregating results, then the
 cost of data movement to the GPU 

Re: Powered by Spark: Concur

2015-02-09 Thread Matei Zaharia
Thanks Denny; added you.

Matei

 On Feb 9, 2015, at 10:11 PM, Denny Lee denny.g@gmail.com wrote:
 
 Forgot to add Concur to the Powered by Spark wiki:
 
 Concur
 https://www.concur.com
 Spark SQL, MLLib
 Using Spark for travel and expenses analytics and personalization
 
 Thanks!
 Denny


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Powered by Spark: Concur

2015-02-09 Thread Denny Lee
Thanks Matei - much appreciated!

On Mon Feb 09 2015 at 10:23:57 PM Matei Zaharia matei.zaha...@gmail.com
wrote:

 Thanks Denny; added you.

 Matei

  On Feb 9, 2015, at 10:11 PM, Denny Lee denny.g@gmail.com wrote:
 
  Forgot to add Concur to the Powered by Spark wiki:
 
  Concur
  https://www.concur.com
  Spark SQL, MLLib
  Using Spark for travel and expenses analytics and personalization
 
  Thanks!
  Denny




Re: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Chester @work
Maybe you can ask prof john canny himself:-)  as I invited him to give a talk 
at Alpine data labs in March's meetup (SF big Analytics  SF machine learning 
joined meetup) , 3/11. To be announced in next day or so. 

Chester

Sent from my iPhone

 On Feb 9, 2015, at 4:48 PM, Ulanov, Alexander alexander.ula...@hp.com 
 wrote:
 
 Hi Evan,
 
 Thank you for explanation and useful link. I am going to build OpenBLAS, link 
 it with Netlib-java and perform benchmark again.
 
 Do I understand correctly that BIDMat binaries contain statically linked 
 Intel MKL BLAS? It might be the reason why I am able to run BIDMat not having 
 MKL BLAS installed on my server. If it is true, I wonder if it is OK because 
 Intel sells this library. Nevertheless, it seems that in my case precompiled 
 MKL BLAS performs better than precompiled OpenBLAS given that BIDMat and 
 Netlib-java are supposed to be on par with JNI overheads.
 
 Though, it might be interesting to link Netlib-java with Intel MKL, as you 
 suggested. I wonder, are John Canny (BIDMat) and Sam Halliday (Netlib-java) 
 interested to compare their libraries.
 
 Best regards, Alexander
 
 From: Evan R. Sparks [mailto:evan.spa...@gmail.com]
 Sent: Friday, February 06, 2015 5:58 PM
 To: Ulanov, Alexander
 Cc: Joseph Bradley; dev@spark.apache.org
 Subject: Re: Using CUDA within Spark / boosting linear algebra
 
 I would build OpenBLAS yourself, since good BLAS performance comes from 
 getting cache sizes, etc. set up correctly for your particular hardware - 
 this is often a very tricky process (see, e.g. ATLAS), but we found that on 
 relatively modern Xeon chips, OpenBLAS builds quickly and yields performance 
 competitive with MKL.
 
 To make sure the right library is getting used, you have to make sure it's 
 first on the search path - export LD_LIBRARY_PATH=/path/to/blas/library.so 
 will do the trick here.
 
 For some examples of getting netlib-java setup on an ec2 node and some 
 example benchmarking code we ran a while back, see: 
 https://github.com/shivaram/matrix-bench
 
 In particular - build-openblas-ec2.sh shows you how to build the library and 
 set up symlinks correctly, and scala/run-netlib.sh shows you how to get the 
 path setup and get that library picked up by netlib-java.
 
 In this way - you could probably get cuBLAS set up to be used by netlib-java 
 as well.
 
 - Evan
 
 On Fri, Feb 6, 2015 at 5:43 PM, Ulanov, Alexander 
 alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
 Evan, could you elaborate on how to force BIDMat and netlib-java to force 
 loading the right blas? For netlib, I there are few JVM flags, such as 
 -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I can 
 force it to use Java implementation. Not sure I understand how to force use a 
 specific blas (not specific wrapper for blas).
 
 Btw. I have installed openblas (yum install openblas), so I suppose that 
 netlib is using it.
 
 From: Evan R. Sparks 
 [mailto:evan.spa...@gmail.commailto:evan.spa...@gmail.com]
 Sent: Friday, February 06, 2015 5:19 PM
 To: Ulanov, Alexander
 Cc: Joseph Bradley; dev@spark.apache.orgmailto:dev@spark.apache.org
 
 Subject: Re: Using CUDA within Spark / boosting linear algebra
 
 Getting breeze to pick up the right blas library is critical for performance. 
 I recommend using OpenBLAS (or MKL, if you already have it). It might make 
 sense to force BIDMat to use the same underlying BLAS library as well.
 
 On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander 
 alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
 Hi Evan, Joseph
 
 I did few matrix multiplication test and BIDMat seems to be ~10x faster than 
 netlib-java+breeze (sorry for weird table formatting):
 
 |A*B  size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64| 
 Breeze+Netlib-java f2jblas |
 +---+
 |100x100*100x100 | 0,00205596 | 0,03810324 | 0,002556 |
 |1000x1000*1000x1000 | 0,018320947 | 0,51803557 |1,638475459 |
 |1x1*1x1 | 23,78046632 | 445,0935211 | 1569,233228 |
 
 Configuration: Intel(R) Xeon(R) CPU E31240 3.3 GHz, 6GB RAM, Fedora 19 Linux, 
 Scala 2.11.
 
 Later I will make tests with Cuda. I need to install new Cuda version for 
 this purpose.
 
 Do you have any ideas why breeze-netlib with native blas is so much slower 
 than BIDMat MKL?
 
 Best regards, Alexander
 
 From: Joseph Bradley 
 [mailto:jos...@databricks.commailto:jos...@databricks.com]
 Sent: Thursday, February 05, 2015 5:29 PM
 To: Ulanov, Alexander
 Cc: Evan R. Sparks; dev@spark.apache.orgmailto:dev@spark.apache.org
 Subject: Re: Using CUDA within Spark / boosting linear algebra
 
 Hi Alexander,
 
 Using GPUs with Spark would be very exciting.  Small comment: Concerning your 
 question earlier about keeping data stored on the GPU rather than having to 
 move it between main memory and GPU memory on each iteration, I would guess 
 this would be critical to getting good 

Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Mark Hamstra

 it sounds like nobody intends these to be used to actually deploy Spark


I wouldn't go quite that far.  What we have now can serve as useful input
to a deployment tool like Chef, but the user is then going to need to add
some customization or configuration within the context of that tooling to
get Spark installed just the way they want.  So it is not so much that the
current Debian packaging can't be used as that it has never really been
intended to be a completely finished product that a newcomer could, for
example, use to install Spark completely and quickly to Ubuntu and have a
fully-functional environment in which they could then run all of the
examples, tutorials, etc.

Getting to that level of packaging (and maintenance) is something that I'm
not sure we want to do since that is a better fit with Bigtop and the
efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.

On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen so...@cloudera.com wrote:

 This is a straw poll to assess whether there is support to keep and
 fix, or remove, the Debian packaging-related config in Spark.

 I see several oldish outstanding JIRAs relating to problems in the
 packaging:

 https://issues.apache.org/jira/browse/SPARK-1799
 https://issues.apache.org/jira/browse/SPARK-2614
 https://issues.apache.org/jira/browse/SPARK-3624
 https://issues.apache.org/jira/browse/SPARK-4436
 (and a similar idea about making RPMs)
 https://issues.apache.org/jira/browse/SPARK-665

 The original motivation seems related to Chef:


 https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908

 Mark's recent comments cast some doubt on whether it is essential:

 https://github.com/apache/spark/pull/4277#issuecomment-72114226

 and in recent conversations I didn't hear dissent to the idea of removing
 this.

 Is this still useful enough to fix up? All else equal I'd like to
 start to walk back some of the complexity of the build, but I don't
 know how all-else-equal it is. Certainly, it sounds like nobody
 intends these to be used to actually deploy Spark.

 I don't doubt it's useful to someone, but can they maintain the
 packaging logic elsewhere?

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: multi-line comment style

2015-02-09 Thread Xiangrui Meng
I like the `/* .. */` style more. Because it is easier for IDEs to
recognize it as a block comment. If you press enter in the comment
block with the `//` style, IDEs won't add `//` for you. -Xiangrui

On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com wrote:
 We should update the style doc to reflect what we have in most places
 (which I think is //).



 On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 FWIW I like the multi-line // over /* */ from a purely style standpoint.
 The Google Java style guide[1] has some comment about code formatting tools
 working better with /* */ but there doesn't seem to be any strong arguments
 for one over the other I can find

 Thanks
 Shivaram

 [1]

 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style

 On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com
 wrote:

  Personally I have no opinion, but agree it would be nice to standardize.
 
  - Patrick
 
  On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote:
   One thing Marcelo pointed out to me is that the // style does not
   interfere with commenting out blocks of code with /* */, which is a
   small good thing. I am also accustomed to // style for multiline, and
   reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style
   inline always looks a little funny to me.
  
   On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout 
 kayousterh...@gmail.com
  wrote:
   Hi all,
  
   The Spark Style Guide
   
  https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
 
   says multi-line comments should formatted as:
  
   /*
* This is a
* very
* long comment.
*/
  
   But in my experience, we almost always use // for multi-line
 comments:
  
   // This is a
   // very
   // long comment.
  
   Here are some examples:
  
  - Recent commit by Reynold, king of style:
  
 
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
  - RDD.scala:
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
  - DAGScheduler.scala:
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
  
  
   Any objections to me updating the style guide to reflect this?  As
 with
   other style issues, I think consistency here is helpful (and
 formatting
   multi-line comments as // does nicely visually distinguish code
  comments
   from doc comments).
  
   -Kay
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



R: Powered by Spark: Concur

2015-02-09 Thread Paolo Platter
Hi,

I checked the powered by wiki too and Agile Labs should be Agile Lab. The link 
is wrong too, it should be www.agilelab.it.
The description is correct.

Thanks a lot

Paolo

Inviata dal mio Windows Phone

Da: Denny Leemailto:denny.g@gmail.com
Inviato: ‎10/‎02/‎2015 07:41
A: Matei Zahariamailto:matei.zaha...@gmail.com
Cc: dev@spark.apache.orgmailto:dev@spark.apache.org
Oggetto: Re: Powered by Spark: Concur

Thanks Matei - much appreciated!

On Mon Feb 09 2015 at 10:23:57 PM Matei Zaharia matei.zaha...@gmail.com
wrote:

 Thanks Denny; added you.

 Matei

  On Feb 9, 2015, at 10:11 PM, Denny Lee denny.g@gmail.com wrote:
 
  Forgot to add Concur to the Powered by Spark wiki:
 
  Concur
  https://www.concur.com
  Spark SQL, MLLib
  Using Spark for travel and expenses analytics and personalization
 
  Thanks!
  Denny




Re: Pull Requests on github

2015-02-09 Thread fommil
Cool, thanks! Let me know if there are any more core numerical libraries
that you'd like to see to support Spark with optimised natives using a
similar packaging model at netlib-java.

I'm interested in fast random number generation next, and I keep wondering
if anybody would be interested in paying for FPGA or GPU / APU backends for
netlib-java. It would be a *lot* of work but I'd be very interested to talk
to an organisation with such a requirement and I'd be able to do it in less
time than they would internally.
On 10 Feb 2015 04:12, Andrew Ash [via Apache Spark Developers List] 
ml-node+s1001551n10546...@n3.nabble.com wrote:

 Sam, I see your PR was merged -- many thanks for sending it in and getting
 it merged!

 In general for future reference, the most effective way to contribute is
 outlined on this wiki page:
 https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

 On Mon, Feb 9, 2015 at 1:04 AM, Akhil Das [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=0
 wrote:

  You can open a Jira issue pointing this PR to get it processed faster.
 :)
 
  Thanks
  Best Regards
 
  On Sat, Feb 7, 2015 at 7:07 AM, fommil [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=1 wrote:
 
   Hi all,
  
   I'm the author of netlib-java and I noticed that the documentation in
  MLlib
   was out of date and misleading, so I submitted a pull request on
 github
   which will hopefully make things easier for everybody to understand
 the
   benefits of system optimised natives and how to use them :-)
  
 https://github.com/apache/spark/pull/4448
  
   However, it looks like there are a *lot* of outstanding PRs and that
 this
   is
   just a mirror repository.
  
   Will somebody please look at my PR and merge into the canonical source
  (and
   let me know)?
  
   Best regards,
   Sam
  
  
  
   --
   View this message in context:
  
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html
   Sent from the Apache Spark Developers List mailing list archive at
   Nabble.com.
  
   -
   To unsubscribe, e-mail: [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=2
   For additional commands, e-mail: [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=10546i=3
  
  
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10546.html
  To unsubscribe from Pull Requests on github, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=10502code=c2FtLmhhbGxpZGF5QGdtYWlsLmNvbXwxMDUwMnwtMzI4MzQzMDI0
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10558.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Unit tests

2015-02-09 Thread Josh Rosen
Hi Iulian,

I think the AkakUtilsSuite failure that you observed has been fixed in 
https://issues.apache.org/jira/browse/SPARK-5548 / 
https://github.com/apache/spark/pull/4343
On February 9, 2015 at 5:47:59 AM, Iulian Dragoș (iulian.dra...@typesafe.com) 
wrote:

Hi Patrick,  

Thanks for the heads up. I was trying to set up our own infrastructure for  
testing Spark (essentially, running `run-tests` every night) on EC2. I  
stumbled upon a number of flaky tests, but none of them look similar to  
anything in Jira with the flaky-test tag. I wonder if there's something  
wrong with our infrastructure, or I should simply open Jira tickets with  
the failures I find. For example, one that appears fairly often on our  
setup is in AkkaUtilsSuite remote fetch ssl on - untrusted server  
(exception `ActorNotFound`, instead of `TimeoutException`).  

thanks,  
iulian  


On Fri, Feb 6, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote:  

 Hey All,  
  
 The tests are in a not-amazing state right now due to a few compounding  
 factors:  
  
 1. We've merged a large volume of patches recently.  
 2. The load on jenkins has been relatively high, exposing races and  
 other behavior not seen at lower load.  
  
 For those not familiar, the main issue is flaky (non deterministic)  
 test failures. Right now I'm trying to prioritize keeping the  
 PullReqeustBuilder in good shape since it will block development if it  
 is down.  
  
 For other tests, let's try to keep filing JIRA's when we see issues  
 and use the flaky-test label (see http://bit.ly/1yRif9S):  
  
 I may contact people regarding specific tests. This is a very high  
 priority to get in good shape. This kind of thing is no one's fault  
 but just the result of a lot of concurrent development, and everyone  
 needs to pitch in to get back in a good place.  
  
 - Patrick  
  
 -  
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org  
 For additional commands, e-mail: dev-h...@spark.apache.org  
  
  


--  

--  
Iulian Dragos  

--  
Reactive Apps on the JVM  
www.typesafe.com  


Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Patrick Wendell
I have wondered whether we should sort of deprecated it more
officially, since otherwise I think people have the reasonable
expectation based on the current code that Spark intends to support
complete Debian packaging as part of the upstream build. Having
something that's sort-of maintained but no one is helping review and
merge patches on it or make it fully functional, IMO that doesn't
benefit us or our users. There are a bunch of other projects that are
specifically devoted to packaging, so it seems like there is a clear
separation of concerns here.

On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra m...@clearstorydata.com wrote:

 it sounds like nobody intends these to be used to actually deploy Spark


 I wouldn't go quite that far.  What we have now can serve as useful input
 to a deployment tool like Chef, but the user is then going to need to add
 some customization or configuration within the context of that tooling to
 get Spark installed just the way they want.  So it is not so much that the
 current Debian packaging can't be used as that it has never really been
 intended to be a completely finished product that a newcomer could, for
 example, use to install Spark completely and quickly to Ubuntu and have a
 fully-functional environment in which they could then run all of the
 examples, tutorials, etc.

 Getting to that level of packaging (and maintenance) is something that I'm
 not sure we want to do since that is a better fit with Bigtop and the
 efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.

 On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen so...@cloudera.com wrote:

 This is a straw poll to assess whether there is support to keep and
 fix, or remove, the Debian packaging-related config in Spark.

 I see several oldish outstanding JIRAs relating to problems in the
 packaging:

 https://issues.apache.org/jira/browse/SPARK-1799
 https://issues.apache.org/jira/browse/SPARK-2614
 https://issues.apache.org/jira/browse/SPARK-3624
 https://issues.apache.org/jira/browse/SPARK-4436
 (and a similar idea about making RPMs)
 https://issues.apache.org/jira/browse/SPARK-665

 The original motivation seems related to Chef:


 https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908

 Mark's recent comments cast some doubt on whether it is essential:

 https://github.com/apache/spark/pull/4277#issuecomment-72114226

 and in recent conversations I didn't hear dissent to the idea of removing
 this.

 Is this still useful enough to fix up? All else equal I'd like to
 start to walk back some of the complexity of the build, but I don't
 know how all-else-equal it is. Certainly, it sounds like nobody
 intends these to be used to actually deploy Spark.

 I don't doubt it's useful to someone, but can they maintain the
 packaging logic elsewhere?

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Nicholas Chammas
+1 to an official deprecation + redirecting users to some other project
that will or already is taking this on.

Nate?


On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell pwend...@gmail.com
wrote:

 I have wondered whether we should sort of deprecated it more
 officially, since otherwise I think people have the reasonable
 expectation based on the current code that Spark intends to support
 complete Debian packaging as part of the upstream build. Having
 something that's sort-of maintained but no one is helping review and
 merge patches on it or make it fully functional, IMO that doesn't
 benefit us or our users. There are a bunch of other projects that are
 specifically devoted to packaging, so it seems like there is a clear
 separation of concerns here.

 On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra m...@clearstorydata.com
 wrote:
 
  it sounds like nobody intends these to be used to actually deploy Spark
 
 
  I wouldn't go quite that far.  What we have now can serve as useful input
  to a deployment tool like Chef, but the user is then going to need to add
  some customization or configuration within the context of that tooling to
  get Spark installed just the way they want.  So it is not so much that
 the
  current Debian packaging can't be used as that it has never really been
  intended to be a completely finished product that a newcomer could, for
  example, use to install Spark completely and quickly to Ubuntu and have a
  fully-functional environment in which they could then run all of the
  examples, tutorials, etc.
 
  Getting to that level of packaging (and maintenance) is something that
 I'm
  not sure we want to do since that is a better fit with Bigtop and the
  efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.
 
  On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen so...@cloudera.com wrote:
 
  This is a straw poll to assess whether there is support to keep and
  fix, or remove, the Debian packaging-related config in Spark.
 
  I see several oldish outstanding JIRAs relating to problems in the
  packaging:
 
  https://issues.apache.org/jira/browse/SPARK-1799
  https://issues.apache.org/jira/browse/SPARK-2614
  https://issues.apache.org/jira/browse/SPARK-3624
  https://issues.apache.org/jira/browse/SPARK-4436
  (and a similar idea about making RPMs)
  https://issues.apache.org/jira/browse/SPARK-665
 
  The original motivation seems related to Chef:
 
 
  https://issues.apache.org/jira/browse/SPARK-2614?focusedComm
 entId=14070908page=com.atlassian.jira.plugin.system.issuetabpanels:
 comment-tabpanel#comment-14070908
 
  Mark's recent comments cast some doubt on whether it is essential:
 
  https://github.com/apache/spark/pull/4277#issuecomment-72114226
 
  and in recent conversations I didn't hear dissent to the idea of
 removing
  this.
 
  Is this still useful enough to fix up? All else equal I'd like to
  start to walk back some of the complexity of the build, but I don't
  know how all-else-equal it is. Certainly, it sounds like nobody
  intends these to be used to actually deploy Spark.
 
  I don't doubt it's useful to someone, but can they maintain the
  packaging logic elsewhere?
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




RE: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Ulanov, Alexander
Hi Evan,

Thank you for explanation and useful link. I am going to build OpenBLAS, link 
it with Netlib-java and perform benchmark again.

Do I understand correctly that BIDMat binaries contain statically linked Intel 
MKL BLAS? It might be the reason why I am able to run BIDMat not having MKL 
BLAS installed on my server. If it is true, I wonder if it is OK because Intel 
sells this library. Nevertheless, it seems that in my case precompiled MKL BLAS 
performs better than precompiled OpenBLAS given that BIDMat and Netlib-java are 
supposed to be on par with JNI overheads.

Though, it might be interesting to link Netlib-java with Intel MKL, as you 
suggested. I wonder, are John Canny (BIDMat) and Sam Halliday (Netlib-java) 
interested to compare their libraries.

Best regards, Alexander

From: Evan R. Sparks [mailto:evan.spa...@gmail.com]
Sent: Friday, February 06, 2015 5:58 PM
To: Ulanov, Alexander
Cc: Joseph Bradley; dev@spark.apache.org
Subject: Re: Using CUDA within Spark / boosting linear algebra

I would build OpenBLAS yourself, since good BLAS performance comes from getting 
cache sizes, etc. set up correctly for your particular hardware - this is often 
a very tricky process (see, e.g. ATLAS), but we found that on relatively modern 
Xeon chips, OpenBLAS builds quickly and yields performance competitive with MKL.

To make sure the right library is getting used, you have to make sure it's 
first on the search path - export LD_LIBRARY_PATH=/path/to/blas/library.so will 
do the trick here.

For some examples of getting netlib-java setup on an ec2 node and some example 
benchmarking code we ran a while back, see: 
https://github.com/shivaram/matrix-bench

In particular - build-openblas-ec2.sh shows you how to build the library and 
set up symlinks correctly, and scala/run-netlib.sh shows you how to get the 
path setup and get that library picked up by netlib-java.

In this way - you could probably get cuBLAS set up to be used by netlib-java as 
well.

- Evan

On Fri, Feb 6, 2015 at 5:43 PM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Evan, could you elaborate on how to force BIDMat and netlib-java to force 
loading the right blas? For netlib, I there are few JVM flags, such as 
-Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I can 
force it to use Java implementation. Not sure I understand how to force use a 
specific blas (not specific wrapper for blas).

Btw. I have installed openblas (yum install openblas), so I suppose that netlib 
is using it.

From: Evan R. Sparks 
[mailto:evan.spa...@gmail.commailto:evan.spa...@gmail.com]
Sent: Friday, February 06, 2015 5:19 PM
To: Ulanov, Alexander
Cc: Joseph Bradley; dev@spark.apache.orgmailto:dev@spark.apache.org

Subject: Re: Using CUDA within Spark / boosting linear algebra

Getting breeze to pick up the right blas library is critical for performance. I 
recommend using OpenBLAS (or MKL, if you already have it). It might make sense 
to force BIDMat to use the same underlying BLAS library as well.

On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Hi Evan, Joseph

I did few matrix multiplication test and BIDMat seems to be ~10x faster than 
netlib-java+breeze (sorry for weird table formatting):

|A*B  size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64| 
Breeze+Netlib-java f2jblas |
+---+
|100x100*100x100 | 0,00205596 | 0,03810324 | 0,002556 |
|1000x1000*1000x1000 | 0,018320947 | 0,51803557 |1,638475459 |
|1x1*1x1 | 23,78046632 | 445,0935211 | 1569,233228 |

Configuration: Intel(R) Xeon(R) CPU E31240 3.3 GHz, 6GB RAM, Fedora 19 Linux, 
Scala 2.11.

Later I will make tests with Cuda. I need to install new Cuda version for this 
purpose.

Do you have any ideas why breeze-netlib with native blas is so much slower than 
BIDMat MKL?

Best regards, Alexander

From: Joseph Bradley 
[mailto:jos...@databricks.commailto:jos...@databricks.com]
Sent: Thursday, February 05, 2015 5:29 PM
To: Ulanov, Alexander
Cc: Evan R. Sparks; dev@spark.apache.orgmailto:dev@spark.apache.org
Subject: Re: Using CUDA within Spark / boosting linear algebra

Hi Alexander,

Using GPUs with Spark would be very exciting.  Small comment: Concerning your 
question earlier about keeping data stored on the GPU rather than having to 
move it between main memory and GPU memory on each iteration, I would guess 
this would be critical to getting good performance.  If you could do multiple 
local iterations before aggregating results, then the cost of data movement to 
the GPU could be amortized (and I believe that is done in practice).  Having 
Spark be aware of the GPU and using it as another part of memory sounds like a 
much bigger undertaking.

Joseph

On Thu, Feb 5, 2015 at 4:59 PM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Thank you for 

adding some temporary jenkins worker nodes...

2015-02-09 Thread shane knapp
...to help w/the build backlog.  let's all welcome
amp-jenkins-slave-{01..03} back to the fray!


pyspark.daemon issues?

2015-02-09 Thread mkhaitman
I've noticed a couple oddities with the pyspark.daemons which are causing us
a bit of memory problems within some of our heavy spark jobs, especially
when they run at the same time...

It seems that there is typically a 1-to-1 ratio of pyspark.daemons to cores
per executor during aggregations. By default the spark.python.worker.memory
is left at the default of 512MB, after which, the remainder of the
aggregations are supposed to spill to disk. 

However:
  *1)* I'm not entirely sure what cases would result in random
numbers of pyspark daemons which do not respect the python worker memory
limit. I've seen some go up to as far as 2GB each (well over the 512MB
limit) which is when we run into some crazy memory problems for jobs making
use of many cores on each executor. To be clear here, they ARE spilling to
disk as well, but also blowing past the memory limits at the same time
somehow. 

  *2)* Another scenario specifically relates to when we want to join
RDDs, where for example, say there are 4 cores per executor, and therefore 4
pyspark daemons during most aggregations. It seems that if a Join occurs, it
will spawn up 4 additional pyspark daemons as opposed to simply re-using the
ones that were already present during the aggregation stage that occurred
before it. This, combined with the case where the python worker memory limit
is not strictly respected, can pose problems for using way more memory per
node. 

The fact that the python worker memory appears to use memory *outside* of
the executor memory is what poses the biggest challenge for preventing
memory depletion on a node. Is there something obvious, or some environment
variable I may have missed that could potentially help with one/both of the
above memory concerns? Alternatively, any suggestions would be greatly
appreciated! :)

Thanks,
Mark.






--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/pyspark-daemon-issues-tp10533.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: multi-line comment style

2015-02-09 Thread Xiangrui Meng
Btw, I think allowing `/* ... */` without the leading `*` in lines is
also useful. Check this line:
https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55,
where we put the R commands that can reproduce the test result. It is
easier if we write in the following style:

~~~
/*
 Using the following R code to load the data and train the model using
glmnet package.

 library(glmnet)
 data - read.csv(path, header=FALSE, stringsAsFactors=FALSE)
 features - as.matrix(data.frame(as.numeric(data$V2), as.numeric(data$V3)))
 label - as.numeric(data$V1)
 weights - coef(glmnet(features, label, family=gaussian, alpha = 0,
lambda = 0))
 */
~~~

So people can copy  paste the R commands directly.

Xiangrui

On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng men...@gmail.com wrote:
 I like the `/* .. */` style more. Because it is easier for IDEs to
 recognize it as a block comment. If you press enter in the comment
 block with the `//` style, IDEs won't add `//` for you. -Xiangrui

 On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com wrote:
 We should update the style doc to reflect what we have in most places
 (which I think is //).



 On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 FWIW I like the multi-line // over /* */ from a purely style standpoint.
 The Google Java style guide[1] has some comment about code formatting tools
 working better with /* */ but there doesn't seem to be any strong arguments
 for one over the other I can find

 Thanks
 Shivaram

 [1]

 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style

 On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com
 wrote:

  Personally I have no opinion, but agree it would be nice to standardize.
 
  - Patrick
 
  On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote:
   One thing Marcelo pointed out to me is that the // style does not
   interfere with commenting out blocks of code with /* */, which is a
   small good thing. I am also accustomed to // style for multiline, and
   reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style
   inline always looks a little funny to me.
  
   On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout 
 kayousterh...@gmail.com
  wrote:
   Hi all,
  
   The Spark Style Guide
   
  https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
 
   says multi-line comments should formatted as:
  
   /*
* This is a
* very
* long comment.
*/
  
   But in my experience, we almost always use // for multi-line
 comments:
  
   // This is a
   // very
   // long comment.
  
   Here are some examples:
  
  - Recent commit by Reynold, king of style:
  
 
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
  - RDD.scala:
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
  - DAGScheduler.scala:
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
  
  
   Any objections to me updating the style guide to reflect this?  As
 with
   other style issues, I think consistency here is helpful (and
 formatting
   multi-line comments as // does nicely visually distinguish code
  comments
   from doc comments).
  
   -Kay
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-02-09 Thread Sean Owen
Old releases can't be changed, but new ones can. This was merged into
the 1.3 branch for the upcoming 1.3.0 release.

If you really had to, you could do some surgery on existing
distributions to swap in/out Jackson.

On Mon, Feb 9, 2015 at 11:22 AM, Gil Vernik g...@il.ibm.com wrote:
 Hi All,

 I understand that https://github.com/apache/spark/pull/3938 was closed and
 merged into Spark? And this suppose to fix this Jackson issue.
 If so, is there any way to update binary distributions of Spark so that it
 will contain this fix? Current binary versions of Spark available for
 download were built with jackson 1.8.8 which makes them  impossible to use
 with Hadoop 2.6.0 jars

 Thanks
 Gil Vernik.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-02-09 Thread Gil Vernik
Hi All,

I understand that https://github.com/apache/spark/pull/3938 was closed and 
merged into Spark? And this suppose to fix this Jackson issue.
If so, is there any way to update binary distributions of Spark so that it 
will contain this fix? Current binary versions of Spark available for 
download were built with jackson 1.8.8 which makes them  impossible to use 
with Hadoop 2.6.0 jars

Thanks
Gil Vernik.






From:   Sean Owen so...@cloudera.com
To: Ted Yu yuzhih...@gmail.com
Cc: Gil Vernik/Haifa/IBM@IBMIL, dev dev@spark.apache.org
Date:   18/01/2015 08:23 PM
Subject:Re: run time exceptions in Spark 1.2.0 manual build 
together with OpenStack hadoop driver



Agree, I think this can / should be fixed with a slightly more
conservative version of https://github.com/apache/spark/pull/3938
related to SPARK-5108.

On Sun, Jan 18, 2015 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 Please tale a look at SPARK-4048 and SPARK-5108

 Cheers

 On Sat, Jan 17, 2015 at 10:26 PM, Gil Vernik g...@il.ibm.com wrote:

 Hi,

 I took a source code of Spark 1.2.0 and tried to build it together with
 hadoop-openstack.jar ( To allow Spark an access to OpenStack Swift )
 I used Hadoop 2.6.0.

 The build was fine without problems, however in run time, while trying 
to
 access swift:// name space i got an exception:
 java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass
  at

 
org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findDeserializationType(JacksonAnnotationIntrospector.java:524)
  at

 
org.codehaus.jackson.map.deser.BasicDeserializerFactory.modifyTypeByAnnotation(BasicDeserializerFactory.java:732)
 ...and the long stack trace goes here

 Digging into the problem i saw the following:
 Jackson versions 1.9.X are not backward compatible, in particular they
 removed JsonClass annotation.
 Hadoop 2.6.0 uses jackson-asl version 1.9.13, while Spark has reference 
to
 older version of jackson.

 This is the main  pom.xml of Spark 1.2.0 :

   dependency
 !-- Matches the version of jackson-core-asl pulled in by avro 
--
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-mapper-asl/artifactId
 version1.8.8/version
   /dependency

 Referencing 1.8.8 version, which is not compatible with Hadoop 2.6.0 .
 If we change version to 1.9.13, than all will work fine and there will 
be
 no run time exceptions while accessing Swift. The following change will
 solve the problem:

   dependency
 !-- Matches the version of jackson-core-asl pulled in by avro 
--
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-mapper-asl/artifactId
 version1.9.13/version
   /dependency

 I am trying to resolve this somehow so people will not get into this
 issue.
 Is there any particular need in Spark for jackson 1.8.8 and not 1.9.13?
 Can we remove 1.8.8 and put 1.9.13 for Avro?
 It looks to me that all works fine when Spark build with jackson 
1.9.13,
 but i am not an expert and not sure what should be tested.

 Thanks,
 Gil Vernik.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




Keep or remove Debian packaging in Spark?

2015-02-09 Thread Sean Owen
This is a straw poll to assess whether there is support to keep and
fix, or remove, the Debian packaging-related config in Spark.

I see several oldish outstanding JIRAs relating to problems in the packaging:

https://issues.apache.org/jira/browse/SPARK-1799
https://issues.apache.org/jira/browse/SPARK-2614
https://issues.apache.org/jira/browse/SPARK-3624
https://issues.apache.org/jira/browse/SPARK-4436
(and a similar idea about making RPMs)
https://issues.apache.org/jira/browse/SPARK-665

The original motivation seems related to Chef:

https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908

Mark's recent comments cast some doubt on whether it is essential:

https://github.com/apache/spark/pull/4277#issuecomment-72114226

and in recent conversations I didn't hear dissent to the idea of removing this.

Is this still useful enough to fix up? All else equal I'd like to
start to walk back some of the complexity of the build, but I don't
know how all-else-equal it is. Certainly, it sounds like nobody
intends these to be used to actually deploy Spark.

I don't doubt it's useful to someone, but can they maintain the
packaging logic elsewhere?

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: multi-line comment style

2015-02-09 Thread Patrick Wendell
Clearly there isn't a strictly optimal commenting format (pro's and
cons for both '//' and '/*'). My thought is for consistency we should
just chose one and put in the style guide.

On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng men...@gmail.com wrote:
 Btw, I think allowing `/* ... */` without the leading `*` in lines is
 also useful. Check this line:
 https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55,
 where we put the R commands that can reproduce the test result. It is
 easier if we write in the following style:

 ~~~
 /*
  Using the following R code to load the data and train the model using
 glmnet package.

  library(glmnet)
  data - read.csv(path, header=FALSE, stringsAsFactors=FALSE)
  features - as.matrix(data.frame(as.numeric(data$V2), as.numeric(data$V3)))
  label - as.numeric(data$V1)
  weights - coef(glmnet(features, label, family=gaussian, alpha = 0,
 lambda = 0))
  */
 ~~~

 So people can copy  paste the R commands directly.

 Xiangrui

 On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng men...@gmail.com wrote:
 I like the `/* .. */` style more. Because it is easier for IDEs to
 recognize it as a block comment. If you press enter in the comment
 block with the `//` style, IDEs won't add `//` for you. -Xiangrui

 On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com wrote:
 We should update the style doc to reflect what we have in most places
 (which I think is //).



 On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 FWIW I like the multi-line // over /* */ from a purely style standpoint.
 The Google Java style guide[1] has some comment about code formatting tools
 working better with /* */ but there doesn't seem to be any strong arguments
 for one over the other I can find

 Thanks
 Shivaram

 [1]

 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style

 On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com
 wrote:

  Personally I have no opinion, but agree it would be nice to standardize.
 
  - Patrick
 
  On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote:
   One thing Marcelo pointed out to me is that the // style does not
   interfere with commenting out blocks of code with /* */, which is a
   small good thing. I am also accustomed to // style for multiline, and
   reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style
   inline always looks a little funny to me.
  
   On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout 
 kayousterh...@gmail.com
  wrote:
   Hi all,
  
   The Spark Style Guide
   
  https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
 
   says multi-line comments should formatted as:
  
   /*
* This is a
* very
* long comment.
*/
  
   But in my experience, we almost always use // for multi-line
 comments:
  
   // This is a
   // very
   // long comment.
  
   Here are some examples:
  
  - Recent commit by Reynold, king of style:
  
 
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
  - RDD.scala:
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
  - DAGScheduler.scala:
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
  
  
   Any objections to me updating the style guide to reflect this?  As
 with
   other style issues, I think consistency here is helpful (and
 formatting
   multi-line comments as // does nicely visually distinguish code
  comments
   from doc comments).
  
   -Kay
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[ANNOUNCE] Apache Spark 1.2.1 Released

2015-02-09 Thread Patrick Wendell
Hi All,

I've just posted the 1.2.1 maintenance release of Apache Spark. We
recommend all 1.2.0 users upgrade to this release, as this release
includes stability fixes across all components of Spark.

- Download this release: http://spark.apache.org/downloads.html
- View the release notes:
http://spark.apache.org/releases/spark-release-1-2-1.html
- Full list of JIRA issues resolved in this release: http://s.apache.org/Mpn

Thanks to everyone who helped work on this release!

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: multi-line comment style

2015-02-09 Thread Reynold Xin
Why don't we just pick // as the default (by encouraging it in the style
guide), since it is mostly used, and then do not disallow /* */? I don't
think it is that big of a deal to have slightly deviations here since it is
dead simple to understand what's going on.


On Mon, Feb 9, 2015 at 1:33 PM, Patrick Wendell pwend...@gmail.com wrote:

 Clearly there isn't a strictly optimal commenting format (pro's and
 cons for both '//' and '/*'). My thought is for consistency we should
 just chose one and put in the style guide.

 On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng men...@gmail.com wrote:
  Btw, I think allowing `/* ... */` without the leading `*` in lines is
  also useful. Check this line:
 
 https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55
 ,
  where we put the R commands that can reproduce the test result. It is
  easier if we write in the following style:
 
  ~~~
  /*
   Using the following R code to load the data and train the model using
  glmnet package.
 
   library(glmnet)
   data - read.csv(path, header=FALSE, stringsAsFactors=FALSE)
   features - as.matrix(data.frame(as.numeric(data$V2),
 as.numeric(data$V3)))
   label - as.numeric(data$V1)
   weights - coef(glmnet(features, label, family=gaussian, alpha = 0,
  lambda = 0))
   */
  ~~~
 
  So people can copy  paste the R commands directly.
 
  Xiangrui
 
  On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng men...@gmail.com wrote:
  I like the `/* .. */` style more. Because it is easier for IDEs to
  recognize it as a block comment. If you press enter in the comment
  block with the `//` style, IDEs won't add `//` for you. -Xiangrui
 
  On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com
 wrote:
  We should update the style doc to reflect what we have in most places
  (which I think is //).
 
 
 
  On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman 
  shiva...@eecs.berkeley.edu wrote:
 
  FWIW I like the multi-line // over /* */ from a purely style
 standpoint.
  The Google Java style guide[1] has some comment about code formatting
 tools
  working better with /* */ but there doesn't seem to be any strong
 arguments
  for one over the other I can find
 
  Thanks
  Shivaram
 
  [1]
 
 
 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
 
  On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   Personally I have no opinion, but agree it would be nice to
 standardize.
  
   - Patrick
  
   On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com
 wrote:
One thing Marcelo pointed out to me is that the // style does not
interfere with commenting out blocks of code with /* */, which is
 a
small good thing. I am also accustomed to // style for multiline,
 and
reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */
 style
inline always looks a little funny to me.
   
On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout 
  kayousterh...@gmail.com
   wrote:
Hi all,
   
The Spark Style Guide

  
 https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
  
says multi-line comments should formatted as:
   
/*
 * This is a
 * very
 * long comment.
 */
   
But in my experience, we almost always use // for multi-line
  comments:
   
// This is a
// very
// long comment.
   
Here are some examples:
   
   - Recent commit by Reynold, king of style:
   
  
 
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
   - RDD.scala:
   
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
   - DAGScheduler.scala:
   
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
   
   
Any objections to me updating the style guide to reflect this?
 As
  with
other style issues, I think consistency here is helpful (and
  formatting
multi-line comments as // does nicely visually distinguish code
   comments
from doc comments).
   
-Kay
   
   
 -
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org
   
  
  
 -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
  
 



Re: multi-line comment style

2015-02-09 Thread Sandy Ryza
+1 to what Andrew said, I think both make sense in different situations and
trusting developer discretion here is reasonable.

On Mon, Feb 9, 2015 at 1:48 PM, Andrew Or and...@databricks.com wrote:

 In my experience I find it much more natural to use // for short multi-line
 comments (2 or 3 lines), and /* */ for long multi-line comments involving
 one or more paragraphs. For short multi-line comments, there is no reason
 not to use // if it just so happens that your first line exceeded 100
 characters and you have to wrap it. For long multi-line comments, however,
 using // all the way looks really awkward especially if you have multiple
 paragraphs.

 Thus, I would actually suggest that we don't try to pick a favorite and
 document that both are acceptable. I don't expect developers to follow my
 exact usage (i.e. with a tipping point of 2-3 lines) so I wouldn't enforce
 anything specific either.

 2015-02-09 13:36 GMT-08:00 Reynold Xin r...@databricks.com:

  Why don't we just pick // as the default (by encouraging it in the style
  guide), since it is mostly used, and then do not disallow /* */? I don't
  think it is that big of a deal to have slightly deviations here since it
 is
  dead simple to understand what's going on.
 
 
  On Mon, Feb 9, 2015 at 1:33 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   Clearly there isn't a strictly optimal commenting format (pro's and
   cons for both '//' and '/*'). My thought is for consistency we should
   just chose one and put in the style guide.
  
   On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng men...@gmail.com
 wrote:
Btw, I think allowing `/* ... */` without the leading `*` in lines is
also useful. Check this line:
   
  
 
 https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55
   ,
where we put the R commands that can reproduce the test result. It is
easier if we write in the following style:
   
~~~
/*
 Using the following R code to load the data and train the model
 using
glmnet package.
   
 library(glmnet)
 data - read.csv(path, header=FALSE, stringsAsFactors=FALSE)
 features - as.matrix(data.frame(as.numeric(data$V2),
   as.numeric(data$V3)))
 label - as.numeric(data$V1)
 weights - coef(glmnet(features, label, family=gaussian, alpha =
 0,
lambda = 0))
 */
~~~
   
So people can copy  paste the R commands directly.
   
Xiangrui
   
On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng men...@gmail.com
  wrote:
I like the `/* .. */` style more. Because it is easier for IDEs to
recognize it as a block comment. If you press enter in the comment
block with the `//` style, IDEs won't add `//` for you. -Xiangrui
   
On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com
   wrote:
We should update the style doc to reflect what we have in most
 places
(which I think is //).
   
   
   
On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman 
shiva...@eecs.berkeley.edu wrote:
   
FWIW I like the multi-line // over /* */ from a purely style
   standpoint.
The Google Java style guide[1] has some comment about code
  formatting
   tools
working better with /* */ but there doesn't seem to be any strong
   arguments
for one over the other I can find
   
Thanks
Shivaram
   
[1]
   
   
  
 
 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
   
On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell 
 pwend...@gmail.com
  
wrote:
   
 Personally I have no opinion, but agree it would be nice to
   standardize.

 - Patrick

 On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com
   wrote:
  One thing Marcelo pointed out to me is that the // style does
  not
  interfere with commenting out blocks of code with /* */, which
  is
   a
  small good thing. I am also accustomed to // style for
  multiline,
   and
  reserve /** */ for javadoc / scaladoc. Meaning, seeing the /*
 */
   style
  inline always looks a little funny to me.
 
  On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout 
kayousterh...@gmail.com
 wrote:
  Hi all,
 
  The Spark Style Guide
  

  
 https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

  says multi-line comments should formatted as:
 
  /*
   * This is a
   * very
   * long comment.
   */
 
  But in my experience, we almost always use // for
 multi-line
comments:
 
  // This is a
  // very
  // long comment.
 
  Here are some examples:
 
 - Recent commit by Reynold, king of style:
 

   
  
 
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
 - RDD.scala:
 

   
  
 
 

Re: multi-line comment style

2015-02-09 Thread Andrew Or
In my experience I find it much more natural to use // for short multi-line
comments (2 or 3 lines), and /* */ for long multi-line comments involving
one or more paragraphs. For short multi-line comments, there is no reason
not to use // if it just so happens that your first line exceeded 100
characters and you have to wrap it. For long multi-line comments, however,
using // all the way looks really awkward especially if you have multiple
paragraphs.

Thus, I would actually suggest that we don't try to pick a favorite and
document that both are acceptable. I don't expect developers to follow my
exact usage (i.e. with a tipping point of 2-3 lines) so I wouldn't enforce
anything specific either.

2015-02-09 13:36 GMT-08:00 Reynold Xin r...@databricks.com:

 Why don't we just pick // as the default (by encouraging it in the style
 guide), since it is mostly used, and then do not disallow /* */? I don't
 think it is that big of a deal to have slightly deviations here since it is
 dead simple to understand what's going on.


 On Mon, Feb 9, 2015 at 1:33 PM, Patrick Wendell pwend...@gmail.com
 wrote:

  Clearly there isn't a strictly optimal commenting format (pro's and
  cons for both '//' and '/*'). My thought is for consistency we should
  just chose one and put in the style guide.
 
  On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng men...@gmail.com wrote:
   Btw, I think allowing `/* ... */` without the leading `*` in lines is
   also useful. Check this line:
  
 
 https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55
  ,
   where we put the R commands that can reproduce the test result. It is
   easier if we write in the following style:
  
   ~~~
   /*
Using the following R code to load the data and train the model using
   glmnet package.
  
library(glmnet)
data - read.csv(path, header=FALSE, stringsAsFactors=FALSE)
features - as.matrix(data.frame(as.numeric(data$V2),
  as.numeric(data$V3)))
label - as.numeric(data$V1)
weights - coef(glmnet(features, label, family=gaussian, alpha = 0,
   lambda = 0))
*/
   ~~~
  
   So people can copy  paste the R commands directly.
  
   Xiangrui
  
   On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng men...@gmail.com
 wrote:
   I like the `/* .. */` style more. Because it is easier for IDEs to
   recognize it as a block comment. If you press enter in the comment
   block with the `//` style, IDEs won't add `//` for you. -Xiangrui
  
   On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com
  wrote:
   We should update the style doc to reflect what we have in most places
   (which I think is //).
  
  
  
   On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman 
   shiva...@eecs.berkeley.edu wrote:
  
   FWIW I like the multi-line // over /* */ from a purely style
  standpoint.
   The Google Java style guide[1] has some comment about code
 formatting
  tools
   working better with /* */ but there doesn't seem to be any strong
  arguments
   for one over the other I can find
  
   Thanks
   Shivaram
  
   [1]
  
  
 
 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
  
   On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
Personally I have no opinion, but agree it would be nice to
  standardize.
   
- Patrick
   
On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com
  wrote:
 One thing Marcelo pointed out to me is that the // style does
 not
 interfere with commenting out blocks of code with /* */, which
 is
  a
 small good thing. I am also accustomed to // style for
 multiline,
  and
 reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */
  style
 inline always looks a little funny to me.

 On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout 
   kayousterh...@gmail.com
wrote:
 Hi all,

 The Spark Style Guide
 
   
  https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
   
 says multi-line comments should formatted as:

 /*
  * This is a
  * very
  * long comment.
  */

 But in my experience, we almost always use // for multi-line
   comments:

 // This is a
 // very
 // long comment.

 Here are some examples:

- Recent commit by Reynold, king of style:

   
  
 
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
- RDD.scala:

   
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
- DAGScheduler.scala:

   
  
 
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281


 Any objections to me updating the style guide to reflect this?
  As
   with
 other style issues, I think consistency here is helpful (and
   formatting
 multi-line 

Re: Unit tests

2015-02-09 Thread Iulian Dragoș
Hi Patrick,

Thanks for the heads up. I was trying to set up our own infrastructure for
testing Spark (essentially, running `run-tests` every night) on EC2. I
stumbled upon a number of flaky tests, but none of them look similar to
anything in Jira with the flaky-test tag. I wonder if there's something
wrong with our infrastructure, or I should simply open Jira tickets with
the failures I find. For example, one that appears fairly often on our
setup is in AkkaUtilsSuite remote fetch ssl on - untrusted server
(exception `ActorNotFound`, instead of `TimeoutException`).

thanks,
iulian


On Fri, Feb 6, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey All,

 The tests are in a not-amazing state right now due to a few compounding
 factors:

 1. We've merged a large volume of patches recently.
 2. The load on jenkins has been relatively high, exposing races and
 other behavior not seen at lower load.

 For those not familiar, the main issue is flaky (non deterministic)
 test failures. Right now I'm trying to prioritize keeping the
 PullReqeustBuilder in good shape since it will block development if it
 is down.

 For other tests, let's try to keep filing JIRA's when we see issues
 and use the flaky-test label (see http://bit.ly/1yRif9S):

 I may contact people regarding specific tests. This is a very high
 priority to get in good shape. This kind of thing is no one's fault
 but just the result of a lot of concurrent development, and everyone
 needs to pitch in to get back in a good place.

 - Patrick

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Sean Owen
What about this straw man proposal: deprecate in 1.3 with some kind of
message in the build, and remove for 1.4? And add a pointer to any
third-party packaging that might provide similar functionality?

On Mon, Feb 9, 2015 at 6:47 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 +1 to an official deprecation + redirecting users to some other project
 that will or already is taking this on.

 Nate?



 On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell pwend...@gmail.com
 wrote:

 I have wondered whether we should sort of deprecated it more
 officially, since otherwise I think people have the reasonable
 expectation based on the current code that Spark intends to support
 complete Debian packaging as part of the upstream build. Having
 something that's sort-of maintained but no one is helping review and
 merge patches on it or make it fully functional, IMO that doesn't
 benefit us or our users. There are a bunch of other projects that are
 specifically devoted to packaging, so it seems like there is a clear
 separation of concerns here.

 On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra m...@clearstorydata.com
 wrote:
 
  it sounds like nobody intends these to be used to actually deploy Spark
 
 
  I wouldn't go quite that far.  What we have now can serve as useful
  input
  to a deployment tool like Chef, but the user is then going to need to
  add
  some customization or configuration within the context of that tooling
  to
  get Spark installed just the way they want.  So it is not so much that
  the
  current Debian packaging can't be used as that it has never really been
  intended to be a completely finished product that a newcomer could, for
  example, use to install Spark completely and quickly to Ubuntu and have
  a
  fully-functional environment in which they could then run all of the
  examples, tutorials, etc.
 
  Getting to that level of packaging (and maintenance) is something that
  I'm
  not sure we want to do since that is a better fit with Bigtop and the
  efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.
 
  On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen so...@cloudera.com wrote:
 
  This is a straw poll to assess whether there is support to keep and
  fix, or remove, the Debian packaging-related config in Spark.
 
  I see several oldish outstanding JIRAs relating to problems in the
  packaging:
 
  https://issues.apache.org/jira/browse/SPARK-1799
  https://issues.apache.org/jira/browse/SPARK-2614
  https://issues.apache.org/jira/browse/SPARK-3624
  https://issues.apache.org/jira/browse/SPARK-4436
  (and a similar idea about making RPMs)
  https://issues.apache.org/jira/browse/SPARK-665
 
  The original motivation seems related to Chef:
 
 
 
  https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908
 
  Mark's recent comments cast some doubt on whether it is essential:
 
  https://github.com/apache/spark/pull/4277#issuecomment-72114226
 
  and in recent conversations I didn't hear dissent to the idea of
  removing
  this.
 
  Is this still useful enough to fix up? All else equal I'd like to
  start to walk back some of the complexity of the build, but I don't
  know how all-else-equal it is. Certainly, it sounds like nobody
  intends these to be used to actually deploy Spark.
 
  I don't doubt it's useful to someone, but can they maintain the
  packaging logic elsewhere?
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark-ec2 licensing clarification

2015-02-09 Thread Shivaram Venkataraman
+spark dev list

Yes, we should add an Apache license to it -- Feel free to open a PR for
it. BTW though it is a part of the mesos github account, it is almost
exclusively used by the Spark Project AFAIK.

Longer term it may make sense to move it to a more appropriate github
account (we could move it to amplab/ for instance as the AMPLab provides
Jenkins support etc. too)

Thanks
Shivaram

On Mon, Feb 9, 2015 at 3:26 PM, Florian Verhein flor...@arkig.com wrote:

 Hi guys,

 Are there any plans to add licensing information to the mesos/spark-ec2
 repo?
 I'd assumed it would be Apache 2.0 but then noticed there's no info in the
 repo.

 Background:
 https://issues.apache.org/jira/browse/SPARK-5676

 Regards,
Florian





RE: Keep or remove Debian packaging in Spark?

2015-02-09 Thread nate
This could be something if the spark community wanted to not maintain debs/rpms 
directly via the project could direct interested efforts towards apache bigtop. 
 Right now debs/rpms of bigtop components, as well as related tests is a focus.

Something that would be great is if at least one spark committer with interests 
in config/pkg/testing could be liason and pt for bigtop efforts.

Right now focus on bigtop 0.9, which currently includes spark 1.2.  Jira for 
items included in 0.9 can be found here:

https://issues.apache.org/jira/browse/BIGTOP-1480



-Original Message-
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Monday, February 9, 2015 3:52 PM
To: Nicholas Chammas
Cc: Patrick Wendell; Mark Hamstra; dev
Subject: Re: Keep or remove Debian packaging in Spark?

What about this straw man proposal: deprecate in 1.3 with some kind of message 
in the build, and remove for 1.4? And add a pointer to any third-party 
packaging that might provide similar functionality?

On Mon, Feb 9, 2015 at 6:47 PM, Nicholas Chammas nicholas.cham...@gmail.com 
wrote:
 +1 to an official deprecation + redirecting users to some other 
 +project
 that will or already is taking this on.

 Nate?



 On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell pwend...@gmail.com
 wrote:

 I have wondered whether we should sort of deprecated it more 
 officially, since otherwise I think people have the reasonable 
 expectation based on the current code that Spark intends to support 
 complete Debian packaging as part of the upstream build. Having 
 something that's sort-of maintained but no one is helping review and 
 merge patches on it or make it fully functional, IMO that doesn't 
 benefit us or our users. There are a bunch of other projects that are 
 specifically devoted to packaging, so it seems like there is a clear 
 separation of concerns here.

 On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra 
 m...@clearstorydata.com
 wrote:
 
  it sounds like nobody intends these to be used to actually deploy 
  Spark
 
 
  I wouldn't go quite that far.  What we have now can serve as useful 
  input to a deployment tool like Chef, but the user is then going to 
  need to add some customization or configuration within the context 
  of that tooling to get Spark installed just the way they want.  So 
  it is not so much that the current Debian packaging can't be used 
  as that it has never really been intended to be a completely 
  finished product that a newcomer could, for example, use to install 
  Spark completely and quickly to Ubuntu and have a fully-functional 
  environment in which they could then run all of the examples, 
  tutorials, etc.
 
  Getting to that level of packaging (and maintenance) is something 
  that I'm not sure we want to do since that is a better fit with 
  Bigtop and the efforts of Cloudera, Horton Works, MapR, etc. to 
  distribute Spark.
 
  On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen so...@cloudera.com wrote:
 
  This is a straw poll to assess whether there is support to keep 
  and fix, or remove, the Debian packaging-related config in Spark.
 
  I see several oldish outstanding JIRAs relating to problems in the
  packaging:
 
  https://issues.apache.org/jira/browse/SPARK-1799
  https://issues.apache.org/jira/browse/SPARK-2614
  https://issues.apache.org/jira/browse/SPARK-3624
  https://issues.apache.org/jira/browse/SPARK-4436
  (and a similar idea about making RPMs)
  https://issues.apache.org/jira/browse/SPARK-665
 
  The original motivation seems related to Chef:
 
 
 
  https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=
  14070908page=com.atlassian.jira.plugin.system.issuetabpanels:comm
  ent-tabpanel#comment-14070908
 
  Mark's recent comments cast some doubt on whether it is essential:
 
  https://github.com/apache/spark/pull/4277#issuecomment-72114226
 
  and in recent conversations I didn't hear dissent to the idea of 
  removing this.
 
  Is this still useful enough to fix up? All else equal I'd like to 
  start to walk back some of the complexity of the build, but I 
  don't know how all-else-equal it is. Certainly, it sounds like 
  nobody intends these to be used to actually deploy Spark.
 
  I don't doubt it's useful to someone, but can they maintain the 
  packaging logic elsewhere?
 
  --
  --- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
  additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
 additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: 

Mail to u...@spark.apache.org failing

2015-02-09 Thread Meethu Mathew

Hi,

The mail id given in 
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark seems 
to be failing. Can anyone tell me how to get added to Powered By Spark list?


--

Regards,

*Meethu*