Re: Pull Requests on github

2015-02-09 Thread fommil
Cool, thanks! Let me know if there are any more core numerical libraries
that you'd like to see to support Spark with optimised natives using a
similar packaging model at netlib-java.

I'm interested in fast random number generation next, and I keep wondering
if anybody would be interested in paying for FPGA or GPU / APU backends for
netlib-java. It would be a *lot* of work but I'd be very interested to talk
to an organisation with such a requirement and I'd be able to do it in less
time than they would internally.
On 10 Feb 2015 04:12, "Andrew Ash [via Apache Spark Developers List]" <
ml-node+s1001551n10546...@n3.nabble.com> wrote:

> Sam, I see your PR was merged -- many thanks for sending it in and getting
> it merged!
>
> In general for future reference, the most effective way to contribute is
> outlined on this wiki page:
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
>
> On Mon, Feb 9, 2015 at 1:04 AM, Akhil Das <[hidden email]
> >
> wrote:
>
> > You can open a Jira issue pointing this PR to get it processed faster.
> :)
> >
> > Thanks
> > Best Regards
> >
> > On Sat, Feb 7, 2015 at 7:07 AM, fommil <[hidden email]
> > wrote:
> >
> > > Hi all,
> > >
> > > I'm the author of netlib-java and I noticed that the documentation in
> > MLlib
> > > was out of date and misleading, so I submitted a pull request on
> github
> > > which will hopefully make things easier for everybody to understand
> the
> > > benefits of system optimised natives and how to use them :-)
> > >
> > >   https://github.com/apache/spark/pull/4448
> > >
> > > However, it looks like there are a *lot* of outstanding PRs and that
> this
> > > is
> > > just a mirror repository.
> > >
> > > Will somebody please look at my PR and merge into the canonical source
> > (and
> > > let me know)?
> > >
> > > Best regards,
> > > Sam
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html
> > > Sent from the Apache Spark Developers List mailing list archive at
> > > Nabble.com.
> > >
> > > -
> > > To unsubscribe, e-mail: [hidden email]
> 
> > > For additional commands, e-mail: [hidden email]
> 
> > >
> > >
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10546.html
>  To unsubscribe from Pull Requests on github, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502p10558.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

R: Powered by Spark: Concur

2015-02-09 Thread Paolo Platter
Hi,

I checked the powered by wiki too and Agile Labs should be Agile Lab. The link 
is wrong too, it should be www.agilelab.it.
The description is correct.

Thanks a lot

Paolo

Inviata dal mio Windows Phone

Da: Denny Lee
Inviato: ‎10/‎02/‎2015 07:41
A: Matei Zaharia
Cc: dev@spark.apache.org
Oggetto: Re: Powered by Spark: Concur

Thanks Matei - much appreciated!

On Mon Feb 09 2015 at 10:23:57 PM Matei Zaharia 
wrote:

> Thanks Denny; added you.
>
> Matei
>
> > On Feb 9, 2015, at 10:11 PM, Denny Lee  wrote:
> >
> > Forgot to add Concur to the "Powered by Spark" wiki:
> >
> > Concur
> > https://www.concur.com
> > Spark SQL, MLLib
> > Using Spark for travel and expenses analytics and personalization
> >
> > Thanks!
> > Denny
>
>


RE: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Judy Nash
Thanks Patrick! That was the issue.
Built the jars on windows env with mvn and forgot to run make-distributions.ps1 
 afterward, so was looking at old jars.

From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Monday, February 9, 2015 10:43 PM
To: Judy Nash
Cc: dev@spark.apache.org
Subject: Re: New Metrics Sink class not packaged in spark-assembly jar

Actually, to correct myself, the assembly jar is in assembly/target/scala-2.11 
(I think).

On Mon, Feb 9, 2015 at 10:42 PM, Patrick Wendell 
mailto:pwend...@gmail.com>> wrote:
Hi Judy,

If you have added source files in the sink/ source folder, they should appear 
in the assembly jar when you build. One thing I noticed is that you are looking 
inside the "/dist" folder. That only gets populated if you run 
"make-distribution". The normal development process is just to do "mvn package" 
and then look at the assembly jar that is contained in core/target.

- Patrick

On Mon, Feb 9, 2015 at 10:02 PM, Judy Nash 
mailto:judyn...@exchange.microsoft.com>> wrote:
Hello,

Working on SPARK-5708 - Add 
Slf4jSink to Spark Metrics Sink.

Wrote a new Slf4jSink class (see patch attached), but the new class is not 
packaged as part of spark-assembly jar.

Do I need to update build config somewhere to have this packaged?

Current packaged class:
[cid:image001.png@01D044BC.8FE515C0]

Thought I must have missed something basic but can't figure out why.

Thanks!
Judy

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.org




Re: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Patrick Wendell
Hi Judy,

If you have added source files in the sink/ source folder, they should
appear in the assembly jar when you build. One thing I noticed is that you
are looking inside the "/dist" folder. That only gets populated if you run
"make-distribution". The normal development process is just to do "mvn
package" and then look at the assembly jar that is contained in core/target.

- Patrick

On Mon, Feb 9, 2015 at 10:02 PM, Judy Nash 
wrote:

>  Hello,
>
>
>
> Working on SPARK-5708 
> - Add Slf4jSink to Spark Metrics Sink.
>
>
>
> Wrote a new Slf4jSink class (see patch attached), but the new class is not
> packaged as part of spark-assembly jar.
>
>
>
> Do I need to update build config somewhere to have this packaged?
>
>
>
> Current packaged class:
>
>
>
> Thought I must have missed something basic but can't figure out why.
>
>
>
> Thanks!
>
> Judy
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>


Re: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Patrick Wendell
Actually, to correct myself, the assembly jar is in
assembly/target/scala-2.11 (I think).

On Mon, Feb 9, 2015 at 10:42 PM, Patrick Wendell  wrote:

> Hi Judy,
>
> If you have added source files in the sink/ source folder, they should
> appear in the assembly jar when you build. One thing I noticed is that you
> are looking inside the "/dist" folder. That only gets populated if you run
> "make-distribution". The normal development process is just to do "mvn
> package" and then look at the assembly jar that is contained in core/target.
>
> - Patrick
>
> On Mon, Feb 9, 2015 at 10:02 PM, Judy Nash <
> judyn...@exchange.microsoft.com> wrote:
>
>>  Hello,
>>
>>
>>
>> Working on SPARK-5708 
>> - Add Slf4jSink to Spark Metrics Sink.
>>
>>
>>
>> Wrote a new Slf4jSink class (see patch attached), but the new class is
>> not packaged as part of spark-assembly jar.
>>
>>
>>
>> Do I need to update build config somewhere to have this packaged?
>>
>>
>>
>> Current packaged class:
>>
>>
>>
>> Thought I must have missed something basic but can't figure out why.
>>
>>
>>
>> Thanks!
>>
>> Judy
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
>


Re: Powered by Spark: Concur

2015-02-09 Thread Denny Lee
Thanks Matei - much appreciated!

On Mon Feb 09 2015 at 10:23:57 PM Matei Zaharia 
wrote:

> Thanks Denny; added you.
>
> Matei
>
> > On Feb 9, 2015, at 10:11 PM, Denny Lee  wrote:
> >
> > Forgot to add Concur to the "Powered by Spark" wiki:
> >
> > Concur
> > https://www.concur.com
> > Spark SQL, MLLib
> > Using Spark for travel and expenses analytics and personalization
> >
> > Thanks!
> > Denny
>
>


Re: Powered by Spark: Concur

2015-02-09 Thread Matei Zaharia
Thanks Denny; added you.

Matei

> On Feb 9, 2015, at 10:11 PM, Denny Lee  wrote:
> 
> Forgot to add Concur to the "Powered by Spark" wiki:
> 
> Concur
> https://www.concur.com
> Spark SQL, MLLib
> Using Spark for travel and expenses analytics and personalization
> 
> Thanks!
> Denny


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Powered by Spark: Concur

2015-02-09 Thread Denny Lee
Forgot to add Concur to the "Powered by Spark" wiki:

Concur
https://www.concur.com
Spark SQL, MLLib
Using Spark for travel and expenses analytics and personalization

Thanks!
Denny


Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Patrick Wendell
Mark was involved in adding this code (IIRC) and has also been the
most active in maintaining it. So I'd be interested in hearing his
thoughts on that proposal. Mark - would you be okay deprecating this
and having Spark instead work with the upstream projects that focus on
packaging?

My feeling is that it's better to just have nothing than to have
something not usable out-of-the-box (which to your point, is a lot
more work).

On Mon, Feb 9, 2015 at 4:10 PM,   wrote:
> This could be something if the spark community wanted to not maintain 
> debs/rpms directly via the project could direct interested efforts towards 
> apache bigtop.  Right now debs/rpms of bigtop components, as well as related 
> tests is a focus.
>
> Something that would be great is if at least one spark committer with 
> interests in config/pkg/testing could be liason and pt for bigtop efforts.
>
> Right now focus on bigtop 0.9, which currently includes spark 1.2.  Jira for 
> items included in 0.9 can be found here:
>
> https://issues.apache.org/jira/browse/BIGTOP-1480
>
>
>
> -Original Message-
> From: Sean Owen [mailto:so...@cloudera.com]
> Sent: Monday, February 9, 2015 3:52 PM
> To: Nicholas Chammas
> Cc: Patrick Wendell; Mark Hamstra; dev
> Subject: Re: Keep or remove Debian packaging in Spark?
>
> What about this straw man proposal: deprecate in 1.3 with some kind of 
> message in the build, and remove for 1.4? And add a pointer to any 
> third-party packaging that might provide similar functionality?
>
> On Mon, Feb 9, 2015 at 6:47 PM, Nicholas Chammas  
> wrote:
>> +1 to an "official" deprecation + redirecting users to some other
>> +project
>> that will or already is taking this on.
>>
>> Nate?
>>
>>
>>
>> On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell 
>> wrote:
>>>
>>> I have wondered whether we should sort of deprecated it more
>>> officially, since otherwise I think people have the reasonable
>>> expectation based on the current code that Spark intends to support
>>> "complete" Debian packaging as part of the upstream build. Having
>>> something that's sort-of maintained but no one is helping review and
>>> merge patches on it or make it fully functional, IMO that doesn't
>>> benefit us or our users. There are a bunch of other projects that are
>>> specifically devoted to packaging, so it seems like there is a clear
>>> separation of concerns here.
>>>
>>> On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra
>>> 
>>> wrote:
>>> >>
>>> >> it sounds like nobody intends these to be used to actually deploy
>>> >> Spark
>>> >
>>> >
>>> > I wouldn't go quite that far.  What we have now can serve as useful
>>> > input to a deployment tool like Chef, but the user is then going to
>>> > need to add some customization or configuration within the context
>>> > of that tooling to get Spark installed just the way they want.  So
>>> > it is not so much that the current Debian packaging can't be used
>>> > as that it has never really been intended to be a completely
>>> > finished product that a newcomer could, for example, use to install
>>> > Spark completely and quickly to Ubuntu and have a fully-functional
>>> > environment in which they could then run all of the examples,
>>> > tutorials, etc.
>>> >
>>> > Getting to that level of packaging (and maintenance) is something
>>> > that I'm not sure we want to do since that is a better fit with
>>> > Bigtop and the efforts of Cloudera, Horton Works, MapR, etc. to
>>> > distribute Spark.
>>> >
>>> > On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen  wrote:
>>> >
>>> >> This is a straw poll to assess whether there is support to keep
>>> >> and fix, or remove, the Debian packaging-related config in Spark.
>>> >>
>>> >> I see several oldish outstanding JIRAs relating to problems in the
>>> >> packaging:
>>> >>
>>> >> https://issues.apache.org/jira/browse/SPARK-1799
>>> >> https://issues.apache.org/jira/browse/SPARK-2614
>>> >> https://issues.apache.org/jira/browse/SPARK-3624
>>> >> https://issues.apache.org/jira/browse/SPARK-4436
>>> >> (and a similar idea about making RPMs)
>>> >> https://issues.apache.org/jira/browse/SPARK-665
>>> >>
>>> >> The original motivation seems related to Chef:
>>> >>
>>> >>
>>> >>
>>> >> https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=
>>> >> 14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:comm
>>> >> ent-tabpanel#comment-14070908
>>> >>
>>> >> Mark's recent comments cast some doubt on whether it is essential:
>>> >>
>>> >> https://github.com/apache/spark/pull/4277#issuecomment-72114226
>>> >>
>>> >> and in recent conversations I didn't hear dissent to the idea of
>>> >> removing this.
>>> >>
>>> >> Is this still useful enough to fix up? All else equal I'd like to
>>> >> start to walk back some of the complexity of the build, but I
>>> >> don't know how all-else-equal it is. Certainly, it sounds like
>>> >> nobody intends these to be used to actually deploy Spark.
>>> >>
>>> >> I don't doubt it's useful to someone, but can they maintain th

New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Judy Nash
Hello,

Working on SPARK-5708 - Add 
Slf4jSink to Spark Metrics Sink.

Wrote a new Slf4jSink class (see patch attached), but the new class is not 
packaged as part of spark-assembly jar.

Do I need to update build config somewhere to have this packaged?

Current packaged class:
[cid:image001.png@01D044B4.1B17A1C0]

Thought I must have missed something basic but can't figure out why.

Thanks!
Judy

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Mail to u...@spark.apache.org failing

2015-02-09 Thread Patrick Wendell
Ah - we should update it to suggest mailing the dev@ list (and if
there is enough traffic maybe do something else).

I'm happy to add you if you can give an organization name, URL, a list
of which Spark components you are using, and a short description of
your use case..

On Mon, Feb 9, 2015 at 9:00 PM, Meethu Mathew  wrote:
> Hi,
>
> The mail id given in
> https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark seems to
> be failing. Can anyone tell me how to get added to Powered By Spark list?
>
> --
>
> Regards,
>
> *Meethu*

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Mail to u...@spark.apache.org failing

2015-02-09 Thread Meethu Mathew

Hi,

The mail id given in 
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark seems 
to be failing. Can anyone tell me how to get added to Powered By Spark list?


--

Regards,

*Meethu*


Re: Pull Requests on github

2015-02-09 Thread Andrew Ash
Sam, I see your PR was merged -- many thanks for sending it in and getting
it merged!

In general for future reference, the most effective way to contribute is
outlined on this wiki page:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

On Mon, Feb 9, 2015 at 1:04 AM, Akhil Das 
wrote:

> You can open a Jira issue pointing this PR to get it processed faster. :)
>
> Thanks
> Best Regards
>
> On Sat, Feb 7, 2015 at 7:07 AM, fommil  wrote:
>
> > Hi all,
> >
> > I'm the author of netlib-java and I noticed that the documentation in
> MLlib
> > was out of date and misleading, so I submitted a pull request on github
> > which will hopefully make things easier for everybody to understand the
> > benefits of system optimised natives and how to use them :-)
> >
> >   https://github.com/apache/spark/pull/4448
> >
> > However, it looks like there are a *lot* of outstanding PRs and that this
> > is
> > just a mirror repository.
> >
> > Will somebody please look at my PR and merge into the canonical source
> (and
> > let me know)?
> >
> > Best regards,
> > Sam
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Pull-Requests-on-github-tp10502.html
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
>


Re: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Evan R. Sparks
Great - perhaps we can move this discussion off-list and onto a JIRA
ticket? (Here's one: https://issues.apache.org/jira/browse/SPARK-5705)

It seems like this is going to be somewhat exploratory for a while (and
there's probably only a handful of us who really care about fast linear
algebra!)

- Evan

On Mon, Feb 9, 2015 at 4:48 PM, Ulanov, Alexander 
wrote:

>  Hi Evan,
>
>
>
> Thank you for explanation and useful link. I am going to build OpenBLAS,
> link it with Netlib-java and perform benchmark again.
>
>
>
> Do I understand correctly that BIDMat binaries contain statically linked
> Intel MKL BLAS? It might be the reason why I am able to run BIDMat not
> having MKL BLAS installed on my server. If it is true, I wonder if it is OK
> because Intel sells this library. Nevertheless, it seems that in my case
> precompiled MKL BLAS performs better than precompiled OpenBLAS given that
> BIDMat and Netlib-java are supposed to be on par with JNI overheads.
>
>
>
> Though, it might be interesting to link Netlib-java with Intel MKL, as you
> suggested. I wonder, are John Canny (BIDMat) and Sam Halliday (Netlib-java)
> interested to compare their libraries.
>
>
>
> Best regards, Alexander
>
>
>
> *From:* Evan R. Sparks [mailto:evan.spa...@gmail.com]
> *Sent:* Friday, February 06, 2015 5:58 PM
>
> *To:* Ulanov, Alexander
> *Cc:* Joseph Bradley; dev@spark.apache.org
> *Subject:* Re: Using CUDA within Spark / boosting linear algebra
>
>
>
> I would build OpenBLAS yourself, since good BLAS performance comes from
> getting cache sizes, etc. set up correctly for your particular hardware -
> this is often a very tricky process (see, e.g. ATLAS), but we found that on
> relatively modern Xeon chips, OpenBLAS builds quickly and yields
> performance competitive with MKL.
>
>
>
> To make sure the right library is getting used, you have to make sure it's
> first on the search path - export LD_LIBRARY_PATH=/path/to/blas/library.so
> will do the trick here.
>
>
>
> For some examples of getting netlib-java setup on an ec2 node and some
> example benchmarking code we ran a while back, see:
> https://github.com/shivaram/matrix-bench
>
>
>
> In particular - build-openblas-ec2.sh shows you how to build the library
> and set up symlinks correctly, and scala/run-netlib.sh shows you how to get
> the path setup and get that library picked up by netlib-java.
>
>
>
> In this way - you could probably get cuBLAS set up to be used by
> netlib-java as well.
>
>
>
> - Evan
>
>
>
> On Fri, Feb 6, 2015 at 5:43 PM, Ulanov, Alexander 
> wrote:
>
>  Evan, could you elaborate on how to force BIDMat and netlib-java to
> force loading the right blas? For netlib, I there are few JVM flags, such
> as -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I
> can force it to use Java implementation. Not sure I understand how to force
> use a specific blas (not specific wrapper for blas).
>
>
>
> Btw. I have installed openblas (yum install openblas), so I suppose that
> netlib is using it.
>
>
>
> *From:* Evan R. Sparks [mailto:evan.spa...@gmail.com]
> *Sent:* Friday, February 06, 2015 5:19 PM
> *To:* Ulanov, Alexander
> *Cc:* Joseph Bradley; dev@spark.apache.org
>
>
> *Subject:* Re: Using CUDA within Spark / boosting linear algebra
>
>
>
> Getting breeze to pick up the right blas library is critical for
> performance. I recommend using OpenBLAS (or MKL, if you already have it).
> It might make sense to force BIDMat to use the same underlying BLAS library
> as well.
>
>
>
> On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander 
> wrote:
>
> Hi Evan, Joseph
>
> I did few matrix multiplication test and BIDMat seems to be ~10x faster
> than netlib-java+breeze (sorry for weird table formatting):
>
> |A*B  size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64|
> Breeze+Netlib-java f2jblas |
> +---+
> |100x100*100x100 | 0,00205596 | 0,03810324 | 0,002556 |
> |1000x1000*1000x1000 | 0,018320947 | 0,51803557 |1,638475459 |
> |1x1*1x1 | 23,78046632 | 445,0935211 | 1569,233228 |
>
> Configuration: Intel(R) Xeon(R) CPU E31240 3.3 GHz, 6GB RAM, Fedora 19
> Linux, Scala 2.11.
>
> Later I will make tests with Cuda. I need to install new Cuda version for
> this purpose.
>
> Do you have any ideas why breeze-netlib with native blas is so much slower
> than BIDMat MKL?
>
> Best regards, Alexander
>
> From: Joseph Bradley [mailto:jos...@databricks.com]
> Sent: Thursday, February 05, 2015 5:29 PM
> To: Ulanov, Alexander
> Cc: Evan R. Sparks; dev@spark.apache.org
>
> Subject: Re: Using CUDA within Spark / boosting linear algebra
>
> Hi Alexander,
>
> Using GPUs with Spark would be very exciting.  Small comment: Concerning
> your question earlier about keeping data stored on the GPU rather than
> having to move it between main memory and GPU memory on each iteration, I
> would guess this would be critical to getting good performance.  If you
> could do multiple local iterati

Re: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Chester @work
Maybe you can ask prof john canny himself:-)  as I invited him to give a talk 
at Alpine data labs in March's meetup (SF big Analytics & SF machine learning 
joined meetup) , 3/11. To be announced in next day or so. 

Chester

Sent from my iPhone

> On Feb 9, 2015, at 4:48 PM, "Ulanov, Alexander"  
> wrote:
> 
> Hi Evan,
> 
> Thank you for explanation and useful link. I am going to build OpenBLAS, link 
> it with Netlib-java and perform benchmark again.
> 
> Do I understand correctly that BIDMat binaries contain statically linked 
> Intel MKL BLAS? It might be the reason why I am able to run BIDMat not having 
> MKL BLAS installed on my server. If it is true, I wonder if it is OK because 
> Intel sells this library. Nevertheless, it seems that in my case precompiled 
> MKL BLAS performs better than precompiled OpenBLAS given that BIDMat and 
> Netlib-java are supposed to be on par with JNI overheads.
> 
> Though, it might be interesting to link Netlib-java with Intel MKL, as you 
> suggested. I wonder, are John Canny (BIDMat) and Sam Halliday (Netlib-java) 
> interested to compare their libraries.
> 
> Best regards, Alexander
> 
> From: Evan R. Sparks [mailto:evan.spa...@gmail.com]
> Sent: Friday, February 06, 2015 5:58 PM
> To: Ulanov, Alexander
> Cc: Joseph Bradley; dev@spark.apache.org
> Subject: Re: Using CUDA within Spark / boosting linear algebra
> 
> I would build OpenBLAS yourself, since good BLAS performance comes from 
> getting cache sizes, etc. set up correctly for your particular hardware - 
> this is often a very tricky process (see, e.g. ATLAS), but we found that on 
> relatively modern Xeon chips, OpenBLAS builds quickly and yields performance 
> competitive with MKL.
> 
> To make sure the right library is getting used, you have to make sure it's 
> first on the search path - export LD_LIBRARY_PATH=/path/to/blas/library.so 
> will do the trick here.
> 
> For some examples of getting netlib-java setup on an ec2 node and some 
> example benchmarking code we ran a while back, see: 
> https://github.com/shivaram/matrix-bench
> 
> In particular - build-openblas-ec2.sh shows you how to build the library and 
> set up symlinks correctly, and scala/run-netlib.sh shows you how to get the 
> path setup and get that library picked up by netlib-java.
> 
> In this way - you could probably get cuBLAS set up to be used by netlib-java 
> as well.
> 
> - Evan
> 
> On Fri, Feb 6, 2015 at 5:43 PM, Ulanov, Alexander 
> mailto:alexander.ula...@hp.com>> wrote:
> Evan, could you elaborate on how to force BIDMat and netlib-java to force 
> loading the right blas? For netlib, I there are few JVM flags, such as 
> -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I can 
> force it to use Java implementation. Not sure I understand how to force use a 
> specific blas (not specific wrapper for blas).
> 
> Btw. I have installed openblas (yum install openblas), so I suppose that 
> netlib is using it.
> 
> From: Evan R. Sparks 
> [mailto:evan.spa...@gmail.com]
> Sent: Friday, February 06, 2015 5:19 PM
> To: Ulanov, Alexander
> Cc: Joseph Bradley; dev@spark.apache.org
> 
> Subject: Re: Using CUDA within Spark / boosting linear algebra
> 
> Getting breeze to pick up the right blas library is critical for performance. 
> I recommend using OpenBLAS (or MKL, if you already have it). It might make 
> sense to force BIDMat to use the same underlying BLAS library as well.
> 
> On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander 
> mailto:alexander.ula...@hp.com>> wrote:
> Hi Evan, Joseph
> 
> I did few matrix multiplication test and BIDMat seems to be ~10x faster than 
> netlib-java+breeze (sorry for weird table formatting):
> 
> |A*B  size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64| 
> Breeze+Netlib-java f2jblas |
> +---+
> |100x100*100x100 | 0,00205596 | 0,03810324 | 0,002556 |
> |1000x1000*1000x1000 | 0,018320947 | 0,51803557 |1,638475459 |
> |1x1*1x1 | 23,78046632 | 445,0935211 | 1569,233228 |
> 
> Configuration: Intel(R) Xeon(R) CPU E31240 3.3 GHz, 6GB RAM, Fedora 19 Linux, 
> Scala 2.11.
> 
> Later I will make tests with Cuda. I need to install new Cuda version for 
> this purpose.
> 
> Do you have any ideas why breeze-netlib with native blas is so much slower 
> than BIDMat MKL?
> 
> Best regards, Alexander
> 
> From: Joseph Bradley 
> [mailto:jos...@databricks.com]
> Sent: Thursday, February 05, 2015 5:29 PM
> To: Ulanov, Alexander
> Cc: Evan R. Sparks; dev@spark.apache.org
> Subject: Re: Using CUDA within Spark / boosting linear algebra
> 
> Hi Alexander,
> 
> Using GPUs with Spark would be very exciting.  Small comment: Concerning your 
> question earlier about keeping data stored on the GPU rather than having to 
> move it between main memory and GPU memory on each iteration, I would g

adding some temporary jenkins worker nodes...

2015-02-09 Thread shane knapp
...to help w/the build backlog.  let's all welcome
amp-jenkins-slave-{01..03} back to the fray!


RE: Using CUDA within Spark / boosting linear algebra

2015-02-09 Thread Ulanov, Alexander
Hi Evan,

Thank you for explanation and useful link. I am going to build OpenBLAS, link 
it with Netlib-java and perform benchmark again.

Do I understand correctly that BIDMat binaries contain statically linked Intel 
MKL BLAS? It might be the reason why I am able to run BIDMat not having MKL 
BLAS installed on my server. If it is true, I wonder if it is OK because Intel 
sells this library. Nevertheless, it seems that in my case precompiled MKL BLAS 
performs better than precompiled OpenBLAS given that BIDMat and Netlib-java are 
supposed to be on par with JNI overheads.

Though, it might be interesting to link Netlib-java with Intel MKL, as you 
suggested. I wonder, are John Canny (BIDMat) and Sam Halliday (Netlib-java) 
interested to compare their libraries.

Best regards, Alexander

From: Evan R. Sparks [mailto:evan.spa...@gmail.com]
Sent: Friday, February 06, 2015 5:58 PM
To: Ulanov, Alexander
Cc: Joseph Bradley; dev@spark.apache.org
Subject: Re: Using CUDA within Spark / boosting linear algebra

I would build OpenBLAS yourself, since good BLAS performance comes from getting 
cache sizes, etc. set up correctly for your particular hardware - this is often 
a very tricky process (see, e.g. ATLAS), but we found that on relatively modern 
Xeon chips, OpenBLAS builds quickly and yields performance competitive with MKL.

To make sure the right library is getting used, you have to make sure it's 
first on the search path - export LD_LIBRARY_PATH=/path/to/blas/library.so will 
do the trick here.

For some examples of getting netlib-java setup on an ec2 node and some example 
benchmarking code we ran a while back, see: 
https://github.com/shivaram/matrix-bench

In particular - build-openblas-ec2.sh shows you how to build the library and 
set up symlinks correctly, and scala/run-netlib.sh shows you how to get the 
path setup and get that library picked up by netlib-java.

In this way - you could probably get cuBLAS set up to be used by netlib-java as 
well.

- Evan

On Fri, Feb 6, 2015 at 5:43 PM, Ulanov, Alexander 
mailto:alexander.ula...@hp.com>> wrote:
Evan, could you elaborate on how to force BIDMat and netlib-java to force 
loading the right blas? For netlib, I there are few JVM flags, such as 
-Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I can 
force it to use Java implementation. Not sure I understand how to force use a 
specific blas (not specific wrapper for blas).

Btw. I have installed openblas (yum install openblas), so I suppose that netlib 
is using it.

From: Evan R. Sparks 
[mailto:evan.spa...@gmail.com]
Sent: Friday, February 06, 2015 5:19 PM
To: Ulanov, Alexander
Cc: Joseph Bradley; dev@spark.apache.org

Subject: Re: Using CUDA within Spark / boosting linear algebra

Getting breeze to pick up the right blas library is critical for performance. I 
recommend using OpenBLAS (or MKL, if you already have it). It might make sense 
to force BIDMat to use the same underlying BLAS library as well.

On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander 
mailto:alexander.ula...@hp.com>> wrote:
Hi Evan, Joseph

I did few matrix multiplication test and BIDMat seems to be ~10x faster than 
netlib-java+breeze (sorry for weird table formatting):

|A*B  size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64| 
Breeze+Netlib-java f2jblas |
+---+
|100x100*100x100 | 0,00205596 | 0,03810324 | 0,002556 |
|1000x1000*1000x1000 | 0,018320947 | 0,51803557 |1,638475459 |
|1x1*1x1 | 23,78046632 | 445,0935211 | 1569,233228 |

Configuration: Intel(R) Xeon(R) CPU E31240 3.3 GHz, 6GB RAM, Fedora 19 Linux, 
Scala 2.11.

Later I will make tests with Cuda. I need to install new Cuda version for this 
purpose.

Do you have any ideas why breeze-netlib with native blas is so much slower than 
BIDMat MKL?

Best regards, Alexander

From: Joseph Bradley 
[mailto:jos...@databricks.com]
Sent: Thursday, February 05, 2015 5:29 PM
To: Ulanov, Alexander
Cc: Evan R. Sparks; dev@spark.apache.org
Subject: Re: Using CUDA within Spark / boosting linear algebra

Hi Alexander,

Using GPUs with Spark would be very exciting.  Small comment: Concerning your 
question earlier about keeping data stored on the GPU rather than having to 
move it between main memory and GPU memory on each iteration, I would guess 
this would be critical to getting good performance.  If you could do multiple 
local iterations before aggregating results, then the cost of data movement to 
the GPU could be amortized (and I believe that is done in practice).  Having 
Spark be aware of the GPU and using it as another part of memory sounds like a 
much bigger undertaking.

Joseph

On Thu, Feb 5, 2015 at 4:59 PM, Ulanov, Alexander 
mailto:alexander.ula...@hp.com>> wrote:
Thank you for explanation! I’ve watched the BIDMach presentation by John Canny 

RE: Keep or remove Debian packaging in Spark?

2015-02-09 Thread nate
This could be something if the spark community wanted to not maintain debs/rpms 
directly via the project could direct interested efforts towards apache bigtop. 
 Right now debs/rpms of bigtop components, as well as related tests is a focus.

Something that would be great is if at least one spark committer with interests 
in config/pkg/testing could be liason and pt for bigtop efforts.

Right now focus on bigtop 0.9, which currently includes spark 1.2.  Jira for 
items included in 0.9 can be found here:

https://issues.apache.org/jira/browse/BIGTOP-1480



-Original Message-
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Monday, February 9, 2015 3:52 PM
To: Nicholas Chammas
Cc: Patrick Wendell; Mark Hamstra; dev
Subject: Re: Keep or remove Debian packaging in Spark?

What about this straw man proposal: deprecate in 1.3 with some kind of message 
in the build, and remove for 1.4? And add a pointer to any third-party 
packaging that might provide similar functionality?

On Mon, Feb 9, 2015 at 6:47 PM, Nicholas Chammas  
wrote:
> +1 to an "official" deprecation + redirecting users to some other 
> +project
> that will or already is taking this on.
>
> Nate?
>
>
>
> On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell 
> wrote:
>>
>> I have wondered whether we should sort of deprecated it more 
>> officially, since otherwise I think people have the reasonable 
>> expectation based on the current code that Spark intends to support 
>> "complete" Debian packaging as part of the upstream build. Having 
>> something that's sort-of maintained but no one is helping review and 
>> merge patches on it or make it fully functional, IMO that doesn't 
>> benefit us or our users. There are a bunch of other projects that are 
>> specifically devoted to packaging, so it seems like there is a clear 
>> separation of concerns here.
>>
>> On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra 
>> 
>> wrote:
>> >>
>> >> it sounds like nobody intends these to be used to actually deploy 
>> >> Spark
>> >
>> >
>> > I wouldn't go quite that far.  What we have now can serve as useful 
>> > input to a deployment tool like Chef, but the user is then going to 
>> > need to add some customization or configuration within the context 
>> > of that tooling to get Spark installed just the way they want.  So 
>> > it is not so much that the current Debian packaging can't be used 
>> > as that it has never really been intended to be a completely 
>> > finished product that a newcomer could, for example, use to install 
>> > Spark completely and quickly to Ubuntu and have a fully-functional 
>> > environment in which they could then run all of the examples, 
>> > tutorials, etc.
>> >
>> > Getting to that level of packaging (and maintenance) is something 
>> > that I'm not sure we want to do since that is a better fit with 
>> > Bigtop and the efforts of Cloudera, Horton Works, MapR, etc. to 
>> > distribute Spark.
>> >
>> > On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen  wrote:
>> >
>> >> This is a straw poll to assess whether there is support to keep 
>> >> and fix, or remove, the Debian packaging-related config in Spark.
>> >>
>> >> I see several oldish outstanding JIRAs relating to problems in the
>> >> packaging:
>> >>
>> >> https://issues.apache.org/jira/browse/SPARK-1799
>> >> https://issues.apache.org/jira/browse/SPARK-2614
>> >> https://issues.apache.org/jira/browse/SPARK-3624
>> >> https://issues.apache.org/jira/browse/SPARK-4436
>> >> (and a similar idea about making RPMs)
>> >> https://issues.apache.org/jira/browse/SPARK-665
>> >>
>> >> The original motivation seems related to Chef:
>> >>
>> >>
>> >>
>> >> https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=
>> >> 14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:comm
>> >> ent-tabpanel#comment-14070908
>> >>
>> >> Mark's recent comments cast some doubt on whether it is essential:
>> >>
>> >> https://github.com/apache/spark/pull/4277#issuecomment-72114226
>> >>
>> >> and in recent conversations I didn't hear dissent to the idea of 
>> >> removing this.
>> >>
>> >> Is this still useful enough to fix up? All else equal I'd like to 
>> >> start to walk back some of the complexity of the build, but I 
>> >> don't know how all-else-equal it is. Certainly, it sounds like 
>> >> nobody intends these to be used to actually deploy Spark.
>> >>
>> >> I don't doubt it's useful to someone, but can they maintain the 
>> >> packaging logic elsewhere?
>> >>
>> >> --
>> >> --- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
>> >> additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
>> additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr..

Re: spark-ec2 licensing clarification

2015-02-09 Thread Shivaram Venkataraman
+spark dev list

Yes, we should add an Apache license to it -- Feel free to open a PR for
it. BTW though it is a part of the mesos github account, it is almost
exclusively used by the Spark Project AFAIK.

Longer term it may make sense to move it to a more appropriate github
account (we could move it to amplab/ for instance as the AMPLab provides
Jenkins support etc. too)

Thanks
Shivaram

On Mon, Feb 9, 2015 at 3:26 PM, Florian Verhein  wrote:

> Hi guys,
>
> Are there any plans to add licensing information to the mesos/spark-ec2
> repo?
> I'd assumed it would be Apache 2.0 but then noticed there's no info in the
> repo.
>
> Background:
> https://issues.apache.org/jira/browse/SPARK-5676
>
> Regards,
>Florian
>
>
>


Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Sean Owen
What about this straw man proposal: deprecate in 1.3 with some kind of
message in the build, and remove for 1.4? And add a pointer to any
third-party packaging that might provide similar functionality?

On Mon, Feb 9, 2015 at 6:47 PM, Nicholas Chammas
 wrote:
> +1 to an "official" deprecation + redirecting users to some other project
> that will or already is taking this on.
>
> Nate?
>
>
>
> On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell 
> wrote:
>>
>> I have wondered whether we should sort of deprecated it more
>> officially, since otherwise I think people have the reasonable
>> expectation based on the current code that Spark intends to support
>> "complete" Debian packaging as part of the upstream build. Having
>> something that's sort-of maintained but no one is helping review and
>> merge patches on it or make it fully functional, IMO that doesn't
>> benefit us or our users. There are a bunch of other projects that are
>> specifically devoted to packaging, so it seems like there is a clear
>> separation of concerns here.
>>
>> On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra 
>> wrote:
>> >>
>> >> it sounds like nobody intends these to be used to actually deploy Spark
>> >
>> >
>> > I wouldn't go quite that far.  What we have now can serve as useful
>> > input
>> > to a deployment tool like Chef, but the user is then going to need to
>> > add
>> > some customization or configuration within the context of that tooling
>> > to
>> > get Spark installed just the way they want.  So it is not so much that
>> > the
>> > current Debian packaging can't be used as that it has never really been
>> > intended to be a completely finished product that a newcomer could, for
>> > example, use to install Spark completely and quickly to Ubuntu and have
>> > a
>> > fully-functional environment in which they could then run all of the
>> > examples, tutorials, etc.
>> >
>> > Getting to that level of packaging (and maintenance) is something that
>> > I'm
>> > not sure we want to do since that is a better fit with Bigtop and the
>> > efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.
>> >
>> > On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen  wrote:
>> >
>> >> This is a straw poll to assess whether there is support to keep and
>> >> fix, or remove, the Debian packaging-related config in Spark.
>> >>
>> >> I see several oldish outstanding JIRAs relating to problems in the
>> >> packaging:
>> >>
>> >> https://issues.apache.org/jira/browse/SPARK-1799
>> >> https://issues.apache.org/jira/browse/SPARK-2614
>> >> https://issues.apache.org/jira/browse/SPARK-3624
>> >> https://issues.apache.org/jira/browse/SPARK-4436
>> >> (and a similar idea about making RPMs)
>> >> https://issues.apache.org/jira/browse/SPARK-665
>> >>
>> >> The original motivation seems related to Chef:
>> >>
>> >>
>> >>
>> >> https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908
>> >>
>> >> Mark's recent comments cast some doubt on whether it is essential:
>> >>
>> >> https://github.com/apache/spark/pull/4277#issuecomment-72114226
>> >>
>> >> and in recent conversations I didn't hear dissent to the idea of
>> >> removing
>> >> this.
>> >>
>> >> Is this still useful enough to fix up? All else equal I'd like to
>> >> start to walk back some of the complexity of the build, but I don't
>> >> know how all-else-equal it is. Certainly, it sounds like nobody
>> >> intends these to be used to actually deploy Spark.
>> >>
>> >> I don't doubt it's useful to someone, but can they maintain the
>> >> packaging logic elsewhere?
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: multi-line comment style

2015-02-09 Thread Sandy Ryza
+1 to what Andrew said, I think both make sense in different situations and
trusting developer discretion here is reasonable.

On Mon, Feb 9, 2015 at 1:48 PM, Andrew Or  wrote:

> In my experience I find it much more natural to use // for short multi-line
> comments (2 or 3 lines), and /* */ for long multi-line comments involving
> one or more paragraphs. For short multi-line comments, there is no reason
> not to use // if it just so happens that your first line exceeded 100
> characters and you have to wrap it. For long multi-line comments, however,
> using // all the way looks really awkward especially if you have multiple
> paragraphs.
>
> Thus, I would actually suggest that we don't try to pick a favorite and
> document that both are acceptable. I don't expect developers to follow my
> exact usage (i.e. with a tipping point of 2-3 lines) so I wouldn't enforce
> anything specific either.
>
> 2015-02-09 13:36 GMT-08:00 Reynold Xin :
>
> > Why don't we just pick // as the default (by encouraging it in the style
> > guide), since it is mostly used, and then do not disallow /* */? I don't
> > think it is that big of a deal to have slightly deviations here since it
> is
> > dead simple to understand what's going on.
> >
> >
> > On Mon, Feb 9, 2015 at 1:33 PM, Patrick Wendell 
> > wrote:
> >
> > > Clearly there isn't a strictly optimal commenting format (pro's and
> > > cons for both '//' and '/*'). My thought is for consistency we should
> > > just chose one and put in the style guide.
> > >
> > > On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng 
> wrote:
> > > > Btw, I think allowing `/* ... */` without the leading `*` in lines is
> > > > also useful. Check this line:
> > > >
> > >
> >
> https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55
> > > ,
> > > > where we put the R commands that can reproduce the test result. It is
> > > > easier if we write in the following style:
> > > >
> > > > ~~~
> > > > /*
> > > >  Using the following R code to load the data and train the model
> using
> > > > glmnet package.
> > > >
> > > >  library("glmnet")
> > > >  data <- read.csv("path", header=FALSE, stringsAsFactors=FALSE)
> > > >  features <- as.matrix(data.frame(as.numeric(data$V2),
> > > as.numeric(data$V3)))
> > > >  label <- as.numeric(data$V1)
> > > >  weights <- coef(glmnet(features, label, family="gaussian", alpha =
> 0,
> > > > lambda = 0))
> > > >  */
> > > > ~~~
> > > >
> > > > So people can copy & paste the R commands directly.
> > > >
> > > > Xiangrui
> > > >
> > > > On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng 
> > wrote:
> > > >> I like the `/* .. */` style more. Because it is easier for IDEs to
> > > >> recognize it as a block comment. If you press enter in the comment
> > > >> block with the `//` style, IDEs won't add `//` for you. -Xiangrui
> > > >>
> > > >> On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin 
> > > wrote:
> > > >>> We should update the style doc to reflect what we have in most
> places
> > > >>> (which I think is //).
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman <
> > > >>> shiva...@eecs.berkeley.edu> wrote:
> > > >>>
> > >  FWIW I like the multi-line // over /* */ from a purely style
> > > standpoint.
> > >  The Google Java style guide[1] has some comment about code
> > formatting
> > > tools
> > >  working better with /* */ but there doesn't seem to be any strong
> > > arguments
> > >  for one over the other I can find
> > > 
> > >  Thanks
> > >  Shivaram
> > > 
> > >  [1]
> > > 
> > > 
> > >
> >
> https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
> > > 
> > >  On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell <
> pwend...@gmail.com
> > >
> > >  wrote:
> > > 
> > >  > Personally I have no opinion, but agree it would be nice to
> > > standardize.
> > >  >
> > >  > - Patrick
> > >  >
> > >  > On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen 
> > > wrote:
> > >  > > One thing Marcelo pointed out to me is that the // style does
> > not
> > >  > > interfere with commenting out blocks of code with /* */, which
> > is
> > > a
> > >  > > small good thing. I am also accustomed to // style for
> > multiline,
> > > and
> > >  > > reserve /** */ for javadoc / scaladoc. Meaning, seeing the /*
> */
> > > style
> > >  > > inline always looks a little funny to me.
> > >  > >
> > >  > > On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout <
> > >  kayousterh...@gmail.com>
> > >  > wrote:
> > >  > >> Hi all,
> > >  > >>
> > >  > >> The Spark Style Guide
> > >  > >> <
> > >  >
> > >
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
> > >  >
> > >  > >> says multi-line comments should formatted as:
> > >  > >>
> > >  > >> /*
> > >  > >>  * This is a
> > >  > >>  * very
> > >  > >>  * long comment.
> > >  >

Re: multi-line comment style

2015-02-09 Thread Andrew Or
In my experience I find it much more natural to use // for short multi-line
comments (2 or 3 lines), and /* */ for long multi-line comments involving
one or more paragraphs. For short multi-line comments, there is no reason
not to use // if it just so happens that your first line exceeded 100
characters and you have to wrap it. For long multi-line comments, however,
using // all the way looks really awkward especially if you have multiple
paragraphs.

Thus, I would actually suggest that we don't try to pick a favorite and
document that both are acceptable. I don't expect developers to follow my
exact usage (i.e. with a tipping point of 2-3 lines) so I wouldn't enforce
anything specific either.

2015-02-09 13:36 GMT-08:00 Reynold Xin :

> Why don't we just pick // as the default (by encouraging it in the style
> guide), since it is mostly used, and then do not disallow /* */? I don't
> think it is that big of a deal to have slightly deviations here since it is
> dead simple to understand what's going on.
>
>
> On Mon, Feb 9, 2015 at 1:33 PM, Patrick Wendell 
> wrote:
>
> > Clearly there isn't a strictly optimal commenting format (pro's and
> > cons for both '//' and '/*'). My thought is for consistency we should
> > just chose one and put in the style guide.
> >
> > On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng  wrote:
> > > Btw, I think allowing `/* ... */` without the leading `*` in lines is
> > > also useful. Check this line:
> > >
> >
> https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55
> > ,
> > > where we put the R commands that can reproduce the test result. It is
> > > easier if we write in the following style:
> > >
> > > ~~~
> > > /*
> > >  Using the following R code to load the data and train the model using
> > > glmnet package.
> > >
> > >  library("glmnet")
> > >  data <- read.csv("path", header=FALSE, stringsAsFactors=FALSE)
> > >  features <- as.matrix(data.frame(as.numeric(data$V2),
> > as.numeric(data$V3)))
> > >  label <- as.numeric(data$V1)
> > >  weights <- coef(glmnet(features, label, family="gaussian", alpha = 0,
> > > lambda = 0))
> > >  */
> > > ~~~
> > >
> > > So people can copy & paste the R commands directly.
> > >
> > > Xiangrui
> > >
> > > On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng 
> wrote:
> > >> I like the `/* .. */` style more. Because it is easier for IDEs to
> > >> recognize it as a block comment. If you press enter in the comment
> > >> block with the `//` style, IDEs won't add `//` for you. -Xiangrui
> > >>
> > >> On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin 
> > wrote:
> > >>> We should update the style doc to reflect what we have in most places
> > >>> (which I think is //).
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman <
> > >>> shiva...@eecs.berkeley.edu> wrote:
> > >>>
> >  FWIW I like the multi-line // over /* */ from a purely style
> > standpoint.
> >  The Google Java style guide[1] has some comment about code
> formatting
> > tools
> >  working better with /* */ but there doesn't seem to be any strong
> > arguments
> >  for one over the other I can find
> > 
> >  Thanks
> >  Shivaram
> > 
> >  [1]
> > 
> > 
> >
> https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
> > 
> >  On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell  >
> >  wrote:
> > 
> >  > Personally I have no opinion, but agree it would be nice to
> > standardize.
> >  >
> >  > - Patrick
> >  >
> >  > On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen 
> > wrote:
> >  > > One thing Marcelo pointed out to me is that the // style does
> not
> >  > > interfere with commenting out blocks of code with /* */, which
> is
> > a
> >  > > small good thing. I am also accustomed to // style for
> multiline,
> > and
> >  > > reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */
> > style
> >  > > inline always looks a little funny to me.
> >  > >
> >  > > On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout <
> >  kayousterh...@gmail.com>
> >  > wrote:
> >  > >> Hi all,
> >  > >>
> >  > >> The Spark Style Guide
> >  > >> <
> >  >
> > https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
> >  >
> >  > >> says multi-line comments should formatted as:
> >  > >>
> >  > >> /*
> >  > >>  * This is a
> >  > >>  * very
> >  > >>  * long comment.
> >  > >>  */
> >  > >>
> >  > >> But in my experience, we almost always use "//" for multi-line
> >  comments:
> >  > >>
> >  > >> // This is a
> >  > >> // very
> >  > >> // long comment.
> >  > >>
> >  > >> Here are some examples:
> >  > >>
> >  > >>- Recent commit by Reynold, king of style:
> >  > >>
> >  >
> > 
> >
> https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f64

Re: multi-line comment style

2015-02-09 Thread Reynold Xin
Why don't we just pick // as the default (by encouraging it in the style
guide), since it is mostly used, and then do not disallow /* */? I don't
think it is that big of a deal to have slightly deviations here since it is
dead simple to understand what's going on.


On Mon, Feb 9, 2015 at 1:33 PM, Patrick Wendell  wrote:

> Clearly there isn't a strictly optimal commenting format (pro's and
> cons for both '//' and '/*'). My thought is for consistency we should
> just chose one and put in the style guide.
>
> On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng  wrote:
> > Btw, I think allowing `/* ... */` without the leading `*` in lines is
> > also useful. Check this line:
> >
> https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55
> ,
> > where we put the R commands that can reproduce the test result. It is
> > easier if we write in the following style:
> >
> > ~~~
> > /*
> >  Using the following R code to load the data and train the model using
> > glmnet package.
> >
> >  library("glmnet")
> >  data <- read.csv("path", header=FALSE, stringsAsFactors=FALSE)
> >  features <- as.matrix(data.frame(as.numeric(data$V2),
> as.numeric(data$V3)))
> >  label <- as.numeric(data$V1)
> >  weights <- coef(glmnet(features, label, family="gaussian", alpha = 0,
> > lambda = 0))
> >  */
> > ~~~
> >
> > So people can copy & paste the R commands directly.
> >
> > Xiangrui
> >
> > On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng  wrote:
> >> I like the `/* .. */` style more. Because it is easier for IDEs to
> >> recognize it as a block comment. If you press enter in the comment
> >> block with the `//` style, IDEs won't add `//` for you. -Xiangrui
> >>
> >> On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin 
> wrote:
> >>> We should update the style doc to reflect what we have in most places
> >>> (which I think is //).
> >>>
> >>>
> >>>
> >>> On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman <
> >>> shiva...@eecs.berkeley.edu> wrote:
> >>>
>  FWIW I like the multi-line // over /* */ from a purely style
> standpoint.
>  The Google Java style guide[1] has some comment about code formatting
> tools
>  working better with /* */ but there doesn't seem to be any strong
> arguments
>  for one over the other I can find
> 
>  Thanks
>  Shivaram
> 
>  [1]
> 
> 
> https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
> 
>  On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell 
>  wrote:
> 
>  > Personally I have no opinion, but agree it would be nice to
> standardize.
>  >
>  > - Patrick
>  >
>  > On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen 
> wrote:
>  > > One thing Marcelo pointed out to me is that the // style does not
>  > > interfere with commenting out blocks of code with /* */, which is
> a
>  > > small good thing. I am also accustomed to // style for multiline,
> and
>  > > reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */
> style
>  > > inline always looks a little funny to me.
>  > >
>  > > On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout <
>  kayousterh...@gmail.com>
>  > wrote:
>  > >> Hi all,
>  > >>
>  > >> The Spark Style Guide
>  > >> <
>  >
> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
>  >
>  > >> says multi-line comments should formatted as:
>  > >>
>  > >> /*
>  > >>  * This is a
>  > >>  * very
>  > >>  * long comment.
>  > >>  */
>  > >>
>  > >> But in my experience, we almost always use "//" for multi-line
>  comments:
>  > >>
>  > >> // This is a
>  > >> // very
>  > >> // long comment.
>  > >>
>  > >> Here are some examples:
>  > >>
>  > >>- Recent commit by Reynold, king of style:
>  > >>
>  >
> 
> https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
>  > >>- RDD.scala:
>  > >>
>  >
> 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
>  > >>- DAGScheduler.scala:
>  > >>
>  >
> 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
>  > >>
>  > >>
>  > >> Any objections to me updating the style guide to reflect this?
> As
>  with
>  > >> other style issues, I think consistency here is helpful (and
>  formatting
>  > >> multi-line comments as "//" does nicely visually distinguish code
>  > comments
>  > >> from doc comments).
>  > >>
>  > >> -Kay
>  > >
>  > >
> -
>  > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>  > > For additional commands, e-mail: dev-h...@spark.apache.org
>  > >
>  >
>  >
> 

Re: multi-line comment style

2015-02-09 Thread Patrick Wendell
Clearly there isn't a strictly optimal commenting format (pro's and
cons for both '//' and '/*'). My thought is for consistency we should
just chose one and put in the style guide.

On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng  wrote:
> Btw, I think allowing `/* ... */` without the leading `*` in lines is
> also useful. Check this line:
> https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55,
> where we put the R commands that can reproduce the test result. It is
> easier if we write in the following style:
>
> ~~~
> /*
>  Using the following R code to load the data and train the model using
> glmnet package.
>
>  library("glmnet")
>  data <- read.csv("path", header=FALSE, stringsAsFactors=FALSE)
>  features <- as.matrix(data.frame(as.numeric(data$V2), as.numeric(data$V3)))
>  label <- as.numeric(data$V1)
>  weights <- coef(glmnet(features, label, family="gaussian", alpha = 0,
> lambda = 0))
>  */
> ~~~
>
> So people can copy & paste the R commands directly.
>
> Xiangrui
>
> On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng  wrote:
>> I like the `/* .. */` style more. Because it is easier for IDEs to
>> recognize it as a block comment. If you press enter in the comment
>> block with the `//` style, IDEs won't add `//` for you. -Xiangrui
>>
>> On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin  wrote:
>>> We should update the style doc to reflect what we have in most places
>>> (which I think is //).
>>>
>>>
>>>
>>> On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
 FWIW I like the multi-line // over /* */ from a purely style standpoint.
 The Google Java style guide[1] has some comment about code formatting tools
 working better with /* */ but there doesn't seem to be any strong arguments
 for one over the other I can find

 Thanks
 Shivaram

 [1]

 https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style

 On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell 
 wrote:

 > Personally I have no opinion, but agree it would be nice to standardize.
 >
 > - Patrick
 >
 > On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen  wrote:
 > > One thing Marcelo pointed out to me is that the // style does not
 > > interfere with commenting out blocks of code with /* */, which is a
 > > small good thing. I am also accustomed to // style for multiline, and
 > > reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style
 > > inline always looks a little funny to me.
 > >
 > > On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout <
 kayousterh...@gmail.com>
 > wrote:
 > >> Hi all,
 > >>
 > >> The Spark Style Guide
 > >> <
 > https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
 >
 > >> says multi-line comments should formatted as:
 > >>
 > >> /*
 > >>  * This is a
 > >>  * very
 > >>  * long comment.
 > >>  */
 > >>
 > >> But in my experience, we almost always use "//" for multi-line
 comments:
 > >>
 > >> // This is a
 > >> // very
 > >> // long comment.
 > >>
 > >> Here are some examples:
 > >>
 > >>- Recent commit by Reynold, king of style:
 > >>
 >
 https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
 > >>- RDD.scala:
 > >>
 >
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
 > >>- DAGScheduler.scala:
 > >>
 >
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
 > >>
 > >>
 > >> Any objections to me updating the style guide to reflect this?  As
 with
 > >> other style issues, I think consistency here is helpful (and
 formatting
 > >> multi-line comments as "//" does nicely visually distinguish code
 > comments
 > >> from doc comments).
 > >>
 > >> -Kay
 > >
 > > -
 > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 > > For additional commands, e-mail: dev-h...@spark.apache.org
 > >
 >
 > -
 > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 > For additional commands, e-mail: dev-h...@spark.apache.org
 >
 >


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[ANNOUNCE] Apache Spark 1.2.1 Released

2015-02-09 Thread Patrick Wendell
Hi All,

I've just posted the 1.2.1 maintenance release of Apache Spark. We
recommend all 1.2.0 users upgrade to this release, as this release
includes stability fixes across all components of Spark.

- Download this release: http://spark.apache.org/downloads.html
- View the release notes:
http://spark.apache.org/releases/spark-release-1-2-1.html
- Full list of JIRA issues resolved in this release: http://s.apache.org/Mpn

Thanks to everyone who helped work on this release!

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



pyspark.daemon issues?

2015-02-09 Thread mkhaitman
I've noticed a couple oddities with the pyspark.daemons which are causing us
a bit of memory problems within some of our heavy spark jobs, especially
when they run at the same time...

It seems that there is typically a 1-to-1 ratio of pyspark.daemons to cores
per executor during aggregations. By default the spark.python.worker.memory
is left at the default of 512MB, after which, the remainder of the
aggregations are supposed to spill to disk. 

However:
  *1)* I'm not entirely sure what cases would result in random
numbers of pyspark daemons which do not respect the python worker memory
limit. I've seen some go up to as far as 2GB each (well over the 512MB
limit) which is when we run into some crazy memory problems for jobs making
use of many cores on each executor. To be clear here, they ARE spilling to
disk as well, but also blowing past the memory limits at the same time
somehow. 

  *2)* Another scenario specifically relates to when we want to join
RDDs, where for example, say there are 4 cores per executor, and therefore 4
pyspark daemons during most aggregations. It seems that if a Join occurs, it
will spawn up 4 additional pyspark daemons as opposed to simply re-using the
ones that were already present during the aggregation stage that occurred
before it. This, combined with the case where the python worker memory limit
is not strictly respected, can pose problems for using way more memory per
node. 

The fact that the python worker memory appears to use memory *outside* of
the executor memory is what poses the biggest challenge for preventing
memory depletion on a node. Is there something obvious, or some environment
variable I may have missed that could potentially help with one/both of the
above memory concerns? Alternatively, any suggestions would be greatly
appreciated! :)

Thanks,
Mark.






--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/pyspark-daemon-issues-tp10533.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: multi-line comment style

2015-02-09 Thread Xiangrui Meng
Btw, I think allowing `/* ... */` without the leading `*` in lines is
also useful. Check this line:
https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55,
where we put the R commands that can reproduce the test result. It is
easier if we write in the following style:

~~~
/*
 Using the following R code to load the data and train the model using
glmnet package.

 library("glmnet")
 data <- read.csv("path", header=FALSE, stringsAsFactors=FALSE)
 features <- as.matrix(data.frame(as.numeric(data$V2), as.numeric(data$V3)))
 label <- as.numeric(data$V1)
 weights <- coef(glmnet(features, label, family="gaussian", alpha = 0,
lambda = 0))
 */
~~~

So people can copy & paste the R commands directly.

Xiangrui

On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng  wrote:
> I like the `/* .. */` style more. Because it is easier for IDEs to
> recognize it as a block comment. If you press enter in the comment
> block with the `//` style, IDEs won't add `//` for you. -Xiangrui
>
> On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin  wrote:
>> We should update the style doc to reflect what we have in most places
>> (which I think is //).
>>
>>
>>
>> On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>>> FWIW I like the multi-line // over /* */ from a purely style standpoint.
>>> The Google Java style guide[1] has some comment about code formatting tools
>>> working better with /* */ but there doesn't seem to be any strong arguments
>>> for one over the other I can find
>>>
>>> Thanks
>>> Shivaram
>>>
>>> [1]
>>>
>>> https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
>>>
>>> On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell 
>>> wrote:
>>>
>>> > Personally I have no opinion, but agree it would be nice to standardize.
>>> >
>>> > - Patrick
>>> >
>>> > On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen  wrote:
>>> > > One thing Marcelo pointed out to me is that the // style does not
>>> > > interfere with commenting out blocks of code with /* */, which is a
>>> > > small good thing. I am also accustomed to // style for multiline, and
>>> > > reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style
>>> > > inline always looks a little funny to me.
>>> > >
>>> > > On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout <
>>> kayousterh...@gmail.com>
>>> > wrote:
>>> > >> Hi all,
>>> > >>
>>> > >> The Spark Style Guide
>>> > >> <
>>> > https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
>>> >
>>> > >> says multi-line comments should formatted as:
>>> > >>
>>> > >> /*
>>> > >>  * This is a
>>> > >>  * very
>>> > >>  * long comment.
>>> > >>  */
>>> > >>
>>> > >> But in my experience, we almost always use "//" for multi-line
>>> comments:
>>> > >>
>>> > >> // This is a
>>> > >> // very
>>> > >> // long comment.
>>> > >>
>>> > >> Here are some examples:
>>> > >>
>>> > >>- Recent commit by Reynold, king of style:
>>> > >>
>>> >
>>> https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
>>> > >>- RDD.scala:
>>> > >>
>>> >
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
>>> > >>- DAGScheduler.scala:
>>> > >>
>>> >
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
>>> > >>
>>> > >>
>>> > >> Any objections to me updating the style guide to reflect this?  As
>>> with
>>> > >> other style issues, I think consistency here is helpful (and
>>> formatting
>>> > >> multi-line comments as "//" does nicely visually distinguish code
>>> > comments
>>> > >> from doc comments).
>>> > >>
>>> > >> -Kay
>>> > >
>>> > > -
>>> > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> > > For additional commands, e-mail: dev-h...@spark.apache.org
>>> > >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> > For additional commands, e-mail: dev-h...@spark.apache.org
>>> >
>>> >
>>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: multi-line comment style

2015-02-09 Thread Xiangrui Meng
I like the `/* .. */` style more. Because it is easier for IDEs to
recognize it as a block comment. If you press enter in the comment
block with the `//` style, IDEs won't add `//` for you. -Xiangrui

On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin  wrote:
> We should update the style doc to reflect what we have in most places
> (which I think is //).
>
>
>
> On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> FWIW I like the multi-line // over /* */ from a purely style standpoint.
>> The Google Java style guide[1] has some comment about code formatting tools
>> working better with /* */ but there doesn't seem to be any strong arguments
>> for one over the other I can find
>>
>> Thanks
>> Shivaram
>>
>> [1]
>>
>> https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style
>>
>> On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell 
>> wrote:
>>
>> > Personally I have no opinion, but agree it would be nice to standardize.
>> >
>> > - Patrick
>> >
>> > On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen  wrote:
>> > > One thing Marcelo pointed out to me is that the // style does not
>> > > interfere with commenting out blocks of code with /* */, which is a
>> > > small good thing. I am also accustomed to // style for multiline, and
>> > > reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style
>> > > inline always looks a little funny to me.
>> > >
>> > > On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout <
>> kayousterh...@gmail.com>
>> > wrote:
>> > >> Hi all,
>> > >>
>> > >> The Spark Style Guide
>> > >> <
>> > https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
>> >
>> > >> says multi-line comments should formatted as:
>> > >>
>> > >> /*
>> > >>  * This is a
>> > >>  * very
>> > >>  * long comment.
>> > >>  */
>> > >>
>> > >> But in my experience, we almost always use "//" for multi-line
>> comments:
>> > >>
>> > >> // This is a
>> > >> // very
>> > >> // long comment.
>> > >>
>> > >> Here are some examples:
>> > >>
>> > >>- Recent commit by Reynold, king of style:
>> > >>
>> >
>> https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58
>> > >>- RDD.scala:
>> > >>
>> >
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361
>> > >>- DAGScheduler.scala:
>> > >>
>> >
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281
>> > >>
>> > >>
>> > >> Any objections to me updating the style guide to reflect this?  As
>> with
>> > >> other style issues, I think consistency here is helpful (and
>> formatting
>> > >> multi-line comments as "//" does nicely visually distinguish code
>> > comments
>> > >> from doc comments).
>> > >>
>> > >> -Kay
>> > >
>> > > -
>> > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > > For additional commands, e-mail: dev-h...@spark.apache.org
>> > >
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>> >
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Nicholas Chammas
+1 to an "official" deprecation + redirecting users to some other project
that will or already is taking this on.

Nate?


On Mon Feb 09 2015 at 10:08:27 AM Patrick Wendell 
wrote:

> I have wondered whether we should sort of deprecated it more
> officially, since otherwise I think people have the reasonable
> expectation based on the current code that Spark intends to support
> "complete" Debian packaging as part of the upstream build. Having
> something that's sort-of maintained but no one is helping review and
> merge patches on it or make it fully functional, IMO that doesn't
> benefit us or our users. There are a bunch of other projects that are
> specifically devoted to packaging, so it seems like there is a clear
> separation of concerns here.
>
> On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra 
> wrote:
> >>
> >> it sounds like nobody intends these to be used to actually deploy Spark
> >
> >
> > I wouldn't go quite that far.  What we have now can serve as useful input
> > to a deployment tool like Chef, but the user is then going to need to add
> > some customization or configuration within the context of that tooling to
> > get Spark installed just the way they want.  So it is not so much that
> the
> > current Debian packaging can't be used as that it has never really been
> > intended to be a completely finished product that a newcomer could, for
> > example, use to install Spark completely and quickly to Ubuntu and have a
> > fully-functional environment in which they could then run all of the
> > examples, tutorials, etc.
> >
> > Getting to that level of packaging (and maintenance) is something that
> I'm
> > not sure we want to do since that is a better fit with Bigtop and the
> > efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.
> >
> > On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen  wrote:
> >
> >> This is a straw poll to assess whether there is support to keep and
> >> fix, or remove, the Debian packaging-related config in Spark.
> >>
> >> I see several oldish outstanding JIRAs relating to problems in the
> >> packaging:
> >>
> >> https://issues.apache.org/jira/browse/SPARK-1799
> >> https://issues.apache.org/jira/browse/SPARK-2614
> >> https://issues.apache.org/jira/browse/SPARK-3624
> >> https://issues.apache.org/jira/browse/SPARK-4436
> >> (and a similar idea about making RPMs)
> >> https://issues.apache.org/jira/browse/SPARK-665
> >>
> >> The original motivation seems related to Chef:
> >>
> >>
> >> https://issues.apache.org/jira/browse/SPARK-2614?focusedComm
> entId=14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-14070908
> >>
> >> Mark's recent comments cast some doubt on whether it is essential:
> >>
> >> https://github.com/apache/spark/pull/4277#issuecomment-72114226
> >>
> >> and in recent conversations I didn't hear dissent to the idea of
> removing
> >> this.
> >>
> >> Is this still useful enough to fix up? All else equal I'd like to
> >> start to walk back some of the complexity of the build, but I don't
> >> know how all-else-equal it is. Certainly, it sounds like nobody
> >> intends these to be used to actually deploy Spark.
> >>
> >> I don't doubt it's useful to someone, but can they maintain the
> >> packaging logic elsewhere?
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Unit tests

2015-02-09 Thread Josh Rosen
Hi Iulian,

I think the AkakUtilsSuite failure that you observed has been fixed in 
https://issues.apache.org/jira/browse/SPARK-5548 / 
https://github.com/apache/spark/pull/4343
On February 9, 2015 at 5:47:59 AM, Iulian Dragoș (iulian.dra...@typesafe.com) 
wrote:

Hi Patrick,  

Thanks for the heads up. I was trying to set up our own infrastructure for  
testing Spark (essentially, running `run-tests` every night) on EC2. I  
stumbled upon a number of flaky tests, but none of them look similar to  
anything in Jira with the flaky-test tag. I wonder if there's something  
wrong with our infrastructure, or I should simply open Jira tickets with  
the failures I find. For example, one that appears fairly often on our  
setup is in AkkaUtilsSuite "remote fetch ssl on - untrusted server"  
(exception `ActorNotFound`, instead of `TimeoutException`).  

thanks,  
iulian  


On Fri, Feb 6, 2015 at 9:55 PM, Patrick Wendell  wrote:  

> Hey All,  
>  
> The tests are in a not-amazing state right now due to a few compounding  
> factors:  
>  
> 1. We've merged a large volume of patches recently.  
> 2. The load on jenkins has been relatively high, exposing races and  
> other behavior not seen at lower load.  
>  
> For those not familiar, the main issue is flaky (non deterministic)  
> test failures. Right now I'm trying to prioritize keeping the  
> PullReqeustBuilder in good shape since it will block development if it  
> is down.  
>  
> For other tests, let's try to keep filing JIRA's when we see issues  
> and use the flaky-test label (see http://bit.ly/1yRif9S):  
>  
> I may contact people regarding specific tests. This is a very high  
> priority to get in good shape. This kind of thing is no one's "fault"  
> but just the result of a lot of concurrent development, and everyone  
> needs to pitch in to get back in a good place.  
>  
> - Patrick  
>  
> -  
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org  
> For additional commands, e-mail: dev-h...@spark.apache.org  
>  
>  


--  

--  
Iulian Dragos  

--  
Reactive Apps on the JVM  
www.typesafe.com  


Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Patrick Wendell
I have wondered whether we should sort of deprecated it more
officially, since otherwise I think people have the reasonable
expectation based on the current code that Spark intends to support
"complete" Debian packaging as part of the upstream build. Having
something that's sort-of maintained but no one is helping review and
merge patches on it or make it fully functional, IMO that doesn't
benefit us or our users. There are a bunch of other projects that are
specifically devoted to packaging, so it seems like there is a clear
separation of concerns here.

On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra  wrote:
>>
>> it sounds like nobody intends these to be used to actually deploy Spark
>
>
> I wouldn't go quite that far.  What we have now can serve as useful input
> to a deployment tool like Chef, but the user is then going to need to add
> some customization or configuration within the context of that tooling to
> get Spark installed just the way they want.  So it is not so much that the
> current Debian packaging can't be used as that it has never really been
> intended to be a completely finished product that a newcomer could, for
> example, use to install Spark completely and quickly to Ubuntu and have a
> fully-functional environment in which they could then run all of the
> examples, tutorials, etc.
>
> Getting to that level of packaging (and maintenance) is something that I'm
> not sure we want to do since that is a better fit with Bigtop and the
> efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.
>
> On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen  wrote:
>
>> This is a straw poll to assess whether there is support to keep and
>> fix, or remove, the Debian packaging-related config in Spark.
>>
>> I see several oldish outstanding JIRAs relating to problems in the
>> packaging:
>>
>> https://issues.apache.org/jira/browse/SPARK-1799
>> https://issues.apache.org/jira/browse/SPARK-2614
>> https://issues.apache.org/jira/browse/SPARK-3624
>> https://issues.apache.org/jira/browse/SPARK-4436
>> (and a similar idea about making RPMs)
>> https://issues.apache.org/jira/browse/SPARK-665
>>
>> The original motivation seems related to Chef:
>>
>>
>> https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908
>>
>> Mark's recent comments cast some doubt on whether it is essential:
>>
>> https://github.com/apache/spark/pull/4277#issuecomment-72114226
>>
>> and in recent conversations I didn't hear dissent to the idea of removing
>> this.
>>
>> Is this still useful enough to fix up? All else equal I'd like to
>> start to walk back some of the complexity of the build, but I don't
>> know how all-else-equal it is. Certainly, it sounds like nobody
>> intends these to be used to actually deploy Spark.
>>
>> I don't doubt it's useful to someone, but can they maintain the
>> packaging logic elsewhere?
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Mark Hamstra
>
> it sounds like nobody intends these to be used to actually deploy Spark


I wouldn't go quite that far.  What we have now can serve as useful input
to a deployment tool like Chef, but the user is then going to need to add
some customization or configuration within the context of that tooling to
get Spark installed just the way they want.  So it is not so much that the
current Debian packaging can't be used as that it has never really been
intended to be a completely finished product that a newcomer could, for
example, use to install Spark completely and quickly to Ubuntu and have a
fully-functional environment in which they could then run all of the
examples, tutorials, etc.

Getting to that level of packaging (and maintenance) is something that I'm
not sure we want to do since that is a better fit with Bigtop and the
efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark.

On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen  wrote:

> This is a straw poll to assess whether there is support to keep and
> fix, or remove, the Debian packaging-related config in Spark.
>
> I see several oldish outstanding JIRAs relating to problems in the
> packaging:
>
> https://issues.apache.org/jira/browse/SPARK-1799
> https://issues.apache.org/jira/browse/SPARK-2614
> https://issues.apache.org/jira/browse/SPARK-3624
> https://issues.apache.org/jira/browse/SPARK-4436
> (and a similar idea about making RPMs)
> https://issues.apache.org/jira/browse/SPARK-665
>
> The original motivation seems related to Chef:
>
>
> https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908
>
> Mark's recent comments cast some doubt on whether it is essential:
>
> https://github.com/apache/spark/pull/4277#issuecomment-72114226
>
> and in recent conversations I didn't hear dissent to the idea of removing
> this.
>
> Is this still useful enough to fix up? All else equal I'd like to
> start to walk back some of the complexity of the build, but I don't
> know how all-else-equal it is. Certainly, it sounds like nobody
> intends these to be used to actually deploy Spark.
>
> I don't doubt it's useful to someone, but can they maintain the
> packaging logic elsewhere?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Unit tests

2015-02-09 Thread Iulian Dragoș
Hi Patrick,

Thanks for the heads up. I was trying to set up our own infrastructure for
testing Spark (essentially, running `run-tests` every night) on EC2. I
stumbled upon a number of flaky tests, but none of them look similar to
anything in Jira with the flaky-test tag. I wonder if there's something
wrong with our infrastructure, or I should simply open Jira tickets with
the failures I find. For example, one that appears fairly often on our
setup is in AkkaUtilsSuite "remote fetch ssl on - untrusted server"
(exception `ActorNotFound`, instead of `TimeoutException`).

thanks,
iulian


On Fri, Feb 6, 2015 at 9:55 PM, Patrick Wendell  wrote:

> Hey All,
>
> The tests are in a not-amazing state right now due to a few compounding
> factors:
>
> 1. We've merged a large volume of patches recently.
> 2. The load on jenkins has been relatively high, exposing races and
> other behavior not seen at lower load.
>
> For those not familiar, the main issue is flaky (non deterministic)
> test failures. Right now I'm trying to prioritize keeping the
> PullReqeustBuilder in good shape since it will block development if it
> is down.
>
> For other tests, let's try to keep filing JIRA's when we see issues
> and use the flaky-test label (see http://bit.ly/1yRif9S):
>
> I may contact people regarding specific tests. This is a very high
> priority to get in good shape. This kind of thing is no one's "fault"
> but just the result of a lot of concurrent development, and everyone
> needs to pitch in to get back in a good place.
>
> - Patrick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-02-09 Thread Sean Owen
Old releases can't be changed, but new ones can. This was merged into
the 1.3 branch for the upcoming 1.3.0 release.

If you really had to, you could do some surgery on existing
distributions to swap in/out Jackson.

On Mon, Feb 9, 2015 at 11:22 AM, Gil Vernik  wrote:
> Hi All,
>
> I understand that https://github.com/apache/spark/pull/3938 was closed and
> merged into Spark? And this suppose to fix this Jackson issue.
> If so, is there any way to update binary distributions of Spark so that it
> will contain this fix? Current binary versions of Spark available for
> download were built with jackson 1.8.8 which makes them  impossible to use
> with Hadoop 2.6.0 jars
>
> Thanks
> Gil Vernik.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-02-09 Thread Gil Vernik
Hi All,

I understand that https://github.com/apache/spark/pull/3938 was closed and 
merged into Spark? And this suppose to fix this Jackson issue.
If so, is there any way to update binary distributions of Spark so that it 
will contain this fix? Current binary versions of Spark available for 
download were built with jackson 1.8.8 which makes them  impossible to use 
with Hadoop 2.6.0 jars

Thanks
Gil Vernik.






From:   Sean Owen 
To: Ted Yu 
Cc: Gil Vernik/Haifa/IBM@IBMIL, dev 
Date:   18/01/2015 08:23 PM
Subject:Re: run time exceptions in Spark 1.2.0 manual build 
together with OpenStack hadoop driver



Agree, I think this can / should be fixed with a slightly more
conservative version of https://github.com/apache/spark/pull/3938
related to SPARK-5108.

On Sun, Jan 18, 2015 at 3:41 PM, Ted Yu  wrote:
> Please tale a look at SPARK-4048 and SPARK-5108
>
> Cheers
>
> On Sat, Jan 17, 2015 at 10:26 PM, Gil Vernik  wrote:
>
>> Hi,
>>
>> I took a source code of Spark 1.2.0 and tried to build it together with
>> hadoop-openstack.jar ( To allow Spark an access to OpenStack Swift )
>> I used Hadoop 2.6.0.
>>
>> The build was fine without problems, however in run time, while trying 
to
>> access "swift://" name space i got an exception:
>> java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass
>>  at
>>
>> 
org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findDeserializationType(JacksonAnnotationIntrospector.java:524)
>>  at
>>
>> 
org.codehaus.jackson.map.deser.BasicDeserializerFactory.modifyTypeByAnnotation(BasicDeserializerFactory.java:732)
>> ...and the long stack trace goes here
>>
>> Digging into the problem i saw the following:
>> Jackson versions 1.9.X are not backward compatible, in particular they
>> removed JsonClass annotation.
>> Hadoop 2.6.0 uses jackson-asl version 1.9.13, while Spark has reference 
to
>> older version of jackson.
>>
>> This is the main  pom.xml of Spark 1.2.0 :
>>
>>   
>> 
>> org.codehaus.jackson
>> jackson-mapper-asl
>> 1.8.8
>>   
>>
>> Referencing 1.8.8 version, which is not compatible with Hadoop 2.6.0 .
>> If we change version to 1.9.13, than all will work fine and there will 
be
>> no run time exceptions while accessing Swift. The following change will
>> solve the problem:
>>
>>   
>> 
>> org.codehaus.jackson
>> jackson-mapper-asl
>> 1.9.13
>>   
>>
>> I am trying to resolve this somehow so people will not get into this
>> issue.
>> Is there any particular need in Spark for jackson 1.8.8 and not 1.9.13?
>> Can we remove 1.8.8 and put 1.9.13 for Avro?
>> It looks to me that all works fine when Spark build with jackson 
1.9.13,
>> but i am not an expert and not sure what should be tested.
>>
>> Thanks,
>> Gil Vernik.
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




Keep or remove Debian packaging in Spark?

2015-02-09 Thread Sean Owen
This is a straw poll to assess whether there is support to keep and
fix, or remove, the Debian packaging-related config in Spark.

I see several oldish outstanding JIRAs relating to problems in the packaging:

https://issues.apache.org/jira/browse/SPARK-1799
https://issues.apache.org/jira/browse/SPARK-2614
https://issues.apache.org/jira/browse/SPARK-3624
https://issues.apache.org/jira/browse/SPARK-4436
(and a similar idea about making RPMs)
https://issues.apache.org/jira/browse/SPARK-665

The original motivation seems related to Chef:

https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908

Mark's recent comments cast some doubt on whether it is essential:

https://github.com/apache/spark/pull/4277#issuecomment-72114226

and in recent conversations I didn't hear dissent to the idea of removing this.

Is this still useful enough to fix up? All else equal I'd like to
start to walk back some of the complexity of the build, but I don't
know how all-else-equal it is. Certainly, it sounds like nobody
intends these to be used to actually deploy Spark.

I don't doubt it's useful to someone, but can they maintain the
packaging logic elsewhere?

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org