Re: Speeding up Spark build during development

2015-05-01 Thread York, Brennon
Following what Ted said, if you leverage the `mvn` from within the
`build/` directory of Spark you¹ll get zinc for free which should help
speed up build times.

On 5/1/15, 9:45 AM, "Ted Yu"  wrote:

>Pramod:
>Please remember to run Zinc so that the build is faster.
>
>Cheers
>
>On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander
>
>wrote:
>
>> Hi Pramod,
>>
>> For cluster-like tests you might want to use the same code as in mllib's
>> LocalClusterSparkContext. You can rebuild only the package that you
>>change
>> and then run this main class.
>>
>> Best regards, Alexander
>>
>> -Original Message-
>> From: Pramod Biligiri [mailto:pramodbilig...@gmail.com]
>> Sent: Friday, May 01, 2015 1:46 AM
>> To: dev@spark.apache.org
>> Subject: Speeding up Spark build during development
>>
>> Hi,
>> I'm making some small changes to the Spark codebase and trying it out
>>on a
>> cluster. I was wondering if there's a faster way to build than running
>>the
>> package target each time.
>> Currently I'm using: mvn -DskipTests  package
>>
>> All the nodes have the same filesystem mounted at the same mount point.
>>
>> Pramod
>>



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Speeding up Spark build during development

2015-05-01 Thread Ted Yu
Pramod:
Please remember to run Zinc so that the build is faster.

Cheers

On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander 
wrote:

> Hi Pramod,
>
> For cluster-like tests you might want to use the same code as in mllib's
> LocalClusterSparkContext. You can rebuild only the package that you change
> and then run this main class.
>
> Best regards, Alexander
>
> -Original Message-
> From: Pramod Biligiri [mailto:pramodbilig...@gmail.com]
> Sent: Friday, May 01, 2015 1:46 AM
> To: dev@spark.apache.org
> Subject: Speeding up Spark build during development
>
> Hi,
> I'm making some small changes to the Spark codebase and trying it out on a
> cluster. I was wondering if there's a faster way to build than running the
> package target each time.
> Currently I'm using: mvn -DskipTests  package
>
> All the nodes have the same filesystem mounted at the same mount point.
>
> Pramod
>


RE: Speeding up Spark build during development

2015-05-01 Thread Ulanov, Alexander
Hi Pramod,

For cluster-like tests you might want to use the same code as in mllib's 
LocalClusterSparkContext. You can rebuild only the package that you change and 
then run this main class.

Best regards, Alexander

-Original Message-
From: Pramod Biligiri [mailto:pramodbilig...@gmail.com] 
Sent: Friday, May 01, 2015 1:46 AM
To: dev@spark.apache.org
Subject: Speeding up Spark build during development

Hi,
I'm making some small changes to the Spark codebase and trying it out on a 
cluster. I was wondering if there's a faster way to build than running the 
package target each time.
Currently I'm using: mvn -DskipTests  package

All the nodes have the same filesystem mounted at the same mount point.

Pramod


Re: Tungsten + Flink

2015-05-01 Thread Ewan Higgs
I don't think it's useful to combine them since they are different 
projects. But I do think that a lot of work went into Flink's paged 
memory system built on byte buffers and if collaboration can take place 
to pop that out into like a memory subsystem library that both Spark and 
Flink can use then it should raise both ships. If the usage patterns are 
too different then sure, don't use their work. But it looks pretty generic:


https://github.com/apache/flink/tree/master/flink-core/src/main/java/org/apache/flink/core/memory

To bring this back into other threads: Flink's memory system uses 
java.nio - so it requires Java 1.7 afaik. :)


-Ewan

On 05/01/2015 03:54 PM, Stephen Carman wrote:

I think as long as the two frameworks follow the same paradigm for how their 
interfaces work it’s fine to have 2 competing frameworks. This way the 
frameworks have some motivation
to be the best at what they do rather than being the only choice whether you 
like it or not. They also seem to have some differing opinions about how to do 
certain things leaving me to believe
that the 2 projects exist mostly because of disagreements on fundamentals about 
how a system such as this should be built and scaled out.

I think spark should definitely take what it can from these projects, but 
otherwise they should remain separate projects going their own way.

Steve



On Apr 29, 2015, at 8:01 PM, Sree V  wrote:

I agree, Ewan.
We should also look into combining both Flink and Spark into one.This eases the 
industry adaptation instead.

Thanking you.

With Regards
Sree


 On Wednesday, April 29, 2015 3:21 AM, Ewan Higgs  
wrote:


Hi all,
A quick question about Tungsten. The announcement of the Tungsten
project is on the back of Hadoop Summit in Brussels where some of the
Flink devs were giving talks [1] on how Flink manages memory using byte
arrays and the like to avoid the overhead of all the Java types[2]. Is
there an opportunity for code reuse here? Spark and Flink may have
different needs in some respects, but they work fundamentally towards
the same goal so I imagine there could be come worthwhile collaboration.

-Ewan

[1] http://2015.hadoopsummit.org/brussels/speaker/?speaker=MrtonBalassi
http://2015.hadoopsummit.org/brussels/speaker/?speaker=AljoschaKrettek

[2]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




This e-mail is intended solely for the above-mentioned recipient and it may 
contain confidential or privileged information. If you have received it in 
error, please notify us immediately and delete the e-mail. You must not copy, 
distribute, disclose or take any action in reliance on it. In addition, the 
contents of an attachment to this e-mail may contain software viruses which 
could damage your own computer system. While ColdLight Solutions, LLC has taken 
every reasonable precaution to minimize this risk, we cannot accept liability 
for any damage which you sustain as a result of software viruses. You should 
perform your own virus checks before opening the attachment.



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Tungsten + Flink

2015-05-01 Thread Stephen Carman
I think as long as the two frameworks follow the same paradigm for how their 
interfaces work it’s fine to have 2 competing frameworks. This way the 
frameworks have some motivation
to be the best at what they do rather than being the only choice whether you 
like it or not. They also seem to have some differing opinions about how to do 
certain things leaving me to believe
that the 2 projects exist mostly because of disagreements on fundamentals about 
how a system such as this should be built and scaled out.

I think spark should definitely take what it can from these projects, but 
otherwise they should remain separate projects going their own way.

Steve


> On Apr 29, 2015, at 8:01 PM, Sree V  wrote:
>
> I agree, Ewan.
> We should also look into combining both Flink and Spark into one.This eases 
> the industry adaptation instead.
>
> Thanking you.
>
> With Regards
> Sree
>
>
> On Wednesday, April 29, 2015 3:21 AM, Ewan Higgs  
> wrote:
>
>
> Hi all,
> A quick question about Tungsten. The announcement of the Tungsten
> project is on the back of Hadoop Summit in Brussels where some of the
> Flink devs were giving talks [1] on how Flink manages memory using byte
> arrays and the like to avoid the overhead of all the Java types[2]. Is
> there an opportunity for code reuse here? Spark and Flink may have
> different needs in some respects, but they work fundamentally towards
> the same goal so I imagine there could be come worthwhile collaboration.
>
> -Ewan
>
> [1] http://2015.hadoopsummit.org/brussels/speaker/?speaker=MrtonBalassi
> http://2015.hadoopsummit.org/brussels/speaker/?speaker=AljoschaKrettek
>
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
> https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>
>

This e-mail is intended solely for the above-mentioned recipient and it may 
contain confidential or privileged information. If you have received it in 
error, please notify us immediately and delete the e-mail. You must not copy, 
distribute, disclose or take any action in reliance on it. In addition, the 
contents of an attachment to this e-mail may contain software viruses which 
could damage your own computer system. While ColdLight Solutions, LLC has taken 
every reasonable precaution to minimize this risk, we cannot accept liability 
for any damage which you sustain as a result of software viruses. You should 
perform your own virus checks before opening the attachment.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] ending support for Java 6?

2015-05-01 Thread Koert Kuipers
it seems spark is happy to upgrade scala, drop older java versions, upgrade
incompatible library versions (akka), and all of this within spark 1.x
does the 1.x mean anything in terms of compatibility of dependencies? or is
that limited to its own api? what are the rules?
 On May 1, 2015 9:04 AM, "Steven Shaw"  wrote:

> On 1 May 2015 at 21:26, Dean Wampler  wrote:
>
>> FWIW, another reason to start planning for deprecation of Java 7, too, is
>> that Scala 2.12 will require Java 8. Scala 2.12 will be released early
>> next
>> year.
>>
>
> ​Will 2.12 be the release that based on dotty
> ?
>
> Cheers,
> Steve.
>


Re: [discuss] ending support for Java 6?

2015-05-01 Thread DW @ Gmail
No. That will be "3.0" some day

Sent from my rotary phone. 


> On May 1, 2015, at 9:04 AM, Steven Shaw  wrote:
> 
>> On 1 May 2015 at 21:26, Dean Wampler  wrote:
> 
>> FWIW, another reason to start planning for deprecation of Java 7, too, is
>> that Scala 2.12 will require Java 8. Scala 2.12 will be released early next
>> year.
> 
> ​Will 2.12 be the release that based on dotty?
> 
> Cheers,
> Steve.


Re: [discuss] ending support for Java 6?

2015-05-01 Thread Steven Shaw
On 1 May 2015 at 21:26, Dean Wampler  wrote:

> FWIW, another reason to start planning for deprecation of Java 7, too, is
> that Scala 2.12 will require Java 8. Scala 2.12 will be released early next
> year.
>

​Will 2.12 be the release that based on dotty
?

Cheers,
Steve.


Re: [discuss] ending support for Java 6?

2015-05-01 Thread Dean Wampler
FWIW, another reason to start planning for deprecation of Java 7, too, is
that Scala 2.12 will require Java 8. Scala 2.12 will be released early next
year.


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
 (O'Reilly)
Typesafe 
@deanwampler 
http://polyglotprogramming.com

On Thu, Apr 30, 2015 at 3:37 PM, Ted Yu  wrote:

> +1 on ending support for Java 6.
>
> BTW from https://www.java.com/en/download/faq/java_7.xml :
> After April 2015, Oracle will no longer post updates of Java SE 7 to its
> public download sites.
>
> On Thu, Apr 30, 2015 at 1:34 PM, Punyashloka Biswal <
> punya.bis...@gmail.com>
> wrote:
>
> > I'm in favor of ending support for Java 6. We should also articulate a
> > policy on how long we want to support current and future versions of Java
> > after Oracle declares them EOL (Java 7 will be in that bucket in a matter
> > of days).
> >
> > Punya
> > On Thu, Apr 30, 2015 at 1:18 PM shane knapp  wrote:
> >
> > > something to keep in mind:  we can easily support java 6 for the build
> > > environment, particularly if there's a definite EOL.
> > >
> > > i'd like to fix our java versioning 'problem', and this could be a big
> > > instigator...  right now we're hackily setting java_home in test
> > invocation
> > > on jenkins, which really isn't the best.  if i decide, within jenkins,
> to
> > > reconfigure every build to 'do the right thing' WRT java version, then
> i
> > > will clean up the old mess and pay down on some technical debt.
> > >
> > > or i can just install java 6 and we use that as JAVA_HOME on a
> > > build-by-build basis.
> > >
> > > this will be a few days of prep and another morning-long downtime if i
> do
> > > the right thing (within jenkins), and only a couple of hours the hacky
> > way
> > > (system level).
> > >
> > > either way, we can test on java 6.  :)
> > >
> > > On Thu, Apr 30, 2015 at 1:00 PM, Koert Kuipers 
> > wrote:
> > >
> > > > nicholas started it! :)
> > > >
> > > > for java 6 i would have said the same thing about 1 year ago: it is
> > > foolish
> > > > to drop it. but i think the time is right about now.
> > > > about half our clients are on java 7 and the other half have active
> > plans
> > > > to migrate to it within 6 months.
> > > >
> > > > On Thu, Apr 30, 2015 at 3:57 PM, Reynold Xin 
> > > wrote:
> > > >
> > > > > Guys thanks for chiming in, but please focus on Java here. Python
> is
> > an
> > > > > entirely separate issue.
> > > > >
> > > > >
> > > > > On Thu, Apr 30, 2015 at 12:53 PM, Koert Kuipers  >
> > > > wrote:
> > > > >
> > > > >> i am not sure eol means much if it is still actively used. we
> have a
> > > lot
> > > > >> of clients with centos 5 (for which we still support python 2.4 in
> > > some
> > > > >> form or another, fun!). most of them are on centos 6, which means
> > > python
> > > > >> 2.6. by cutting out python 2.6 you would cut out the majority of
> the
> > > > actual
> > > > >> clusters i am aware of. unless you intention is to truly make
> > > something
> > > > >> academic i dont think that is wise.
> > > > >>
> > > > >> On Thu, Apr 30, 2015 at 3:48 PM, Nicholas Chammas <
> > > > >> nicholas.cham...@gmail.com> wrote:
> > > > >>
> > > > >>> (On that note, I think Python 2.6 should be next on the chopping
> > > block
> > > > >>> sometime later this year, but that’s for another thread.)
> > > > >>>
> > > > >>> (To continue the parenthetical, Python 2.6 was in fact EOL-ed in
> > > > October
> > > > >>> of
> > > > >>> 2013. )
> > > > >>> ​
> > > > >>>
> > > > >>> On Thu, Apr 30, 2015 at 3:18 PM Nicholas Chammas <
> > > > >>> nicholas.cham...@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > I understand the concern about cutting out users who still use
> > Java
> > > > 6,
> > > > >>> and
> > > > >>> > I don't have numbers about how many people are still using Java
> > 6.
> > > > >>> >
> > > > >>> > But I want to say at a high level that I support deprecating
> > older
> > > > >>> > versions of stuff to reduce our maintenance burden and let us
> use
> > > > more
> > > > >>> > modern patterns in our code.
> > > > >>> >
> > > > >>> > Maintenance always costs way more than initial development over
> > the
> > > > >>> > lifetime of a project, and for that reason "anti-support" is
> just
> > > as
> > > > >>> > important as support.
> > > > >>> >
> > > > >>> > (On that note, I think Python 2.6 should be next on the
> chopping
> > > > block
> > > > >>> > sometime later this year, but that's for another thread.)
> > > > >>> >
> > > > >>> > Nick
> > > > >>> >
> > > > >>> >
> > > > >>> > On Thu, Apr 30, 2015 at 3:03 PM Reynold Xin <
> r...@databricks.com
> > >
> > > > >>> wrote:
> > > > >>> >
> > > > >>> >> This has been discussed a few times in the past, but now
> Oracle
> > > has
> > > > >>> ended
> > > > >>> >> support for Java 6 for over a year, I wonder if we should just

Re: Speeding up Spark build during development

2015-05-01 Thread Prashant Sharma
Hi Pramod,

If you are using sbt as your build, then you need to do sbt assembly once
and use sbt ~compile. Also export SPARK_PREPEND_CLASSES=1 this in your
shell and all nodes.
You can may be try this out ?

Thanks,

Prashant Sharma



On Fri, May 1, 2015 at 2:16 PM, Pramod Biligiri 
wrote:

> Hi,
> I'm making some small changes to the Spark codebase and trying it out on a
> cluster. I was wondering if there's a faster way to build than running the
> package target each time.
> Currently I'm using: mvn -DskipTests  package
>
> All the nodes have the same filesystem mounted at the same mount point.
>
> Pramod
>


Re: [discuss] ending support for Java 6?

2015-05-01 Thread Steve Loughran

> On 30 Apr 2015, at 21:40, Marcelo Vanzin  wrote:
> 
> As for the idea, I'm +1. Spark is the only reason I still have jdk6
> around - exactly because I don't want to cause the issue that started
> this discussion (inadvertently using JDK7 APIs). And as has been
> pointed out, even J7 is about to go EOL real soon.

+1, perhaps with a roadmap for people to plan for

> 
> Even Hadoop is moving away (I think 2.7 will be j7-only). Hive 1.1 is
> already j7-only. And when Hadoop moves away from something, it's an
> event worthy of headlines.

The constraint here was that there were too many people "stuck" in Java 6, and 
java 7 wasn't compelling enough to pull people off a JVM they trusted to be 
stable at large scale. One problem with production hadoop is that across 5000 
14-core servers, all race conditions will surface —leading to a reluctance to 
upgrade JVMs or even OS's. There was also the fact that for a long time Hadoop 
wouldn't build on OSX on Java 7 (HADOOP-9350). Even today, OS/X's JDK has 
better rendering than java7+, leaving it nice to have around for the IDEs.


After Hadoop 2.5 shipped an announcement was made that 2.6 would be the last 
Java 1.6 release, with the switch taking place in November. Moving ASF Jenkins 
up was probably the hardest bit ( HADOOP-10530 ). 

Switching to JDK7 has enabled moving kerberos support to Java 8 (HADOOP-10786; 
some changes in the internal kerberos classes used directly for kerberos to 
work properly). See HADOOP-11090 for the JDK8 migration; Hadoop trunk will be 
switching to Java 8 before long

> They're still on Jetty 6!

While moving off Jetty entirely wherever possible, leaving jetty 6 on the 
transitive-maven-classpath in the hope of not breaking code that expects it to 
be there. It's not that the project likes Jetty 6 (there are threads whose sole 
aim is to detect jetty startup failures), but that moving off it is felt to be 
better than upgrading.

> 
> As for pyspark, https://github.com/apache/spark/pull/5580 should get
> rid of the last incompatibility with large assemblies, by keeping the
> python files in separate archives. If we remove support for Java 6,
> then we don't need to worry about the size of the assembly anymore.

zzhang's patch drops to Java 6 just to rebuild the assembly Jar; you can still 
build Java7-only classes. So it will work even before the pyspark patch goes in.

Speeding up Spark build during development

2015-05-01 Thread Pramod Biligiri
Hi,
I'm making some small changes to the Spark codebase and trying it out on a
cluster. I was wondering if there's a faster way to build than running the
package target each time.
Currently I'm using: mvn -DskipTests  package

All the nodes have the same filesystem mounted at the same mount point.

Pramod


Re: Custom PersistanceEngine and LeaderAgent implementation in Java

2015-05-01 Thread Niranda Perera
Hi Reynold,

Pls find the PR here [1]

[1] https://github.com/apache/spark/pull/5832

On Thu, Apr 30, 2015 at 11:34 AM, Reynold Xin  wrote:

> We should change the trait to abstract class, and then your problem will
> go away.
>
> Do you want to submit a pull request?
>
>
> On Wed, Apr 29, 2015 at 11:02 PM, Niranda Perera  > wrote:
>
>> Hi,
>>
>> this follows the following feature in this feature [1]
>>
>> I'm trying to implement a custom persistence engine and a leader agent in
>> the Java environment.
>>
>> vis-a-vis scala, when I implement the PersistenceEngine trait in java, I
>> would have to implement methods such as readPersistedData, removeDriver,
>> etc together with read, persist and unpersist methods.
>>
>> but the issue here is, methods such as readPersistedData etc are 'final
>> def's, hence can not be overridden in the java environment.
>>
>> I am new to scala, but is there any workaround to implement the above
>> traits in java?
>>
>> look forward to hear from you.
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-1830
>>
>> --
>> Niranda
>>
>
>


-- 
Niranda


Fwd: Event generator for SPARK-Streaming from csv

2015-05-01 Thread anshu shukla
I have the real DEBS-TAxi data in csv file , in order to operate over it
how to simulate a "Spout" kind  of thing as event generator using the
timestamps in CSV file.




-- 
Thanks & Regards,
Anshu Shukla