ALS memory limits

2014-03-25 Thread Debasish Das
Hi,

For our usecases we are looking into 20 x 1M matrices which comes in the
similar ranges as outlined by the paper over here:

http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html

Is the exponential runtime growth in spark ALS as outlined by the blog
still exists in recommendation.ALS ?

I am running a spark cluster of 10 nodes with total memory of around 1 TB
with 80 cores

With rank = 50, the memory requirements for ALS should be 20Mx50 doubles on
every worker which is around 8 GB

Even if both the factor matrices are cached in memory I should be bounded
by ~ 9 GB but even with 32 GB per worker I see GC errors...

I am debugging the scalability and memory requirements of the algorithm
further but any insights will be very helpful...

Also there are two other issues:

1. If GC errors are hit, that worker JVM goes down and I have to restart it
manually. Is this expected ?

2. When I try to make use of all 80 cores on the cluster I get some issues
related to java.io.File not found exception on /tmp/ ? Is there some OS
limit that how many cores can simultaneously access /tmp from a process ?

Thanks.
Deb

On Sun, Mar 16, 2014 at 2:20 PM, Sean Owen  wrote:

> Good point -- there's been another optimization for ALS in HEAD (
> https://github.com/apache/spark/pull/131), but yes the better place to
> pick up just essential changes since 0.9.0 including the previous one is
> the 0.9 branch.
>
> --
> Sean Owen | Director, Data Science | London
>
>
> On Sun, Mar 16, 2014 at 2:18 PM, Patrick Wendell wrote:
>
>> Sean - was this merged into the 0.9 branch as well (it seems so based
>> on the message from rxin). If so it might make sense to try out the
>> head of branch-0.9 as well. Unless there are *also* other changes
>> relevant to this in master.
>>
>> - Patrick
>>
>> On Sun, Mar 16, 2014 at 12:24 PM, Sean Owen  wrote:
>> > You should simply use a snapshot built from HEAD of
>> github.com/apache/spark
>> > if you can. The key change is in MLlib and with any luck you can just
>> > replace that bit. See the PR I referenced.
>> >
>> > Sure with enough memory you can get it to run even with the memory
>> issue,
>> > but it could be hundreds of GB at your scale. Not sure I take the point
>> > about the JVM; you can give it 64GB of heap and executors can use that
>> much,
>> > sure.
>> >
>> > You could reduce the number of features a lot to work around it too, or
>> > reduce the input size. (If anyone saw my blog post about StackOverflow
>> and
>> > ALS -- that's why I snuck in a relatively paltry 40 features and pruned
>> > questions with <4 tags :) )
>> >
>> > I don't think jblas has anything to do with it per se, and the
>> allocation
>> > fails in Java code, not native code. This should be exactly what that
>> PR I
>> > mentioned fixes.
>> >
>> > --
>> > Sean Owen | Director, Data Science | London
>> >
>> >
>> > On Sun, Mar 16, 2014 at 11:48 AM, Debasish Das <
>> debasish.da...@gmail.com>
>> > wrote:
>> >>
>> >> Thanks Sean...let me get the latest code..do you know which PR was it ?
>> >>
>> >> But will the executors run fine with say 32 gb or 64 gb of memory ?
>> Does
>> >> not JVM shows up issues when the max memory goes beyond certain
>> limit...
>> >>
>> >> Also the failure is due to GC limits from jblas...and I was thinking
>> that
>> >> jblas is going to call native malloc right ? May be 64 gb is not a big
>> deal
>> >> then...I will try increasing to 32 and then 64...
>> >>
>> >> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead
>> limit
>> >> exceeded)
>> >>
>> >>
>> org.jblas.DoubleMatrix.(DoubleMatrix.java:323)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)scala.Array$.fill(Array.scala:267)com.verizon.bigdata.mllib.recommendation.ALSQR.updateBlock(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:346)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:345)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)scala.collection.Iterator$$anon$11.next(Iterator.scala:328)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147)scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:23

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 10:53 AM, Tathagata Das
 wrote:
> PR 159 seems like a fairly big patch to me. And quite recent, so its impact
> on the scheduling is not clear. It may also depend on other changes that
> may have gotten into the DAGScheduler but not pulled into branch 0.9. I am
> not sure it is a good idea to pull that in. We can pull those changes later
> for 0.9.2 if required.


There is no impact on scheduling : it only has an impact on error
handling - it ensures that you can actually use spark on yarn in
multi-tennent clusters more reliably.
Currently, any reasonably long running job (30 mins+) working on non
trivial dataset will fail due to accumulated failures in spark.


Regards,
Mridul


>
> TD
>
>
>
>
> On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan wrote:
>
>> Forgot to mention this in the earlier request for PR's.
>> If there is another RC being cut, please add
>> https://github.com/apache/spark/pull/159 to it too (if not done
>> already !).
>>
>> Thanks,
>> Mridul
>>
>> On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
>>  wrote:
>> >  Hello everyone,
>> >
>> > Since the release of Spark 0.9, we have received a number of important
>> bug
>> > fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
>> > going to cut a release candidate soon and we would love it if people test
>> > it out. We have backported several bug fixes into the 0.9 and updated
>> JIRA
>> > accordingly<
>> https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)
>> >.
>> > Please let me know if there are fixes that were not backported but you
>> > would like to see them in 0.9.1.
>> >
>> > Thanks!
>> >
>> > TD
>>


Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 11:04 AM, Kay Ousterhout  wrote:
> I don't think the blacklisting is a priority and the CPUS_PER_TASK issue
> was still broken after this patch (so broken that I'm convinced no one
> actually uses this feature!!), so agree with TD's sentiment that this
> shouldn't go into 0.9.1.


I am not sure I follow what exactly was broken.
Note that there is no change of behavior by the PR on CPUS_PER_TASK :
that exists in 0.6 (probably earlier).
Is the behavior of CPUS_PER_TASK broken ? Yes - but that is not an
artifact of this PR.



Regards,
Mridul

>
>
> On Tue, Mar 25, 2014 at 10:23 PM, Tathagata Das > wrote:
>
>> PR 159 seems like a fairly big patch to me. And quite recent, so its impact
>> on the scheduling is not clear. It may also depend on other changes that
>> may have gotten into the DAGScheduler but not pulled into branch 0.9. I am
>> not sure it is a good idea to pull that in. We can pull those changes later
>> for 0.9.2 if required.
>>
>> TD
>>
>>
>>
>>
>> On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan > >wrote:
>>
>> > Forgot to mention this in the earlier request for PR's.
>> > If there is another RC being cut, please add
>> > https://github.com/apache/spark/pull/159 to it too (if not done
>> > already !).
>> >
>> > Thanks,
>> > Mridul
>> >
>> > On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
>> >  wrote:
>> > >  Hello everyone,
>> > >
>> > > Since the release of Spark 0.9, we have received a number of important
>> > bug
>> > > fixes and we would like to make a bug-fix release of Spark 0.9.1. We
>> are
>> > > going to cut a release candidate soon and we would love it if people
>> test
>> > > it out. We have backported several bug fixes into the 0.9 and updated
>> > JIRA
>> > > accordingly<
>> >
>> https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)
>> > >.
>> > > Please let me know if there are fixes that were not backported but you
>> > > would like to see them in 0.9.1.
>> > >
>> > > Thanks!
>> > >
>> > > TD
>> >
>>


Re: [VOTE] Release Apache Spark 0.9.1 (rc1)

2014-03-25 Thread Matei Zaharia
Actually I found one minor issue, which is that the support for Tachyon in 
make-distribution.sh seems to rely on GNU sed flags and doesn’t work on Mac OS 
X. But I’d be okay pushing that to a later release since this is a packaging 
operation that you do only once, and presumably you’d do it on a Linux cluster. 
I opened https://spark-project.atlassian.net/browse/SPARK-1326 to track it. We 
can put it in another RC if we find bigger issues.

Matei

On Mar 25, 2014, at 10:31 PM, Matei Zaharia  wrote:

> +1 looks good to me. I tried both the source and CDH4 versions and looked at 
> the new streaming docs.
> 
> The release notes seem slightly incomplete, but I guess you’re still working 
> on them? Anyway those don’t go into the release tarball so it’s okay.
> 
> Matei
> 
> On Mar 24, 2014, at 2:01 PM, Tathagata Das  
> wrote:
> 
>> Please vote on releasing the following candidate as Apache Spark version 
>> 0.9.1
>> A draft of the release notes along with the CHANGES.txt file is attached to 
>> this e-mail.
>> 
>> The tag to be voted on is v0.9.1 (commit 81c6a06c):
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=81c6a06c796a87aaeb5f129f36e4c3396e27d652
>> 
>> 
>> 
>> The release files, including signatures, digests, etc can be found at:
>> http://people.apache.org/~tdas/spark-0.9.1-rc1/
>> 
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/tdas.asc
>> 
>> 
>> 
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1007/
>> 
>> 
>> 
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~tdas/spark-0.9.1-rc1-docs/
>> 
>> Please vote on releasing this package as Apache Spark 0.9.1!
>> 
>> The vote is open until Thursday, March 27, at 22:00 UTC and passes if
>> a majority of at least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Spark 0.9.1
>> 
>> 
>> [ ] -1 Do not release this package because ...
>> 
>> 
>> 
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>> 
> 



Re: Spark 0.9.1 release

2014-03-25 Thread Kay Ousterhout
I don't think the blacklisting is a priority and the CPUS_PER_TASK issue
was still broken after this patch (so broken that I'm convinced no one
actually uses this feature!!), so agree with TD's sentiment that this
shouldn't go into 0.9.1.


On Tue, Mar 25, 2014 at 10:23 PM, Tathagata Das  wrote:

> PR 159 seems like a fairly big patch to me. And quite recent, so its impact
> on the scheduling is not clear. It may also depend on other changes that
> may have gotten into the DAGScheduler but not pulled into branch 0.9. I am
> not sure it is a good idea to pull that in. We can pull those changes later
> for 0.9.2 if required.
>
> TD
>
>
>
>
> On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan  >wrote:
>
> > Forgot to mention this in the earlier request for PR's.
> > If there is another RC being cut, please add
> > https://github.com/apache/spark/pull/159 to it too (if not done
> > already !).
> >
> > Thanks,
> > Mridul
> >
> > On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
> >  wrote:
> > >  Hello everyone,
> > >
> > > Since the release of Spark 0.9, we have received a number of important
> > bug
> > > fixes and we would like to make a bug-fix release of Spark 0.9.1. We
> are
> > > going to cut a release candidate soon and we would love it if people
> test
> > > it out. We have backported several bug fixes into the 0.9 and updated
> > JIRA
> > > accordingly<
> >
> https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)
> > >.
> > > Please let me know if there are fixes that were not backported but you
> > > would like to see them in 0.9.1.
> > >
> > > Thanks!
> > >
> > > TD
> >
>


Re: [VOTE] Release Apache Spark 0.9.1 (rc1)

2014-03-25 Thread Matei Zaharia
+1 looks good to me. I tried both the source and CDH4 versions and looked at 
the new streaming docs.

The release notes seem slightly incomplete, but I guess you’re still working on 
them? Anyway those don’t go into the release tarball so it’s okay.

Matei

On Mar 24, 2014, at 2:01 PM, Tathagata Das  wrote:

> Please vote on releasing the following candidate as Apache Spark version 0.9.1
> A draft of the release notes along with the CHANGES.txt file is attached to 
> this e-mail.
> 
> The tag to be voted on is v0.9.1 (commit 81c6a06c):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=81c6a06c796a87aaeb5f129f36e4c3396e27d652
> 
> 
> 
> The release files, including signatures, digests, etc can be found at:
> http://people.apache.org/~tdas/spark-0.9.1-rc1/
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/tdas.asc
> 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1007/
> 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~tdas/spark-0.9.1-rc1-docs/
> 
> Please vote on releasing this package as Apache Spark 0.9.1!
> 
> The vote is open until Thursday, March 27, at 22:00 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 0.9.1
> 
> 
> [ ] -1 Do not release this package because ...
> 
> 
> 
> To learn more about Apache Spark, please see
> http://spark.apache.org/
> 



Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
PR 159 seems like a fairly big patch to me. And quite recent, so its impact
on the scheduling is not clear. It may also depend on other changes that
may have gotten into the DAGScheduler but not pulled into branch 0.9. I am
not sure it is a good idea to pull that in. We can pull those changes later
for 0.9.2 if required.

TD




On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan wrote:

> Forgot to mention this in the earlier request for PR's.
> If there is another RC being cut, please add
> https://github.com/apache/spark/pull/159 to it too (if not done
> already !).
>
> Thanks,
> Mridul
>
> On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
>  wrote:
> >  Hello everyone,
> >
> > Since the release of Spark 0.9, we have received a number of important
> bug
> > fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
> > going to cut a release candidate soon and we would love it if people test
> > it out. We have backported several bug fixes into the 0.9 and updated
> JIRA
> > accordingly<
> https://spark-project.atlassian.net/browse/SPARK-1275?jql=project%20in%20(SPARK%2C%20BLINKDB%2C%20MLI%2C%20MLLIB%2C%20SHARK%2C%20STREAMING%2C%20GRAPH%2C%20TACHYON)%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20(Resolved%2C%20Closed)
> >.
> > Please let me know if there are fixes that were not backported but you
> > would like to see them in 0.9.1.
> >
> > Thanks!
> >
> > TD
>


Re: Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Prashant Sharma
I think we should upgrade sbt, I have been using sbt since 13.2-M1 and have
not spotted any issues. So RC1 should be good + it has the fast incremental
compilation.

Prashant Sharma


On Wed, Mar 26, 2014 at 10:41 AM, Will Benton  wrote:

> - Original Message -
>
> > At last, I worked around this issue by updating my local SBT to
> 0.13.2-RC1.
> > If any of you are experiencing similar problem, I suggest you upgrade
> your
> > local SBT version.
>
> If this issue is causing grief for anyone on Fedora 20, know that you can
> install sbt via yum and get a sbt 0.13.1 that has been patched to use Ivy
> 2.3.0 instead of Ivy 2.3.0-rc1.  Obviously, this isn't a solution for
> everyone, but it's certainly cleaner than building your own sbt locally if
> you're using Fedora.  (If you try this out and run in to any trouble,
> please let me know off-list and I'll help out.)
>
>
>
> best,
> wb
>


Re: Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Will Benton
- Original Message -

> At last, I worked around this issue by updating my local SBT to 0.13.2-RC1.
> If any of you are experiencing similar problem, I suggest you upgrade your
> local SBT version.

If this issue is causing grief for anyone on Fedora 20, know that you can 
install sbt via yum and get a sbt 0.13.1 that has been patched to use Ivy 2.3.0 
instead of Ivy 2.3.0-rc1.  Obviously, this isn't a solution for everyone, but 
it's certainly cleaner than building your own sbt locally if you're using 
Fedora.  (If you try this out and run in to any trouble, please let me know 
off-list and I'll help out.)



best,
wb


Suggest to workaround the org.eclipse.jetty.orbit problem with SBT 0.13.2-RC1

2014-03-25 Thread Cheng Lian
Hi all,

Due to a bug  of Ivy, SBT
tries to download .orbit instead of .jar files and causing problems. This
bug has been fixed in Ivy 2.3.0, but SBT 0.13.1 still uses Ivy 2.0. Aaron
had kindly provided a workaround in PR
#183,
but I'm afraid only explicitly depend on javax.servlet only is not enough.
I'm not pretty sure about this because I'm facing both this issue and
ridiculously unstable network environment, which makes reproducing the bug
extremely time consuming (sbt gen-idea costs at least half an hour to
complete, and the generated result is broken. Most of the time was spent in
dependency resolution).

At last, I worked around this issue by updating my local SBT to 0.13.2-RC1.
If any of you are experiencing similar problem, I suggest you upgrade your
local SBT version. Since SBT 0.13.2-RC1 is not an official release, we have
to build it from scratch:

$ git clone g...@github.com:sbt/sbt.git 
$ cd 
$ git checkout v0.13.2-RC1

Now ensure you have SBT 0.13.1 installed as the latest stable version is
required for bootstrapping:

$ sbt publishLocal
$ mv ~/.sbt/boot /tmp
$ cd /bin
$ mv sbt-launch.jar sbt-launch-0.13.1.jar
$ ln -sf /target/sbt-launch-0.13.2-RC1.jar
sbt-launch.jar

Now you should be able to build Spark without worrying .orbit files. Hope
it helps.

Cheng


Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
Forgot to mention this in the earlier request for PR's.
If there is another RC being cut, please add
https://github.com/apache/spark/pull/159 to it too (if not done
already !).

Thanks,
Mridul

On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das
 wrote:
>  Hello everyone,
>
> Since the release of Spark 0.9, we have received a number of important bug
> fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
> going to cut a release candidate soon and we would love it if people test
> it out. We have backported several bug fixes into the 0.9 and updated JIRA
> accordingly.
> Please let me know if there are fixes that were not backported but you
> would like to see them in 0.9.1.
>
> Thanks!
>
> TD


Re: Travis CI

2014-03-25 Thread Patrick Wendell
Ya It's been a little bit slow lately because of a high error rate in
interactions with the git-hub API. Unfortunately we are pretty slammed
for the release and haven't had a ton of time to do further debugging.

- Patrick

On Tue, Mar 25, 2014 at 7:13 PM, Nan Zhu  wrote:
> I just found that the Jenkins is not working from this afternoon
>
> for one PR, the first time build failed after 90 minutes, the second time it
> has run for more than 2 hours, no result is returned
>
> Best,
>
> --
> Nan Zhu
>
>
> On Tuesday, March 25, 2014 at 10:06 PM, Patrick Wendell wrote:
>
> That's not correct - like Michael said the Jenkins build remains the
> reference build for now.
>
> On Tue, Mar 25, 2014 at 7:03 PM, Nan Zhu  wrote:
>
> I assume the Jenkins is not working now?
>
> Best,
>
> --
> Nan Zhu
>
>
> On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote:
>
> Just a quick note to everyone that Patrick and I are playing around with
> Travis CI on the Spark github repository. For now, travis does not run all
> of the test cases, so will only be turned on experimentally. Long term it
> looks like Travis might give better integration with github, so we are
> going to see if it is feasible to get all of our tests running on it.
>
> *Jenkins remains the reference CI and should be consulted before merging
> pull requests, independent of what Travis says.*
>
> If you have any questions or want to help out with the investigation, let
> me know!
>
> Michael
>
>


Re: Travis CI

2014-03-25 Thread Nan Zhu
I just found that the Jenkins is not working from this afternoon

for one PR, the first time build failed after 90 minutes, the second time it 
has run for more than 2 hours, no result is returned

Best, 

-- 
Nan Zhu



On Tuesday, March 25, 2014 at 10:06 PM, Patrick Wendell wrote:

> That's not correct - like Michael said the Jenkins build remains the
> reference build for now.
> 
> On Tue, Mar 25, 2014 at 7:03 PM, Nan Zhu  (mailto:zhunanmcg...@gmail.com)> wrote:
> > I assume the Jenkins is not working now?
> > 
> > Best,
> > 
> > --
> > Nan Zhu
> > 
> > 
> > On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote:
> > 
> > Just a quick note to everyone that Patrick and I are playing around with
> > Travis CI on the Spark github repository. For now, travis does not run all
> > of the test cases, so will only be turned on experimentally. Long term it
> > looks like Travis might give better integration with github, so we are
> > going to see if it is feasible to get all of our tests running on it.
> > 
> > *Jenkins remains the reference CI and should be consulted before merging
> > pull requests, independent of what Travis says.*
> > 
> > If you have any questions or want to help out with the investigation, let
> > me know!
> > 
> > Michael 



Re: Travis CI

2014-03-25 Thread Patrick Wendell
That's not correct - like Michael said the Jenkins build remains the
reference build for now.

On Tue, Mar 25, 2014 at 7:03 PM, Nan Zhu  wrote:
> I assume the Jenkins is not working now?
>
> Best,
>
> --
> Nan Zhu
>
>
> On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote:
>
> Just a quick note to everyone that Patrick and I are playing around with
> Travis CI on the Spark github repository. For now, travis does not run all
> of the test cases, so will only be turned on experimentally. Long term it
> looks like Travis might give better integration with github, so we are
> going to see if it is feasible to get all of our tests running on it.
>
> *Jenkins remains the reference CI and should be consulted before merging
> pull requests, independent of what Travis says.*
>
> If you have any questions or want to help out with the investigation, let
> me know!
>
> Michael
>
>


Re: Travis CI

2014-03-25 Thread Nan Zhu
I assume the Jenkins is not working now? 

Best, 

-- 
Nan Zhu



On Tuesday, March 25, 2014 at 6:42 PM, Michael Armbrust wrote:

> Just a quick note to everyone that Patrick and I are playing around with
> Travis CI on the Spark github repository. For now, travis does not run all
> of the test cases, so will only be turned on experimentally. Long term it
> looks like Travis might give better integration with github, so we are
> going to see if it is feasible to get all of our tests running on it.
> 
> *Jenkins remains the reference CI and should be consulted before merging
> pull requests, independent of what Travis says.*
> 
> If you have any questions or want to help out with the investigation, let
> me know!
> 
> Michael 



Travis CI

2014-03-25 Thread Michael Armbrust
Just a quick note to everyone that Patrick and I are playing around with
Travis CI on the Spark github repository.  For now, travis does not run all
of the test cases, so will only be turned on experimentally.  Long term it
looks like Travis might give better integration with github, so we are
going to see if it is feasible to get all of our tests running on it.

*Jenkins remains the reference CI and should be consulted before merging
pull requests, independent of what Travis says.*

If you have any questions or want to help out with the investigation, let
me know!

Michael


Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
@evan
>From the discussion in the JIRA, it seems that we still dont have a clear
solution for SPARK-1138. Nor do we have a sense of whether the solution is
going to small enough for a maintenance release. So I dont think we should
block the release of Spark 0.9.1 for this. We can make another Spark 0.9.2
release once the correct solution has been figured out.

@kevin
I understand the problem. I will try to port the solution for master
inthis PR  into
branch 0.9. Lets see if it works out.


On Tue, Mar 25, 2014 at 10:19 AM, Kevin Markey wrote:

> TD:
>
> A correct shading of ASM should only affect Spark code unless someone is
> relying on ASM 4.0 in unrelated project code, in which case they can add
> org.ow2.asm:asm:4.x as a dependency.
>
> Our short term solution has been to repackage other libraries with a 3.2
> dependency or to exclude ASM when our use of a dependent library really
> doesn't need it.  As you probably know, the real problem arises in
> ClassVisitor, which is an Interface in 3.x and before, but in 4.x it is an
> abstract class that takes a version constant as its constructor.  The ASM
> folks of course had our best interests in mind when they did this,
> attempting to deal with the Java-version dependent  changes from one ASM
> release to the next.  Unfortunately, they didn't change the names or
> locations of their classes and interfaces, which would have helped.
>
> In our particular case, the only library from which we couldn't exclude
> ASM was org.glassfish.jersey.containers:jersey-container-servlet:jar:2.5.1.
> I added a new module to our project, including some dummy source code,
> because we needed the library to be self contained, made the servlet --
> minus some unrelated transitive dependencies -- the only module dependency,
> then used the Maven shade plugin to relocate "org.objectweb.asm" to an
> arbitrary target.  We added the new shaded module as a new project
> dependency, plus the unrelated transitive dependencies excluded above.
> This solved the problem. At least until we added WADL to the project.  Then
> we needed to deal with it on its own terms.
>
> As you can see, we left Spark alone in all its ASM 4.0 glory.  Why? Spark
> is more volatile than the other libraries.  Also, the way in which we
> needed to deploy Spark and other resources on our (Yarn) clusters suggested
> that it would be easier to shade the other libraries.  I wanted to avoid
> having to install a locally patched Spark library into our build, updating
> the cluster and individual developers whenever there's a new patch.
>  Individual developers such as me who are testing the impact of patches can
> handle it, but the main build goes to Maven Central via our corporate
> Artifactory mirror.
>
> If suddenly we had a Spark 0.9.1 with a shaded ASM, it would have no
> negative impact on us.  Only a positive impact.
>
> I just wish that all users of ASM would read FAQ entry 15!!!
>
> Thanks
> Kevin
>
>
>
> On 03/24/2014 06:30 PM, Tathagata Das wrote:
>
>> Hello Kevin,
>>
>> A fix for SPARK-782 would definitely simplify building against Spark.
>> However, its possible that a fix for this issue in 0.9.1 will break
>> the builds (that reference spark) of existing 0.9 users, either due to
>> a change in the ASM version, or for being incompatible with their
>> current workarounds for this issue. That is not a good idea for a
>> maintenance release, especially when 1.0 is not too far away.
>>
>> Can you (and others) elaborate more on the current workarounds that
>> you have for this issue? Its best to understand all the implications
>> of this fix.
>>
>> Note that in branch 0.9, it is not fixed, neither in SBT nor in Maven.
>>
>> TD
>>
>> On Mon, Mar 24, 2014 at 4:38 PM, Kevin Markey 
>> wrote:
>>
>>> Is there any way that [SPARK-782] (Shade ASM) can be included?  I see
>>> that
>>> it is not currently backported to 0.9.  But there is no single issue that
>>> has caused us more grief as we integrate spark-core with other project
>>> dependencies.  There are way too many libraries out there in addition to
>>> Spark 0.9 and before that are not well-behaved (ASM FAQ recommends
>>> shading),
>>> including some Hive and Hadoop libraries and a number of servlet
>>> libraries.
>>> We can't control those, but if Spark were well behaved in this regard, it
>>> would help.  Even for a maintenance release, and even if 1.0 is only 6
>>> weeks
>>> away!
>>>
>>> (For those not following 782, according to Jira comments, the SBT build
>>> shades it, but it is the Maven build that ends up in Maven Central.)
>>>
>>> Thanks
>>> Kevin Markey
>>>
>>>
>>>
>>>
>>> On 03/19/2014 06:07 PM, Tathagata Das wrote:
>>>
Hello everyone,

 Since the release of Spark 0.9, we have received a number of important
 bug
 fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
 going to cut a release candidate soon and we would love it if people
 test
 it ou

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-25 Thread Gary Malouf
Can anyone verify the claims from Aureliano regarding the Akka dependency
protobuf collision?  Our team has a major need to upgrade to protobuf 2.5.0
up the pipe and Spark seems to be the blocker here.


On Fri, Mar 21, 2014 at 6:49 PM, Aureliano Buendia wrote:

>
>
>
> On Tue, Mar 18, 2014 at 12:56 PM, Ognen Duzlevski <
> og...@plainvanillagames.com> wrote:
>
>>
>> On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote:
>>
>>> On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia  wrote:
>>>
 Is there a reason for spark using the older akka?




 On Sun, Mar 2, 2014 at 1:53 PM, 1esha  wrote:

 The problem is in akka remote. It contains files compiled with 2.4.*.
 When

 you run it with 2.5.* in classpath it fails like above.



 Looks like moving to akka 2.3 will solve this issue. Check this issue -

 https://www.assembla.com/spaces/akka/tickets/3154-use-
 protobuf-version-2-5-0#/activity/ticket:


 Is the solution to exclude the  2.4.*. dependency on protobuf or will
 thi produce more complications?

>>> I am not sure I remember what the context was around this but I run
>> 0.9.0 with hadoop 2.2.0 just fine.
>>
>
> The problem is that spark depends on an older version of akka, which
> depends on an older version of protobuf (2.4).
>
> This means people cannot use protobuf 2.5 with spark.
>
>
>> Ognen
>>
>
>


Re: Spark 0.9.1 release

2014-03-25 Thread Kevin Markey

TD:

A correct shading of ASM should only affect Spark code unless someone is 
relying on ASM 4.0 in unrelated project code, in which case they can add 
org.ow2.asm:asm:4.x as a dependency.


Our short term solution has been to repackage other libraries with a 3.2 
dependency or to exclude ASM when our use of a dependent library really 
doesn't need it.  As you probably know, the real problem arises in 
ClassVisitor, which is an Interface in 3.x and before, but in 4.x it is 
an abstract class that takes a version constant as its constructor.  The 
ASM folks of course had our best interests in mind when they did this, 
attempting to deal with the Java-version dependent  changes from one ASM 
release to the next.  Unfortunately, they didn't change the names or 
locations of their classes and interfaces, which would have helped.


In our particular case, the only library from which we couldn't exclude 
ASM was 
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.5.1. I 
added a new module to our project, including some dummy source code, 
because we needed the library to be self contained, made the servlet -- 
minus some unrelated transitive dependencies -- the only module 
dependency, then used the Maven shade plugin to relocate 
"org.objectweb.asm" to an arbitrary target.  We added the new shaded 
module as a new project dependency, plus the unrelated transitive 
dependencies excluded above.   This solved the problem. At least until 
we added WADL to the project.  Then we needed to deal with it on its own 
terms.


As you can see, we left Spark alone in all its ASM 4.0 glory.  Why? 
Spark is more volatile than the other libraries.  Also, the way in which 
we needed to deploy Spark and other resources on our (Yarn) clusters 
suggested that it would be easier to shade the other libraries.  I 
wanted to avoid having to install a locally patched Spark library into 
our build, updating the cluster and individual developers whenever 
there's a new patch.  Individual developers such as me who are testing 
the impact of patches can handle it, but the main build goes to Maven 
Central via our corporate Artifactory mirror.


If suddenly we had a Spark 0.9.1 with a shaded ASM, it would have no 
negative impact on us.  Only a positive impact.


I just wish that all users of ASM would read FAQ entry 15!!!

Thanks
Kevin


On 03/24/2014 06:30 PM, Tathagata Das wrote:

Hello Kevin,

A fix for SPARK-782 would definitely simplify building against Spark.
However, its possible that a fix for this issue in 0.9.1 will break
the builds (that reference spark) of existing 0.9 users, either due to
a change in the ASM version, or for being incompatible with their
current workarounds for this issue. That is not a good idea for a
maintenance release, especially when 1.0 is not too far away.

Can you (and others) elaborate more on the current workarounds that
you have for this issue? Its best to understand all the implications
of this fix.

Note that in branch 0.9, it is not fixed, neither in SBT nor in Maven.

TD

On Mon, Mar 24, 2014 at 4:38 PM, Kevin Markey  wrote:

Is there any way that [SPARK-782] (Shade ASM) can be included?  I see that
it is not currently backported to 0.9.  But there is no single issue that
has caused us more grief as we integrate spark-core with other project
dependencies.  There are way too many libraries out there in addition to
Spark 0.9 and before that are not well-behaved (ASM FAQ recommends shading),
including some Hive and Hadoop libraries and a number of servlet libraries.
We can't control those, but if Spark were well behaved in this regard, it
would help.  Even for a maintenance release, and even if 1.0 is only 6 weeks
away!

(For those not following 782, according to Jira comments, the SBT build
shades it, but it is the Maven build that ends up in Maven Central.)

Thanks
Kevin Markey




On 03/19/2014 06:07 PM, Tathagata Das wrote:

   Hello everyone,

Since the release of Spark 0.9, we have received a number of important bug
fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
going to cut a release candidate soon and we would love it if people test
it out. We have backported several bug fixes into the 0.9 and updated JIRA

accordingly.

Please let me know if there are fixes that were not backported but you
would like to see them in 0.9.1.

Thanks!

TD





Re: new Catalyst/SQL component merged into master

2014-03-25 Thread Evan Chan
HI Michael,

It's not publicly available right now, though we can probably chat
about it offline.   It's not a super novel concept or anything, in
fact I had proposed it a long time ago on the mailing lists.

-Evan

On Mon, Mar 24, 2014 at 1:34 PM, Michael Armbrust
 wrote:
> Hi Evan,
>
> Index support is definitely something we would like to add, and it is
> possible that adding support for your custom indexing solution would not be
> too difficult.
>
> We already push predicates into hive table scan operators when the
> predicates are over partition keys.  You can see an example of how we
> collect filters and decide which can pushed into the scan using the
> HiveTableScan query planning strategy.
>
> I'd like to know more about your indexing solution.  Is this something
> publicly available?  One concern here is that the query planning code is not
> considered a public API and so is likely to change quite a bit as we improve
> the optimizer.  Its not currently something that we plan to expose for
> external components to modify.
>
> Michael
>
>
> On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan  wrote:
>>
>> Hi Michael,
>>
>> Congrats, this is really neat!
>>
>> What thoughts do you have regarding adding indexing support and
>> predicate pushdown to this SQL framework?Right now we have custom
>> bitmap indexing to speed up queries, so we're really curious as far as
>> the architectural direction.
>>
>> -Evan
>>
>>
>> On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
>>  wrote:
>> >>
>> >> It will be great if there are any examples or usecases to look at ?
>> >>
>> > There are examples in the Spark documentation.  Patrick posted and
>> > updated
>> > copy here so people can see them before 1.0 is released:
>> >
>> > http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
>> >
>> >> Does this feature has different usecases than shark or more cleaner as
>> >> hive dependency is gone?
>> >>
>> > Depending on how you use this, there is still a dependency on Hive (By
>> > default this is not the case.  See the above documentation for more
>> > details).  However, the dependency is on a stock version of Hive instead
>> > of
>> > one modified by the AMPLab.  Furthermore, Spark SQL has its own
>> > optimizer,
>> > instead of relying on the Hive optimizer.  Long term, this is going to
>> > give
>> > us a lot more flexibility to optimize queries specifically for the Spark
>> > execution engine.  We are actively porting over the best parts of shark
>> > (specifically the in-memory columnar representation).
>> >
>> > Shark still has some features that are missing in Spark SQL, including
>> > SharkServer (and years of testing).  Once SparkSQL graduates from Alpha
>> > status, it'll likely become the new backend for Shark.
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |
>
>



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |


Re: Spark 0.9.1 release

2014-03-25 Thread Evan Chan
Hey guys,

I think SPARK-1138 should be resolved before releasing Spark 0.9.1.
It's affecting multiple users ability to use Spark 0.9 with various
versions of Hadoop.
I have one fix but not sure if it works for others.

-Evan


On Mon, Mar 24, 2014 at 5:30 PM, Tathagata Das
 wrote:
> Hello Kevin,
>
> A fix for SPARK-782 would definitely simplify building against Spark.
> However, its possible that a fix for this issue in 0.9.1 will break
> the builds (that reference spark) of existing 0.9 users, either due to
> a change in the ASM version, or for being incompatible with their
> current workarounds for this issue. That is not a good idea for a
> maintenance release, especially when 1.0 is not too far away.
>
> Can you (and others) elaborate more on the current workarounds that
> you have for this issue? Its best to understand all the implications
> of this fix.
>
> Note that in branch 0.9, it is not fixed, neither in SBT nor in Maven.
>
> TD
>
> On Mon, Mar 24, 2014 at 4:38 PM, Kevin Markey  wrote:
>> Is there any way that [SPARK-782] (Shade ASM) can be included?  I see that
>> it is not currently backported to 0.9.  But there is no single issue that
>> has caused us more grief as we integrate spark-core with other project
>> dependencies.  There are way too many libraries out there in addition to
>> Spark 0.9 and before that are not well-behaved (ASM FAQ recommends shading),
>> including some Hive and Hadoop libraries and a number of servlet libraries.
>> We can't control those, but if Spark were well behaved in this regard, it
>> would help.  Even for a maintenance release, and even if 1.0 is only 6 weeks
>> away!
>>
>> (For those not following 782, according to Jira comments, the SBT build
>> shades it, but it is the Maven build that ends up in Maven Central.)
>>
>> Thanks
>> Kevin Markey
>>
>>
>>
>>
>> On 03/19/2014 06:07 PM, Tathagata Das wrote:
>>>
>>>   Hello everyone,
>>>
>>> Since the release of Spark 0.9, we have received a number of important bug
>>> fixes and we would like to make a bug-fix release of Spark 0.9.1. We are
>>> going to cut a release candidate soon and we would love it if people test
>>> it out. We have backported several bug fixes into the 0.9 and updated JIRA
>>>
>>> accordingly.
>>>
>>> Please let me know if there are fixes that were not backported but you
>>> would like to see them in 0.9.1.
>>>
>>> Thanks!
>>>
>>> TD
>>>
>>



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |


Re: Shark does not give any results with SELECT count(*) command

2014-03-25 Thread qingyang li
spark is deloyed on bigdata001 bigdata002 bigdata003 bigdata004
 bigdata001 is master
i have also copied shark's files on the four machines.
when i run " select count(*) from b " on bigdata003's shark shell
"bin/shark" , i could get the result.
but when i run "select count(*) from b" on other nodes's shark shell
"bin/shark",  i can not get the result.

it seems the result has been sent to bigdata003,
i have found such log on bigdata003:

14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata001/192.168.1.101]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata002/192.168.1.102]
14/03/25 17:08:49 INFO network.ConnectionManager: Accepted connection
from [bigdata004/192.168.1.104]

and also found such log on bigdata004 002 001:

09/01/13 09:32:29 INFO network.ConnectionManager: Accepted connection
from [bigdata003/192.168.1.103]
09/01/13 09:32:29 INFO network.SendingConnection: Initiating
connection to [bigdata003/192.168.1.103:39848]
09/01/13 09:32:29 INFO network.SendingConnection: Connected to
[bigdata003/192.168.1.103:39848], 1 messages pending





2014-03-25 16:19 GMT+08:00 qingyang li :

> reopen this thread because i encounter this problem again.
> Here is my env:
> scala 2.10.3 s
> spark 0.9.0tandalone mode
> shark 0.9.0downlaod the source code and build by myself
> hive hive-shark-0.11
> I have copied hive-site.xml from my hadoop cluster , it's hive version is
> 0.12,  after copied , i deleted some attributes from hive-site.xml
>
> When run select count(*) from xxx, no resut and no errors output.
>
> Can someone give me some suggestions to debug ?
>
>
>
>
>
> 2014-03-20 11:27 GMT+08:00 qingyang li :
>
> have found the cause , my problem is :
>> the style of file salves is not correct, so the task only be run on
>> master.
>>
>> explain here to help other guy who also encounter similiar problem.
>>
>>
>> 2014-03-20 9:57 GMT+08:00 qingyang li :
>>
>> Hi, i install spark0.9.0 and shark0.9 on 3 nodes , when i run select *
>>> from src , i can get result, but when i run select count(*) from src or
>>> select * from src limit 1,  there is no result output.
>>>
>>> i have found similiar problem on google groups:
>>>
>>> https://groups.google.com/forum/#!searchin/spark-users/Shark$20does$20not$20give$20any$20results$20with$20SELECT$20command/spark-users/oKMBPBWim0U/_hbDCi4m-xUJ
>>> but , there is no solution on it.
>>>
>>> Does anyone encounter such problem?
>>>
>>
>>
>