Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Sean Owen
For any Hadoop 2.4 distro, yes, set hadoop.version but also set -Phadoop-2.4. http://spark.apache.org/docs/latest/building-with-maven.html On Mon, Aug 4, 2014 at 9:15 AM, Patrick Wendell pwend...@gmail.com wrote: For hortonworks, I believe it should work to just link against the corresponding

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet assembly problem. $ ./sbt/sbt assembly Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
Sorry I mean, I tried this command ./sbt/sbt clean and now it works. Is it because of cached components no recompiled? On 8/4/14, 4:44 PM, Larry Xiao wrote: I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
Can you try building without any of the special `hadoop.version` flags and just building only with -Phadoop-2.4? In the past users have reported issues trying to build random spot versions... I think HW is supposed to be compatible with the normal 2.4.0 build. On Mon, Aug 4, 2014 at 8:35 AM,

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
Ah I see, yeah you might need to set hadoop.version and yarn.version. I thought he profile set this automatically. On Mon, Aug 4, 2014 at 10:02 AM, Ron's Yahoo! zlgonza...@yahoo.com wrote: I meant yarn and hadoop defaulted to 1.0.4 so the yarn build fails since 1.0.4 doesn't exist for yarn...

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
I don’t think there is an hwx profile, but there probably should be. - Steve From: Patrick Wendell pwend...@gmail.com Date: Monday, August 4, 2014 at 10:08 To: Ron's Yahoo! zlgonza...@yahoo.com Cc: Ron's Yahoo! zlgonza...@yahoo.com.invalid, Steve Nunez snu...@hortonworks.com,

Problems running modified spark version on ec2 cluster

2014-08-04 Thread Matt Forbes
I'm trying to run a forked version of mllib where I am experimenting with a boosted trees implementation. Here is what I've tried, but can't seem to get working properly: *Directory layout:* src/spark-dev (spark github fork) pom.xml - I've tried changing the version to 1.2 arbitrarily in core

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Sean Owen
What would such a profile do though? In general building for a specific vendor version means setting hadoop.verison and/or yarn.version. Any hard-coded value is unlikely to match what a particular user needs. Setting protobuf versions and so on is already done by the generic profiles. In a

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
Hmm. Fair enough. I hadn¹t given that answer much thought and on reflection think you¹re right in that a profile would just be a bad hack. On 8/4/14, 10:35, Sean Owen so...@cloudera.com wrote: What would such a profile do though? In general building for a specific vendor version means setting

Re: Problems running modified spark version on ec2 cluster

2014-08-04 Thread Matt Forbes
After rummaging through the worker instances I noticed they were using the assembly jar (which I hadn't noticed before). Now instead of using the core and mllib jars individually, I'm just overwriting the assembly jar in the master and using spark-ec2/copy-dir. For posterity, my run script is:

Re: Scala 2.11 external dependencies

2014-08-04 Thread Anand Avati
On Sun, Aug 3, 2014 at 9:09 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Anand, Thanks for looking into this - it's great to see momentum towards Scala 2.11 and I'd love if this land in Spark 1.2. For the external dependencies, it would be good to create a sub-task of SPARK-1812 to

Re: Interested in contributing to GraphX in Python

2014-08-04 Thread Reynold Xin
Thanks for your interest. I think the main challenge is if we have to call Python functions per record, it can be pretty expensive to serialize/deserialize across boundaries of the Python process and JVM process. I don't know if there is a good way to solve this problem yet. On Fri, Aug 1,

Re: Low Level Kafka Consumer for Spark

2014-08-04 Thread Yan Fang
Another suggestion that may help is that, you can consider use Kafka to store the latest offset instead of Zookeeper. There are at least two benefits: 1) lower the workload of ZK 2) support replay from certain offset. This is how Samza http://samza.incubator.apache.org/ deals with the Kafka

log overloaded in SparkContext/ Spark 1.0.x

2014-08-04 Thread Dmitriy Lyubimov
it would seem the code like import o.a.spark.SparkContext._ import math._ a = log(b) does not seem to compile anymore with Spark 1.0.x since SparkContext._ also exposes a `log` function. Which happens a lot to a guy like me. obvious workaround is to use something like import

Re: log overloaded in SparkContext/ Spark 1.0.x

2014-08-04 Thread Matei Zaharia
Hah, weird. log should be protected actually (look at trait Logging). Is your class extending SparkContext or somehow being placed in the org.apache.spark package? Or maybe the Scala compiler looks at it anyway.. in that case we can rename it. Please open a JIRA for it if that's the case. On