Re: Problem building Spark

2015-10-19 Thread Ted Yu
See this thread
http://search-hadoop.com/m/q3RTtV3VFNdgNri2&subj=Re+Build+spark+1+5+1+branch+fails

> On Oct 19, 2015, at 6:59 PM, Annabel Melongo 
>  wrote:
> 
> I tried to build Spark according to the build directions and the it failed 
> due to the following error: 
>  
>  
>  
>  
>  
>  
> Building Spark - Spark 1.5.1 Documentation
> Building Spark Building with build/mvn Building a Runnable Distribution 
> Setting up Maven’s Memory Usage Specifying the Hadoop Version Building With 
> Hive and JDBC Support Building for Scala 2.11
> View on spark.apache.org
> Preview by Yahoo
>  
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.5.5:single (test-jar-with- 
>   dependencies) on project spark-streaming-mqtt_2.10: Failed to 
> create assembly: Error creating assembly archive test-
> jar-with-dependencies: Problem creating jar: Execution exception (and 
> the archive is probably corrupt but I could not 
> delete it): Java heap space -> [Help 1]
> 
> Any help?  I have a 64-bit windows 8 machine


streaming test failure

2015-10-18 Thread Ted Yu
When I ran the following command on Linux with latest master branch:
~/apache-maven-3.3.3/bin/mvn clean -Phive -Phive-thriftserver -Pyarn
-Phadoop-2.4 -Dhadoop.version=2.7.0 package

I saw some test failures:
http://pastebin.com/1VYZYy5K

Has anyone seen similar test failure before ?

Thanks


test failed due to OOME

2015-10-18 Thread Ted Yu
From
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console
:

SparkListenerSuite:- basic creation and shutdown of LiveListenerBus-
bus.stop() waits for the event queue to completely drain- basic
creation of StageInfo- basic creation of StageInfo with shuffle-
StageInfo with fewer tasks than partitions- local metrics-
onTaskGettingResult() called when result fetched remotely *** FAILED
***  org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost
task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError:
Java heap space at java.util.Arrays.copyOf(Arrays.java:2271)at
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)  at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) 
at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1852)
at
java.io.ObjectOutputStream.write(ObjectOutputStream.java:708)   at
org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:182)   at
org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:52)
  at
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1160) at
org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:49)  
at
java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)  
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)   at
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
  at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at
java.lang.Thread.run(Thread.java:745)


Should more heap be given to test suite ?


Cheers


Re: Build spark 1.5.1 branch fails

2015-10-17 Thread Ted Yu
Have you set MAVEN_OPTS with the following ?
-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m

Cheers

On Sat, Oct 17, 2015 at 2:35 PM, Chester Chen  wrote:

> I was using jdk 1.7 and maven version is the same as pom file.
>
> ᚛ |(v1.5.1)|$ java -version
> java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>
> Using build/sbt still fail the same with -Denforcer.skip, with mvn build,
> it fails with
>
>
> [ERROR] PermGen space -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging
>
> I am giving up on this. Just using 1.5.2-SNAPSHOT for now.
>
> Chester
>
>
> On Mon, Oct 12, 2015 at 12:05 AM, Xiao Li  wrote:
>
>> Hi, Chester,
>>
>> Please check your pom.xml. Your java.version and maven.version might not
>> match your build environment.
>>
>> Or using -Denforcer.skip=true from the command line to skip it.
>>
>> Good luck,
>>
>> Xiao Li
>>
>> 2015-10-08 10:35 GMT-07:00 Chester Chen :
>>
>>> Question regarding branch-1.5  build.
>>>
>>> Noticed that the spark project no longer publish the spark-assembly. We
>>> have to build ourselves ( until we find way to not depends on assembly
>>> jar).
>>>
>>>
>>> I check out the tag v.1.5.1 release version and using the sbt to build
>>> it, I get the following error
>>>
>>> build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive
>>> -Phive-thriftserver -DskipTests clean package assembly
>>>
>>>
>>> [warn] ::
>>> [warn] ::  UNRESOLVED DEPENDENCIES ::
>>> [warn] ::
>>> [warn] :: org.apache.spark#spark-network-common_2.10;1.5.1:
>>> configuration not public in
>>> org.apache.spark#spark-network-common_2.10;1.5.1: 'test'. It was required
>>> from org.apache.spark#spark-network-shuffle_2.10;1.5.1 test
>>> [warn] ::
>>> [warn]
>>> [warn] Note: Unresolved dependencies path:
>>> [warn] org.apache.spark:spark-network-common_2.10:1.5.1
>>> ((com.typesafe.sbt.pom.MavenHelper) MavenHelper.scala#L76)
>>> [warn]  +- org.apache.spark:spark-network-shuffle_2.10:1.5.1
>>> [info] Packaging
>>> /Users/chester/projects/alpine/apache/spark/launcher/target/scala-2.10/spark-launcher_2.10-1.5.1.jar
>>> ...
>>> [info] Done packaging.
>>> [warn] four warnings found
>>> [warn] Note: Some input files use unchecked or unsafe operations.
>>> [warn] Note: Recompile with -Xlint:unchecked for details.
>>> [warn] No main class detected
>>> [info] Packaging
>>> /Users/chester/projects/alpine/apache/spark/external/flume-sink/target/scala-2.10/spark-streaming-flume-sink_2.10-1.5.1.jar
>>> ...
>>> [info] Done packaging.
>>> sbt.ResolveException: unresolved dependency:
>>> org.apache.spark#spark-network-common_2.10;1.5.1: configuration not public
>>> in org.apache.spark#spark-network-common_2.10;1.5.1: 'test'. It was
>>> required from org.apache.spark#spark-network-shuffle_2.10;1.5.1 test
>>>
>>>
>>> Somehow the network-shuffle can't find the test jar needed ( not sure
>>> why test still needed, even the  -DskipTests is already specified)
>>>
>>> tried the maven command, the build failed as well ( without assembly)
>>>
>>> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive
>>> -Phive-thriftserver -DskipTests clean package
>>>
>>> [ERROR] Failed to execute goal
>>> org.apache.maven.plugins:maven-enforcer-plugin:1.4:enforce
>>> (enforce-versions) on project spark-parent_2.10: Some Enforcer rules have
>>> failed. Look above for specific messages explaining why the rule failed. ->
>>> [Help 1]
>>> [ERROR]
>>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>>> -e switch.
>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>> [ERROR]
>>> [ERROR] For more information about the errors and possible solutions,
>>> please read the following articles:
>>> [ERROR] [Help 1]
>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>>
>>>
>>>
>>> I checkout the branch-1.5 and replaced "1.5.2-SNAPSHOT" with "1.5.1" and
>>> build/sbt will still fail ( same error as above for sbt)
>>>
>>> But if I keep the version string as "1.5.2-SNAPSHOT", the build/sbt
>>> works fine.
>>>
>>>
>>> Any ideas ?
>>>
>>> Chester
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Ted Yu
if [ "$SPARK_MASTER_IP" = "" ]; then
  SPARK_MASTER_IP=`hostname`
  --ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port
$SPARK_MASTER_WEBUI_PORT \
  "$sbin"/../tachyon/bin/tachyon bootstrap-conf $SPARK_MASTER_IP
./sbin/start-master.sh

if [ "$SPARK_MASTER_IP" = "" ]; then
  SPARK_MASTER_IP="`hostname`"
  "$sbin/slaves.sh" cd "$SPARK_HOME" \; "$sbin"/../tachyon/bin/tachyon
bootstrap-conf "$SPARK_MASTER_IP"
"$sbin/slaves.sh" cd "$SPARK_HOME" \; "$sbin/start-slave.sh"
"spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT"
./sbin/start-slaves.sh

On Fri, Oct 16, 2015 at 9:01 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> I'd look into tracing a possible bug here, but I'm not sure where to look.
> Searching the codebase for `SPARK_MASTER_IP`, amazingly, does not show it
> being used in any place directly by Spark
> <https://github.com/apache/spark/search?utf8=%E2%9C%93&q=SPARK_MASTER_IP>.
>
> Clearly, Spark is using this environment variable (otherwise I wouldn't
> see the behavior described in my first email), but I can't see where.
>
> Can someone give me a pointer?
>
> Nick
>
> On Thu, Oct 15, 2015 at 12:37 AM Ted Yu  wrote:
>
>> Some old bits:
>>
>>
>> http://stackoverflow.com/questions/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac
>> http://stackoverflow.com/questions/29412157/passing-hostname-to-netty
>>
>> FYI
>>
>> On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> I’m setting the Spark master address via the SPARK_MASTER_IP
>>> environment variable in spark-env.sh, like spark-ec2 does
>>> <https://github.com/amplab/spark-ec2/blob/a990752575cd8b0ab25731d7820a55c714798ec3/templates/root/spark/conf/spark-env.sh#L13>
>>> .
>>>
>>> The funny thing is that Spark seems to accept this only if the value of
>>> SPARK_MASTER_IP is a DNS name and not an IP address.
>>>
>>> When I provide an IP address, I get errors in the log when starting the
>>> master:
>>>
>>> 15/10/15 01:47:31 ERROR NettyTransport: failed to bind to 
>>> /54.210.XX.XX:7077, shutting down Netty transport
>>>
>>> (XX is my redaction of the full IP address.)
>>>
>>> Am I misunderstanding something about how to use this environment
>>> variable?
>>>
>>> The spark-env.sh template indicates that either an IP address or a
>>> hostname should work
>>> <https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/conf/spark-env.sh.template#L49>,
>>> but my testing shows that only hostnames work.
>>>
>>> Nick
>>> ​
>>>
>>
>>


Re: Building Spark

2015-10-15 Thread Ted Yu
bq. Access is denied

Please check permission of the path mentioned.

On Thu, Oct 15, 2015 at 3:45 PM, Annabel Melongo <
melongo_anna...@yahoo.com.invalid> wrote:

> I was trying to build a cloned version of Spark on my local machine using
> the command:
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
> package
> However I got the error:
>[ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-shade-plugin:2.4.1
> :shade (default) on project spark-network-common_2.10: Error creating
> shaded jar :
> C:\Users\Annabel\git\spark\network\common\dependency-reduced-pom.xml
> (Access i
> s denied) -> [Help 1]
>
> Any idea, I'm running a 64-bit Windows 8 machine
>
> Thanks
>
>


Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-14 Thread Ted Yu
Some old bits:

http://stackoverflow.com/questions/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac
http://stackoverflow.com/questions/29412157/passing-hostname-to-netty

FYI

On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> I’m setting the Spark master address via the SPARK_MASTER_IP environment
> variable in spark-env.sh, like spark-ec2 does
> 
> .
>
> The funny thing is that Spark seems to accept this only if the value of
> SPARK_MASTER_IP is a DNS name and not an IP address.
>
> When I provide an IP address, I get errors in the log when starting the
> master:
>
> 15/10/15 01:47:31 ERROR NettyTransport: failed to bind to /54.210.XX.XX:7077, 
> shutting down Netty transport
>
> (XX is my redaction of the full IP address.)
>
> Am I misunderstanding something about how to use this environment variable?
>
> The spark-env.sh template indicates that either an IP address or a
> hostname should work
> ,
> but my testing shows that only hostnames work.
>
> Nick
> ​
>


Re: Getting started

2015-10-13 Thread Ted Yu
Please see
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

On Tue, Oct 13, 2015 at 5:49 AM, _abhishek 
wrote:

> Hello
> I am interested in contributing to apache spark.I am new to open source.Can
> someone please help me with how to get started,beginner level bugs etc.
> Thanks
> Abhishek Kumar
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Getting-started-tp14588.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
Josh:
We're on the same page.

I used the term 're-submit your PR' which was different from opening new PR.

On Mon, Oct 12, 2015 at 2:47 PM, Personal  wrote:

> Just ask Jenkins to retest; no need to open a new PR just to re-trigger
> the build.
>
>
> On October 12, 2015 at 2:45:13 PM, Ted Yu (yuzhih...@gmail.com) wrote:
>
> Can you re-submit your PR to trigger a new build - assuming the tests are
> flaky ?
>
> If any test fails again, consider contacting the owner of the module for
> expert opinion.
>
> Cheers
>
> On Mon, Oct 12, 2015 at 2:07 PM, Meihua Wu 
> wrote:
>
>> Hi Ted,
>>
>> Thanks for the info. I have checked but I did not find the failures
>> though.
>>
>> In my cases, I have seen
>>
>> 1) spilling in ExternalAppendOnlyMapSuite failed due to timeout.
>> [
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43531/console
>> ]
>>
>> 2) pySpark failure
>> [
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43553/console
>> ]
>>
>> Traceback (most recent call last):
>>   File
>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>> line 316, in _get_connection
>> IndexError: pop from an empty deque
>>
>>
>>
>> On Mon, Oct 12, 2015 at 1:36 PM, Ted Yu  wrote:
>> > You can go to:
>> > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN
>> >
>> > and see if the test failure(s) you encountered appeared there.
>> >
>> > FYI
>> >
>> > On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu <
>> rotationsymmetr...@gmail.com>
>> > wrote:
>> >>
>> >> Hi Spark Devs,
>> >>
>> >> I recently encountered several cases that the Jenkin failed tests that
>> >> are supposed to be unrelated to my patch. For example, I made a patch
>> >> to Spark ML Scala API but some Scala RDD tests failed due to timeout,
>> >> or the java_gateway in PySpark fails. Just wondering if these are
>> >> isolated cases?
>> >>
>> >> Thanks,
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >
>>
>
>


Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
Can you re-submit your PR to trigger a new build - assuming the tests are
flaky ?

If any test fails again, consider contacting the owner of the module for
expert opinion.

Cheers

On Mon, Oct 12, 2015 at 2:07 PM, Meihua Wu 
wrote:

> Hi Ted,
>
> Thanks for the info. I have checked but I did not find the failures though.
>
> In my cases, I have seen
>
> 1) spilling in ExternalAppendOnlyMapSuite failed due to timeout.
> [
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43531/console
> ]
>
> 2) pySpark failure
> [
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43553/console
> ]
>
> Traceback (most recent call last):
>   File
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 316, in _get_connection
> IndexError: pop from an empty deque
>
>
>
> On Mon, Oct 12, 2015 at 1:36 PM, Ted Yu  wrote:
> > You can go to:
> > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN
> >
> > and see if the test failure(s) you encountered appeared there.
> >
> > FYI
> >
> > On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu  >
> > wrote:
> >>
> >> Hi Spark Devs,
> >>
> >> I recently encountered several cases that the Jenkin failed tests that
> >> are supposed to be unrelated to my patch. For example, I made a patch
> >> to Spark ML Scala API but some Scala RDD tests failed due to timeout,
> >> or the java_gateway in PySpark fails. Just wondering if these are
> >> isolated cases?
> >>
> >> Thanks,
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
>


Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
You can go to:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN

and see if the test failure(s) you encountered appeared there.

FYI

On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu 
wrote:

> Hi Spark Devs,
>
> I recently encountered several cases that the Jenkin failed tests that
> are supposed to be unrelated to my patch. For example, I made a patch
> to Spark ML Scala API but some Scala RDD tests failed due to timeout,
> or the java_gateway in PySpark fails. Just wondering if these are
> isolated cases?
>
> Thanks,
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: taking the heap dump when an executor goes OOM

2015-10-12 Thread Ted Yu
http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss

> On Oct 11, 2015, at 10:45 PM, Niranda Perera  wrote:
> 
> Hi all, 
> 
> is there a way for me to get the heap-dump hprof of an executor jvm, when it 
> goes out of memory? 
> 
> is this currently supported or do I have to change some configurations? 
> 
> cheers 
> 
> -- 
> Niranda 
> @n1r44
> +94-71-554-8430
> https://pythagoreanscript.wordpress.com/


Re: Compiling Spark with a local hadoop profile

2015-10-08 Thread Ted Yu
In root pom.xml :
2.2.0

You can override the version of hadoop with command similar to:
-Phadoop-2.4 -Dhadoop.version=2.7.0

Cheers

On Thu, Oct 8, 2015 at 11:22 AM, sbiookag  wrote:

> I'm modifying hdfs module inside hadoop, and would like the see the
> reflection while i'm running spark on top of it, but I still see the native
> hadoop behaviour. I've checked and saw Spark is building a really fat jar
> file, which contains all hadoop classes (using hadoop profile defined in
> maven), and deploy it over all workers. I also tried bigtop-dist, to
> exclude
> hadoop classes but see no effect.
>
> Is it possible to do such a thing easily, for example by small
> modifications
> inside the maven file?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
I tried building with Scala 2.11 on Linux with latest master branch :

[INFO] Spark Project External MQTT  SUCCESS [
19.188 s]
[INFO] Spark Project External MQTT Assembly ... SUCCESS [
 7.081 s]
[INFO] Spark Project External ZeroMQ .. SUCCESS [
 8.790 s]
[INFO] Spark Project External Kafka ... SUCCESS [
14.764 s]
[INFO] Spark Project Examples . SUCCESS [02:22
min]
[INFO] Spark Project External Kafka Assembly .. SUCCESS [
10.286 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 17:49 min

FYI

On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu  wrote:

> Interesting
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/
> shows green builds.
>
>
> On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș 
> wrote:
>
>> Since Oct. 4 the build fails on 2.11 with the dreaded
>>
>> [error] /home/ubuntu/workspace/Apache Spark (master) on 
>> 2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310: 
>> no valid targets for annotation on value conf - it is discarded unused. You 
>> may specify targets with meta-annotations, e.g. @(transient @param)
>> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
>>
>> Can we have the pull request builder at least build with 2.11? This makes
>> #8433 <https://github.com/apache/spark/pull/8433> pretty much useless,
>> since people will continue to add useless @transient annotations.
>> ​
>> --
>>
>> --
>> Iulian Dragos
>>
>> --
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>>
>


Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
Interesting

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/
shows green builds.


On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș 
wrote:

> Since Oct. 4 the build fails on 2.11 with the dreaded
>
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310: no 
> valid targets for annotation on value conf - it is discarded unused. You may 
> specify targets with meta-annotations, e.g. @(transient @param)
> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
>
> Can we have the pull request builder at least build with 2.11? This makes
> #8433  pretty much useless,
> since people will continue to add useless @transient annotations.
> ​
> --
>
> --
> Iulian Dragos
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>


Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Ted Yu
As a workaround, can you set the number of partitions higher in the
sc.textFile method ?

Cheers

On Mon, Oct 5, 2015 at 3:31 PM, Jegan  wrote:

> Hi All,
>
> I am facing the below exception when the size of the file being read in a
> partition is above 2GB. This is apparently because Java's limitation on
> memory mapped files. It supports mapping only 2GB files.
>
> Caused by: java.lang.IllegalArgumentException: Size exceeds
> Integer.MAX_VALUE
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)
> at
> org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)
> at
> org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1207)
> at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)
> at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)
> at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:102)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:791)
> at
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
> at
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:153)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> My use case is to read the files from S3 and do some processing. I am
> caching the data like below in order to avoid SocketTimeoutExceptions from
> another library I am using for the processing.
>
> val rdd1 = sc.textFile("***").coalesce(1000)
> rdd1.persist(DISK_ONLY_2) // replication factor 2
> rdd1.foreachPartition { iter => } // one pass over the data to download
>
> The 3rd line fails with the above error when a partition contains a file
> of size more than 2GB file.
>
> Do you think this needs to be fixed in Spark? One idea may be is to use a
> wrapper class (something called BigByteBuffer) which keeps an array of
> ByteBuffers and keeps the index of the current buffer being read etc. Below
> is the modified DiskStore.scala.
>
> private def getBytes(file: File, offset: Long, length: Long): 
> Option[ByteBuffer] = {
>   val channel = new RandomAccessFile(file, "r").getChannel
>   Utils.tryWithSafeFinally {
> // For small files, directly read rather than memory map
> if (length < minMemoryMapBytes) {
>   // Map small file in Memory
> } else {
>   // TODO Create a BigByteBuffer
>
> }
>   } {
> channel.close()
>   }
> }
>
> class BigByteBuffer extends ByteBuffer {
>   val buffers: Array[ByteBuffer]
>   var currentIndex = 0
>
>   ... // Other methods
> }
>
> Please let me know if there is any other work-around for the same. Thanks for 
> your time.
>
> Regards,
> Jegan
>


Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-04 Thread Ted Yu
hadoop1 package for Scala 2.10 wasn't in RC1 either:
http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/

On Sun, Oct 4, 2015 at 5:17 PM, Nicholas Chammas  wrote:

> I’m looking here:
>
> https://s3.amazonaws.com/spark-related-packages/
>
> I believe this is where one set of official packages is published. Please
> correct me if this is not the case.
>
> It appears that almost every version of Spark up to and including 1.5.0
> has included a --bin-hadoop1.tgz release (e.g. spark-1.5.0-bin-hadoop1.tgz
> ).
>
> However, 1.5.1 has no such package. There is a
> spark-1.5.1-bin-hadoop1-scala2.11.tgz package, but this is a separate
> thing. (1.5.0 also has a hadoop1-scala2.11 package.)
>
> Was this intentional?
>
> More importantly, is there some rough specification for what packages we
> should be able to expect in this S3 bucket with every release?
>
> This is important for those of us who depend on this publishing venue
> (e.g. spark-ec2 and related tools).
>
> Nick
> ​
>


Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
Andy:
1.5.1 has many critical bug fixes on top of 1.5.0

http://search-hadoop.com/m/q3RTtGrXP31BVt4l1

Please consider using 1.5.1

Cheers

On Fri, Oct 2, 2015 at 11:19 AM, andy petrella 
wrote:

> it's an option but not a solution, indeed
>
> Le ven. 2 oct. 2015 20:08, Ted Yu  a écrit :
>
>> Andy:
>> 1.5.1 has been released.
>>
>> Maybe you can use this:
>>
>> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_2.10-1.5.1.pom
>>
>> I can access the above.
>>
>> On Fri, Oct 2, 2015 at 11:06 AM, Marcelo Vanzin 
>> wrote:
>>
>>> Hmm, now I get that too (did not get it before). Maybe the servers are
>>> having issues.
>>>
>>> On Fri, Oct 2, 2015 at 11:05 AM, Ted Yu  wrote:
>>> > I tried to access
>>> >
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
>>> > on Chrome and Firefox (on Mac)
>>> > I got 404
>>> >
>>> > FYI
>>> >
>>> > On Fri, Oct 2, 2015 at 10:49 AM, andy petrella <
>>> andy.petre...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Yup folks,
>>> >>
>>> >> I've been reported by someone building the Spark-Notebook that repo1
>>> is
>>> >> apparently broken for scala 2.10 and spark 1.5.0.
>>> >>
>>> >> Check this
>>> >>
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
>>> >>
>>> >> The URL is correct since
>>> >>
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/
>>> >> is ok...
>>> >>
>>> >> scala 2.11 is fine btw
>>> >>
>>> >>
>>> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.11/1.5.0/spark-streaming_2.11-1.5.0.pom
>>> >>
>>> >> Any idea?
>>> >>
>>> >> ps: this happens for streaming too at least
>>> >>
>>> >> --
>>> >> andy
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>> --
> andy
>


Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
Andy:
1.5.1 has been released.

Maybe you can use this:
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_2.10-1.5.1.pom

I can access the above.

On Fri, Oct 2, 2015 at 11:06 AM, Marcelo Vanzin  wrote:

> Hmm, now I get that too (did not get it before). Maybe the servers are
> having issues.
>
> On Fri, Oct 2, 2015 at 11:05 AM, Ted Yu  wrote:
> > I tried to access
> >
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
> > on Chrome and Firefox (on Mac)
> > I got 404
> >
> > FYI
> >
> > On Fri, Oct 2, 2015 at 10:49 AM, andy petrella 
> > wrote:
> >>
> >> Yup folks,
> >>
> >> I've been reported by someone building the Spark-Notebook that repo1 is
> >> apparently broken for scala 2.10 and spark 1.5.0.
> >>
> >> Check this
> >>
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
> >>
> >> The URL is correct since
> >>
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/
> >> is ok...
> >>
> >> scala 2.11 is fine btw
> >>
> >>
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.11/1.5.0/spark-streaming_2.11-1.5.0.pom
> >>
> >> Any idea?
> >>
> >> ps: this happens for streaming too at least
> >>
> >> --
> >> andy
> >
> >
>
>
>
> --
> Marcelo
>


Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
I tried to access
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
on
Chrome and Firefox (on Mac)
I got 404

FYI

On Fri, Oct 2, 2015 at 10:49 AM, andy petrella 
wrote:

> Yup folks,
>
> I've been reported by someone building the Spark-Notebook that repo1 is
> apparently broken for scala 2.10 and spark 1.5.0.
>
> Check this
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom
>
> The URL is correct since
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/
> is ok...
>
> scala 2.11 is fine btw
>
> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.11/1.5.0/spark-streaming_2.11-1.5.0.pom
> 
>
> Any idea?
>
> ps: this happens for streaming too at least
>
> --
> andy
>


Re: failed to run spark sample on windows

2015-09-28 Thread Ted Yu
What version of hadoop are you using ?

Is that version consistent with the one which was used to build Spark 1.4.0
?

Cheers

On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong  wrote:

> I tried to run HdfsTest sample on windows spark-1.4.0
>
> bin\run-sample org.apache.spark.examples.HdfsTest 
>
> but got below exception, any body any idea what was wrong here?
>
> 15/09/28 16:33:56.565 ERROR SparkContext: Error initializing SparkContext.
> java.lang.NullPointerException
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
> at
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
> at
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:130)
> at org.apache.spark.SparkContext.(SparkContext.scala:515)
> at org.apache.spark.examples.HdfsTest$.main(HdfsTest.scala:32)
> at org.apache.spark.examples.HdfsTest.main(HdfsTest.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
> at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>


Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I cloned Hive 1.2 code base and saw:

10.10.2.0

So the version used by Spark is quite close to what Hive uses.

On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu  wrote:

> I see.
> I use maven to build so I observe different contents under lib_managed
> directory.
>
> Here is snippet of dependency tree:
>
> [INFO] |  +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark:compile
> [INFO] |  |  +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile
> [INFO] |  |  +- org.apache.derby:derby:jar:10.10.1.1:compile
>
> On Tue, Sep 22, 2015 at 3:21 PM, Richard Hillegas 
> wrote:
>
>> Thanks, Ted. I'm working on my master branch. The lib_managed/jars
>> directory has a lot of jarballs, including hadoop and hive. Maybe these
>> were faulted in when I built with the following command?
>>
>>   sbt/sbt -Phive assembly/assembly
>>
>> The Derby jars seem to be used in order to manage the metastore_db
>> database. Maybe my question should be directed to the Hive community?
>>
>> Thanks,
>> -Rick
>>
>> Here are the gory details:
>>
>> bash-3.2$ ls lib_managed/jars
>> FastInfoset-1.2.12.jar curator-test-2.4.0.jar
>> jersey-test-framework-grizzly2-1.9.jar parquet-format-2.3.0-incubating.jar
>> JavaEWAH-0.3.2.jar datanucleus-api-jdo-3.2.6.jar jets3t-0.7.1.jar
>> parquet-generator-1.7.0.jar
>> ST4-4.0.4.jar datanucleus-core-3.2.10.jar
>> jetty-continuation-8.1.14.v20131031.jar parquet-hadoop-1.7.0.jar
>> activation-1.1.jar datanucleus-rdbms-3.2.9.jar
>> jetty-http-8.1.14.v20131031.jar parquet-hadoop-bundle-1.6.0.jar
>> akka-actor_2.10-2.3.11.jar derby-10.10.1.1.jar
>> jetty-io-8.1.14.v20131031.jar parquet-jackson-1.7.0.jar
>> akka-remote_2.10-2.3.11.jar derby-10.10.2.0.jar
>> jetty-jndi-8.1.14.v20131031.jar platform-3.4.0.jar
>> akka-slf4j_2.10-2.3.11.jar genjavadoc-plugin_2.10.4-0.9-spark0.jar
>> jetty-plus-8.1.14.v20131031.jar pmml-agent-1.1.15.jar
>> akka-testkit_2.10-2.3.11.jar groovy-all-2.1.6.jar
>> jetty-security-8.1.14.v20131031.jar pmml-model-1.1.15.jar
>> antlr-2.7.7.jar guava-11.0.2.jar jetty-server-8.1.14.v20131031.jar
>> pmml-schema-1.1.15.jar
>> antlr-runtime-3.4.jar guice-3.0.jar jetty-servlet-8.1.14.v20131031.jar
>> postgresql-9.3-1102-jdbc41.jar
>> aopalliance-1.0.jar h2-1.4.183.jar jetty-util-6.1.26.jar py4j-0.8.2.1.jar
>> arpack_combined_all-0.1-javadoc.jar hadoop-annotations-2.2.0.jar
>> jetty-util-8.1.14.v20131031.jar pyrolite-4.4.jar
>> arpack_combined_all-0.1.jar hadoop-auth-2.2.0.jar
>> jetty-webapp-8.1.14.v20131031.jar quasiquotes_2.10-2.0.0.jar
>> asm-3.2.jar hadoop-client-2.2.0.jar jetty-websocket-8.1.14.v20131031.jar
>> reflectasm-1.07-shaded.jar
>> avro-1.7.4.jar hadoop-common-2.2.0.jar jetty-xml-8.1.14.v20131031.jar
>> sac-1.3.jar
>> avro-1.7.7.jar hadoop-hdfs-2.2.0.jar jline-0.9.94.jar
>> scala-compiler-2.10.0.jar
>> avro-ipc-1.7.7-tests.jar hadoop-mapreduce-client-app-2.2.0.jar
>> jline-2.10.4.jar scala-compiler-2.10.4.jar
>> avro-ipc-1.7.7.jar hadoop-mapreduce-client-common-2.2.0.jar
>> jline-2.12.jar scala-library-2.10.4.jar
>> avro-mapred-1.7.7-hadoop2.jar hadoop-mapreduce-client-core-2.2.0.jar
>> jna-3.4.0.jar scala-reflect-2.10.4.jar
>> breeze-macros_2.10-0.11.2.jar hadoop-mapreduce-client-jobclient-2.2.0.jar
>> joda-time-2.5.jar scalacheck_2.10-1.11.3.jar
>> breeze_2.10-0.11.2.jar hadoop-mapreduce-client-shuffle-2.2.0.jar
>> jodd-core-3.5.2.jar scalap-2.10.0.jar
>> calcite-avatica-1.2.0-incubating.jar hadoop-yarn-api-2.2.0.jar
>> json-20080701.jar selenium-api-2.42.2.jar
>> calcite-core-1.2.0-incubating.jar hadoop-yarn-client-2.2.0.jar
>> json-20090211.jar selenium-chrome-driver-2.42.2.jar
>> calcite-linq4j-1.2.0-incubating.jar hadoop-yarn-common-2.2.0.jar
>> json4s-ast_2.10-3.2.10.jar selenium-firefox-driver-2.42.2.jar
>> cglib-2.2.1-v20090111.jar hadoop-yarn-server-common-2.2.0.jar
>> json4s-core_2.10-3.2.10.jar selenium-htmlunit-driver-2.42.2.jar
>> cglib-nodep-2.1_3.jar hadoop-yarn-server-nodemanager-2.2.0.jar
>> json4s-jackson_2.10-3.2.10.jar selenium-ie-driver-2.42.2.jar
>> chill-java-0.5.0.jar hamcrest-core-1.1.jar jsr173_api-1.0.jar
>> selenium-java-2.42.2.jar
>> chill_2.10-0.5.0.jar hamcrest-core-1.3.jar jsr305-1.3.9.jar
>> selenium-remote-driver-2.42.2.jar
>> commons-beanutils-1.7.0.jar hamcrest-library-1.3.jar jsr305-2.0.1.jar
>> selenium-safari-driver-2.42.2.jar
>> commons-beanutils-core-1.8.0.jar hive-exec-1.2.1.spark.jar jta-1.1.jar
>> selenium-support-2.42.2.jar
>> commons-cli-1.2.jar hive-metastore-1.2.1.spark.jar jtransforms-2.4.0.jar
>> serializer-2.7.1.jar
>> commons-codec-1.1

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
nterface-0.9.jar stax-api-1.0.1.jar
> commons-compress-1.4.1.jar ivy-2.4.0.jar libfb303-0.9.2.jar
> stream-2.7.0.jar
> commons-configuration-1.6.jar jackson-core-asl-1.8.8.jar
> libthrift-0.9.2.jar stringtemplate-3.2.1.jar
> commons-dbcp-1.4.jar jackson-core-asl-1.9.13.jar lz4-1.3.0.jar
> tachyon-client-0.7.1.jar
> commons-digester-1.8.jar jackson-jaxrs-1.8.8.jar
> mesos-0.21.1-shaded-protobuf.jar tachyon-underfs-hdfs-0.7.1.jar
> commons-exec-1.1.jar jackson-mapper-asl-1.9.13.jar minlog-1.2.jar
> tachyon-underfs-local-0.7.1.jar
> commons-httpclient-3.1.jar jackson-xc-1.8.8.jar mockito-core-1.9.5.jar
> test-interface-0.5.jar
> commons-io-2.1.jar janino-2.7.8.jar mysql-connector-java-5.1.34.jar
> test-interface-1.0.jar
> commons-io-2.4.jar jansi-1.4.jar nekohtml-1.9.20.jar
> uncommons-maths-1.2.2a.jar
> commons-lang-2.5.jar javassist-3.15.0-GA.jar netty-all-4.0.29.Final.jar
> unused-1.0.0.jar
> commons-lang-2.6.jar javax.inject-1.jar objenesis-1.0.jar webbit-0.4.14.jar
> commons-lang3-3.3.2.jar jaxb-api-2.2.2.jar objenesis-1.2.jar
> xalan-2.7.1.jar
> commons-logging-1.1.3.jar jaxb-api-2.2.7.jar opencsv-2.3.jar
> xercesImpl-2.11.0.jar
> commons-math-2.1.jar jaxb-core-2.2.7.jar oro-2.0.8.jar xml-apis-1.4.01.jar
> commons-math-2.2.jar jaxb-impl-2.2.3-1.jar paranamer-2.3.jar
> xmlenc-0.52.jar
> commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar paranamer-2.6.jar xz-1.0.jar
> commons-net-3.1.jar jblas-1.2.4.jar parquet-avro-1.7.0.jar
> zookeeper-3.4.5.jar
> commons-pool-1.5.4.jar jcl-over-slf4j-1.7.10.jar parquet-column-1.7.0.jar
> core-1.1.2.jar jdo-api-3.0.1.jar parquet-common-1.7.0.jar
> cssparser-0.9.13.jar jersey-guice-1.9.jar parquet-encoding-1.7.0.jar
>
> Ted Yu  wrote on 09/22/2015 01:32:39 PM:
>
> > From: Ted Yu 
> > To: Richard Hillegas/San Francisco/IBM@IBMUS
> > Cc: Dev 
> > Date: 09/22/2015 01:33 PM
> > Subject: Re: Derby version in Spark
>
> >
> > Which Spark release are you building ?
> >
> > For master branch, I get the following:
> >
> > lib_managed/jars/datanucleus-api-jdo-3.2.6.jar  lib_managed/jars/
> > datanucleus-core-3.2.10.jar  lib_managed/jars/datanucleus-rdbms-3.2.9.jar
> >
> > FYI
> >
> > On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas 
> wrote:
> > I see that lib_managed/jars holds these old Derby versions:
> >
> >   lib_managed/jars/derby-10.10.1.1.jar
> >   lib_managed/jars/derby-10.10.2.0.jar
> >
> > The Derby 10.10 release family supports some ancient JVMs: Java SE 5
> > and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone
> > running Spark on the resource-constrained Java ME platform. Is Spark
> > really deployed on Java SE 5? Is there some other reason that Spark
> > uses the 10.10 Derby family?
> >
> > If no-one needs those ancient JVMs, maybe we could consider changing
> > the Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1
> > release (both run on Java 6 and up).
> >
> > Thanks,
> > -Rick
>
>


Re: Derby version in Spark

2015-09-22 Thread Ted Yu
Which Spark release are you building ?

For master branch, I get the following:

lib_managed/jars/datanucleus-api-jdo-3.2.6.jar
 lib_managed/jars/datanucleus-core-3.2.10.jar
 lib_managed/jars/datanucleus-rdbms-3.2.9.jar

FYI

On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas 
wrote:

> I see that lib_managed/jars holds these old Derby versions:
>
>   lib_managed/jars/derby-10.10.1.1.jar
>   lib_managed/jars/derby-10.10.2.0.jar
>
> The Derby 10.10 release family supports some ancient JVMs: Java SE 5 and
> Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone running
> Spark on the resource-constrained Java ME platform. Is Spark really
> deployed on Java SE 5? Is there some other reason that Spark uses the 10.10
> Derby family?
>
> If no-one needs those ancient JVMs, maybe we could consider changing the
> Derby version to 10.11.1.1 or even to the upcoming 10.12.1.1 release (both
> run on Java 6 and up).
>
> Thanks,
> -Rick
>


Re: How to modify Hadoop APIs used by Spark?

2015-09-21 Thread Ted Yu
Can you clarify what you want to do:
If you modify existing hadoop InputFormat, etc, it would be a matter of
rebuilding hadoop and build Spark using the custom built hadoop as
dependency.

Do you introduce new InputFormat ?

Cheers

On Mon, Sep 21, 2015 at 1:20 PM, Dogtail Ray  wrote:

> Hi all,
>
> I find that Spark uses some Hadoop APIs such as InputFormat, InputSplit,
> etc., and I want to modify these Hadoop APIs. Do you know how can I
> integrate my modified Hadoop code into Spark? Great thanks!
>
>


Re: passing SparkContext as parameter

2015-09-21 Thread Ted Yu
You can use broadcast variable for passing connection information. 

Cheers

> On Sep 21, 2015, at 4:27 AM, Priya Ch  wrote:
> 
> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records in 
> rdd. Hence I need sparkContext to read from DB (cassandra in our case),
> 
> If sparkContext couldn't be sent to executors , what is the workaround for 
> this ??
> 
>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak  wrote:
>> add @transient?
>> 
>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch  
>>> wrote:
>>> Hello All,
>>> 
>>> How can i pass sparkContext as a parameter to a method in an object. 
>>> Because passing sparkContext is giving me TaskNotSerializable Exception.
>>> 
>>> How can i achieve this ?
>>> 
>>> Thanks,
>>> Padma Ch
> 


Re: Using scala-2.11 when making changes to spark source

2015-09-20 Thread Ted Yu
Maybe the following can be used for changing Scala version:
http://maven.apache.org/archetype/maven-archetype-plugin/

I played with it a little bit but didn't get far.

FYI

On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch  wrote:

>
> The dev/change-scala-version.sh [2.11]  script modifies in-place  the
> pom.xml files across all of the modules.  This is a git-visible change.  So
> if we wish to make changes to spark source in our own fork's - while
> developing with scala 2.11 - we would end up conflating those updates with
> our own.
>
> A possible scenario would be to update .gitignore - by adding pom.xml.
> However I can not get that to work: .gitignore is tricky.
>
> Suggestions appreciated.
>


Re: SparkR installation not working

2015-09-19 Thread Ted Yu
Looks like you didn't specify sparkr profile when building.

Cheers

On Sat, Sep 19, 2015 at 12:30 PM, Devl Devel 
wrote:

> Hi All,
>
> I've built spark 1.5.0 with hadoop 2.6 with a fresh download :
>
> build/mvn  -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
>
> I try to run SparkR it launches the normal R without the spark addons:
>
> ./bin/sparkR --master local[*]
> Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
>
> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
> Copyright (C) 2014 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
> >
>
> With no "Welcome to SparkR"
>
> also
>
> > sc <- sparkR.init()
> Error: could not find function "sparkR.init"
> > sqlContext <- sparkRSQL.init(sc)
> Error: could not find function "sparkRSQL.init"
> >
>
> Spark-shell and other components are fine. Using scala 2.10.6 and Java
> 1.8_45, Ubuntu 15.0.4. Please can anyone give me any pointers? Is there a
> spark maven profile I need to enable?
>
> Thanks
> Devl
>


Re: (send this email to subscribe)

2015-09-13 Thread Ted Yu
See first section of http://spark.apache.org/community.html

Cheers



> On Sep 13, 2015, at 6:43 PM, 蒋林  wrote:
> 
> Hi,I need subscribe email list,please send me,thank you
> 
> 
>  


Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
Can you take a look at SPARK-5278 where ambiguity is shown between field
names which differ only by case ?

Cheers

On Sat, Sep 12, 2015 at 3:40 AM, Fengdong Yu 
wrote:

> Hi Ted,
> I checked the JSON, there aren't duplicated key in JSON.
>
>
> Azuryy Yu
> Sr. Infrastructure Engineer
>
> cel: 158-0164-9103
> wetchat: azuryy
>
>
> On Sat, Sep 12, 2015 at 5:52 PM, Ted Yu  wrote:
>
>> Is it possible that Canonical_URL occurs more than once in your json ?
>>
>> Can you check your json input ?
>>
>> Thanks
>>
>> On Sat, Sep 12, 2015 at 2:05 AM, Fengdong Yu 
>> wrote:
>>
>>> Hi,
>>>
>>> I am using spark1.4.1 data frame, read JSON data, then save it to orc.
>>> the code is very simple:
>>>
>>> DataFrame json = sqlContext.read().json(input);
>>>
>>> json.write().format("orc").save(output);
>>>
>>> the job failed. what's wrong with this exception? Thanks.
>>>
>>> Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>> Reference 'Canonical_URL' is ambiguous, could be: Canonical_URL#960,
>>> Canonical_URL#1010.; at
>>> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:279)
>>> at
>>> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:116)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:350)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:341)
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
>>> at
>>> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>>> at
>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
>>> at 
>>> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108)
>>> at
>>> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at scala.collection.immutable.List.foreach(List.scala:318) at
>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
>>> scala.collection.AbstractTraversable.map(Traversable.scala:105) at
>>> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:122)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at
>>> scala.collection.Iterator$class.foreach(Iterator.scala:727) at
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) at
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at
>>> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:127)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8.applyOrElse(Analyzer.scala:341)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8.applyOrElse(An

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
Is it possible that Canonical_URL occurs more than once in your json ?

Can you check your json input ?

Thanks

On Sat, Sep 12, 2015 at 2:05 AM, Fengdong Yu 
wrote:

> Hi,
>
> I am using spark1.4.1 data frame, read JSON data, then save it to orc. the
> code is very simple:
>
> DataFrame json = sqlContext.read().json(input);
>
> json.write().format("orc").save(output);
>
> the job failed. what's wrong with this exception? Thanks.
>
> Exception in thread "main" org.apache.spark.sql.AnalysisException:
> Reference 'Canonical_URL' is ambiguous, could be: Canonical_URL#960,
> Canonical_URL#1010.; at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:279)
> at
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:116)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$16.apply(Analyzer.scala:350)
> at
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:350)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:341)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
> at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108)
> at
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318) at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
> scala.collection.AbstractTraversable.map(Traversable.scala:105) at
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:122)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at
> scala.collection.Iterator$class.foreach(Iterator.scala:727) at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:127)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8.applyOrElse(Analyzer.scala:341)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8.applyOrElse(Analyzer.scala:243)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
> at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:243)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:242)
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:61)
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:59)
> at
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
> at scala.collection.immutable.List.foldLeft(List.scala:84) at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:59)
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:51)
> at scala.collection.immutable.List.foreach(List.scal

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-11 Thread Ted Yu
This is related:
https://issues.apache.org/jira/browse/SPARK-10557

On Fri, Sep 11, 2015 at 10:21 AM, Ryan Williams <
ryan.blake.willi...@gmail.com> wrote:

> Any idea why 1.5.0 is not in Maven central yet
> <http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22>?
> Is that a separate release process?
>
>
> On Wed, Sep 9, 2015 at 12:40 PM andy petrella 
> wrote:
>
>> You can try it out really quickly by "building" a Spark Notebook from
>> http://spark-notebook.io/.
>>
>> Just choose the master branch and 1.5.0, a correct hadoop version
>> (default to 2.2.0 though) and there you go :-)
>>
>>
>> On Wed, Sep 9, 2015 at 6:39 PM Ted Yu  wrote:
>>
>>> Jerry:
>>> I just tried building hbase-spark module with 1.5.0 and I see:
>>>
>>> ls -l ~/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0
>>> total 21712
>>> -rw-r--r--  1 tyu  staff   196 Sep  9 09:37 _maven.repositories
>>> -rw-r--r--  1 tyu  staff  11081542 Sep  9 09:37 spark-core_2.10-1.5.0.jar
>>> -rw-r--r--  1 tyu  staff41 Sep  9 09:37
>>> spark-core_2.10-1.5.0.jar.sha1
>>> -rw-r--r--  1 tyu  staff 19816 Sep  9 09:37 spark-core_2.10-1.5.0.pom
>>> -rw-r--r--  1 tyu  staff41 Sep  9 09:37
>>> spark-core_2.10-1.5.0.pom.sha1
>>>
>>> FYI
>>>
>>> On Wed, Sep 9, 2015 at 9:35 AM, Jerry Lam  wrote:
>>>
>>>> Hi Spark Developers,
>>>>
>>>> I'm eager to try it out! However, I got problems in resolving
>>>> dependencies:
>>>> [warn] [NOT FOUND  ]
>>>> org.apache.spark#spark-core_2.10;1.5.0!spark-core_2.10.jar (0ms)
>>>> [warn]  jcenter: tried
>>>>
>>>> When the package will be available?
>>>>
>>>> Best Regards,
>>>>
>>>> Jerry
>>>>
>>>>
>>>> On Wed, Sep 9, 2015 at 9:30 AM, Dimitris Kouzis - Loukas <
>>>> look...@gmail.com> wrote:
>>>>
>>>>> Yeii!
>>>>>
>>>>> On Wed, Sep 9, 2015 at 2:25 PM, Yu Ishikawa <
>>>>> yuu.ishikawa+sp...@gmail.com> wrote:
>>>>>
>>>>>> Great work, everyone!
>>>>>>
>>>>>>
>>>>>>
>>>>>> -
>>>>>> -- Yu Ishikawa
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Spark-1-5-0-tp14013p14015.html
>>>>>> Sent from the Apache Spark Developers List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> -
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>> --
>> andy
>>
>


Re: Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

2015-09-09 Thread Ted Yu
Here is the example from Reynold (
http://search-hadoop.com/m/q3RTtfvs1P1YDK8d) :

scala> val data = sc.parallelize(1 to size, 5).map(x =>
(util.Random.nextInt(size /
repetitions),util.Random.nextDouble)).toDF("key", "value")
data: org.apache.spark.sql.DataFrame = [key: int, value: double]

scala> data.explain
== Physical Plan ==
TungstenProject [_1#0 AS key#2,_2#1 AS value#3]
 Scan PhysicalRDD[_1#0,_2#1]

...
scala> val res = df.groupBy("key").agg(sum("value"))
res: org.apache.spark.sql.DataFrame = [key: int, sum(value): double]

scala> res.explain
15/09/09 14:17:26 INFO MemoryStore: ensureFreeSpace(88456) called with
curMem=84037, maxMem=556038881
15/09/09 14:17:26 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 86.4 KB, free 530.1 MB)
15/09/09 14:17:26 INFO MemoryStore: ensureFreeSpace(19788) called with
curMem=172493, maxMem=556038881
15/09/09 14:17:26 INFO MemoryStore: Block broadcast_2_piece0 stored as
bytes in memory (estimated size 19.3 KB, free 530.1 MB)
15/09/09 14:17:26 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on localhost:42098 (size: 19.3 KB, free: 530.2 MB)
15/09/09 14:17:26 INFO SparkContext: Created broadcast 2 from explain at
:27
== Physical Plan ==
TungstenAggregate(key=[key#19],
functions=[(sum(value#20),mode=Final,isDistinct=false)],
output=[key#19,sum(value)#21])
 TungstenExchange hashpartitioning(key#19)
  TungstenAggregate(key=[key#19],
functions=[(sum(value#20),mode=Partial,isDistinct=false)],
output=[key#19,currentSum#25])
   Scan ParquetRelation[file:/tmp/data][key#19,value#20]

FYI

On Wed, Sep 9, 2015 at 12:31 PM, lonikar  wrote:

> The tungsten, cogegen etc options are enabled by default. But I am not able
> to get the execution through the UnsafeRow/TungstenProject. It still
> executes using InternalRow/Project.
>
> I see this in the SparkStrategies.scala: If unsafe mode is enabled and we
> support these data types in Unsafe, use the tungsten project. Otherwise use
> the normal project.
>
> Can someone give an example code on what can trigger this? I tried some of
> the primitive types but did not work.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-How-to-trigger-expression-execution-through-UnsafeRow-TungstenProject-tp14026.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread Ted Yu
Jerry:
I just tried building hbase-spark module with 1.5.0 and I see:

ls -l ~/.m2/repository/org/apache/spark/spark-core_2.10/1.5.0
total 21712
-rw-r--r--  1 tyu  staff   196 Sep  9 09:37 _maven.repositories
-rw-r--r--  1 tyu  staff  11081542 Sep  9 09:37 spark-core_2.10-1.5.0.jar
-rw-r--r--  1 tyu  staff41 Sep  9 09:37
spark-core_2.10-1.5.0.jar.sha1
-rw-r--r--  1 tyu  staff 19816 Sep  9 09:37 spark-core_2.10-1.5.0.pom
-rw-r--r--  1 tyu  staff41 Sep  9 09:37
spark-core_2.10-1.5.0.pom.sha1

FYI

On Wed, Sep 9, 2015 at 9:35 AM, Jerry Lam  wrote:

> Hi Spark Developers,
>
> I'm eager to try it out! However, I got problems in resolving dependencies:
> [warn] [NOT FOUND  ]
> org.apache.spark#spark-core_2.10;1.5.0!spark-core_2.10.jar (0ms)
> [warn]  jcenter: tried
>
> When the package will be available?
>
> Best Regards,
>
> Jerry
>
>
> On Wed, Sep 9, 2015 at 9:30 AM, Dimitris Kouzis - Loukas <
> look...@gmail.com> wrote:
>
>> Yeii!
>>
>> On Wed, Sep 9, 2015 at 2:25 PM, Yu Ishikawa > > wrote:
>>
>>> Great work, everyone!
>>>
>>>
>>>
>>> -
>>> -- Yu Ishikawa
>>> --
>>> View this message in context:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Spark-1-5-0-tp14013p14015.html
>>> Sent from the Apache Spark Developers List mailing list archive at
>>> Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>


Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
My understanding is that people on this mailing list who are interested to
help can log comments on the GORA JIRA.
HBase integration with Spark is proven to work. So the intricacies should
be on Gora side.

On Wed, Aug 26, 2015 at 8:08 AM, Furkan KAMACI 
wrote:

> Btw, here is the source code of GoraInputFormat.java :
>
>
> https://github.com/kamaci/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/GoraInputFormat.java
> 26 Ağu 2015 18:05 tarihinde "Furkan KAMACI" 
> yazdı:
>
> I'll send an e-mail to Gora dev list too and also attach my patch into my
>> GSoC Jira issue you mentioned and then we can continue at there.
>>
>> Before I do that stuff, I wanted to get Spark dev community's ideas to
>> solve my problem due to you may have faced such kind of problems before.
>> 26 Ağu 2015 17:13 tarihinde "Ted Yu"  yazdı:
>>
>>> I found GORA-386 Gora Spark Backend Support
>>>
>>> Should the discussion be continued there ?
>>>
>>> Cheers
>>>
>>> On Wed, Aug 26, 2015 at 7:02 AM, Ted Malaska 
>>> wrote:
>>>
>>>> Where is the input format class.  When every I use the search on your
>>>> github it says "We couldn’t find any issues matching 'GoraInputFormat'"
>>>>
>>>>
>>>>
>>>> On Wed, Aug 26, 2015 at 9:48 AM, Furkan KAMACI 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Here is the MapReduceTestUtils.testSparkWordCount()
>>>>>
>>>>>
>>>>> https://github.com/kamaci/gora/blob/master/gora-core/src/test/java/org/apache/gora/mapreduce/MapReduceTestUtils.java#L108
>>>>>
>>>>> Here is SparkWordCount
>>>>>
>>>>>
>>>>> https://github.com/kamaci/gora/blob/8f1acc6d4ef6c192e8fc06287558b7bc7c39b040/gora-core/src/examples/java/org/apache/gora/examples/spark/SparkWordCount.java
>>>>>
>>>>> Lastly, here is GoraSparkEngine:
>>>>>
>>>>>
>>>>> https://github.com/kamaci/gora/blob/master/gora-core/src/main/java/org/apache/gora/spark/GoraSparkEngine.java
>>>>>
>>>>> Kind Regards,
>>>>> Furkan KAMACI
>>>>>
>>>>> On Wed, Aug 26, 2015 at 4:40 PM, Ted Malaska >>>> > wrote:
>>>>>
>>>>>> Where can I find the code for MapReduceTestUtils.testSparkWordCount?
>>>>>>
>>>>>> On Wed, Aug 26, 2015 at 9:29 AM, Furkan KAMACI <
>>>>>> furkankam...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here is the test method I've ignored due to Connection Refused
>>>>>>> problem failure:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/kamaci/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/mapreduce/TestHBaseStoreWordCount.java#L65
>>>>>>>
>>>>>>> I've implemented a Spark backend for Apache Gora as GSoC project and
>>>>>>> this is the latest obstacle that I should solve. If you can help me, you
>>>>>>> are welcome.
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Furkan KAMACI
>>>>>>>
>>>>>>> On Wed, Aug 26, 2015 at 3:45 PM, Ted Malaska <
>>>>>>> ted.mala...@cloudera.com> wrote:
>>>>>>>
>>>>>>>> I've always used HBaseTestingUtility and never really had much
>>>>>>>> trouble. I use that for all my unit testing between Spark and HBase.
>>>>>>>>
>>>>>>>> Here are some code examples if your interested
>>>>>>>>
>>>>>>>> --Main HBase-Spark Module
>>>>>>>> https://github.com/apache/hbase/tree/master/hbase-spark
>>>>>>>>
>>>>>>>> --Unit test that cover all basic connections
>>>>>>>>
>>>>>>>> https://github.com/apache/hbase/blob/master/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala
>>>>>>>>
>>>>>>>> --If you want to look at the old stuff before it went into HBase
>>>>>>>> https://github.com/cloudera-labs/SparkOnHBase
>>>>>>>>
>>>>>>>> Let me k

Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
I found GORA-386 Gora Spark Backend Support

Should the discussion be continued there ?

Cheers

On Wed, Aug 26, 2015 at 7:02 AM, Ted Malaska 
wrote:

> Where is the input format class.  When every I use the search on your
> github it says "We couldn’t find any issues matching 'GoraInputFormat'"
>
>
>
> On Wed, Aug 26, 2015 at 9:48 AM, Furkan KAMACI 
> wrote:
>
>> Hi,
>>
>> Here is the MapReduceTestUtils.testSparkWordCount()
>>
>>
>> https://github.com/kamaci/gora/blob/master/gora-core/src/test/java/org/apache/gora/mapreduce/MapReduceTestUtils.java#L108
>>
>> Here is SparkWordCount
>>
>>
>> https://github.com/kamaci/gora/blob/8f1acc6d4ef6c192e8fc06287558b7bc7c39b040/gora-core/src/examples/java/org/apache/gora/examples/spark/SparkWordCount.java
>>
>> Lastly, here is GoraSparkEngine:
>>
>>
>> https://github.com/kamaci/gora/blob/master/gora-core/src/main/java/org/apache/gora/spark/GoraSparkEngine.java
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Wed, Aug 26, 2015 at 4:40 PM, Ted Malaska 
>> wrote:
>>
>>> Where can I find the code for MapReduceTestUtils.testSparkWordCount?
>>>
>>> On Wed, Aug 26, 2015 at 9:29 AM, Furkan KAMACI 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Here is the test method I've ignored due to Connection Refused problem
>>>> failure:
>>>>
>>>>
>>>> https://github.com/kamaci/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/mapreduce/TestHBaseStoreWordCount.java#L65
>>>>
>>>> I've implemented a Spark backend for Apache Gora as GSoC project and
>>>> this is the latest obstacle that I should solve. If you can help me, you
>>>> are welcome.
>>>>
>>>> Kind Regards,
>>>> Furkan KAMACI
>>>>
>>>> On Wed, Aug 26, 2015 at 3:45 PM, Ted Malaska 
>>>> wrote:
>>>>
>>>>> I've always used HBaseTestingUtility and never really had much
>>>>> trouble. I use that for all my unit testing between Spark and HBase.
>>>>>
>>>>> Here are some code examples if your interested
>>>>>
>>>>> --Main HBase-Spark Module
>>>>> https://github.com/apache/hbase/tree/master/hbase-spark
>>>>>
>>>>> --Unit test that cover all basic connections
>>>>>
>>>>> https://github.com/apache/hbase/blob/master/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala
>>>>>
>>>>> --If you want to look at the old stuff before it went into HBase
>>>>> https://github.com/cloudera-labs/SparkOnHBase
>>>>>
>>>>> Let me know if that helps
>>>>>
>>>>> On Wed, Aug 26, 2015 at 5:40 AM, Ted Yu  wrote:
>>>>>
>>>>>> Can you log the contents of the Configuration you pass from Spark ?
>>>>>> The output would give you some clue.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 26, 2015, at 2:30 AM, Furkan KAMACI 
>>>>>> wrote:
>>>>>>
>>>>>> Hi Ted,
>>>>>>
>>>>>> I'll check Zookeeper connection but another test method which runs on
>>>>>> hbase without Spark works without any error. Hbase version is
>>>>>> 0.98.8-hadoop2 and I use Spark 1.3.1
>>>>>>
>>>>>> Kind Regards,
>>>>>> Furkan KAMACI
>>>>>> 26 Ağu 2015 12:08 tarihinde "Ted Yu"  yazdı:
>>>>>>
>>>>>>> The connection failure was to zookeeper.
>>>>>>>
>>>>>>> Have you verified that localhost:2181 can serve requests ?
>>>>>>> What version of hbase was Gora built against ?
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Aug 26, 2015, at 1:50 AM, Furkan KAMACI 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I start an Hbase cluster for my test class. I use that helper class:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/gora/blob/mast

Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
Can you log the contents of the Configuration you pass from Spark ?
The output would give you some clue. 

Cheers



> On Aug 26, 2015, at 2:30 AM, Furkan KAMACI  wrote:
> 
> Hi Ted,
> 
> I'll check Zookeeper connection but another test method which runs on hbase 
> without Spark works without any error. Hbase version is 0.98.8-hadoop2 and I 
> use Spark 1.3.1
> 
> Kind Regards,
> Furkan KAMACI
> 
> 26 Ağu 2015 12:08 tarihinde "Ted Yu"  yazdı:
>> The connection failure was to zookeeper. 
>> 
>> Have you verified that localhost:2181 can serve requests ?
>> What version of hbase was Gora built against ?
>> 
>> Cheers
>> 
>> 
>> 
>>> On Aug 26, 2015, at 1:50 AM, Furkan KAMACI  wrote:
>>> 
>>> Hi,
>>> 
>>> I start an Hbase cluster for my test class. I use that helper class: 
>>> 
>>> https://github.com/apache/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/util/HBaseClusterSingleton.java
>>> 
>>> and use it as like that:
>>> 
>>> private static final HBaseClusterSingleton cluster = 
>>> HBaseClusterSingleton.build(1);
>>> 
>>> I retrieve configuration object as follows:
>>> 
>>> cluster.getConf()
>>> 
>>> and I use it at Spark as follows:
>>> 
>>> sparkContext.newAPIHadoopRDD(conf, MyInputFormat.class, clazzK,
>>> clazzV);
>>> 
>>> When I run my test there is no need to startup an Hbase cluster because 
>>> Spark will connect to my dummy cluster. However when I run my test method 
>>> it throws an error:
>>> 
>>> 2015-08-26 01:19:59,558 INFO [Executor task launch 
>>> worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn 
>>> (ClientCnxn.java:logStartConnect(966)) - Opening socket connection to 
>>> server localhost/127.0.0.1:2181. Will not attempt to authenticate using 
>>> SASL (unknown error)
>>> 
>>> 2015-08-26 01:19:59,559 WARN [Executor task launch 
>>> worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn 
>>> (ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected 
>>> error, closing socket connection and attempting reconnect 
>>> java.net.ConnectException: Connection refused at 
>>> sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at 
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>>>  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
>>> Hbase tests, which do not run on Spark, works well. When I check the logs I 
>>> see that cluster and Spark is started up correctly:
>>> 
>>> 2015-08-26 01:35:21,791 INFO [main] hdfs.MiniDFSCluster 
>>> (MiniDFSCluster.java:waitActive(2055)) - Cluster is active
>>> 
>>> 2015-08-26 01:35:40,334 INFO [main] util.Utils (Logging.scala:logInfo(59)) 
>>> - Successfully started service 'sparkDriver' on port 56941.
>>> I realized that when I start up an hbase from command line my test method 
>>> for Spark connects to it!
>>> 
>>> So, does it means that it doesn't care about the conf I passed to it? Any 
>>> ideas about how to solve it?


Re: Spark Cannot Connect to HBaseClusterSingleton

2015-08-26 Thread Ted Yu
The connection failure was to zookeeper. 

Have you verified that localhost:2181 can serve requests ?
What version of hbase was Gora built against ?

Cheers



> On Aug 26, 2015, at 1:50 AM, Furkan KAMACI  wrote:
> 
> Hi,
> 
> I start an Hbase cluster for my test class. I use that helper class: 
> 
> https://github.com/apache/gora/blob/master/gora-hbase/src/test/java/org/apache/gora/hbase/util/HBaseClusterSingleton.java
> 
> and use it as like that:
> 
> private static final HBaseClusterSingleton cluster = 
> HBaseClusterSingleton.build(1);
> 
> I retrieve configuration object as follows:
> 
> cluster.getConf()
> 
> and I use it at Spark as follows:
> 
> sparkContext.newAPIHadoopRDD(conf, MyInputFormat.class, clazzK,
> clazzV);
> 
> When I run my test there is no need to startup an Hbase cluster because Spark 
> will connect to my dummy cluster. However when I run my test method it throws 
> an error:
> 
> 2015-08-26 01:19:59,558 INFO [Executor task launch 
> worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(966)) - Opening socket connection to server 
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> 
> 2015-08-26 01:19:59,559 WARN [Executor task launch 
> worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect java.net.ConnectException: 
> Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native 
> Method) at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> Hbase tests, which do not run on Spark, works well. When I check the logs I 
> see that cluster and Spark is started up correctly:
> 
> 2015-08-26 01:35:21,791 INFO [main] hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2055)) - Cluster is active
> 
> 2015-08-26 01:35:40,334 INFO [main] util.Utils (Logging.scala:logInfo(59)) - 
> Successfully started service 'sparkDriver' on port 56941.
> I realized that when I start up an hbase from command line my test method for 
> Spark connects to it!
> 
> So, does it means that it doesn't care about the conf I passed to it? Any 
> ideas about how to solve it?


Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-21 Thread Ted Yu
I pointed hbase-spark module (in HBase project) to 1.5.0-rc1 and was able
to build the module (with proper maven repo).

FYI

On Fri, Aug 21, 2015 at 2:17 PM, mkhaitman  wrote:

> Just a heads up that this RC1 release is still appearing as
> "1.5.0-SNAPSHOT"
> (Not just me right..?)
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC1-tp13780p13792.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Ted Yu
See this thread:

http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+module&subj=Building+Spark+Building+just+one+module+



> On Aug 19, 2015, at 1:44 AM, canan chen  wrote:
> 
> I want to work on one jira, but it is not easy to do unit test, because it 
> involves different components especially UI. spark building is pretty slow, I 
> don't want to build it each time to test my code change. I am wondering how 
> other people do ? Is there any experience can share ? Thanks
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
I tried accessing just now.
It took several seconds before the page showed up.

FYI

On Thu, Aug 13, 2015 at 7:56 PM, Cheng, Hao  wrote:

> I found the https://spark-prs.appspot.com/ is super slow while open it in
> a new window recently, not sure just myself or everybody experience the
> same, is there anyways to speed up?
>
>
>
> *From:* Josh Rosen [mailto:rosenvi...@gmail.com]
> *Sent:* Friday, August 14, 2015 10:21 AM
> *To:* dev
> *Subject:* Re: Automatically deleting pull request comments left by
> AmplabJenkins
>
>
>
> Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59
>
>
>
> On Wed, Aug 12, 2015 at 7:51 PM, Josh Rosen  wrote:
>
> *TL;DR*: would anyone object if I wrote a script to auto-delete pull
> request comments from AmplabJenkins?
>
>
>
> Currently there are two bots which post Jenkins test result comments to
> GitHub, AmplabJenkins and SparkQA.
>
>
>
> SparkQA is the account which post the detailed Jenkins start and finish
> messages that contain information on which commit is being tested and which
> tests have failed. This bot is controlled via the dev/run-tests-jenkins
> script.
>
>
>
> AmplabJenkins is controlled by the Jenkins GitHub Pull Request Builder
> plugin. This bot posts relatively uninformative comments ("Merge build
> triggered", "Merge build started", "Merge build failed") that do not
> contain any links or details specific to the tests being run.
>
>
>
> It is technically non-trivial prevent these AmplabJenkins comments from
> being posted in the first place (see
> https://issues.apache.org/jira/browse/SPARK-4216).
>
>
>
> However, as a short-term hack I'd like to deploy a script which
> automatically deletes these comments as soon as they're posted, with an
> exemption carved out for the "Can an admin approve this patch for testing?"
> messages. This will help to significantly de-clutter pull request
> discussions in the GitHub UI.
>
>
>
> If nobody objects, I'd like to deploy this script sometime in the next few
> days.
>
>
>
> (From a technical perspective, my script uses the GitHub REST API and
> AmplabJenkins' own OAuth token to delete the comments.  The final
> deployment environment will most likely be the backend of
> http://spark-prs.appspot.com).
>
>
>
> - Josh
>
>
>


Re: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Ted Yu
Thanks Josh for the initiative.

I think reducing the redundancy in QA bot posts would make discussion on GitHub
UI more focused.

Cheers

On Thu, Aug 13, 2015 at 7:21 PM, Josh Rosen  wrote:

> Prototype is at https://github.com/databricks/spark-pr-dashboard/pull/59
>
> On Wed, Aug 12, 2015 at 7:51 PM, Josh Rosen  wrote:
>
>> *TL;DR*: would anyone object if I wrote a script to auto-delete pull
>> request comments from AmplabJenkins?
>>
>> Currently there are two bots which post Jenkins test result comments to
>> GitHub, AmplabJenkins and SparkQA.
>>
>> SparkQA is the account which post the detailed Jenkins start and finish
>> messages that contain information on which commit is being tested and which
>> tests have failed. This bot is controlled via the dev/run-tests-jenkins
>> script.
>>
>> AmplabJenkins is controlled by the Jenkins GitHub Pull Request Builder
>> plugin. This bot posts relatively uninformative comments ("Merge build
>> triggered", "Merge build started", "Merge build failed") that do not
>> contain any links or details specific to the tests being run.
>>
>> It is technically non-trivial prevent these AmplabJenkins comments from
>> being posted in the first place (see
>> https://issues.apache.org/jira/browse/SPARK-4216).
>>
>> However, as a short-term hack I'd like to deploy a script which
>> automatically deletes these comments as soon as they're posted, with an
>> exemption carved out for the "Can an admin approve this patch for testing?"
>> messages. This will help to significantly de-clutter pull request
>> discussions in the GitHub UI.
>>
>> If nobody objects, I'd like to deploy this script sometime in the next
>> few days.
>>
>> (From a technical perspective, my script uses the GitHub REST API and
>> AmplabJenkins' own OAuth token to delete the comments.  The final
>> deployment environment will most likely be the backend of
>> http://spark-prs.appspot.com).
>>
>> - Josh
>>
>
>


Re: subscribe

2015-08-13 Thread Ted Yu
See first section on https://spark.apache.org/community

On Thu, Aug 13, 2015 at 9:44 AM, Naga Vij  wrote:

> subscribe
>


Re: 答复: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Ted Yu
Yan:
Where can I find performance numbers for Astro (it's close to middle of
August) ?

Cheers

On Tue, Aug 11, 2015 at 3:58 PM, Yan Zhou.sc  wrote:

> Finally I can take a look at HBASE-14181 now. Unfortunately there is no
> design doc mentioned. Superficially it is very similar to Astro with a
> difference of
>
> this being part of HBase client library; while Astro works as a Spark
> package so will evolve and function more closely with Spark SQL/Dataframe
> instead of HBase.
>
>
>
> In terms of architecture, my take is loosely-coupled query engines on top
> of KV store vs. an array of query engines supported by, and packaged as
> part of, a KV store.
>
>
>
> Functionality-wise the two could be close but Astro also supports Python
> as a result of tight integration with Spark.
>
> It will be interesting to see performance comparisons when HBase-14181 is
> ready.
>
>
>
> Thanks,
>
>
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* Tuesday, August 11, 2015 3:28 PM
> *To:* Yan Zhou.sc
> *Cc:* Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org
> *Subject:* Re: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"
>
>
>
> HBase will not have query engine.
>
>
>
> It will provide better support to query engines.
>
>
>
> Cheers
>
>
> On Aug 10, 2015, at 11:11 PM, Yan Zhou.sc  wrote:
>
> Ted,
>
>
>
> I’m in China now, and seem to experience difficulty to access Apache Jira.
> Anyways, it appears to me  that HBASE-14181
> <https://issues.apache.org/jira/browse/HBASE-14181> attempts to support
> Spark DataFrame inside HBase.
>
> If true, one question to me is whether HBase is intended to have a
> built-in query engine or not. Or it will stick with the current way as
>
> a k-v store with some built-in processing capabilities in the forms of
> coprocessor, custom filter, …, etc., which allows for loosely-coupled query
> engines
>
> built on top of it.
>
>
>
> Thanks,
>
>
>
> *发件人**:* Ted Yu [mailto:yuzhih...@gmail.com ]
> *发送时间**:* 2015年8月11日 8:54
> *收件人**:* Bing Xiao (Bing)
> *抄送**:* dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc
> *主题**:* Re: Package Release Annoucement: Spark SQL on HBase "Astro"
>
>
>
> Yan / Bing:
>
> Mind taking a look at HBASE-14181
> <https://issues.apache.org/jira/browse/HBASE-14181> 'Add Spark DataFrame
> DataSource to HBase-Spark Module' ?
>
>
>
> Thanks
>
>
>
> On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) 
> wrote:
>
> We are happy to announce the availability of the Spark SQL on HBase 1.0.0
> release.
> http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
>
> The main features in this package, dubbed “Astro”, include:
>
> · Systematic and powerful handling of data pruning and
> intelligent scan, based on partial evaluation technique
>
> · HBase pushdown capabilities like custom filters and coprocessor
> to support ultra low latency processing
>
> · SQL, Data Frame support
>
> · More SQL capabilities made possible (Secondary index, bloom
> filter, Primary Key, Bulk load, Update)
>
> · Joins with data from other sources
>
> · Python/Java/Scala support
>
> · Support latest Spark 1.4.0 release
>
>
>
> The tests by Huawei team and community contributors covered the areas:
> bulk load; projection pruning; partition pruning; partial evaluation; code
> generation; coprocessor; customer filtering; DML; complex filtering on keys
> and non-keys; Join/union with non-Hbase data; Data Frame; multi-column
> family test.  We will post the test results including performance tests the
> middle of August.
>
> You are very welcomed to try out or deploy the package, and help improve
> the integration tests with various combinations of the settings, extensive
> Data Frame tests, complex join/union test and extensive performance tests.
> Please use the “Issues” “Pull Requests” links at this package homepage, if
> you want to report bugs, improvement or feature requests.
>
> Special thanks to project owner and technical leader Yan Zhou, Huawei
> global team, community contributors and Databricks.   Databricks has been
> providing great assistance from the design to the release.
>
> “Astro”, the Spark SQL on HBase package will be useful for ultra low
> latency* query and analytics of large scale data sets in vertical
> enterprises**.* We will continue to work with the community to develop
> new features and improve code base.  Your comments and suggestions are
> greatly appreciated.
>
>
>
> Yan Zhou / Bing Xiao
>
> Huawei Big Data team
>
>
>
>
>
>


Re: Sources/pom for org.spark-project.hive

2015-08-11 Thread Ted Yu
Have you looked at
https://github.com/pwendell/hive/tree/0.13.1-shaded-protobuf ?

Cheers

On Tue, Aug 11, 2015 at 12:25 PM, Pala M Muthaia <
mchett...@rocketfuelinc.com> wrote:

> Hi,
>
> I am trying to make Spark SQL 1.4 work with our internal fork of Hive. We
> have some customizations in Hive (custom authorization, various hooks etc)
> that are all part of hive-exec.
>
> Given Spark's hive dependency is through org.spark-project.hive groupId,
> looks like i need to modify the definition of hive-exec artifact there to
> take dependency on our internal hive (vs org.apache.hive), and then
> everything else would flow through.
>
> However, i am unable to find sources for org.spark-project.hive to make
> this change. Is it available? Otherwise, how can i proceed in this
> situation?
>
>
> Thanks,
> pala
>


Re: 答复: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-11 Thread Ted Yu
HBase will not have query engine. 

It will provide better support to query engines. 

Cheers



> On Aug 10, 2015, at 11:11 PM, Yan Zhou.sc  wrote:
> 
> Ted,
>  
> I’m in China now, and seem to experience difficulty to access Apache Jira. 
> Anyways, it appears to me  that HBASE-14181 attempts to support Spark 
> DataFrame inside HBase.
> If true, one question to me is whether HBase is intended to have a built-in 
> query engine or not. Or it will stick with the current way as
> a k-v store with some built-in processing capabilities in the forms of 
> coprocessor, custom filter, …, etc., which allows for loosely-coupled query 
> engines
> built on top of it.
>  
> Thanks,
>  
> 发件人: Ted Yu [mailto:yuzhih...@gmail.com] 
> 发送时间: 2015年8月11日 8:54
> 收件人: Bing Xiao (Bing)
> 抄送: dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc
> 主题: Re: Package Release Annoucement: Spark SQL on HBase "Astro"
>  
> Yan / Bing:
> Mind taking a look at HBASE-14181 'Add Spark DataFrame DataSource to 
> HBase-Spark Module' ?
>  
> Thanks
>  
> On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing)  
> wrote:
> We are happy to announce the availability of the Spark SQL on HBase 1.0.0 
> release.  http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
> The main features in this package, dubbed “Astro”, include:
> · Systematic and powerful handling of data pruning and intelligent 
> scan, based on partial evaluation technique
> 
> · HBase pushdown capabilities like custom filters and coprocessor to 
> support ultra low latency processing
> 
> · SQL, Data Frame support
> 
> · More SQL capabilities made possible (Secondary index, bloom filter, 
> Primary Key, Bulk load, Update)
> 
> · Joins with data from other sources
> 
> · Python/Java/Scala support
> 
> · Support latest Spark 1.4.0 release
> 
>  
> 
> The tests by Huawei team and community contributors covered the areas: bulk 
> load; projection pruning; partition pruning; partial evaluation; code 
> generation; coprocessor; customer filtering; DML; complex filtering on keys 
> and non-keys; Join/union with non-Hbase data; Data Frame; multi-column family 
> test.  We will post the test results including performance tests the middle 
> of August.
> You are very welcomed to try out or deploy the package, and help improve the 
> integration tests with various combinations of the settings, extensive Data 
> Frame tests, complex join/union test and extensive performance tests.  Please 
> use the “Issues” “Pull Requests” links at this package homepage, if you want 
> to report bugs, improvement or feature requests.
> Special thanks to project owner and technical leader Yan Zhou, Huawei global 
> team, community contributors and Databricks.   Databricks has been providing 
> great assistance from the design to the release.
> “Astro”, the Spark SQL on HBase package will be useful for ultra low latency 
> query and analytics of large scale data sets in vertical enterprises. We will 
> continue to work with the community to develop new features and improve code 
> base.  Your comments and suggestions are greatly appreciated.
>  
> Yan Zhou / Bing Xiao
> Huawei Big Data team
>  
>  


Re: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-10 Thread Ted Yu
Yan / Bing:
Mind taking a look at HBASE-14181
 'Add Spark DataFrame
DataSource to HBase-Spark Module' ?

Thanks

On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) 
wrote:

> We are happy to announce the availability of the Spark SQL on HBase 1.0.0
> release.
> http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
>
> The main features in this package, dubbed “Astro”, include:
>
> · Systematic and powerful handling of data pruning and
> intelligent scan, based on partial evaluation technique
>
> · HBase pushdown capabilities like custom filters and coprocessor
> to support ultra low latency processing
>
> · SQL, Data Frame support
>
> · More SQL capabilities made possible (Secondary index, bloom
> filter, Primary Key, Bulk load, Update)
>
> · Joins with data from other sources
>
> · Python/Java/Scala support
>
> · Support latest Spark 1.4.0 release
>
>
>
> The tests by Huawei team and community contributors covered the areas:
> bulk load; projection pruning; partition pruning; partial evaluation; code
> generation; coprocessor; customer filtering; DML; complex filtering on keys
> and non-keys; Join/union with non-Hbase data; Data Frame; multi-column
> family test.  We will post the test results including performance tests the
> middle of August.
>
> You are very welcomed to try out or deploy the package, and help improve
> the integration tests with various combinations of the settings, extensive
> Data Frame tests, complex join/union test and extensive performance tests.
> Please use the “Issues” “Pull Requests” links at this package homepage, if
> you want to report bugs, improvement or feature requests.
>
> Special thanks to project owner and technical leader Yan Zhou, Huawei
> global team, community contributors and Databricks.   Databricks has been
> providing great assistance from the design to the release.
>
> “Astro”, the Spark SQL on HBase package will be useful for ultra low
> latency* query and analytics of large scale data sets in vertical
> enterprises**.* We will continue to work with the community to develop
> new features and improve code base.  Your comments and suggestions are
> greatly appreciated.
>
>
>
> Yan Zhou / Bing Xiao
>
> Huawei Big Data team
>
>
>


Re: Is there any way to support multiple users executing SQL on thrift server?

2015-08-06 Thread Ted Yu
What is the JIRA number if a JIRA has been logged for this ?

Thanks



> On Jan 20, 2015, at 11:30 AM, Cheng Lian  wrote:
> 
> Hey Yi,
> 
> I'm quite unfamiliar with Hadoop/HDFS auth mechanisms for now, but would like 
> to investigate this issue later. Would you please open an JIRA for it? Thanks!
> 
> Cheng
> 
>> On 1/19/15 1:00 AM, Yi Tian wrote:
>> Is there any way to support multiple users executing SQL on one thrift 
>> server?
>> 
>> I think there are some problems for spark 1.2.0, for example:
>> 
>> Start thrift server with user A
>> Connect to thrift server via beeline with user B
>> Execute “insert into table dest select … from table src”
>> then we found these items on hdfs:
>> 
>> drwxr-xr-x   - B supergroup  0 2015-01-16 16:42 
>> /tmp/hadoop/hive_2015-01-16_16-42-48_923_1860943684064616152-3/-ext-1
>> drwxr-xr-x   - B supergroup  0 2015-01-16 16:42 
>> /tmp/hadoop/hive_2015-01-16_16-42-48_923_1860943684064616152-3/-ext-1/_temporary
>> drwxr-xr-x   - B supergroup  0 2015-01-16 16:42 
>> /tmp/hadoop/hive_2015-01-16_16-42-48_923_1860943684064616152-3/-ext-1/_temporary/0
>> drwxr-xr-x   - A supergroup  0 2015-01-16 16:42 
>> /tmp/hadoop/hive_2015-01-16_16-42-48_923_1860943684064616152-3/-ext-1/_temporary/0/_temporary
>> drwxr-xr-x   - A supergroup  0 2015-01-16 16:42 
>> /tmp/hadoop/hive_2015-01-16_16-42-48_923_1860943684064616152-3/-ext-1/_temporary/0/task_201501161642_0022_m_00
>> -rw-r--r--   3 A supergroup   2671 2015-01-16 16:42 
>> /tmp/hadoop/hive_2015-01-16_16-42-48_923_1860943684064616152-3/-ext-1/_temporary/0/task_201501161642_0022_m_00/part-0
>> You can see all the temporary path created on driver side (thrift server 
>> side) is owned by user B (which is what we expected).
>> 
>> But all the output data created on executor side is owned by user A, (which 
>> is NOT what we expected).
>> error owner of the output data cause 
>> org.apache.hadoop.security.AccessControlException while the driver side 
>> moving output data into dest table.
>> 
>> Is anyone know how to resolve this problem?
>> 
> 


Re: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-08-03 Thread Ted Yu
When I tried to compile against hbase 1.1.1, I got:

[ERROR]
/home/hbase/ssoh/src/main/scala/org/apache/spark/sql/hbase/SparkSqlRegionObserver.scala:124:
overloaded method next needs result type
[ERROR]   override def next(result: java.util.List[Cell], limit: Int) =
next(result)

Is there plan to support hbase 1.x ?

Thanks

On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) 
wrote:

> We are happy to announce the availability of the Spark SQL on HBase 1.0.0
> release.
> http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase
>
> The main features in this package, dubbed “Astro”, include:
>
> · Systematic and powerful handling of data pruning and
> intelligent scan, based on partial evaluation technique
>
> · HBase pushdown capabilities like custom filters and coprocessor
> to support ultra low latency processing
>
> · SQL, Data Frame support
>
> · More SQL capabilities made possible (Secondary index, bloom
> filter, Primary Key, Bulk load, Update)
>
> · Joins with data from other sources
>
> · Python/Java/Scala support
>
> · Support latest Spark 1.4.0 release
>
>
>
> The tests by Huawei team and community contributors covered the areas:
> bulk load; projection pruning; partition pruning; partial evaluation; code
> generation; coprocessor; customer filtering; DML; complex filtering on keys
> and non-keys; Join/union with non-Hbase data; Data Frame; multi-column
> family test.  We will post the test results including performance tests the
> middle of August.
>
> You are very welcomed to try out or deploy the package, and help improve
> the integration tests with various combinations of the settings, extensive
> Data Frame tests, complex join/union test and extensive performance tests.
> Please use the “Issues” “Pull Requests” links at this package homepage, if
> you want to report bugs, improvement or feature requests.
>
> Special thanks to project owner and technical leader Yan Zhou, Huawei
> global team, community contributors and Databricks.   Databricks has been
> providing great assistance from the design to the release.
>
> “Astro”, the Spark SQL on HBase package will be useful for ultra low
> latency* query and analytics of large scale data sets in vertical
> enterprises**.* We will continue to work with the community to develop
> new features and improve code base.  Your comments and suggestions are
> greatly appreciated.
>
>
>
> Yan Zhou / Bing Xiao
>
> Huawei Big Data team
>
>
>


Re: add to user list

2015-07-30 Thread Ted Yu
Please take a look at the first section of:
https://spark.apache.org/community

On Thu, Jul 30, 2015 at 9:23 PM, Sachin Aggarwal  wrote:

>
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>


Re: High availability with zookeeper: worker discovery

2015-07-30 Thread Ted Yu
zookeeper is not a direct dependency of Spark.

Can you give a bit more detail on how the election / discovery of master
works ?

Cheers

On Thu, Jul 30, 2015 at 7:41 PM, Christophe Schmitz 
wrote:

> Hi there,
>
> I am trying to run a 3 node spark cluster where each nodes contains a
> spark worker and a spark maser. Election of the master happens via
> zookeeper.
>
> The way I am configuring it is by (on each node) giving the IP:PORT of the
> local master to the local worker, and I wish the worker could autodiscover
> the elected master automatically.
>
> But unfortunatly, only the local worker of the elected master registered
> to the elected master. Why aren't the other worker getting to connect to
> the elected master?
>
> The interessing thing is that if I kill the elected master and wait a bit,
> then the new elected master sees all the workers!
>
> I am wondering if I am missing something to make this happens without
> having to kill the elected master.
>
> Thanks!
>
>
> PS: I am on spark 1.2.2
>
>


Re: Generalised Spark-HBase integration

2015-07-28 Thread Ted Yu
I got a compilation error:

[INFO] /home/hbase/s-on-hbase/src/main/scala:-1: info: compiling
[INFO] Compiling 18 source files to /home/hbase/s-on-hbase/target/classes
at 1438099569598
[ERROR]
/home/hbase/s-on-hbase/src/main/scala/org/apache/spark/hbase/examples/simple/HBaseTableSimple.scala:36:
error: type mismatch;
[INFO]  found   : Int
[INFO]  required: Short
[INFO]   while (scanner.advance) numCells += 1
[INFO]^
[ERROR] one error found

FYI

On Tue, Jul 28, 2015 at 8:59 AM, Michal Haris 
wrote:

> Hi all, last couple of months I've been working on a large graph analytics
> and along the way have written from scratch a HBase-Spark integration as
> none of the ones out there worked either in terms of scale or in the way
> they integrated with the RDD interface. This week I have generalised it
> into an (almost) spark module, which works with the latest spark and the
> new hbase api, so... sharing! :
> https://github.com/michal-harish/spark-on-hbase
>
>
> --
> Michal Haris
> Technical Architect
> direct line: +44 (0) 207 749 0229
> www.visualdna.com | t: +44 (0) 207 734 7033
> 31 Old Nichol Street
> London
> E2 7HR
>


ReceiverTrackerSuite failing in master build

2015-07-28 Thread Ted Yu
Hi,
I noticed that ReceiverTrackerSuite is failing in master Jenkins build for
both hadoop profiles.

The failure seems to start with:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3104/

FYI


Re: Asked to remove non-existent executor exception

2015-07-26 Thread Ted Yu
If I read the code correctly, that error message came
from CoarseGrainedSchedulerBackend.

There may be existing / future error messages, other than the one cited
below, which are useful. Maybe change the log level of this message  to
DEBUG ?

Cheers

On Sun, Jul 26, 2015 at 3:28 PM, Mridul Muralidharan 
wrote:

> Simply customize your log4j confit instead of modifying code if you don't
> want messages from that class.
>
>
> Regards
> Mridul
>
> On Sunday, July 26, 2015, Sea <261810...@qq.com> wrote:
>
>> This exception is so ugly!!!  The screen is full of these information
>> when the program runs a long time,  and they will not fail the job.
>>
>> I comment it in the source code. I think this information is useless
>> because the executor is already removed and I don't know what does the
>> executor id mean.
>>
>> Should we remove this information forever?
>>
>>
>>
>>  15/07/23 13:26:41 ERROR SparkDeploySchedulerBackend: Asked to remove
>> non-existent executor 2...
>>
>> 15/07/23 13:26:41 ERROR SparkDeploySchedulerBackend: Asked to remove
>> non-existent executor 2...
>>
>>
>>
>>
>>
>>
>>
>> -- 原始邮件 --
>>  *发件人:* "Ted Yu";;
>> *发送时间:* 2015年7月26日(星期天) 晚上10:51
>> *收件人:* "Pa Rö";
>> *抄送:* "user";
>> *主题:* Re: Asked to remove non-existent executor exception
>>
>> You can list the files in tmpfs in reverse chronological order and remove
>> the oldest until you have enough space.
>>
>> Cheers
>>
>> On Sun, Jul 26, 2015 at 12:43 AM, Pa Rö 
>> wrote:
>>
>>> i has seen that the "tempfs" is full, how i can clear this?
>>>
>>> 2015-07-23 13:41 GMT+02:00 Pa Rö :
>>>
>>>>   hello spark community,
>>>>
>>>> i have build an application with geomesa, accumulo and spark.
>>>> if it run on spark local mode, it is working, but on spark
>>>> cluster not. in short it says: No space left on device. Asked to remove
>>>> non-existent executor XY.
>>>> I´m confused, because there were many GB´s of free space. do i need to
>>>> change my configuration or what else can i do? thanks in advance.
>>>>
>>>> here is the complete exception:
>>>>
>>>> og4j:WARN No appenders could be found for logger
>>>> (org.apache.accumulo.fate.zookeeper.ZooSession).
>>>> log4j:WARN Please initialize the log4j system properly.
>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
>>>> for more info.
>>>> Using Spark's default log4j profile:
>>>> org/apache/spark/log4j-defaults.properties
>>>> 15/07/23 13:26:39 INFO SparkContext: Running Spark version 1.3.0
>>>> 15/07/23 13:26:39 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 15/07/23 13:26:39 INFO SecurityManager: Changing view acls to: marcel
>>>> 15/07/23 13:26:39 INFO SecurityManager: Changing modify acls to: marcel
>>>> 15/07/23 13:26:39 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users with view permissions: Set(marcel); users
>>>> with modify permissions: Set(marcel)
>>>> 15/07/23 13:26:39 INFO Slf4jLogger: Slf4jLogger started
>>>> 15/07/23 13:26:40 INFO Remoting: Starting remoting
>>>> 15/07/23 13:26:40 INFO Remoting: Remoting started; listening on
>>>> addresses :[akka.tcp://sparkDriver@node1-scads02:52478]
>>>> 15/07/23 13:26:40 INFO Utils: Successfully started service
>>>> 'sparkDriver' on port 52478.
>>>> 15/07/23 13:26:40 INFO SparkEnv: Registering MapOutputTracker
>>>> 15/07/23 13:26:40 INFO SparkEnv: Registering BlockManagerMaster
>>>> 15/07/23 13:26:40 INFO DiskBlockManager: Created local directory at
>>>> /tmp/spark-ca9319d4-68a2-4add-a21a-48b13ae9cf81/blockmgr-cbf8af23-e113-4732-8c2c-7413ad237b3b
>>>> 15/07/23 13:26:40 INFO MemoryStore: MemoryStore started with capacity
>>>> 1916.2 MB
>>>> 15/07/23 13:26:40 INFO HttpFileServer: HTTP File server directory is
>>>> /tmp/spark-9d4a04d5-3535-49e0-a859-d278a0cc7bf8/httpd-1882aafc-45fe-4490-803d-c04fc67510a2
>>>> 15/07/23 13:26:40 INFO HttpServer: Starting HTTP Server
>>>> 15/07/23 13:26:40 INFO Server: jetty-8.y.z-SNAPSHOT
>>>> 15/07/23 13:26:40 INFO AbstractConnector: Started
>>>> Sock

Re: KinesisStreamSuite failing in master branch

2015-07-20 Thread Ted Yu
TD:
Thanks for getting the builds back to green.

On Sun, Jul 19, 2015 at 7:21 PM, Tathagata Das  wrote:

> The PR to fix this is out.
> https://github.com/apache/spark/pull/7519
>
>
> On Sun, Jul 19, 2015 at 6:41 PM, Tathagata Das 
> wrote:
>
>> I am taking care of this right now.
>>
>> On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell 
>> wrote:
>>
>>> I think we should just revert this patch on all affected branches. No
>>> reason to leave the builds broken until a fix is in place.
>>>
>>> - Patrick
>>>
>>> On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen 
>>> wrote:
>>> > Yep, I emailed TD about it; I think that we may need to make a change
>>> to the
>>> > pull request builder to fix this.  Pending that, we could just revert
>>> the
>>> > commit that added this.
>>> >
>>> > On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu  wrote:
>>> >>
>>> >> Hi,
>>> >> I noticed that KinesisStreamSuite fails for both hadoop profiles in
>>> master
>>> >> Jenkins builds.
>>> >>
>>> >> From
>>> >>
>>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console
>>> >> :
>>> >>
>>> >> KinesisStreamSuite:
>>> >> *** RUN ABORTED ***
>>> >>   java.lang.AssertionError: assertion failed: Kinesis test not
>>> enabled,
>>> >> should not attempt to get AWS credentials
>>> >>   at scala.Predef$.assert(Predef.scala:179)
>>> >>   at
>>> >>
>>> org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:189)
>>> >>   at
>>> >> org.apache.spark.streaming.kinesis.KinesisTestUtils.org
>>> $apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient$lzycompute(KinesisTestUtils.scala:59)
>>> >>   at
>>> >> org.apache.spark.streaming.kinesis.KinesisTestUtils.org
>>> $apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient(KinesisTestUtils.scala:58)
>>> >>   at
>>> >>
>>> org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:121)
>>> >>   at
>>> >>
>>> org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:157)
>>> >>   at
>>> >>
>>> org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:78)
>>> >>   at
>>> >>
>>> org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:45)
>>> >>   at
>>> >>
>>> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
>>> >>   at
>>> >>
>>> org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:33)
>>> >>
>>> >>
>>> >> FYI
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>


KinesisStreamSuite failing in master branch

2015-07-19 Thread Ted Yu
Hi,
I noticed that KinesisStreamSuite fails for both hadoop profiles in master
Jenkins builds.

From
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console
:

KinesisStreamSuite:*** RUN ABORTED ***  java.lang.AssertionError:
assertion failed: Kinesis test not enabled, should not attempt to get
AWS credentials  at scala.Predef$.assert(Predef.scala:179)  at
org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:189)
 at 
org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient$lzycompute(KinesisTestUtils.scala:59)
 at 
org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient(KinesisTestUtils.scala:58)
 at 
org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:121)
 at 
org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:157)
 at 
org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:78)
 at 
org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:45)
 at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
 at 
org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:33)


FYI


Re: If gmail, check sparm

2015-07-18 Thread Ted Yu
Interesting read. 

I did find a lot of Spark mails in Spam folder. 

Thanks Mridul 



> On Jul 18, 2015, at 10:25 AM, Mridul Muralidharan  wrote:
> 
> https://plus.google.com/+LinusTorvalds/posts/DiG9qANf5PA
> 
> I have noticed a bunch of mails from dev@ and github going to spam -
> including spark maliing list.
> Might be a good idea for dev, committers to check if they are missing
> things in their spam folder if on gmail.
> 
> Regards,
> Mridul
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Expression.resolved unmatched with the correct values in catalyst?

2015-07-18 Thread Ted Yu
What if you move your addition to before line 64 (in master branch there is
case for if e.checkInputDataTypes().isFailure):

  case c: Cast if !c.resolved =>

Cheers

On Wed, Jul 15, 2015 at 12:47 AM, Takeshi Yamamuro 
wrote:

> Hi, devs
>
> I found that the case of 'Expression.resolved !=
> (Expression.childrenResolved && checkInputDataTypes().isSuccess)'
> occurs in the output of Analyzer.
> That is, some tests in o.a.s.sql.* fail if the codes below are added in
> CheckAnalysis:
>
>
> https://github.com/maropu/spark/commit/a488eee8351f5ec49854eef0266e4445269d5867
>
> Is this a correct behaviour in catalyst?
> If correct, anyone explains the case if this happens?
>
> Thanks,
> takeshi
>
> --
> ---
> Takeshi Yamamuro (maropu)
>


Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Ted Yu
+1 to removing commit messages. 



> On Jul 18, 2015, at 1:35 AM, Sean Owen  wrote:
> 
> +1 to removing them. Sometimes there are 50+ commits because people
> have been merging from master into their branch rather than rebasing.
> 
>> On Sat, Jul 18, 2015 at 8:48 AM, Reynold Xin  wrote:
>> I took a look at the commit messages in git log -- it looks like the
>> individual commit messages are not that useful to include, but do make the
>> commit messages more verbose. They are usually just a bunch of extremely
>> concise descriptions of "bug fixes", "merges", etc:
>> 
>>cb3f12d [xxx] add whitespace
>>6d874a6 [xxx] support pyspark for yarn-client
>> 
>>89b01f5 [yyy] Update the unit test to add more cases
>>275d252 [yyy] Address the comments
>>7cc146d [yyy] Address the comments
>>2624723 [yyy] Fix rebase conflict
>>45befaa [yyy] Update the unit test
>>bbc1c9c [yyy] Fix checkpointing doesn't retain driver port issue
>> 
>> 
>> Anybody against removing those from the merge script so the log looks
>> cleaner? If nobody feels strongly about this, we can just create a JIRA to
>> remove them, and only keep the author names.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Apache gives exception when running groupby on df temp table

2015-07-16 Thread Ted Yu
Can you provide a bit more information such as:

release of Spark you use
snippet of your SparkSQL query

Thanks

On Thu, Jul 16, 2015 at 5:31 AM, nipun  wrote:

> I have a dataframe. I register it as a temp table and run a spark sql query
> on it to get another dataframe. Now when I run groupBy on it, it gives me
> this exception
>
> e: Lost task 1.3 in stage 21.0 (TID 579, 172.28.0.162):
> java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.spark.sql.types.UTF8String
> at
>
> org.apache.spark.sql.execution.SparkSqlSerializer2$$anonfun$createSerializationFunction$1.apply(SparkSqlSerializer2.scala:319)
> at
>
> org.apache.spark.sql.execution.SparkSqlSerializer2$$anonfun$createSerializationFunction$1.apply(SparkSqlSerializer2.scala:212)
> at
>
> org.apache.spark.sql.execution.Serializer2SerializationStream.writeKey(SparkSqlSerializer2.scala:65)
> at
>
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:206)
> at
>
> org.apache.spark.util.collection.WritablePartitionedIterator$$anon$3.writeNext(WritablePartitionedPairCollection.scala:104)
> at
>
> org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:375)
> at
>
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:208)
> at
>
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-gives-exception-when-running-groupby-on-df-temp-table-tp13275.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: problems with build of latest the master

2015-07-15 Thread Ted Yu
If I understand correctly, hadoop-openstack is not currently dependence in 
Spark. 



> On Jul 15, 2015, at 8:21 AM, Josh Rosen  wrote:
> 
> We may be able to fix this from the Spark side by adding appropriate 
> exclusions in our Hadoop dependencies, right?  If possible, I think that we 
> should do this.
> 
>> On Wed, Jul 15, 2015 at 7:10 AM, Ted Yu  wrote:
>> I attached a patch for HADOOP-12235
>> 
>> BTW openstack was not mentioned in the first email from Gil.
>> My email and Gil's second email were sent around the same moment.
>> 
>> Cheers
>> 
>>> On Wed, Jul 15, 2015 at 2:06 AM, Steve Loughran  
>>> wrote:
>>> 
>>>> On 14 Jul 2015, at 12:22, Ted Yu  wrote:
>>>> 
>>>> Looking at Jenkins, master branch compiles.
>>>> 
>>>> Can you try the following command ?
>>>> 
>>>> mvn -Phive -Phadoop-2.6 -DskipTests clean package
>>>> 
>>>> What version of Java are you using ?
>>> 
>>> Ted, Giles has stuck in hadoop-openstack, it's that which is creating the 
>>> problem
>>> 
>>> Giles, I don't know why hadoop-openstack has a mockito dependency as  it 
>>> should be test time only 
>>> 
>>> Looking at the POM it's tag
>>> 
>>> in hadoop-2.7 tis scoped to compile, which 
>>> 
>>>   org.mockito
>>>   mockito-all
>>>   compile
>>> 
>>> 
>>> it should be "provided", shouldn't it?
>>> 
>>> Created https://issues.apache.org/jira/browse/HADOOP-12235 : if someone 
>>> supplies a patch I'll get it in.
>>> 
>>> -steve
> 


Re: problems with build of latest the master

2015-07-15 Thread Ted Yu
I attached a patch for HADOOP-12235

BTW openstack was not mentioned in the first email from Gil.
My email and Gil's second email were sent around the same moment.

Cheers

On Wed, Jul 15, 2015 at 2:06 AM, Steve Loughran 
wrote:

>
>  On 14 Jul 2015, at 12:22, Ted Yu  wrote:
>
>  Looking at Jenkins, master branch compiles.
>
>  Can you try the following command ?
>
> mvn -Phive -Phadoop-2.6 -DskipTests clean package
>
>  What version of Java are you using ?
>
>
>  Ted, Giles has stuck in hadoop-openstack, it's that which is creating
> the problem
>
>  Giles, I don't know why hadoop-openstack has a mockito dependency as  it
> should be test time only
>
>  Looking at the POM it's tag
>
>  in hadoop-2.7 tis scoped to compile, which
>  
>   org.mockito
>   mockito-all
>   compile
> 
>
>  it should be "provided", shouldn't it?
>
>  Created https://issues.apache.org/jira/browse/HADOOP-12235 : if someone
> supplies a patch I'll get it in.
>
>  -steve
>


Re: problems with build of latest the master

2015-07-14 Thread Ted Yu
Looking at Jenkins, master branch compiles.

Can you try the following command ?

mvn -Phive -Phadoop-2.6 -DskipTests clean package

What version of Java are you using ?

Cheers

On Tue, Jul 14, 2015 at 2:23 AM, Gil Vernik  wrote:

> I just did checkout of the master and tried to build it with
>
> mvn -Dhadoop.version=2.6.0 -DskipTests clean package
>
> Got:
>
> [ERROR]
> /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:117:
> error: cannot find symbol
> [ERROR]
> when(shuffleMemoryManager.tryToAcquire(anyLong())).then(returnsFirstArg());
> [ERROR]   ^
> [ERROR]   symbol:   method then(Answer)
> [ERROR]   location: interface OngoingStubbing
> [ERROR]
> /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:408:
> error: cannot find symbol
> [ERROR]   .then(returnsFirstArg()) // Allocate initial sort buffer
> [ERROR]   ^
> [ERROR]   symbol:   method then(Answer)
> [ERROR]   location: interface OngoingStubbing
> [ERROR]
> /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/shuffle/unsafe/UnsafeShuffleWriterSuite.java:435:
> error: cannot find symbol
> [ERROR]   .then(returnsFirstArg()) // Allocate initial sort buffer
> [ERROR]   ^
> [ERROR]   symbol:   method then(Answer)
> [ERROR]   location: interface OngoingStubbing
> [ERROR]
> /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java:98:
> error: cannot find symbol
> [ERROR]
> when(shuffleMemoryManager.tryToAcquire(anyLong())).then(returnsFirstArg());
> [ERROR]   ^
> [ERROR]   symbol:   method then(Answer)
> [ERROR]   location: interface OngoingStubbing
> [ERROR]
> /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java:130:
> error: cannot find symbol
> [ERROR]   .then(returnsSecondArg());
> [ERROR]   ^
> [ERROR]   symbol:   method then(Answer)
> [ERROR]   location: interface OngoingStubbing
> [ERROR] Note:
> /Users/gilv/Dev/Spark/spark/core/src/test/java/org/apache/spark/JavaAPISuite.java
> uses or overrides a deprecated API.
> [ERROR] Note: Recompile with -Xlint:deprecation for details.
> [ERROR] 5 errors
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM ... SUCCESS [
>  3.183 s]
> [INFO] Spark Project Launcher . SUCCESS [
>  7.681 s]
> [INFO] Spark Project Networking ... SUCCESS [
>  7.178 s]
> [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
>  4.125 s]
> [INFO] Spark Project Unsafe ... SUCCESS [
>  3.734 s]
> [INFO] Spark Project Core . FAILURE [02:16
> min]
> [INFO] Spark Project Bagel  SKIPPED


Re: ./dev/run-tests fail on master

2015-07-12 Thread Ted Yu
When I ran dev/run-tests , I got :

File "./dev/run-tests.py", line 68, in
__main__.identify_changed_files_from_git_commits
Failed example:
'root' in [x.name for x in determine_modules_for_files(
 identify_changed_files_from_git_commits("50a0496a43",
target_ref="6765ef9"))]
Exception raised:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/doctest.py", line 1253, in __run
compileflags, 1) in test.globs
  File "",
line 1, in 
'root' in [x.name for x in determine_modules_for_files(
 identify_changed_files_from_git_commits("50a0496a43",
target_ref="6765ef9"))]
  File "./dev/run-tests.py", line 82, in
identify_changed_files_from_git_commits
raw_output = subprocess.check_output(['git', 'diff', '--name-only',
patch_sha, diff_target],
AttributeError: 'module' object has no attribute 'check_output'

I was using python 2.6.6

Xiaoyu:
In the interim, can you use maven to run test suite ?

Cheers

On Sun, Jul 12, 2015 at 8:26 PM, Xiaoyu Ma 
wrote:

> Hi guys,
> I was trying to rerun test using run-tests on master but I got below
> errors. I was able to build using maven though. Any advice?
>
> [error]^
> [error]
> /Users/ilovesoup1/workspace/eclipseWS/spark/network/common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:24:
> error: package org.slf4j does not exist
> [error] import org.slf4j.Logger;
> [error] ^
> [error]
> /Users/ilovesoup1/workspace/eclipseWS/spark/network/common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java:25:
> error: package org.slf4j does not exist
> [error] import org.slf4j.LoggerFactory;
> [error] ^
> [error]
> /Users/ilovesoup1/workspace/eclipseWS/spark/network/common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java:75:
> error: cannot find symbol
> [error]   public void exceptionCaught(ChannelHandlerContext ctx, Throwable
> cause) throws Exception {
> [error]   ^
> [error]   symbol:   class ChannelHandlerContext
> [error]   location: class TransportChannelHandler
> [error] 100 errors
> [info] Done updating.
> [info] Done updating.
> [warn] There may be incompatibilities among your library dependencies.
> [warn] Here are some of the libraries that were evicted:
> [warn] * com.google.code.findbugs:jsr305:1.3.9 -> 2.0.1
> [warn] * com.google.guava:guava:11.0.2 -> 14.0.1
> [warn] * io.netty:netty-all:4.0.23.Final -> 4.0.28.Final
> [warn] * commons-net:commons-net:2.2 -> 3.1
> [warn] Run 'evicted' to see detailed eviction warnings
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}graphx...
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}catalyst...
> [info] Updating {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}bagel...
> [info] Updating {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}yarn...
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming...
> [info] Done updating.
> [warn] There may be incompatibilities among your library dependencies.
> [warn] Here are some of the libraries that were evicted:
> [warn] * com.google.guava:guava:11.0.2 -> 14.0.1
> [warn] Run 'evicted' to see detailed eviction warnings
> [info] Done updating.
> [warn] There may be incompatibilities among your library dependencies.
> [warn] Here are some of the libraries that were evicted:
> [warn] * com.google.guava:guava:11.0.2 -> 14.0.1
> [warn] Run 'evicted' to see detailed eviction warnings
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-twitter...
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-kafka...
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-flume...
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-zeromq...
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}kinesis-asl...
> [info] Done updating.
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-mqtt...
> [info] Done updating.
> [info] Updating {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}tools...
> [info] Done updating.
> [warn] There may be incompatibilities among your library dependencies.
> [warn] Here are some of the libraries that were evicted:
> [warn] * com.google.guava:guava:11.0.2 -> 14.0.1
> [warn] Run 'evicted' to see detailed eviction warnings
> [info] Updating {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}sql...
> [info] Done updating.
> [info] Done updating.
> [info] Done updating.
> [info] Done updating.
> [info] Done updating.
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-flume-assembly...
> [info] Done updating.
> [info] Updating
> {file:/Users/ilovesoup1/workspace/eclipseWS/spark/}streaming-kafka-assembly...
> [info] Done updating.
> [info] Done updating.
> [info] Done updating.
> [warn] There may be incompatibilitie

Re: Spark master broken?

2015-07-12 Thread Ted Yu
Jenkins shows green builds.

What Java version did you use ?

Cheers

On Sun, Jul 12, 2015 at 3:49 AM, René Treffer  wrote:

> Hi *,
>
> I'm currently trying to build master but it fails with
>
>  [error] Picked up JAVA_TOOL_OPTIONS:
>> -javaagent:/usr/share/java/jayatanaag.jar
>> [error]
>> /home/rtreffer/work/spark-master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java:135:
>>
>> error: > org.apache.spark.sql.execution.UnsafeExternalRowSorter$1> is not abstract
>> and does not override abstract method 
>> minBy(Function1,Ordering) in TraversableOnce
>> [error]   return new AbstractScalaRowIterator() {
>> [error] ^
>> [error]   where B,A are type-variables:
>> [error] B extends Object declared in method
>> minBy(Function1,Ordering)
>> [error] A extends Object declared in interface TraversableOnce
>> [error] 1 error
>> [error] Compile failed at Jul 12, 2015 12:17:56 PM [20.565s]
>> [INFO]
>> 
>> [INFO] Reactor Summary:
>> [INFO]
>> [INFO] Spark Project Parent POM .. SUCCESS
>> [6.094s]
>> [INFO] Spark Project Core  SUCCESS
>> [2:52.035s]
>> [INFO] Spark Project Bagel ... SUCCESS
>> [22.506s]
>> [INFO] Spark Project GraphX .. SUCCESS
>> [19.076s]
>> [INFO] Spark Project ML Library .. SUCCESS
>> [1:15.520s]
>> [INFO] Spark Project Tools ... SUCCESS
>> [2.041s]
>> [INFO] Spark Project Networking .. SUCCESS
>> [8.741s]
>> [INFO] Spark Project Shuffle Streaming Service ... SUCCESS
>> [7.298s]
>> [INFO] Spark Project Streaming ... SUCCESS
>> [29.154s]
>> [INFO] Spark Project Catalyst  FAILURE
>> [21.048s]
>>
>>
>  I've tried to build for 2.11 and 2.10 without success. Is there a known
> issue on master?
>
> Regards,
>   Rene Treffer
>


Re: The latest master branch didn't compile with -Phive?

2015-07-10 Thread Ted Yu
Compilation on master branch has been fixed.

Thanks to Cheng Lian.

On Thu, Jul 9, 2015 at 8:50 AM, Josh Rosen  wrote:

> Jenkins runs compile-only builds for Maven as an early warning system for
> this type of issue; you can see from
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ that the
> Maven compilation is now broken in master.
>
> On Thu, Jul 9, 2015 at 8:48 AM, Ted Yu  wrote:
>
>> I guess the compilation issue didn't surface in QA run because sbt was
>> used:
>>
>> [info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:  
>> -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive-thriftserver 
>> -Phive package assembly/assembly streaming-kafka-assembly/assembly 
>> streaming-flume-assembly/assembly
>>
>>
>> Cheers
>>
>>
>> On Thu, Jul 9, 2015 at 7:58 AM, Ted Yu  wrote:
>>
>>> From
>>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
>>> :
>>>
>>> + build/mvn -DzincPort=3439 -DskipTests -Phadoop-2.4 -Pyarn -Phive 
>>> -Phive-thriftserver -Pkinesis-asl clean package
>>>
>>>
>>> FYI
>>>
>>>
>>> On Thu, Jul 9, 2015 at 7:51 AM, Sean Owen  wrote:
>>>
>>>> This is an error from scalac and not Spark. I find it happens
>>>> frequently for me but goes away on a clean build. *shrug*
>>>>
>>>>
>>>> On Thu, Jul 9, 2015 at 3:45 PM, Ted Yu  wrote:
>>>> > Looking at
>>>> >
>>>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
>>>> > :
>>>> >
>>>> > [error]
>>>> > [error]  while compiling:
>>>> >
>>>> /home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
>>>> > [error] during phase: typer
>>>> > [error]  library version: version 2.10.4
>>>> > [error] compiler version: version 2.10.4
>>>> >
>>>> >
>>>> > I traced back to build #2869 and the error was there - didn't go back
>>>> > further.
>>>> >
>>>> >
>>>> > FYI
>>>> >
>>>> >
>>>> > On Thu, Jul 9, 2015 at 7:24 AM, Yijie Shen >>> >
>>>> > wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I use the clean version just clone from the master branch, build
>>>> with:
>>>> >>
>>>> >> build/mvn -Phive -Phadoop-2.4 -DskipTests package
>>>> >>
>>>> >> And BUILD FAILURE at last, due to:
>>>> >>
>>>> >> [error]  while compiling:
>>>> >>
>>>> /Users/yijie/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
>>>> >> [error] during phase: typer
>>>> >> [error]  library version: version 2.10.4
>>>> >> [error] compiler version: version 2.10.4
>>>> >> ...
>>>> >> [error]
>>>> >> [error]   last tree to typer: Ident(Warehouse)
>>>> >> [error]   symbol:  (flags: )
>>>> >> [error]symbol definition: 
>>>> >> [error]symbol owners:
>>>> >> [error]   context owners: lazy value hiveWarehouse -> class
>>>> >> HiveMetastoreCatalog -> package hive
>>>> >> [error]
>>>> >> [error] == Enclosing template or block ==
>>>> >> [error]
>>>> >> [error] Template( // val :  in
>>>> class
>>>> >> HiveMetastoreCatalog
>>>> >> [error]   "Catalog", "Logging" // parents
>>>> >> [error]   ValDef(
>>>> >> [error] private
>>>> >> [error] "_"
>>>> >> [error] 
>>>> >> [error] 
>>>> >> [error]   )
>>>> >> [error]   // 24 statements
>>>> >> [error]   ValDef( // private[this] val client:
>>>> >> org.apache.spark.sql.hive.client.ClientInterface in class
>>>> >> HiveMetastoreCatalog
>>>> >> [error] private  
>>>> >> [error] "client"
>>>> >> [error] "ClientInterface"
>>>> >> [error] 
>>>> >> …
>>>> >>
>>>> >>
>>>> https://gist.github.com/yijieshen/e0925e2227a312ae4c64#file-build_failure
>>>> >>
>>>> >> Did I make a silly mistake?
>>>> >>
>>>> >> Thanks, Yijie
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>


Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Ted Yu
I guess the compilation issue didn't surface in QA run because sbt was used:

[info] Building Spark (w/Hive 0.13.1) using SBT with these arguments:
-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl
-Phive-thriftserver -Phive package assembly/assembly
streaming-kafka-assembly/assembly streaming-flume-assembly/assembly


Cheers


On Thu, Jul 9, 2015 at 7:58 AM, Ted Yu  wrote:

> From
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
> :
>
> + build/mvn -DzincPort=3439 -DskipTests -Phadoop-2.4 -Pyarn -Phive 
> -Phive-thriftserver -Pkinesis-asl clean package
>
>
> FYI
>
>
> On Thu, Jul 9, 2015 at 7:51 AM, Sean Owen  wrote:
>
>> This is an error from scalac and not Spark. I find it happens
>> frequently for me but goes away on a clean build. *shrug*
>>
>>
>> On Thu, Jul 9, 2015 at 3:45 PM, Ted Yu  wrote:
>> > Looking at
>> >
>> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
>> > :
>> >
>> > [error]
>> > [error]  while compiling:
>> >
>> /home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
>> > [error] during phase: typer
>> > [error]  library version: version 2.10.4
>> > [error] compiler version: version 2.10.4
>> >
>> >
>> > I traced back to build #2869 and the error was there - didn't go back
>> > further.
>> >
>> >
>> > FYI
>> >
>> >
>> > On Thu, Jul 9, 2015 at 7:24 AM, Yijie Shen 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I use the clean version just clone from the master branch, build with:
>> >>
>> >> build/mvn -Phive -Phadoop-2.4 -DskipTests package
>> >>
>> >> And BUILD FAILURE at last, due to:
>> >>
>> >> [error]  while compiling:
>> >>
>> /Users/yijie/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
>> >> [error] during phase: typer
>> >> [error]  library version: version 2.10.4
>> >> [error] compiler version: version 2.10.4
>> >> ...
>> >> [error]
>> >> [error]   last tree to typer: Ident(Warehouse)
>> >> [error]   symbol:  (flags: )
>> >> [error]symbol definition: 
>> >> [error]symbol owners:
>> >> [error]   context owners: lazy value hiveWarehouse -> class
>> >> HiveMetastoreCatalog -> package hive
>> >> [error]
>> >> [error] == Enclosing template or block ==
>> >> [error]
>> >> [error] Template( // val :  in
>> class
>> >> HiveMetastoreCatalog
>> >> [error]   "Catalog", "Logging" // parents
>> >> [error]   ValDef(
>> >> [error] private
>> >> [error] "_"
>> >> [error] 
>> >> [error] 
>> >> [error]   )
>> >> [error]   // 24 statements
>> >> [error]   ValDef( // private[this] val client:
>> >> org.apache.spark.sql.hive.client.ClientInterface in class
>> >> HiveMetastoreCatalog
>> >> [error] private  
>> >> [error] "client"
>> >> [error] "ClientInterface"
>> >> [error] 
>> >> …
>> >>
>> >>
>> https://gist.github.com/yijieshen/e0925e2227a312ae4c64#file-build_failure
>> >>
>> >> Did I make a silly mistake?
>> >>
>> >> Thanks, Yijie
>> >
>> >
>>
>
>


Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Ted Yu
From
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
:

+ build/mvn -DzincPort=3439 -DskipTests -Phadoop-2.4 -Pyarn -Phive
-Phive-thriftserver -Pkinesis-asl clean package


FYI


On Thu, Jul 9, 2015 at 7:51 AM, Sean Owen  wrote:

> This is an error from scalac and not Spark. I find it happens
> frequently for me but goes away on a clean build. *shrug*
>
> On Thu, Jul 9, 2015 at 3:45 PM, Ted Yu  wrote:
> > Looking at
> >
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
> > :
> >
> > [error]
> > [error]  while compiling:
> >
> /home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
> > [error] during phase: typer
> > [error]  library version: version 2.10.4
> > [error] compiler version: version 2.10.4
> >
> >
> > I traced back to build #2869 and the error was there - didn't go back
> > further.
> >
> >
> > FYI
> >
> >
> > On Thu, Jul 9, 2015 at 7:24 AM, Yijie Shen 
> > wrote:
> >>
> >> Hi,
> >>
> >> I use the clean version just clone from the master branch, build with:
> >>
> >> build/mvn -Phive -Phadoop-2.4 -DskipTests package
> >>
> >> And BUILD FAILURE at last, due to:
> >>
> >> [error]  while compiling:
> >>
> /Users/yijie/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
> >> [error] during phase: typer
> >> [error]  library version: version 2.10.4
> >> [error] compiler version: version 2.10.4
> >> ...
> >> [error]
> >> [error]   last tree to typer: Ident(Warehouse)
> >> [error]   symbol:  (flags: )
> >> [error]symbol definition: 
> >> [error]symbol owners:
> >> [error]   context owners: lazy value hiveWarehouse -> class
> >> HiveMetastoreCatalog -> package hive
> >> [error]
> >> [error] == Enclosing template or block ==
> >> [error]
> >> [error] Template( // val :  in class
> >> HiveMetastoreCatalog
> >> [error]   "Catalog", "Logging" // parents
> >> [error]   ValDef(
> >> [error] private
> >> [error] "_"
> >> [error] 
> >> [error] 
> >> [error]   )
> >> [error]   // 24 statements
> >> [error]   ValDef( // private[this] val client:
> >> org.apache.spark.sql.hive.client.ClientInterface in class
> >> HiveMetastoreCatalog
> >> [error] private  
> >> [error] "client"
> >> [error] "ClientInterface"
> >> [error] 
> >> …
> >>
> >>
> https://gist.github.com/yijieshen/e0925e2227a312ae4c64#file-build_failure
> >>
> >> Did I make a silly mistake?
> >>
> >> Thanks, Yijie
> >
> >
>


Re: The latest master branch didn't compile with -Phive?

2015-07-09 Thread Ted Yu
Looking at
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/2875/consoleFull
:

[error]
[error]  while compiling:
/home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
[error] during phase: typer
[error]  library version: version 2.10.4
[error] compiler version: version 2.10.4


I traced back to build #2869 and the error was there - didn't go back further.


FYI


On Thu, Jul 9, 2015 at 7:24 AM, Yijie Shen 
wrote:

> Hi,
>
> I use the clean version just clone from the master branch, build with:
>
> build/mvn -Phive -Phadoop-2.4 -DskipTests package
>
> And BUILD FAILURE at last, due to:
>
> [error]  while compiling: 
> /Users/yijie/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
> [error] during phase: typer
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4
> ...
> [error]
> [error]   last tree to typer: Ident(Warehouse)
> [error]   symbol:  (flags: )
> [error]symbol definition: 
> [error]symbol owners:
> [error]   context owners: lazy value hiveWarehouse -> class 
> HiveMetastoreCatalog -> package hive
> [error]
> [error] == Enclosing template or block ==
> [error]
> [error] Template( // val :  in class 
> HiveMetastoreCatalog
> [error]   "Catalog", "Logging" // parents
> [error]   ValDef(
> [error] private
> [error] "_"
> [error] 
> [error] 
> [error]   )
> [error]   // 24 statements
> [error]   ValDef( // private[this] val client: 
> org.apache.spark.sql.hive.client.ClientInterface in class HiveMetastoreCatalog
> [error] private  
> [error] "client"
> [error] "ClientInterface"
> [error] 
> …
>
>
> https://gist.github.com/yijieshen/e0925e2227a312ae4c64#file-build_failure
>
> Did I make a silly mistake?
>
> Thanks, Yijie
>


Re: Can not build master

2015-07-03 Thread Ted Yu
Here is mine:

Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
2015-03-13T13:10:27-07:00)
Maven home: /home/hbase/apache-maven-3.3.1
Java version: 1.8.0_45, vendor: Oracle Corporation
Java home: /home/hbase/jdk1.8.0_45/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family:
"unix"

On Fri, Jul 3, 2015 at 6:05 PM, Andrew Or  wrote:

> @Tarek and Ted, what maven versions are you using?
>
> 2015-07-03 17:35 GMT-07:00 Krishna Sankar :
>
>> Patrick,
>>I assume an RC3 will be out for folks like me to test the
>> distribution. As usual, I will run the tests when you have a new
>> distribution.
>> Cheers
>> 
>>
>> On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell 
>> wrote:
>>
>>> Patch that added test-jar dependencies:
>>> https://github.com/apache/spark/commit/bfe74b34
>>>
>>> Patch that originally disabled dependency reduced poms:
>>>
>>> https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724
>>>
>>> Patch that reverted the disabling of dependency reduced poms:
>>>
>>> https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e
>>>
>>> On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell 
>>> wrote:
>>> > Okay I did some forensics with Sean Owen. Some things about this bug:
>>> >
>>> > 1. The underlying cause is that we added some code to make the tests
>>> > of sub modules depend on the core tests. For unknown reasons this
>>> > causes Spark to hit MSHADE-148 for *some* combinations of build
>>> > profiles.
>>> >
>>> > 2. MSHADE-148 can be worked around by disabling building of
>>> > "dependency reduced poms" because then the buggy code path is
>>> > circumvented. Andrew Or did this in a patch on the 1.4 branch.
>>> > However, that is not a tenable option for us because our *published*
>>> > pom files require dependency reduction to substitute in the scala
>>> > version correctly for the poms published to maven central.
>>> >
>>> > 3. As a result, Andrew Or reverted his patch recently, causing some
>>> > package builds to start failing again (but publishing works now).
>>> >
>>> > 4. The reason this is not detected in our test harness or release
>>> > build is that it is sensitive to the profiles enabled. The combination
>>> > of profiles we enable in the test harness and release builds do not
>>> > trigger this bug.
>>> >
>>> > The best path I see forward right now is to do the following:
>>> >
>>> > 1. Disable creation of dependency reduced poms by default (this
>>> > doesn't matter for people doing a package build) so typical users
>>> > won't have this bug.
>>> >
>>> > 2. Add a profile that re-enables that setting.
>>> >
>>> > 3. Use the above profile when publishing release artifacts to maven
>>> central.
>>> >
>>> > 4. Hope that we don't hit this bug for publishing.
>>> >
>>> > - Patrick
>>> >
>>> > On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel 
>>> wrote:
>>> >> Doesn't change anything for me.
>>> >>
>>> >> On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell 
>>> wrote:
>>> >>>
>>> >>> Can you try using the built in maven "build/mvn..."? All of our
>>> builds
>>> >>> are passing on Jenkins so I wonder if it's a maven version issue:
>>> >>>
>>> >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/
>>> >>>
>>> >>> - Patrick
>>> >>>
>>> >>> On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu  wrote:
>>> >>> > Please take a look at SPARK-8781
>>> >>> > (https://github.com/apache/spark/pull/7193)
>>> >>> >
>>> >>> > Cheers
>>> >>> >
>>> >>> > On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel 
>>> wrote:
>>> >>> >>
>>> >>> >> I found a solution, there might be a better one.
>>> >>> >>
>>> >>> >> https://github.com/apache/spark/pull/7217
>>> >>> >>
>>> >>> >> On Fri, Jul 3, 2015 

Re: Can not build master

2015-07-03 Thread Ted Yu
Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193)

Cheers

On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel  wrote:

> I found a solution, there might be a better one.
>
> https://github.com/apache/spark/pull/7217
>
> On Fri, Jul 3, 2015 at 2:28 PM Robin East  wrote:
>
>> Yes me too
>>
>> On 3 Jul 2015, at 22:21, Ted Yu  wrote:
>>
>> This is what I got (the last line was repeated non-stop):
>>
>> [INFO] Replacing original artifact with shaded artifact.
>> [INFO] Replacing
>> /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with
>> /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
>> [INFO] Dependency-reduced POM written at:
>> /home/hbase/spark/bagel/dependency-reduced-pom.xml
>> [INFO] Dependency-reduced POM written at:
>> /home/hbase/spark/bagel/dependency-reduced-pom.xml
>>
>> On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel  wrote:
>>
>>> Hi all,
>>>
>>> I am trying to build the master, but it stucks and prints
>>>
>>> [INFO] Dependency-reduced POM written at:
>>> /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
>>>
>>> build command:  mvn -DskipTests clean package
>>>
>>> Do others have the same issue?
>>>
>>> Regards,
>>> Tarek
>>>
>>
>>
>>


Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Ted Yu
Patrick:
I used the following command:
~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean
package

The build doesn't seem to stop.
Here is tail of build output:

[INFO] Dependency-reduced POM written at:
/home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at:
/home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml

Here is part of the stack trace for the build process:

http://pastebin.com/xL2Y0QMU

FYI

On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.4.1!
>
> This release fixes a handful of known issues in Spark 1.4.0, listed here:
> http://s.apache.org/spark-1.4.1
>
> The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
> 07b95c7adf88f0662b7ab1c47e302ff5e6859606
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> [published as version: 1.4.1]
> https://repository.apache.org/content/repositories/orgapachespark-1120/
> [published as version: 1.4.1-rc2]
> https://repository.apache.org/content/repositories/orgapachespark-1121/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
>
> Please vote on releasing this package as Apache Spark 1.4.1!
>
> The vote is open until Monday, July 06, at 22:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Can not build master

2015-07-03 Thread Ted Yu
This is what I got (the last line was repeated non-stop):

[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing
/home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with
/home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
[INFO] Dependency-reduced POM written at:
/home/hbase/spark/bagel/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at:
/home/hbase/spark/bagel/dependency-reduced-pom.xml

On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel  wrote:

> Hi all,
>
> I am trying to build the master, but it stucks and prints
>
> [INFO] Dependency-reduced POM written at:
> /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
>
> build command:  mvn -DskipTests clean package
>
> Do others have the same issue?
>
> Regards,
> Tarek
>


Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Ted Yu
Andrew:
I agree with your assessment.

Cheers

On Mon, Jun 29, 2015 at 3:33 PM, Andrew Or  wrote:

> Hi Ted,
>
> We haven't observed a StreamingContextSuite failure on our test
> infrastructure recently. Given that we cannot reproduce it even locally it
> is unlikely that this uncovers a real bug. Even if it does I would not
> block the release on it because many in the community are waiting for a few
> important fixes. In general, there will always be outstanding issues in
> Spark that we cannot address in every release.
>
> -Andrew
>
> 2015-06-29 14:29 GMT-07:00 Ted Yu :
>
>> The test passes when run alone on my machine as well.
>>
>> Please run test suite.
>>
>> Thanks
>>
>> On Mon, Jun 29, 2015 at 2:01 PM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> @Ted, I ran the following two commands.
>>>
>>> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DskipTests clean
>>> package
>>> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive
>>> -DwildcardSuites=org.apache.spark.streaming.StreamingContextSuite test
>>>
>>> Using Java version "1.7.0_51", the tests passed normally.
>>>
>>>
>>>
>>> On Mon, Jun 29, 2015 at 1:05 PM, Krishna Sankar 
>>> wrote:
>>>
>>>> +1 (non-binding, of course)
>>>>
>>>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:26 min
>>>>  mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
>>>> 2. Tested pyspark, mllib
>>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
>>>> 2.2. Linear/Ridge/Laso Regression OK
>>>> 2.3. Decision Tree, Naive Bayes OK
>>>> 2.4. KMeans OK
>>>>Center And Scale OK
>>>> 2.5. RDD operations OK
>>>>   State of the Union Texts - MapReduce, Filter,sortByKey (word
>>>> count)
>>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>>Model evaluation/optimization (rank, numIter, lambda) with
>>>> itertools OK
>>>> 3. Scala - MLlib
>>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
>>>> 3.2. LinearRegressionWithSGD OK
>>>> 3.3. Decision Tree OK
>>>> 3.4. KMeans OK
>>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>> 3.6. saveAsParquetFile OK
>>>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
>>>> registerTempTable, sql OK
>>>> 3.8. result = sqlContext.sql("SELECT
>>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
>>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
>>>> 4.0. Spark SQL from Python OK
>>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'")
>>>> OK
>>>> 5.0. Packages
>>>> 5.1. com.databricks.spark.csv - read/write OK
>>>>
>>>> Cheers
>>>> 
>>>>
>>>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 1.4.1!
>>>>>
>>>>> This release fixes a handful of known issues in Spark 1.4.0, listed
>>>>> here:
>>>>> http://s.apache.org/spark-1.4.1
>>>>>
>>>>> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>>>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>>>>> 60e08e50751fe3929156de956d62faea79f5b801
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>>>>>
>>>>> Release artifacts are signed with the following key:
>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> [published as version: 1.4.1]
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>>>>> [published as version: 1.4.1-rc1]
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>>>>>
>>>>> Please vote on releasing this package as Apache Spark 1.4.1!
>>>>>
>>>>> The vote is open until Saturday, June 27, at 06:32 UTC and passes
>>>>> if a majority of at least 3 +1 PMC votes are cast.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 1.4.1
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see
>>>>> http://spark.apache.org/
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Ted Yu
The test passes when run alone on my machine as well.

Please run test suite.

Thanks

On Mon, Jun 29, 2015 at 2:01 PM, Tathagata Das 
wrote:

> @Ted, I ran the following two commands.
>
> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DskipTests clean
> package
> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive
> -DwildcardSuites=org.apache.spark.streaming.StreamingContextSuite test
>
> Using Java version "1.7.0_51", the tests passed normally.
>
>
>
> On Mon, Jun 29, 2015 at 1:05 PM, Krishna Sankar 
> wrote:
>
>> +1 (non-binding, of course)
>>
>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:26 min
>>  mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
>> 2. Tested pyspark, mllib
>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
>> 2.2. Linear/Ridge/Laso Regression OK
>> 2.3. Decision Tree, Naive Bayes OK
>> 2.4. KMeans OK
>>Center And Scale OK
>> 2.5. RDD operations OK
>>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>Model evaluation/optimization (rank, numIter, lambda) with
>> itertools OK
>> 3. Scala - MLlib
>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
>> 3.2. LinearRegressionWithSGD OK
>> 3.3. Decision Tree OK
>> 3.4. KMeans OK
>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
>> 3.6. saveAsParquetFile OK
>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
>> registerTempTable, sql OK
>> 3.8. result = sqlContext.sql("SELECT
>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
>> 4.0. Spark SQL from Python OK
>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
>> 5.0. Packages
>> 5.1. com.databricks.spark.csv - read/write OK
>>
>> Cheers
>> 
>>
>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.4.1!
>>>
>>> This release fixes a handful of known issues in Spark 1.4.0, listed here:
>>> http://s.apache.org/spark-1.4.1
>>>
>>> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>>> 60e08e50751fe3929156de956d62faea79f5b801
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> [published as version: 1.4.1]
>>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>>> [published as version: 1.4.1-rc1]
>>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>>>
>>> Please vote on releasing this package as Apache Spark 1.4.1!
>>>
>>> The vote is open until Saturday, June 27, at 06:32 UTC and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.4.1
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see
>>> http://spark.apache.org/
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>


Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Ted Yu
Here is the command I used:
mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package

Java: 1.8.0_45

OS:
Linux x.com 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

Cheers

On Mon, Jun 29, 2015 at 12:04 AM, Tathagata Das  wrote:

> @Ted, could you elaborate more on what was the test command that you ran?
> What profiles, using SBT or Maven?
>
> TD
>
> On Sun, Jun 28, 2015 at 12:21 PM, Patrick Wendell 
> wrote:
>
>> Hey Krishna - this is still the current release candidate.
>>
>> - Patrick
>>
>> On Sun, Jun 28, 2015 at 12:14 PM, Krishna Sankar 
>> wrote:
>> > Patrick,
>> >Haven't seen any replies on test results. I will byte ;o) - Should I
>> test
>> > this version or is another one in the wings ?
>> > Cheers
>> > 
>> >
>> > On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
>> > wrote:
>> >>
>> >> Please vote on releasing the following candidate as Apache Spark
>> version
>> >> 1.4.1!
>> >>
>> >> This release fixes a handful of known issues in Spark 1.4.0, listed
>> here:
>> >> http://s.apache.org/spark-1.4.1
>> >>
>> >> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>> >> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>> >> 60e08e50751fe3929156de956d62faea79f5b801
>> >>
>> >> The release files, including signatures, digests, etc. can be found at:
>> >> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>> >>
>> >> Release artifacts are signed with the following key:
>> >> https://people.apache.org/keys/committer/pwendell.asc
>> >>
>> >> The staging repository for this release can be found at:
>> >> [published as version: 1.4.1]
>> >>
>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>> >> [published as version: 1.4.1-rc1]
>> >>
>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>> >>
>> >> The documentation corresponding to this release can be found at:
>> >>
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>> >>
>> >> Please vote on releasing this package as Apache Spark 1.4.1!
>> >>
>> >> The vote is open until Saturday, June 27, at 06:32 UTC and passes
>> >> if a majority of at least 3 +1 PMC votes are cast.
>> >>
>> >> [ ] +1 Release this package as Apache Spark 1.4.1
>> >> [ ] -1 Do not release this package because ...
>> >>
>> >> To learn more about Apache Spark, please see
>> >> http://spark.apache.org/
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11

2015-06-28 Thread Ted Yu
Spark-Master-Scala211-Compile build is green.

However it is not clear what the actual command is:

[EnvInject] - Variables injected successfully.
[Spark-Master-Scala211-Compile] $ /bin/bash /tmp/hudson8945334776362889961.sh


FYI


On Sun, Jun 28, 2015 at 6:02 PM, Alessandro Baretta 
wrote:

> I am building the current master branch with Scala 2.11 following these
> instructions:
>
> Building for Scala 2.11
>
> To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11
>  property:
>
> dev/change-version-to-2.11.sh
> mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
>
>
> Here's what I'm seeing:
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.security.Groups).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's repl log4j profile:
> org/apache/spark/log4j-defaults-repl.properties
> To adjust logging level use sc.setLogLevel("INFO")
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.5.0-SNAPSHOT
>   /_/
>
> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_79)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/06/29 00:42:20 ERROR ActorSystemImpl: Uncaught fatal error from thread
> [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
> ActorSystem [sparkDriver]
> java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage
> overrides final method
> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at
> akka.remote.transport.AkkaPduProtobufCodec$.constructControlMessagePdu(AkkaPduCodec.scala:231)
> at
> akka.remote.transport.AkkaPduProtobufCodec$.(AkkaPduCodec.scala:153)
> at akka.remote.transport.AkkaPduProtobufCodec$.(AkkaPduCodec.scala)
> at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:733)
> at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:703)
>
> What am I doing wrong?
>
>


Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Ted Yu
Pardon.
During earlier test run, I got:

^[[32mStreamingContextSuite:^[[0m
^[[32m- from no conf constructor^[[0m
^[[32m- from no conf + spark home^[[0m
^[[32m- from no conf + spark home + env^[[0m
^[[32m- from conf with settings^[[0m
^[[32m- from existing SparkContext^[[0m
^[[32m- from existing SparkContext with settings^[[0m
^[[31m*** RUN ABORTED ***^[[0m
^[[31m  java.lang.NoSuchMethodError:
org.apache.spark.ui.JettyUtils$.createStaticHandler(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/jetty/servlet/ServletContextHandler;^[[0m
^[[31m  at
org.apache.spark.streaming.ui.StreamingTab.attach(StreamingTab.scala:49)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:601)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:601)^[[0m
^[[31m  at scala.Option.foreach(Option.scala:236)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:601)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply$mcV$sp(StreamingContextSuite.scala:101)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m
^[[31m  at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)^[[0m
^[[31m  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)^[[0m

The error from previous email was due to absence
of StreamingContextSuite.scala

On Fri, Jun 26, 2015 at 1:27 PM, Ted Yu  wrote:

> I got the following when running test suite:
>
> [INFO] compiler plugin:
> BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
> ^[[0m[^[[0minfo^[[0m] ^[[0mCompiling 2 Scala sources and 1 Java source to
> /home/hbase/spark-1.4.1/streaming/target/scala-2.10/test-classes...^[[0m
> ^[[0m[^[[31merror^[[0m]
> ^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/DStreamClosureSuite.scala:82:
> not found: type TestException^[[0m
> ^[[0m[^[[31merror^[[0m] ^[[0mthrow new TestException(^[[0m
> ^[[0m[^[[31merror^[[0m] ^[[0m  ^^[[0m
> ^[[0m[^[[31merror^[[0m]
> ^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/scheduler/JobGeneratorSuite.scala:73:
> not found: type TestReceiver^[[0m
> ^[[0m[^[[31merror^[[0m] ^[[0m  val inputStream =
> ssc.receiverStream(new TestReceiver)^[[0m
> ^[[0m[^[[31merror^[[0m] ^[[0m
>   ^^[[0m
> ^[[0m[^[[31merror^[[0m] ^[[0mtwo errors found^[[0m
> ^[[0m[^[[31merror^[[0m] ^[[0mCompile failed at Jun 25, 2015 5:12:24 PM
> [1.492s]^[[0m
>
> Has anyone else seen similar error ?
>
> Thanks
>
> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.4.1!
>>
>> This release fixes a handful of known issues in Spark 1.4.0, listed here:
>> http://s.apache.org/spark-1.4.1
>>
>> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>> 60e08e50751fe3929156de956d62faea79f5b801
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> [published as version: 1.4.1]
>> https://repository.apache.org/content/repositories/orgapachespark-1118/
>> [published as version: 1.4.1-rc1]
>> https://repository.apache.org/content/repositories/orgapachespark-1119/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.4.1!
>>
>> The vote is open until Saturday, June 27, at 06:32 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.4.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Ted Yu
I got the following when running test suite:

[INFO] compiler plugin:
BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
^[[0m[^[[0minfo^[[0m] ^[[0mCompiling 2 Scala sources and 1 Java source to
/home/hbase/spark-1.4.1/streaming/target/scala-2.10/test-classes...^[[0m
^[[0m[^[[31merror^[[0m]
^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/DStreamClosureSuite.scala:82:
not found: type TestException^[[0m
^[[0m[^[[31merror^[[0m] ^[[0mthrow new TestException(^[[0m
^[[0m[^[[31merror^[[0m] ^[[0m  ^^[[0m
^[[0m[^[[31merror^[[0m]
^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/scheduler/JobGeneratorSuite.scala:73:
not found: type TestReceiver^[[0m
^[[0m[^[[31merror^[[0m] ^[[0m  val inputStream = ssc.receiverStream(new
TestReceiver)^[[0m
^[[0m[^[[31merror^[[0m] ^[[0m
^^[[0m
^[[0m[^[[31merror^[[0m] ^[[0mtwo errors found^[[0m
^[[0m[^[[31merror^[[0m] ^[[0mCompile failed at Jun 25, 2015 5:12:24 PM
[1.492s]^[[0m

Has anyone else seen similar error ?

Thanks

On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.4.1!
>
> This release fixes a handful of known issues in Spark 1.4.0, listed here:
> http://s.apache.org/spark-1.4.1
>
> The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
> 60e08e50751fe3929156de956d62faea79f5b801
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> [published as version: 1.4.1]
> https://repository.apache.org/content/repositories/orgapachespark-1118/
> [published as version: 1.4.1-rc1]
> https://repository.apache.org/content/repositories/orgapachespark-1119/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/
>
> Please vote on releasing this package as Apache Spark 1.4.1!
>
> The vote is open until Saturday, June 27, at 06:32 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Anyone facing problem in incremental building of individual project

2015-06-04 Thread Ted Yu
Andrew Or put in this workaround :

diff --git a/pom.xml b/pom.xml
index 0b1aaad..d03d33b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1438,6 +1438,8 @@
 2.3
 
   false
+  
+  false
   
 
   

FYI

On Thu, Jun 4, 2015 at 6:25 AM, Steve Loughran 
wrote:

>
>  On 4 Jun 2015, at 11:16, Meethu Mathew  wrote:
>
>  Hi all,
>
>  ​I added some new code to MLlib. When I am trying to build only the
> mllib project using  *mvn --projects mllib/ -DskipTests clean install*
> *​ *after setting
>  export S
> PARK_PREPEND_CLASSES=true
> ​, the build is getting stuck with the following message.
>
>
>
>>  Excluding org.jpmml:pmml-schema:jar:1.1.15 from the shaded jar.
>> [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.7 from the shaded jar.
>> [INFO] Excluding com.sun.xml.bind:jaxb-core:jar:2.2.7 from the shaded jar.
>> [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.7 from the shaded jar.
>> [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded
>> jar.
>> [INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded
>> jar.
>> [INFO] Replacing original artifact with shaded artifact.
>> [INFO] Replacing
>> /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT.jar
>> with
>> /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT-shaded.jar
>> [INFO] Dependency-reduced POM written at:
>> /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
>> [INFO] Dependency-reduced POM written at:
>> /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
>> [INFO] Dependency-reduced POM written at:
>> /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
>> [INFO] Dependency-reduced POM written at:
>> /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml
>
>.
>
>
>
>  I've seen something similar in a different build,
>
>  It looks like MSHADE-148:
> https://issues.apache.org/jira/browse/MSHADE-148
> if you apply Tom White's patch, does your problem go away?
>


Re: problem with using mapPartitions

2015-05-30 Thread Ted Yu
bq. val result = fDB.mappartitions(testMP).collect

Not sure if you pasted the above code - there was a typo: method name
should be mapPartitions

Cheers

On Sat, May 30, 2015 at 9:44 AM, unioah  wrote:

> Hi,
>
> I try to aggregate the value in each partition internally.
> For example,
>
> Before:
> worker 1:worker 2:
> 1, 2, 1 2, 1, 2
>
> After:
> worker 1:  worker 2:
> (1->2), (2->1)   (1->1), (2->2)
>
> I try to use mappartitions,
> object MyTest {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("This is a test")
> val sc = new SparkContext(conf)
>
> val fDB = sc.parallelize(List(1, 2, 1, 2, 1, 2, 5, 5, 2), 3)
> val result = fDB.mappartitions(testMP).collect
> println(result.mkString)
> sc.stop
>   }
>
>   def testMP(iter: Iterator[Int]): Iterator[(Long, Int)] = {
> var result = new LongMap[Int]()
> var cur = 0l
>
> while (iter.hasNext) {
>   cur = iter.next.toLong
>   if (result.contains(cur)) {
> result(cur) += 1
>   } else {
> result += (cur, 1)
>   }
> }
> result.toList.iterator
>   }
> }
>
> But I got the error message no matter how I tried.
>
> Driver stacktrace:
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependent
> Stages(DAGScheduler.scala:1204)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
> at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
> at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 15/05/30 10:41:21 ERROR SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 1
>
> Anybody can help me? Thx
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/problem-with-using-mapPartitions-tp12514.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: StreamingContextSuite fails with NoSuchMethodError

2015-05-30 Thread Ted Yu
I downloaded source tar ball and ran command similar to following with:
clean package -DskipTests

Then I ran the following command. 

Fyi 



> On May 30, 2015, at 12:42 AM, Tathagata Das  wrote:
> 
> Did was it a clean compilation? 
> 
> TD
> 
>> On Fri, May 29, 2015 at 10:48 PM, Ted Yu  wrote:
>> Hi,
>> I ran the following command on 1.4.0 RC3:
>> 
>> mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package
>> 
>> I saw the following failure:
>> 
>> ^[[32mStreamingContextSuite:^[[0m
>> ^[[32m- from no conf constructor^[[0m
>> ^[[32m- from no conf + spark home^[[0m
>> ^[[32m- from no conf + spark home + env^[[0m
>> ^[[32m- from conf with settings^[[0m
>> ^[[32m- from existing SparkContext^[[0m
>> ^[[32m- from existing SparkContext with settings^[[0m
>> ^[[31m*** RUN ABORTED ***^[[0m
>> ^[[31m  java.lang.NoSuchMethodError: 
>> org.apache.spark.ui.JettyUtils$.createStaticHandler(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/jetty/servlet/ServletContextHandler;^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.ui.StreamingTab.attach(StreamingTab.scala:49)^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m
>> ^[[31m  at scala.Option.foreach(Option.scala:236)^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:585)^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply$mcV$sp(StreamingContextSuite.scala:101)^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m
>> ^[[31m  at 
>> org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m
>> ^[[31m  at 
>> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)^[[0m
>> ^[[31m  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)^[[0m
>> 
>> Did anyone else encounter similar error ?
>> 
>> Cheers
> 


StreamingContextSuite fails with NoSuchMethodError

2015-05-29 Thread Ted Yu
Hi,
I ran the following command on 1.4.0 RC3:

mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package

I saw the following failure:

^[[32mStreamingContextSuite:^[[0m
^[[32m- from no conf constructor^[[0m
^[[32m- from no conf + spark home^[[0m
^[[32m- from no conf + spark home + env^[[0m
^[[32m- from conf with settings^[[0m
^[[32m- from existing SparkContext^[[0m
^[[32m- from existing SparkContext with settings^[[0m
^[[31m*** RUN ABORTED ***^[[0m
^[[31m  java.lang.NoSuchMethodError:
org.apache.spark.ui.JettyUtils$.createStaticHandler(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/jetty/servlet/ServletContextHandler;^[[0m
^[[31m  at
org.apache.spark.streaming.ui.StreamingTab.attach(StreamingTab.scala:49)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m
^[[31m  at scala.Option.foreach(Option.scala:236)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:585)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply$mcV$sp(StreamingContextSuite.scala:101)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m
^[[31m  at
org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m
^[[31m  at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)^[[0m
^[[31m  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)^[[0m

Did anyone else encounter similar error ?

Cheers


Re: ClosureCleaner slowing down Spark SQL queries

2015-05-27 Thread Ted Yu
Can you try your query using Spark 1.4.0 RC2 ?

There have been some fixes since 1.2.0
e.g.
SPARK-7233 ClosureCleaner#clean blocks concurrent job submitter threads

Cheers

On Wed, May 27, 2015 at 10:38 AM, Nitin Goyal  wrote:

> Hi All,
>
> I am running a SQL query (spark version 1.2) on a table created from
> unionAll of 3 schema RDDs which gets executed in roughly 400ms (200ms at
> driver and roughly 200ms at executors).
>
> If I run same query on a table created from unionAll of 27 schema RDDS, I
> see that executors time is same(because of concurrency and nature of my
> query) but driver time shoots to 600ms (and total query time being = 600 +
> 200 = 800ms).
>
> I attached JProfiler and found that ClosureCleaner clean method is taking
> time at driver(some issue related to URLClassLoader) and it linearly
> increases with number of RDDs being union-ed on which query is getting
> fired. This is causing my query to take a huge amount of time where I
> expect
> the query to be executed within 400ms irrespective of number of RDDs (since
> I have executors available to cater my need). PFB the links of screenshots
> from Jprofiler :-
>
> http://pasteboard.co/MnQtB4o.png
>
> http://pasteboard.co/MnrzHwJ.png
>
> Any help/suggestion to fix this will be highly appreciated since this needs
> to be fixed for production
>
> Thanks in Advance,
> Nitin
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Kryo option changed

2015-05-24 Thread Ted Yu
The original PR from Liye didn't include test which exercises Kryo buffer
size configured in mb which is below 2GB.

In my PR, I added such a test and it passed on Jenkins:
https://github.com/apache/spark/pull/6390

FYI

On Sun, May 24, 2015 at 8:08 AM, Ted Yu  wrote:

> Please update to the following:
>
> commit c2f0821aad3b82dcd327e914c9b297e92526649d
> Author: Zhang, Liye 
> Date:   Fri May 8 09:10:58 2015 +0100
>
> [SPARK-7392] [CORE] bugfix: Kryo buffer size cannot be larger than 2M
>
> On Sun, May 24, 2015 at 8:04 AM, Debasish Das 
> wrote:
>
>> I am May 3rd commit:
>>
>> commit 49549d5a1a867c3ba25f5e4aec351d4102444bc0
>>
>> Author: WangTaoTheTonic 
>>
>> Date:   Sun May 3 00:47:47 2015 +0100
>>
>>
>> [SPARK-7031] [THRIFTSERVER] let thrift server take
>> SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS
>>
>> On Sat, May 23, 2015 at 7:54 PM, Josh Rosen  wrote:
>>
>>> Which commit of master are you building off?  It looks like there was a
>>> bugfix for an issue related to KryoSerializer buffer configuration:
>>> https://github.com/apache/spark/pull/5934
>>>
>>> That patch was committed two weeks ago, but you mentioned that you're
>>> building off a newer version of master.  Could you confirm the commit that
>>> you're running?  If this used to work but now throws an error, then this is
>>> a regression that should be fixed; we shouldn't require you to perform a mb
>>> -> kb conversion to work around this.
>>>
>>> On Sat, May 23, 2015 at 6:37 PM, Ted Yu  wrote:
>>>
>>>> Pardon me.
>>>>
>>>> Please use '8192k'
>>>>
>>>> Cheers
>>>>
>>>> On Sat, May 23, 2015 at 6:24 PM, Debasish Das >>> > wrote:
>>>>
>>>>> Tried "8mb"...still I am failing on the same error...
>>>>>
>>>>> On Sat, May 23, 2015 at 6:10 PM, Ted Yu  wrote:
>>>>>
>>>>>> bq. it shuld be "8mb"
>>>>>>
>>>>>> Please use the above syntax.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Sat, May 23, 2015 at 6:04 PM, Debasish Das <
>>>>>> debasish.da...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am on last week's master but all the examples that set up the
>>>>>>> following
>>>>>>>
>>>>>>> .set("spark.kryoserializer.buffer", "8m")
>>>>>>>
>>>>>>> are failing with the following error:
>>>>>>>
>>>>>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>>>>>> spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb.
>>>>>>> looks like buffer.mb is deprecated...Is "8m" is not the right syntax
>>>>>>> to get 8mb kryo buffer or it shuld be "8mb"
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Deb
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Kryo option changed

2015-05-24 Thread Ted Yu
Please update to the following:

commit c2f0821aad3b82dcd327e914c9b297e92526649d
Author: Zhang, Liye 
Date:   Fri May 8 09:10:58 2015 +0100

[SPARK-7392] [CORE] bugfix: Kryo buffer size cannot be larger than 2M

On Sun, May 24, 2015 at 8:04 AM, Debasish Das 
wrote:

> I am May 3rd commit:
>
> commit 49549d5a1a867c3ba25f5e4aec351d4102444bc0
>
> Author: WangTaoTheTonic 
>
> Date:   Sun May 3 00:47:47 2015 +0100
>
>
> [SPARK-7031] [THRIFTSERVER] let thrift server take SPARK_DAEMON_MEMORY
> and SPARK_DAEMON_JAVA_OPTS
>
> On Sat, May 23, 2015 at 7:54 PM, Josh Rosen  wrote:
>
>> Which commit of master are you building off?  It looks like there was a
>> bugfix for an issue related to KryoSerializer buffer configuration:
>> https://github.com/apache/spark/pull/5934
>>
>> That patch was committed two weeks ago, but you mentioned that you're
>> building off a newer version of master.  Could you confirm the commit that
>> you're running?  If this used to work but now throws an error, then this is
>> a regression that should be fixed; we shouldn't require you to perform a mb
>> -> kb conversion to work around this.
>>
>> On Sat, May 23, 2015 at 6:37 PM, Ted Yu  wrote:
>>
>>> Pardon me.
>>>
>>> Please use '8192k'
>>>
>>> Cheers
>>>
>>> On Sat, May 23, 2015 at 6:24 PM, Debasish Das 
>>> wrote:
>>>
>>>> Tried "8mb"...still I am failing on the same error...
>>>>
>>>> On Sat, May 23, 2015 at 6:10 PM, Ted Yu  wrote:
>>>>
>>>>> bq. it shuld be "8mb"
>>>>>
>>>>> Please use the above syntax.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sat, May 23, 2015 at 6:04 PM, Debasish Das <
>>>>> debasish.da...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am on last week's master but all the examples that set up the
>>>>>> following
>>>>>>
>>>>>> .set("spark.kryoserializer.buffer", "8m")
>>>>>>
>>>>>> are failing with the following error:
>>>>>>
>>>>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>>>>> spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb.
>>>>>> looks like buffer.mb is deprecated...Is "8m" is not the right syntax
>>>>>> to get 8mb kryo buffer or it shuld be "8mb"
>>>>>>
>>>>>> Thanks.
>>>>>> Deb
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Kryo option changed

2015-05-23 Thread Ted Yu
Pardon me.

Please use '8192k'

Cheers

On Sat, May 23, 2015 at 6:24 PM, Debasish Das 
wrote:

> Tried "8mb"...still I am failing on the same error...
>
> On Sat, May 23, 2015 at 6:10 PM, Ted Yu  wrote:
>
>> bq. it shuld be "8mb"
>>
>> Please use the above syntax.
>>
>> Cheers
>>
>> On Sat, May 23, 2015 at 6:04 PM, Debasish Das 
>> wrote:
>>
>>> Hi,
>>>
>>> I am on last week's master but all the examples that set up the following
>>>
>>> .set("spark.kryoserializer.buffer", "8m")
>>>
>>> are failing with the following error:
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>> spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb.
>>> looks like buffer.mb is deprecated...Is "8m" is not the right syntax to
>>> get 8mb kryo buffer or it shuld be "8mb"
>>>
>>> Thanks.
>>> Deb
>>>
>>
>>
>


Re: Kryo option changed

2015-05-23 Thread Ted Yu
bq. it shuld be "8mb"

Please use the above syntax.

Cheers

On Sat, May 23, 2015 at 6:04 PM, Debasish Das 
wrote:

> Hi,
>
> I am on last week's master but all the examples that set up the following
>
> .set("spark.kryoserializer.buffer", "8m")
>
> are failing with the following error:
>
> Exception in thread "main" java.lang.IllegalArgumentException:
> spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb.
> looks like buffer.mb is deprecated...Is "8m" is not the right syntax to
> get 8mb kryo buffer or it shuld be "8mb"
>
> Thanks.
> Deb
>


Re: [IMPORTANT] Committers please update merge script

2015-05-23 Thread Ted Yu
INFRA-9646 has been resolved.

FYI

On Wed, May 13, 2015 at 6:00 PM, Patrick Wendell  wrote:

> Hi All - unfortunately the fix introduced another bug, which is that
> fixVersion was not updated properly. I've updated the script and had
> one other person test it.
>
> So committers please pull from master again thanks!
>
> - Patrick
>
> On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell 
> wrote:
> > Due to an ASF infrastructure change (bug?) [1] the default JIRA
> > resolution status has switched to "Pending Closed". I've made a change
> > to our merge script to coerce the correct status of "Fixed" when
> > resolving [2]. Please upgrade the merge script to master.
> >
> > I've manually corrected JIRA's that were closed with the incorrect
> > status. Let me know if you have any issues.
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-9646
> >
> > [2]
> https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Unable to build from assembly

2015-05-22 Thread Ted Yu
What version of Java do you use ?

Can you run this command first ?
build/sbt clean

BTW please see [SPARK-7498] [MLLIB] add varargs back to setDefault

Cheers

On Fri, May 22, 2015 at 7:34 AM, Manoj Kumar  wrote:

> Hello,
>
> I updated my master from upstream recently, and on running
>
> build/sbt assembly
>
> it gives me this error
>
> [error]
> /home/manoj/spark/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java:106:
> error: MyJavaLogisticRegression is not abstract and does not override
> abstract method setDefault(ParamPair...) in Params
> [error] class MyJavaLogisticRegression
> [error] ^
> [error]
> /home/manoj/spark/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java:168:
> error: MyJavaLogisticRegressionModel is not abstract and does not override
> abstract method setDefault(ParamPair...) in Params
> [error] class MyJavaLogisticRegressionModel
> [error] ^
> [error] 2 errors
> [error] (examples/compile:compile) javac returned nonzero exit code
>
> It was working fine before this.
>
> Could someone please guide me on what could be wrong?
>
>
>
> --
> Godspeed,
> Manoj Kumar,
> http://manojbits.wordpress.com
> 
> http://github.com/MechCoder
>


Re: Recent Spark test failures

2015-05-15 Thread Ted Yu
bq. would be prohibitive to build all configurations for every push

Agreed.

Can PR builder rotate testing against hadoop 2.3, 2.4, 2.6 and 2.7 (each
test run still uses one hadoop profile) ?

This way we would have some coverage for each of the major hadoop releases.

Cheers

On Fri, May 15, 2015 at 10:30 AM, Sean Owen  wrote:

> You all are looking only at the pull request builder. It just does one
> build to sanity-check a pull request, since that already takes 2 hours and
> would be prohibitive to build all configurations for every push. There is a
> different set of Jenkins jobs that periodically tests master against a lot
> more configurations, including Hadoop 2.4.
>
> On Fri, May 15, 2015 at 6:02 PM, Frederick R Reiss 
> wrote:
>
>> The PR builder seems to be building against Hadoop 2.3. In the log for
>> the most recent successful build (
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32805/consoleFull
>> ) I see:
>>
>> =
>> Building Spark
>> =
>> [info] Compile with Hive 0.13.1
>> [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3
>> -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver
>> ...
>> =
>> Running Spark unit tests
>> =
>> [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3
>> -Dhadoop.version=2.3.0 -Pkinesis-asl test
>>
>> Is anyone testing individual pull requests against Hadoop 2.4 or 2.6
>> before the code is declared "clean"?
>>
>> Fred
>>
>> [image: Inactive hide details for Ted Yu ---05/15/2015 09:29:09
>> AM---Jenkins build against hadoop 2.4 has been unstable recently: https]Ted
>> Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been
>> unstable recently: https://amplab.cs.berkeley.edu/jenkins/
>>
>> From: Ted Yu 
>> To: Andrew Or 
>> Cc: "dev@spark.apache.org" 
>> Date: 05/15/2015 09:29 AM
>> Subject: Re: Recent Spark test failures
>> --
>>
>>
>>
>> Jenkins build against hadoop 2.4 has been unstable recently:
>>
>> *https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/*
>> <https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/>
>>
>> I haven't found the test which hung / failed in recent Jenkins builds.
>>
>> But PR builder has several green builds lately:
>> *https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/*
>> <https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/>
>>
>> Maybe PR builder doesn't build against hadoop 2.4 ?
>>
>> Cheers
>>
>> On Mon, May 11, 2015 at 1:11 PM, Ted Yu <*yuzhih...@gmail.com*
>> > wrote:
>>
>>Makes sense.
>>
>>Having high determinism in these tests would make Jenkins build
>>stable.
>>
>>
>>On Mon, May 11, 2015 at 1:08 PM, Andrew Or <*and...@databricks.com*
>>> wrote:
>>   Hi Ted,
>>
>>   Yes, those two options can be useful, but in general I think the
>>   standard to set is that tests should never fail. It's actually the 
>> worst if
>>   tests fail sometimes but not others, because we can't reproduce them
>>   deterministically. Using -M and -A actually tolerates flaky tests to a
>>   certain extent, and I would prefer to instead increase the determinism 
>> in
>>   these tests.
>>
>>   -Andrew
>>
>>   2015-05-08 17:56 GMT-07:00 Ted Yu <*yuzhih...@gmail.com*
>>   >:
>>   Andrew:
>>  Do you think the -M and -A options described here can be used
>>  in test runs ?
>>  *http://scalatest.org/user_guide/using_the_runner*
>>  <http://scalatest.org/user_guide/using_the_runner>
>>
>>  Cheers
>>
>>  On Wed, May 6, 2015 at 5:41 PM, Andrew Or <
>>  *and...@databricks.com* > wrote:
>> Dear all,
>>
>> I'm sure you have all noticed that the Spark tests have been
>> fairly
>> unstable recently. I wanted to share a tool that I use to
>> track which

Re: Recent Spark test failures

2015-05-15 Thread Ted Yu
From
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32831/consoleFull
:

[info] Building Spark with these arguments: -Pyarn -Phadoop-2.3
-Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver


Should PR builder cover hadoop 2.4 as well ?


Thanks


On Fri, May 15, 2015 at 9:23 AM, Ted Yu  wrote:

> Jenkins build against hadoop 2.4 has been unstable recently:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/
>
> I haven't found the test which hung / failed in recent Jenkins builds.
>
> But PR builder has several green builds lately:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
>
> Maybe PR builder doesn't build against hadoop 2.4 ?
>
> Cheers
>
> On Mon, May 11, 2015 at 1:11 PM, Ted Yu  wrote:
>
>> Makes sense.
>>
>> Having high determinism in these tests would make Jenkins build stable.
>>
>> On Mon, May 11, 2015 at 1:08 PM, Andrew Or  wrote:
>>
>>> Hi Ted,
>>>
>>> Yes, those two options can be useful, but in general I think the
>>> standard to set is that tests should never fail. It's actually the worst if
>>> tests fail sometimes but not others, because we can't reproduce them
>>> deterministically. Using -M and -A actually tolerates flaky tests to a
>>> certain extent, and I would prefer to instead increase the determinism in
>>> these tests.
>>>
>>> -Andrew
>>>
>>> 2015-05-08 17:56 GMT-07:00 Ted Yu :
>>>
>>> Andrew:
>>>> Do you think the -M and -A options described here can be used in test
>>>> runs ?
>>>> http://scalatest.org/user_guide/using_the_runner
>>>>
>>>> Cheers
>>>>
>>>> On Wed, May 6, 2015 at 5:41 PM, Andrew Or 
>>>> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I'm sure you have all noticed that the Spark tests have been fairly
>>>>> unstable recently. I wanted to share a tool that I use to track which
>>>>> tests
>>>>> have been failing most often in order to prioritize fixing these flaky
>>>>> tests.
>>>>>
>>>>> Here is an output of the tool. This spreadsheet reports the top 10
>>>>> failed
>>>>> tests this week (ending yesterday 5/5):
>>>>>
>>>>> https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4
>>>>>
>>>>> It is produced by a small project:
>>>>> https://github.com/andrewor14/spark-test-failures
>>>>>
>>>>> I have been filing JIRAs on flaky tests based on this tool. Hopefully
>>>>> we
>>>>> can collectively stabilize the build a little more as we near the
>>>>> release
>>>>> for Spark 1.4.
>>>>>
>>>>> -Andrew
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Recent Spark test failures

2015-05-15 Thread Ted Yu
Jenkins build against hadoop 2.4 has been unstable recently:
https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/

I haven't found the test which hung / failed in recent Jenkins builds.

But PR builder has several green builds lately:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/

Maybe PR builder doesn't build against hadoop 2.4 ?

Cheers

On Mon, May 11, 2015 at 1:11 PM, Ted Yu  wrote:

> Makes sense.
>
> Having high determinism in these tests would make Jenkins build stable.
>
> On Mon, May 11, 2015 at 1:08 PM, Andrew Or  wrote:
>
>> Hi Ted,
>>
>> Yes, those two options can be useful, but in general I think the standard
>> to set is that tests should never fail. It's actually the worst if tests
>> fail sometimes but not others, because we can't reproduce them
>> deterministically. Using -M and -A actually tolerates flaky tests to a
>> certain extent, and I would prefer to instead increase the determinism in
>> these tests.
>>
>> -Andrew
>>
>> 2015-05-08 17:56 GMT-07:00 Ted Yu :
>>
>> Andrew:
>>> Do you think the -M and -A options described here can be used in test
>>> runs ?
>>> http://scalatest.org/user_guide/using_the_runner
>>>
>>> Cheers
>>>
>>> On Wed, May 6, 2015 at 5:41 PM, Andrew Or  wrote:
>>>
>>>> Dear all,
>>>>
>>>> I'm sure you have all noticed that the Spark tests have been fairly
>>>> unstable recently. I wanted to share a tool that I use to track which
>>>> tests
>>>> have been failing most often in order to prioritize fixing these flaky
>>>> tests.
>>>>
>>>> Here is an output of the tool. This spreadsheet reports the top 10
>>>> failed
>>>> tests this week (ending yesterday 5/5):
>>>>
>>>> https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4
>>>>
>>>> It is produced by a small project:
>>>> https://github.com/andrewor14/spark-test-failures
>>>>
>>>> I have been filing JIRAs on flaky tests based on this tool. Hopefully we
>>>> can collectively stabilize the build a little more as we near the
>>>> release
>>>> for Spark 1.4.
>>>>
>>>> -Andrew
>>>>
>>>
>>>
>>
>


Re: How to link code pull request with JIRA ID?

2015-05-13 Thread Ted Yu
Subproject tag should follow SPARK JIRA number.
e.g.

[SPARK-5277][SQL] ...

Cheers

On Wed, May 13, 2015 at 11:50 AM, Stephen Boesch  wrote:

> following up from Nicholas, it is
>
> [SPARK-12345] Your PR description
>
> where 12345 is the jira number.
>
>
> One thing I tend to forget is when/where to include the subproject tag e.g.
>  [MLLIB]
>
>
> 2015-05-13 11:11 GMT-07:00 Nicholas Chammas :
>
> > That happens automatically when you open a PR with the JIRA key in the PR
> > title.
> >
> > On Wed, May 13, 2015 at 2:10 PM Chandrashekhar Kotekar <
> > shekhar.kote...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am new to open source contribution and trying to understand the
> process
> > > starting from pulling code to uploading patch.
> > >
> > > I have managed to pull code from GitHub. In JIRA I saw that each JIRA
> > issue
> > > is connected with pull request. I would like to know how do people
> attach
> > > pull request details to JIRA issue?
> > >
> > > Thanks,
> > > Chandrash3khar Kotekar
> > > Mobile - +91 8600011455
> > >
> >
>


Re: [PySpark DataFrame] When a Row is not a Row

2015-05-11 Thread Ted Yu
In Row#equals():

  while (i < len) {
if (apply(i) != that.apply(i)) {

'!=' should be !apply(i).equals(that.apply(i)) ?

Cheers

On Mon, May 11, 2015 at 1:49 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> This is really strange.
>
> >>> # Spark 1.3.1
> >>> print type(results)
> 
>
> >>> a = results.take(1)[0]
>
> >>> print type(a)
> 
>
> >>> print pyspark.sql.types.Row
> 
>
> >>> print type(a) == pyspark.sql.types.Row
> False
> >>> print isinstance(a, pyspark.sql.types.Row)
> False
>
> If I set a as follows, then the type checks pass fine.
>
> a = pyspark.sql.types.Row('name')('Nick')
>
> Is this a bug? What can I do to narrow down the source?
>
> results is a massive DataFrame of spark-perf results.
>
> Nick
> ​
>


Re: Recent Spark test failures

2015-05-11 Thread Ted Yu
Makes sense.

Having high determinism in these tests would make Jenkins build stable.

On Mon, May 11, 2015 at 1:08 PM, Andrew Or  wrote:

> Hi Ted,
>
> Yes, those two options can be useful, but in general I think the standard
> to set is that tests should never fail. It's actually the worst if tests
> fail sometimes but not others, because we can't reproduce them
> deterministically. Using -M and -A actually tolerates flaky tests to a
> certain extent, and I would prefer to instead increase the determinism in
> these tests.
>
> -Andrew
>
> 2015-05-08 17:56 GMT-07:00 Ted Yu :
>
> Andrew:
>> Do you think the -M and -A options described here can be used in test
>> runs ?
>> http://scalatest.org/user_guide/using_the_runner
>>
>> Cheers
>>
>> On Wed, May 6, 2015 at 5:41 PM, Andrew Or  wrote:
>>
>>> Dear all,
>>>
>>> I'm sure you have all noticed that the Spark tests have been fairly
>>> unstable recently. I wanted to share a tool that I use to track which
>>> tests
>>> have been failing most often in order to prioritize fixing these flaky
>>> tests.
>>>
>>> Here is an output of the tool. This spreadsheet reports the top 10 failed
>>> tests this week (ending yesterday 5/5):
>>>
>>> https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4
>>>
>>> It is produced by a small project:
>>> https://github.com/andrewor14/spark-test-failures
>>>
>>> I have been filing JIRAs on flaky tests based on this tool. Hopefully we
>>> can collectively stabilize the build a little more as we near the release
>>> for Spark 1.4.
>>>
>>> -Andrew
>>>
>>
>>
>


Re: Build fail...

2015-05-08 Thread Ted Yu
Looks like you're right:

https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/427/console

[error] 
/home/jenkins/workspace/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:370:
value tryWithSafeFinally is not a member of object
org.apache.spark.util.Utils
[error] Utils.tryWithSafeFinally {
[error]   ^


FYI


On Fri, May 8, 2015 at 6:53 PM, rtimp  wrote:

> Hi,
>
> From what I myself noticed a few minutes ago, I think branch-1.3 might be
> failing to compile due to the most recent commit. I tried reverting to
> commit 7fd212b575b6227df5068844416e51f11740e771 (the commit prior to the
> head) on that branch and recompiling, and was successful. As Ferris would
> say, it is so choice.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Build-fail-tp12170p12171.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Recent Spark test failures

2015-05-08 Thread Ted Yu
Andrew:
Do you think the -M and -A options described here can be used in test runs ?
http://scalatest.org/user_guide/using_the_runner

Cheers

On Wed, May 6, 2015 at 5:41 PM, Andrew Or  wrote:

> Dear all,
>
> I'm sure you have all noticed that the Spark tests have been fairly
> unstable recently. I wanted to share a tool that I use to track which tests
> have been failing most often in order to prioritize fixing these flaky
> tests.
>
> Here is an output of the tool. This spreadsheet reports the top 10 failed
> tests this week (ending yesterday 5/5):
>
> https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4
>
> It is produced by a small project:
> https://github.com/andrewor14/spark-test-failures
>
> I have been filing JIRAs on flaky tests based on this tool. Hopefully we
> can collectively stabilize the build a little more as we near the release
> for Spark 1.4.
>
> -Andrew
>


Re: unable to extract tgz files downloaded from spark

2015-05-06 Thread Ted Yu
>From which site did you download the tar ball ?

Which package type did you choose (pre-built for which distro) ?

Thanks

On Wed, May 6, 2015 at 7:16 PM, Praveen Kumar Muthuswamy <
muthusamy...@gmail.com> wrote:

> Hi
> I have been trying to install latest spark verison and downloaded the .tgz
> files(ex spark-1.3.1.tgz). But, I could not extract them. It complains of
> invalid tar format.
> Has any seen this issue ?
>
> Thanks
> Praveen
>


Re: jackson.databind exception in RDDOperationScope.jsonMapper.writeValueAsString(this)

2015-05-06 Thread Ted Yu
Looks like mismatch of jackson version.
Spark uses:
2.4.4

FYI

On Wed, May 6, 2015 at 8:00 AM, A.M.Chan  wrote:

> Hey, guys. I meet this exception while testing SQL/Columns.
> I didn't change the pom or the core project.
> In the morning, it's fine to test my PR.
> I don't know what happed.
>
>
> An exception or error caused a run to abort:
> com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/jackson/databind/PropertyName;ZZZ)V
> java.lang.NoSuchMethodError:
> com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/jackson/databind/PropertyName;ZZZ)V
> at
> com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector.com
> $fasterxml$jackson$module$scala$introspect$ScalaPropertiesCollector$$_addField(ScalaPropertiesCollector.scala:109)
> at
> com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:100)
> at
> com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:99)
> at scala.Option.foreach(Option.scala:236)
> at
> com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2.apply(ScalaPropertiesCollector.scala:99)
> at
> com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2.apply(ScalaPropertiesCollector.scala:93)
> at
> scala.collection.GenTraversableViewLike$Filtered$$anonfun$foreach$4.apply(GenTraversableViewLike.scala:109)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635)
> at
> scala.collection.GenTraversableViewLike$Filtered$class.foreach(GenTraversableViewLike.scala:108)
> at scala.collection.SeqViewLike$$anon$5.foreach(SeqViewLike.scala:80)
> at
> com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector._addFields(ScalaPropertiesCollector.scala:93)
> at
> com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collect(POJOPropertiesCollector.java:233)
> at
> com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.collectProperties(BasicClassIntrospector.java:142)
> at
> com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.forSerialization(BasicClassIntrospector.java:68)
> at
> com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.forSerialization(BasicClassIntrospector.java:11)
> at
> com.fasterxml.jackson.databind.SerializationConfig.introspect(SerializationConfig.java:530)
> at
> com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:133)
> at
> com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1077)
> at
> com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1037)
> at
> com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:445)
> at
> com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:599)
> at
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:93)
> at
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2811)
> at
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2268)
> at
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:51)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:124)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:99)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:671)
> at org.apache.spark.SparkContext.parallelize(SparkContext.scala:685)
>
>
>
>
>
> --
>
> A.M.Chan


<    1   2   3   4   >