Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
Interesting

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/
shows green builds.


On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș 
wrote:

> Since Oct. 4 the build fails on 2.11 with the dreaded
>
> [error] /home/ubuntu/workspace/Apache Spark (master) on 
> 2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310: no 
> valid targets for annotation on value conf - it is discarded unused. You may 
> specify targets with meta-annotations, e.g. @(transient @param)
> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
>
> Can we have the pull request builder at least build with 2.11? This makes
> #8433  pretty much useless,
> since people will continue to add useless @transient annotations.
> ​
> --
>
> --
> Iulian Dragos
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>


Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Iulian Dragoș
Since Oct. 4 the build fails on 2.11 with the dreaded

[error] /home/ubuntu/workspace/Apache Spark (master) on
2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310:
no valid targets for annotation on value conf - it is discarded
unused. You may specify targets with meta-annotations, e.g.
@(transient @param)
[error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)

Can we have the pull request builder at least build with 2.11? This makes
#8433  pretty much useless,
since people will continue to add useless @transient annotations.
​
-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
I tried building with Scala 2.11 on Linux with latest master branch :

[INFO] Spark Project External MQTT  SUCCESS [
19.188 s]
[INFO] Spark Project External MQTT Assembly ... SUCCESS [
 7.081 s]
[INFO] Spark Project External ZeroMQ .. SUCCESS [
 8.790 s]
[INFO] Spark Project External Kafka ... SUCCESS [
14.764 s]
[INFO] Spark Project Examples . SUCCESS [02:22
min]
[INFO] Spark Project External Kafka Assembly .. SUCCESS [
10.286 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 17:49 min

FYI

On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu  wrote:

> Interesting
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/
> shows green builds.
>
>
> On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș 
> wrote:
>
>> Since Oct. 4 the build fails on 2.11 with the dreaded
>>
>> [error] /home/ubuntu/workspace/Apache Spark (master) on 
>> 2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310: 
>> no valid targets for annotation on value conf - it is discarded unused. You 
>> may specify targets with meta-annotations, e.g. @(transient @param)
>> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
>>
>> Can we have the pull request builder at least build with 2.11? This makes
>> #8433  pretty much useless,
>> since people will continue to add useless @transient annotations.
>> ​
>> --
>>
>> --
>> Iulian Dragos
>>
>> --
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>>
>


RE: RowNumber in HiveContext returns null, negative numbers or huge

2015-10-08 Thread Saif.A.Ellafi
Hi,

I have figured this only happens in cluster mode. working properly in local[32]

From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com]
Sent: Thursday, October 08, 2015 10:23 AM
To: dev@spark.apache.org
Subject: RowNumber in HiveContext returns null, negative numbers or huge

Hi all, would this be a bug??

val ws = Window.
partitionBy("clrty_id").
orderBy("filemonth_dtt")

val nm = "repeatMe"
df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm))


stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_))

--->

Long, DateType, Int
[2003,2006-06-01,-1863462909]
[2003,2006-09-01,-1863462909]
[2003,2007-01-01,-1863462909]
[2003,2007-08-01,-1863462909]
[2003,2007-07-01,-1863462909]
[2138,2007-07-01,-1863462774]
[2138,2007-02-01,-1863462774]
[2138,2006-11-01,-1863462774]
[2138,2006-08-01,-1863462774]
[2138,2007-08-01,-1863462774]
[2138,2006-09-01,-1863462774]
[2138,2007-03-01,-1863462774]
[2138,2006-10-01,-1863462774]
[2138,2007-05-01,-1863462774]
[2138,2006-06-01,-1863462774]
[2138,2006-12-01,-1863462774]


Thanks,
Saif



Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Reynold Xin
The problem only applies to the sbt build because it treats warnings as
errors.

@Iulian - how about we disable warnings -> errors for 2.11? That would seem
better until we switch 2.11 to be the default build.


On Thu, Oct 8, 2015 at 7:55 AM, Ted Yu  wrote:

> I tried building with Scala 2.11 on Linux with latest master branch :
>
> [INFO] Spark Project External MQTT  SUCCESS [
> 19.188 s]
> [INFO] Spark Project External MQTT Assembly ... SUCCESS [
>  7.081 s]
> [INFO] Spark Project External ZeroMQ .. SUCCESS [
>  8.790 s]
> [INFO] Spark Project External Kafka ... SUCCESS [
> 14.764 s]
> [INFO] Spark Project Examples . SUCCESS [02:22
> min]
> [INFO] Spark Project External Kafka Assembly .. SUCCESS [
> 10.286 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 17:49 min
>
> FYI
>
> On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu  wrote:
>
>> Interesting
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/
>> shows green builds.
>>
>>
>> On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș > > wrote:
>>
>>> Since Oct. 4 the build fails on 2.11 with the dreaded
>>>
>>> [error] /home/ubuntu/workspace/Apache Spark (master) on 
>>> 2.11/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala:310: 
>>> no valid targets for annotation on value conf - it is discarded unused. You 
>>> may specify targets with meta-annotations, e.g. @(transient @param)
>>> [error] private[netty] class NettyRpcEndpointRef(@transient conf: SparkConf)
>>>
>>> Can we have the pull request builder at least build with 2.11? This
>>> makes #8433  pretty much
>>> useless, since people will continue to add useless @transient annotations.
>>> ​
>>> --
>>>
>>> --
>>> Iulian Dragos
>>>
>>> --
>>> Reactive Apps on the JVM
>>> www.typesafe.com
>>>
>>>
>>
>


Re: Understanding code/closure shipment to Spark workers‏

2015-10-08 Thread Xiao Li
Hi, Arijit,

The code flow of spark-submit is simple.

Enter the main function of SparkSubmit.scala
--> case SparkSubmitAction.SUBMIT => submit(appArgs)
--> doRunMain() in function submit() in the same file
--> runMain(childArgs,...) in the same file
--> mainMethod.invoke(null, childArgs.toArray)  in the same file

Function Invoke() is provided by JAVA Reflection for invoking the main
function of your JAR.

Hopefully, it can help you understand the problem.

Thanks,

Xiao Li


2015-10-07 16:47 GMT-07:00 Arijit :

>  Hi,
>
> I want to understand the code flow starting from the Spark jar that I
> submit through spark-submit, how does Spark identify and extract the
> closures, clean and serialize them and ship them to workers to execute as
> tasks. Can someone point me to any documentation or a pointer to the source
> code path to help me understand this.
>
> Thanks, Arijit
>


Compiling Spark with a local hadoop profile

2015-10-08 Thread sbiookag
I'm modifying hdfs module inside hadoop, and would like the see the
reflection while i'm running spark on top of it, but I still see the native
hadoop behaviour. I've checked and saw Spark is building a really fat jar
file, which contains all hadoop classes (using hadoop profile defined in
maven), and deploy it over all workers. I also tried bigtop-dist, to exclude
hadoop classes but see no effect.

Is it possible to do such a thing easily, for example by small modifications
inside the maven file?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Compiling Spark with a local hadoop profile

2015-10-08 Thread Ted Yu
In root pom.xml :
2.2.0

You can override the version of hadoop with command similar to:
-Phadoop-2.4 -Dhadoop.version=2.7.0

Cheers

On Thu, Oct 8, 2015 at 11:22 AM, sbiookag  wrote:

> I'm modifying hdfs module inside hadoop, and would like the see the
> reflection while i'm running spark on top of it, but I still see the native
> hadoop behaviour. I've checked and saw Spark is building a really fat jar
> file, which contains all hadoop classes (using hadoop profile defined in
> maven), and deploy it over all workers. I also tried bigtop-dist, to
> exclude
> hadoop classes but see no effect.
>
> Is it possible to do such a thing easily, for example by small
> modifications
> inside the maven file?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


spark over drill

2015-10-08 Thread Pranay Tonpay
hi ,,
Is spark-drill integration already done ? if yes, which spark version
supports it ... it was in the "upcming list for 2015" is what i had read
somewhere


Re: Compiling Spark with a local hadoop profile

2015-10-08 Thread sbiookag
Thanks Ted for reply.

But this is not what I want. This would tell spark to read hadoop dependency
from maven repository, which is the original version of hadoop. I myslef is
modifying the hadoop code, and wanted to include them inside the spark fat
jar. "Spark-Class" would run slaves with the fat jar created in the assembly
folder, and that jar does not contain my modified classes. 

Something that confuses me is, what spark includes the hadoop classes in
it's built jar output? Isn't it supposed to go and read from the hadoop
folder in each worker node?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517p14519.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark over drill

2015-10-08 Thread Reynold Xin
You probably saw that in a presentation given by the drill team. You should
check with them on that.

On Thu, Oct 8, 2015 at 11:51 AM, Pranay Tonpay  wrote:

> hi ,,
> Is spark-drill integration already done ? if yes, which spark version
> supports it ... it was in the "upcming list for 2015" is what i had read
> somewhere
>