RE: Anyone knows how to build and spark on jdk9?

2017-10-26 Thread Zhang, Liyun
Thanks your suggestion, seems that scala 2.12.4 support jdk9


Scala 2.12.4<https://github.com/scala/scala/releases/tag/v2.12.4> is now 
available.

Our 
benchmarks<https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?var-branch=2.12.x=1501580691158=1507711932006>
 show a further reduction in compile times since 2.12.3 of 5-10%.

Improved Java 9 friendliness, with more to come!

Best Regards
Kelly Zhang/Zhang,Liyun





From: Reynold Xin [mailto:r...@databricks.com]
Sent: Friday, October 27, 2017 10:26 AM
To: Zhang, Liyun <liyun.zh...@intel.com>; d...@spark.apache.org; 
user@spark.apache.org
Subject: Re: Anyone knows how to build and spark on jdk9?

It probably depends on the Scala version we use in Spark supporting Java 9 
first.

On Thu, Oct 26, 2017 at 7:22 PM Zhang, Liyun 
<liyun.zh...@intel.com<mailto:liyun.zh...@intel.com>> wrote:
Hi all:
1.   I want to build spark on jdk9 and test it with Hadoop on jdk9 env. I 
search for jiras related to JDK9. I only found 
SPARK-13278<https://issues.apache.org/jira/browse/SPARK-13278>.  This means now 
spark can build or run successfully on JDK9 ?


Best Regards
Kelly Zhang/Zhang,Liyun



Anyone knows how to build and spark on jdk9?

2017-10-26 Thread Zhang, Liyun
Hi all:
1.   I want to build spark on jdk9 and test it with Hadoop on jdk9 env. I 
search for jiras related to JDK9. I only found 
SPARK-13278<https://issues.apache.org/jira/browse/SPARK-13278>.  This means now 
spark can build or run successfully on JDK9 ?


Best Regards
Kelly Zhang/Zhang,Liyun



How to clean the cache when i do performance test in spark

2016-12-07 Thread Zhang, Liyun
Hi all:
   When I test my spark application, I found that the second 
round(application_1481153226569_0002) is more faster than first 
round(application_1481153226569_0001).  Actually the configuration is same. I 
guess the second round is improved a lot by cache. So how can I clean the cache?




[cid:image002.png@01D2516A.5194DFA0]

Best Regards
Kelly Zhang/Zhang,Liyun



How to make the result of sortByKey distributed evenly?

2016-09-06 Thread Zhang, Liyun
Hi all:
  I have a question about RDD.sortByKey

val n=2
val sorted=sc.parallelize(2 to n).map(x=>(x/n,x)).sortByKey()
 sorted.saveAsTextFile("hdfs://bdpe42:8020/SkewedGroupByTest")

sc.parallelize(2 to n).map(x=>(x/n,x)) will generate pairs like 
[(0,2),(0,3),.,(0,1),(1,2)], the key is skewed.

The result of sortByKey is expected to distributed evenly. But when I view the 
result and found that part-0 is large and part-1 is small.

 hadoop fs -ls /SkewedGroupByTest/
16/09/06 03:24:55 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r-- 1 root supergroup 0 2016-09-06 03:21 /SkewedGroupByTest /_SUCCESS
-rw-r--r-- 1 root supergroup 188878 2016-09-06 03:21 
/SkewedGroupByTest/part-0
-rw-r--r-- 1 root supergroup 10 2016-09-06 03:21 /SkewedGroupByTest/part-1

How can I get the result distributed evenly?  I don't need that the key in the 
part-x are same and only need to guarantee the data in part-0 ~ 
part-x is sorted.


Thanks for any help!


Kelly Zhang/Zhang,Liyun
Best Regards