Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-24 Thread Ted Yu
If I am not mistaken, the binaries for Scala 2.11 were generated against
hadoop 1.

What about binaries for Scala 2.11 against hadoop 2.x ?

Cheers

On Sun, Nov 22, 2015 at 2:21 PM, Michael Armbrust 
wrote:

> In order to facilitate community testing of Spark 1.6.0, I'm excited to
> announce the availability of an early preview of the release. This is not a
> release candidate, so there is no voting involved. However, it'd be awesome
> if community members can start testing with this preview package and report
> any problems they encounter.
>
> This preview package contains all the commits to branch-1.6
>  till commit
> 308381420f51b6da1007ea09a02d740613a226e0
> .
>
> The staging maven repository for this preview build can be found here:
> https://repository.apache.org/content/repositories/orgapachespark-1162
>
> Binaries for this preview build can be found here:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/
>
> A build of the docs can also be found here:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/
>
> The full change log for this release can be found on JIRA
> 
> .
>
> *== How can you help? ==*
>
> If you are a Spark user, you can help us test this release by taking a
> Spark workload and running on this preview release, then reporting any
> regressions.
>
> *== Major Features ==*
>
> When testing, we'd appreciate it if users could focus on areas that have
> changed in this release.  Some notable new features include:
>
> SPARK-11787  *Parquet
> Performance* - Improve Parquet scan performance when using flat schemas.
> SPARK-10810  *Session *
> *Management* - Multiple users of the thrift (JDBC/ODBC) server now have
> isolated sessions including their own default database (i.e USE mydb)
> even on shared clusters.
> SPARK-   *Dataset
> API* - A new, experimental type-safe API (similar to RDDs) that performs
> many operations on serialized binary data and code generation (i.e. Project
> Tungsten)
> SPARK-1  *Unified
> Memory Management* - Shared memory for execution and caching instead of
> exclusive division of the regions.
> SPARK-10978  *Datasource
> API Avoid Double Filter* - When implementing a datasource with filter
> pushdown, developers can now tell Spark SQL to avoid double evaluating a
> pushed-down filter.
> SPARK-2629   *New
> improved state management* - trackStateByKey - a DStream transformation
> for stateful stream processing, supersedes updateStateByKey in
> functionality and performance.
>
> Happy testing!
>
> Michael
>
>


Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-23 Thread mkhaitman
Nice! Built and testing on CentOS 7 on a Hadoop 2.7.1 cluster.

One thing I've noticed is that KeyboardInterrupts are now ignored? Is that
intended? I starting typing a line out and then changed my mind and wanted
to issue the good old ctrl+c to interrupt, but that didn't work.

Otherwise haven't seen any major issues yet!

Mark.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Spark-1-6-0-Release-Preview-tp15314p15323.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-23 Thread Dean Wampler
I'm seeing an RPC timeout with the 2.11 build, but not the Hadoop1, 2.10
build: The following session with two uses of sc.parallize causes it almost
every the time. Occasionally I don't see the stack trace and I don't see it
with just a single sc.parallize, even the bigger, second one. When the
error occurs, it does pause for about two minutes with no output before the
stack trace. I elided some output; why all the non-log4j warnings occur at
startup is another question:


$ pwd
/Users/deanwampler/projects/spark/spark-1.6.0-bin-hadoop1-scala2.11
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.security.Groups).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Using Spark's repl log4j profile:
org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Spark context available as sc.
15/11/23 13:01:45 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/11/23 13:01:45 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/11/23 13:01:49 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so recording
the schema version 1.2.0
15/11/23 13:01:49 WARN ObjectStore: Failed to get database default,
returning NoSuchObjectException
15/11/23 13:01:49 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/11/23 13:01:50 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/11/23 13:01:50 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
SQL context available as sqlContext.
Welcome to
    __
   / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
 /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize((1 to 10), 1).count()

[Stage 0:>  (0 + 0) /
1]
[Stage 0:==>  (420 + 4) /
1]
[Stage 0:===> (683 + 4) /
1]
... elided ...
[Stage 0:==> (8264 + 4) /
1]
[Stage 0:==> (8902 + 6) /
1]
[Stage 0:=>  (9477 + 4) /
1]

res0: Long = 10

scala> sc.parallelize((1 to 100), 10).count()

[Stage 1:> (0 + 0) /
10]
[Stage 1:> (0 + 0) /
10]
[Stage 1:> (0 + 0) /
10]15/11/23 13:04:09 WARN NettyRpcEndpointRef: Error sending message
[message = Heartbeat(driver,[Lscala.Tuple2;@7f9d659c,BlockManagerId(driver,
localhost, 55188))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org
$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
  at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
  at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
  at org.apache.spark.executor.Executor.org
$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:452)
  at
org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:472)
  at
org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
  at
org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1708)
  at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:472)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at