Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-24 Thread Ted Yu
If I am not mistaken, the binaries for Scala 2.11 were generated against
hadoop 1.

What about binaries for Scala 2.11 against hadoop 2.x ?

Cheers

On Sun, Nov 22, 2015 at 2:21 PM, Michael Armbrust 
wrote:

> In order to facilitate community testing of Spark 1.6.0, I'm excited to
> announce the availability of an early preview of the release. This is not a
> release candidate, so there is no voting involved. However, it'd be awesome
> if community members can start testing with this preview package and report
> any problems they encounter.
>
> This preview package contains all the commits to branch-1.6
>  till commit
> 308381420f51b6da1007ea09a02d740613a226e0
> .
>
> The staging maven repository for this preview build can be found here:
> https://repository.apache.org/content/repositories/orgapachespark-1162
>
> Binaries for this preview build can be found here:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/
>
> A build of the docs can also be found here:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/
>
> The full change log for this release can be found on JIRA
> 
> .
>
> *== How can you help? ==*
>
> If you are a Spark user, you can help us test this release by taking a
> Spark workload and running on this preview release, then reporting any
> regressions.
>
> *== Major Features ==*
>
> When testing, we'd appreciate it if users could focus on areas that have
> changed in this release.  Some notable new features include:
>
> SPARK-11787  *Parquet
> Performance* - Improve Parquet scan performance when using flat schemas.
> SPARK-10810  *Session *
> *Management* - Multiple users of the thrift (JDBC/ODBC) server now have
> isolated sessions including their own default database (i.e USE mydb)
> even on shared clusters.
> SPARK-   *Dataset
> API* - A new, experimental type-safe API (similar to RDDs) that performs
> many operations on serialized binary data and code generation (i.e. Project
> Tungsten)
> SPARK-1  *Unified
> Memory Management* - Shared memory for execution and caching instead of
> exclusive division of the regions.
> SPARK-10978  *Datasource
> API Avoid Double Filter* - When implementing a datasource with filter
> pushdown, developers can now tell Spark SQL to avoid double evaluating a
> pushed-down filter.
> SPARK-2629   *New
> improved state management* - trackStateByKey - a DStream transformation
> for stateful stream processing, supersedes updateStateByKey in
> functionality and performance.
>
> Happy testing!
>
> Michael
>
>


Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-23 Thread mkhaitman
Nice! Built and testing on CentOS 7 on a Hadoop 2.7.1 cluster.

One thing I've noticed is that KeyboardInterrupts are now ignored? Is that
intended? I starting typing a line out and then changed my mind and wanted
to issue the good old ctrl+c to interrupt, but that didn't work.

Otherwise haven't seen any major issues yet!

Mark.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Spark-1-6-0-Release-Preview-tp15314p15323.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-23 Thread Dean Wampler
I'm seeing an RPC timeout with the 2.11 build, but not the Hadoop1, 2.10
build: The following session with two uses of sc.parallize causes it almost
every the time. Occasionally I don't see the stack trace and I don't see it
with just a single sc.parallize, even the bigger, second one. When the
error occurs, it does pause for about two minutes with no output before the
stack trace. I elided some output; why all the non-log4j warnings occur at
startup is another question:


$ pwd
/Users/deanwampler/projects/spark/spark-1.6.0-bin-hadoop1-scala2.11
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.security.Groups).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Using Spark's repl log4j profile:
org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Spark context available as sc.
15/11/23 13:01:45 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/11/23 13:01:45 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/11/23 13:01:49 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so recording
the schema version 1.2.0
15/11/23 13:01:49 WARN ObjectStore: Failed to get database default,
returning NoSuchObjectException
15/11/23 13:01:49 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/11/23 13:01:50 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/11/23 13:01:50 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
SQL context available as sqlContext.
Welcome to
    __
   / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0-SNAPSHOT
 /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc.parallelize((1 to 10), 1).count()

[Stage 0:>  (0 + 0) /
1]
[Stage 0:==>  (420 + 4) /
1]
[Stage 0:===> (683 + 4) /
1]
... elided ...
[Stage 0:==> (8264 + 4) /
1]
[Stage 0:==> (8902 + 6) /
1]
[Stage 0:=>  (9477 + 4) /
1]

res0: Long = 10

scala> sc.parallelize((1 to 100), 10).count()

[Stage 1:> (0 + 0) /
10]
[Stage 1:> (0 + 0) /
10]
[Stage 1:> (0 + 0) /
10]15/11/23 13:04:09 WARN NettyRpcEndpointRef: Error sending message
[message = Heartbeat(driver,[Lscala.Tuple2;@7f9d659c,BlockManagerId(driver,
localhost, 55188))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org
$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
  at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
  at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
  at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
  at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
  at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
  at org.apache.spark.executor.Executor.org
$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:452)
  at
org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:472)
  at
org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
  at
org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472)
  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1708)
  at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:472)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at

[ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-22 Thread Michael Armbrust
In order to facilitate community testing of Spark 1.6.0, I'm excited to
announce the availability of an early preview of the release. This is not a
release candidate, so there is no voting involved. However, it'd be awesome
if community members can start testing with this preview package and report
any problems they encounter.

This preview package contains all the commits to branch-1.6
 till commit
308381420f51b6da1007ea09a02d740613a226e0
.

The staging maven repository for this preview build can be found here:
https://repository.apache.org/content/repositories/orgapachespark-1162

Binaries for this preview build can be found here:
http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/

A build of the docs can also be found here:
http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/

The full change log for this release can be found on JIRA

.

*== How can you help? ==*

If you are a Spark user, you can help us test this release by taking a
Spark workload and running on this preview release, then reporting any
regressions.

*== Major Features ==*

When testing, we'd appreciate it if users could focus on areas that have
changed in this release.  Some notable new features include:

SPARK-11787  *Parquet
Performance* - Improve Parquet scan performance when using flat schemas.
SPARK-10810  *Session *
*Management* - Multiple users of the thrift (JDBC/ODBC) server now have
isolated sessions including their own default database (i.e USE mydb) even
on shared clusters.
SPARK-   *Dataset API* -
A new, experimental type-safe API (similar to RDDs) that performs many
operations on serialized binary data and code generation (i.e. Project
Tungsten)
SPARK-1  *Unified
Memory Management* - Shared memory for execution and caching instead of
exclusive division of the regions.
SPARK-10978  *Datasource
API Avoid Double Filter* - When implementing a datasource with filter
pushdown, developers can now tell Spark SQL to avoid double evaluating a
pushed-down filter.
SPARK-2629   *New
improved state management* - trackStateByKey - a DStream transformation for
stateful stream processing, supersedes updateStateByKey in functionality
and performance.

Happy testing!

Michael