date:20150213

[jira] [Commented] (SPARK-5649) Throw exception when can not apply datatype cast

2015-02-13 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319724#comment-14319724
 ] 

Michael Armbrust commented on SPARK-5649:
-

https://github.com/apache/spark/pull/4558

 Throw exception when can not apply datatype cast
 

 Key: SPARK-5649
 URL: https://issues.apache.org/jira/browse/SPARK-5649
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: wangfei
 Fix For: 1.3.0


 Throw exception when can not apply datatypes cast to info user the cast issue 
 in the sqls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5649) Throw exception when can not apply datatype cast

2015-02-13 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-5649.
-
   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: wangfei

 Throw exception when can not apply datatype cast
 

 Key: SPARK-5649
 URL: https://issues.apache.org/jira/browse/SPARK-5649
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.2.0
Reporter: wangfei
Assignee: wangfei
 Fix For: 1.3.0


 Throw exception when can not apply datatypes cast to info user the cast issue 
 in the sqls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)

Littlestar created SPARK-5795:
-

 Summary: api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not 
friendly to java
 Key: SPARK-5795
 URL: https://issues.apache.org/jira/browse/SPARK-5795
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.1
Reporter: Littlestar


import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

the following code can't compile on java.
JavaPairDStreamInteger, Integer rs =
rs.saveAsNewAPIHadoopFiles(prefix, txt, Integer.class, Integer.class, 
TextOutputFormat.class, jobConf);

but similar code in JavaPairRDD works ok.
JavaPairRDDString, String counts =...
counts.saveAsNewAPIHadoopFile(out, Text.class, Text.class, 
TextOutputFormat.class, jobConf);

mybe the 
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }
=
def saveAsNewAPIHadoopFiles[F : NewOutputFormat[_, _]](
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5795) api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

2015-02-13 Thread Littlestar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319769#comment-14319769
 ] 

Littlestar commented on SPARK-5795:
---

org.apache.spark.api.java.JavaPairRDDK, V
{noformat}
/** Output the RDD to any Hadoop-supported file system. */
  def saveAsHadoopFile[F : OutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: JobConf) {
rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, conf)
  }

  /** Output the RDD to any Hadoop-supported file system. */
  def saveAsHadoopFile[F : OutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F]) {
rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass)
  }

  /** Output the RDD to any Hadoop-supported file system, compressing with the 
supplied codec. */
  def saveAsHadoopFile[F : OutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  codec: Class[_ : CompressionCodec]) {
rdd.saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, codec)
  }

  /** Output the RDD to any Hadoop-supported file system. */
  def saveAsNewAPIHadoopFile[F : NewOutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F],
  conf: Configuration) {
rdd.saveAsNewAPIHadoopFile(path, keyClass, valueClass, outputFormatClass, 
conf)
  }

  /**
   * Output the RDD to any Hadoop-supported storage system, using
   * a Configuration object for that storage system.
   */
  def saveAsNewAPIHadoopDataset(conf: Configuration) {
rdd.saveAsNewAPIHadoopDataset(conf)
  }

  /** Output the RDD to any Hadoop-supported file system. */
  def saveAsNewAPIHadoopFile[F : NewOutputFormat[_, _]](
  path: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[F]) {
rdd.saveAsNewAPIHadoopFile(path, keyClass, valueClass, outputFormatClass)
  }
{noformat}

org.apache.spark.streaming.api.java.JavaPairDStreamK, V

{noformat}
/**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsHadoopFiles[F : OutputFormat[K, V]](prefix: String, suffix: 
String) {
dstream.saveAsHadoopFiles(prefix, suffix)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : OutputFormat[_, _]]) {
dstream.saveAsHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : OutputFormat[_, _]],
  conf: JobConf) {
dstream.saveAsHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsNewAPIHadoopFiles[F : NewOutputFormat[K, V]](prefix: String, 
suffix: String) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]]) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass)
  }

  /**
   * Save each RDD in `this` DStream as a Hadoop file. The file name at each 
batch interval is
   * generated based on `prefix` and `suffix`: prefix-TIME_IN_MS.suffix.
   */
  def saveAsNewAPIHadoopFiles(
  prefix: String,
  suffix: String,
  keyClass: Class[_],
  valueClass: Class[_],
  outputFormatClass: Class[_ : NewOutputFormat[_, _]],
  conf: Configuration = new Configuration) {
dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, 
outputFormatClass, conf)
  }
{noformat}

 api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java

[jira] [Commented] (SPARK-3785) Support off-loading computations to a GPU

2015-02-13 Thread Sam Halliday (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319777#comment-14319777
]

Sam Halliday commented on SPARK-3785:
-

Hi all, just joining the thread :-)

I'm the author of netlib-java. I recommend watching my ScalaX talk
http://fommil.github.io/scalax14/#/ for anybody who hasn't seen it yet. I talk
about beyond-CPU acceleration in the last few slides (just after the Breeze
examples).

In my decade of industrial experience with these things, the GPU is a *lot*
faster than the CPU for large matrix operations, but slower for smaller ones
(1000 elements or less). Typically, operations that are highly parallelisable,
such as matrix multiplication, have a constant time cost rather than linear in
number of elements.

However, the big problem with GPUs is memory management. If you have a problem
that you're happy to solve entirely on the GPU, you're going to get great
performance at the cost of less portability... a major consideration for a JVM
based application. The trick is minimising how much data you need to transmit
between the traditional CPU memory space and the GPU memory space. And further
optimisations can be obtained by using the GPU profilers that come with the
card.

It is for this reason that GPU-backed implementations of BLAS/LAPACK can only
match, but not surpass, the performance of Intel MKL. There exist BLAS-LIKE and
LAPACK-LIKE implementations for GPUs (e.g. cuBLAS, clBLAS) but they can only be
used when you hold pointers to the GPU memory regions and are not good for use
from Java/Scala (unless you are using macros/code generators to really generate
native code).

I have links with FPGA companies and I'd love to see a full BLAS implementation
using that custom hardware... but it's such a mammoth task the FPGA
implementors (not me) would need to be funded to do it.

I am very hopeful about the cutting edge commodity tech coming from Intel/AMD
(e.g. APUs) which allow CPU and GPU to share the memory region. I would love to
buy one of these machines and write a minimal BLAS implementation to do some
benchmarks and see if we can get GPU performance without the memory transfer
overhead. My project https://github.com/fommil/multiblas (which was abandoned
until the tech caught up) would be a perfect place to do this and would involve
only runtime changes for Spark users to benefit. But, to be honest, I'd
probably need funding to turn my attention to this because I've got a few other
personal priorities at the moment.

I've heard the raspberry pi has such a shared region. It might be interesting
to use it as a cheapo dev environment.

Support off-loading computations to a GPU
-

Key: SPARK-3785
URL: https://issues.apache.org/jira/browse/SPARK-3785
Project: Spark
Issue Type: Brainstorming
Components: MLlib
Reporter: Thomas Darimont
Priority: Minor

Are there any plans to adding support for off-loading computations to the
GPU, e.g. via an open-cl binding?
http://www.jocl.org/
https://code.google.com/p/javacl/
http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL

1 2 >

1 - 100 of 115 matches

Mail list logo