RE: (Spark SQL) partition-scoped UDF

2015-09-09 Thread Eron Wright
know of a solution compatible with Spark 1.4 or 1.5? Thanks again! From: Reynold Xin Date: Friday, September 4, 2015 at 5:19 PM To: Eron Wright Cc: "dev@spark.apache.org" Subject: Re: (Spark SQL) partition-scoped UDF Can you say more about your transformer? This is a good idea, and

(Spark SQL) partition-scoped UDF

2015-09-04 Thread Eron Wright
in batch for efficiency to amortize some overhead. How may I accomplish this? One option appears to be to invoke DataFrame::mapPartitions, yielding an RDD that is then converted back to a DataFrame. Unsure about the viability or consequences of that. Thanks!Eron Wright

Make ML Developer APIs public (post-1.4)

2015-08-03 Thread Eron Wright
but reiterating it here. Thanks, Eron Wright

[SPARK-8794] [SQL] PrunedScan problem

2015-07-02 Thread Eron Wright
I filed an issue due to an issue I see with PrunedScan, that causes sub-optimal performance in ML pipelines. Sorry if the issue is already known. Having tried a few approaches to working with large binary files with Spark ML, I prefer loading the data into a vector-type column from a relation

RE: Contribution

2015-06-13 Thread Eron Wright
The deeplearning4j project provides neural net algorithms for Spark ML. You may consider it sample code for extending Spark with new ML algorithms. http://deeplearning4j.org/sparkmlhttps://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-scaleout/spark/dl4j-spark-ml -Eron

RE: Problem with pyspark on Docker talking to YARN cluster

2015-06-10 Thread Eron Wright
Options include:use 'spark.driver.host' and 'spark.driver.port' setting to stabilize the driver-side endpoint. (ref)use host networking for your container, i.e. docker run --net=host ...use yarn-cluster mode (see SPARK-5162) Hope this helps,Eron Date: Wed, 10 Jun 2015 13:43:04 -0700 Subject:

[sample code] deeplearning4j for Spark ML (@DeveloperAPI)

2015-06-08 Thread Eron Wright
have Spark working with multiple GPUs on AWS and we're looking forward to optimizations that will speed neural net training even more. Eron Wright Contributor | deeplearning4j.org

Re: Ivy support in Spark vs. sbt

2015-06-04 Thread Eron Wright
I saw something like this last night, with a similar message. Is this what you’re referring to? [error] org.deeplearning4j#dl4j-spark-ml;0.0.3.3.4.alpha1-SNAPSHOT!dl4j-spark-ml.jar origin location must be absolute:

[SPARK-7400] PortableDataStream UDT

2015-05-11 Thread Eron Wright
Hello, I'm working on SPARK-7400 for DataFrame support for PortableDataStream, i.e. the data type associated with the RDD from sc.binaryFiles(...). Assuming a patch is available soon, what is the likelihood of inclusion in Spark 1.4? Thanks