Re: Aggregating over sorted data

2016-12-19 Thread Robin East
This is also a feature we need for our time-series processing > On 19 Dec 2016, at 04:07, Liang-Chi Hsieh wrote: > > > Hi, > > As I know, Spark SQL doesn't provide native support for this feature now. > After searching, I found only few database systems support it, e.g.,

Re: [GRAPHX] Graph Algorithms and Spark

2016-04-21 Thread Robin East
-synchronous parallel processing that is the foundation of most of the above algorithms. We cover other algorithms in our book and if you search on google you will find a number of other examples. --- Robin East Spark GraphX

Re: spark graphx storage RDD memory leak

2016-04-11 Thread Robin East
this looks like https://issues.apache.org/jira/browse/SPARK-12655 <https://issues.apache.org/jira/browse/SPARK-12655> fixed in 2.0 --- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publicati

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
architectural sense. --- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-act

Re: query on SVD++

2015-12-07 Thread Robin East
ect%20=%20SPARK%20AND%20resolution%20=%20Unresolved%20AND%20component%20=%20GraphX%20ORDER%20BY%20updated%20DESC> for the latest. --- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publicati

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
more answers in the user mailing list) >>>> >>>> First up let me say that I don’t really know how this could be done - I’m >>>> sure it would be possible with enough tinkering but it’s not clear what >>>> you are trying to achieve. Spark is a di

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
not the case. If you didn’t mean then we are both in agreement. --- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <h

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
t;> list. >>> Yes, we have a distributed C++ application, that will store data on each >>> node in the cluster, and we hope to leverage Spark to do more fancy >>> analytics on those data. But we need high performance, that’s why we want >>> shared memory. >

Re: Add a function to support Google's Word2Vec

2015-11-17 Thread Robin East (hotmail)
Have a look at SPARK-9484, JIRA is already there. Pull request would be good. Robin > On 17 Nov 2015, at 12:10, yuming wang wrote: > > Hi: > > > > I have a function to load Google’s Word2Vec generated binary file and spark > can use this model. If it is convenient, I'm

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Robin East
I used the following build command: build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package this also gave the ‘Dependency-reduced POM’ loop Robin On 3 Jul 2015, at 23:41, Patrick Wendell pwend...@gmail.com wrote: What if you use the built-in maven (i.e. build/mvn).

Re: Can not build master

2015-07-03 Thread Robin East
Yes me too On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote: This is what I got (the last line was repeated non-stop): [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with

Re: LDA and PageRank Using GraphX

2015-05-04 Thread Robin East
There is an LDA example in the MLlib examples. You can run it like this: ./bin/run-example mllib.LDAExample --stopwordFile stopwords input documents stop words is a file of stop words, 1 on each line. Input documents are the text of each document, 1 document per line. To see all the options

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Robin East
+1 (subject to comments on ec2 issues below) machine 1: Macbook Air, OSX 10.10.2 (Yosemite), Java 8 machine 2: iMac, OSX 10.8.4, Java 7 1. mvn clean package -DskipTests (33min/13min) 2. ran SVM benchmark https://github.com/insidedctm/spark-mllib-benchmark EC2 issues: 1) Unable to

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Robin East
Running ec2 launch scripts gives me the following error: ssl.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Full stack trace at https://gist.github.com/insidedctm/4d41600bc22560540a26 I’m running OSX Mavericks 10.9.5 I’ll

Fwd: LinearRegressionWithSGD accuracy

2015-01-16 Thread Robin East
Sent from my iPhone Begin forwarded message: From: Robin East robin.e...@xense.co.uk Date: 16 January 2015 11:35:23 GMT To: Joseph Bradley jos...@databricks.com Cc: Yana Kadiyska yana.kadiy...@gmail.com, Devl Devel devl.developm...@gmail.com Subject: Re: LinearRegressionWithSGD accuracy

Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Robin East
-dev, +user You’ll need to set the gradient descent step size to something small - a bit of trial and error shows that 0.0001 works. You’ll need to create a LinearRegressionWithSGD instance and set the step size explicitly: val lr = new LinearRegressionWithSGD()