date:20150116

Fwd: LinearRegressionWithSGD accuracy

2015-01-16 Thread Robin East

Sent from my iPhone Begin forwarded message: From: Robin East robin.e...@xense.co.uk Date: 16 January 2015 11:35:23 GMT To: Joseph Bradley jos...@databricks.com Cc: Yana Kadiyska yana.kadiy...@gmail.com, Devl Devel devl.developm...@gmail.com Subject: Re: LinearRegressionWithSGD accuracy

Optimize encoding/decoding strings when using Parquet

2015-01-16 Thread Mick Davies

Hi, It seems that a reasonably large proportion of query time using Spark SQL seems to be spent decoding Parquet Binary objects to produce Java Strings. Has anyone considered trying to optimize these conversions as many are duplicated. Details are outlined in the conversation in the user

RDD order guarantees

2015-01-16 Thread Ewan Higgs

Hi all, Quick one: when reading files, are the orders of partitions guaranteed to be preserved? I am finding some weird behaviour where I run sortByKeys() on an RDD (which has 16 byte keys) and write it to disk. If I open a python shell and run the following: for part in range(29): print

Setting JVM options to Spark executors in Standalone mode

2015-01-16 Thread Michel Dufresne

Hi All, I'm trying to set some JVM options to the executor processes in a standalone cluster. Here's what I have in *spark-env.sh*: jmx_opt=-Dcom.sun.management.jmxremote jmx_opt=${jmx_opt} -Djava.net.preferIPv4Stack=true jmx_opt=${jmx_opt} -Dcom.sun.management.jmxremote.port=

Re: Setting JVM options to Spark executors in Standalone mode

2015-01-16 Thread Zhan Zhang

You can try to add it in in conf/spark-defaults.conf # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=one two three” Thanks. Zhan Zhang On Jan 16, 2015, at 9:56 AM, Michel Dufresne sparkhealthanalyt...@gmail.com wrote: Hi All, I'm trying to set some JVM

Re: Setting JVM options to Spark executors in Standalone mode

2015-01-16 Thread Marcelo Vanzin

On Fri, Jan 16, 2015 at 10:07 AM, Michel Dufresne sparkhealthanalyt...@gmail.com wrote: Thank for your reply, I've should have mentioned that spark-env.sh is the only option i found because: - I'm creating the SpeakConf/SparkContext from a Play Application (therefore I'm not using

Re: RDD order guarantees

2015-01-16 Thread Reynold Xin

You are running on a local file system right? HDFS orders the file based on names, but local file system often don't. I think that's why the difference. We might be able to do a sort and order the partitions when we create a RDD to make this universal though. On Fri, Jan 16, 2015 at 8:26 AM,

Re: Optimize encoding/decoding strings when using Parquet

2015-01-16 Thread Michael Armbrust

+1 to adding such an optimization to parquet. The bytes are tagged specially as UTF8 in the parquet schema so it seem like it would be possible to add this. On Fri, Jan 16, 2015 at 8:17 AM, Mick Davies michael.belldav...@gmail.com wrote: Hi, It seems that a reasonably large proportion of

Spectral clustering

2015-01-16 Thread Andrew Musselman

Hi, thinking of picking up this Jira ticket: https://issues.apache.org/jira/browse/SPARK-4259 Anyone done any work on this to date? Any thoughts on it before we go too far in? Thanks! Best Andrew

Re: Implementing TinkerPop on top of GraphX

2015-01-16 Thread Kushal Datta

Hi David, Yes, we are still headed in that direction. Please take a look at the repo I sent earlier. I think that's a good starting point. Thanks, -Kushal. On Thu, Jan 15, 2015 at 8:31 AM, David Robinson drobin1...@gmail.com wrote: I am new to Spark and GraphX, however, I use Tinkerpop

Re: Join implementation in SparkSQL

2015-01-16 Thread Alessandro Baretta

Reynold, The source file you are directing me to is a little too terse for me to understand what exactly is going on. Let me tell you what I'm trying to do and what problems I'm encountering, so that you might be able to better direct me investigation of the SparkSQL codebase. I am computing the

Re: Implementing TinkerPop on top of GraphX

2015-01-16 Thread Kyle Ellrott

Looking at https://github.com/kdatta/tinkerpop3/compare/graphx-gremlin I only see a maven build file. Do you have some source code some place else? I've worked on a spark based implementation ( https://github.com/kellrott/spark-gremlin ), but its not done and I've been tied up on other projects.

Re: Spark SQL API changes and stabilization

2015-01-16 Thread Alessandro Baretta

Reynold, Your clarification is much appreciated. One issue though, that I would strongly encourage you to work on, is to make sure that the Scaladoc CAN be generated manually if needed (a Use at your own risk clause would be perfectly legitimate here). The reason I say this is that currently even

Re: Implementing TinkerPop on top of GraphX

2015-01-16 Thread Kushal Datta

The source code is under a new module named 'graphx'. let me double check. On Fri, Jan 16, 2015 at 2:11 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote: Looking at https://github.com/kdatta/tinkerpop3/compare/graphx-gremlin I only see a maven build file. Do you have some source code some place

Re: Join implementation in SparkSQL

2015-01-16 Thread Yin Huai

Hi Alex, Can you attach the output of sql(explain extended your query).collect.foreach(println)? Thanks, Yin On Fri, Jan 16, 2015 at 1:54 PM, Alessandro Baretta alexbare...@gmail.com wrote: Reynold, The source file you are directing me to is a little too terse for me to understand what

Re: Implementing TinkerPop on top of GraphX

2015-01-16 Thread Kushal Datta

code updated. sorry, wrong branch uploaded before. On Fri, Jan 16, 2015 at 2:13 PM, Kushal Datta kushal.da...@gmail.com wrote: The source code is under a new module named 'graphx'. let me double check. On Fri, Jan 16, 2015 at 2:11 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote: Looking at

Re: RDD order guarantees

2015-01-16 Thread Ewan Higgs

Yes, I am running on a local file system. Is there a bug open for this? Mingyu Kim reported the problem last April: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html -Ewan On 01/16/2015 07:41 PM, Reynold Xin wrote: You are running on a

Re: DBSCAN for MLlib

2015-01-16 Thread Muhammad Ali A'råby

Please find my answers on JIRA page. Muhammad-Ali On Thursday, January 15, 2015 3:25 AM, Xiangrui Meng men...@gmail.com wrote: Please find my comments on the JRIA page. -Xiangrui On Tue, Jan 13, 2015 at 1:49 PM, Muhammad Ali A'råby angelland...@yahoo.com.invalid wrote: I have to

Fwd: LinearRegressionWithSGD accuracy

Optimize encoding/decoding strings when using Parquet

RDD order guarantees

Setting JVM options to Spark executors in Standalone mode

Re: Setting JVM options to Spark executors in Standalone mode

Re: Setting JVM options to Spark executors in Standalone mode

Re: RDD order guarantees

Re: Optimize encoding/decoding strings when using Parquet

Spectral clustering

Re: Implementing TinkerPop on top of GraphX

Re: Join implementation in SparkSQL

Re: Implementing TinkerPop on top of GraphX

Re: Spark SQL API changes and stabilization

Re: Implementing TinkerPop on top of GraphX

Re: Join implementation in SparkSQL

Re: Implementing TinkerPop on top of GraphX

Re: RDD order guarantees

Re: DBSCAN for MLlib

18 matches

Site Navigation

Mail list logo

Footer information