Sent from my iPhone
Begin forwarded message:
From: Robin East robin.e...@xense.co.uk
Date: 16 January 2015 11:35:23 GMT
To: Joseph Bradley jos...@databricks.com
Cc: Yana Kadiyska yana.kadiy...@gmail.com, Devl Devel
devl.developm...@gmail.com
Subject: Re: LinearRegressionWithSGD accuracy
Hi,
It seems that a reasonably large proportion of query time using Spark SQL
seems to be spent decoding Parquet Binary objects to produce Java Strings.
Has anyone considered trying to optimize these conversions as many are
duplicated.
Details are outlined in the conversation in the user
Hi all,
Quick one: when reading files, are the orders of partitions guaranteed
to be preserved? I am finding some weird behaviour where I run
sortByKeys() on an RDD (which has 16 byte keys) and write it to disk. If
I open a python shell and run the following:
for part in range(29):
print
Hi All,
I'm trying to set some JVM options to the executor processes in a
standalone cluster. Here's what I have in *spark-env.sh*:
jmx_opt=-Dcom.sun.management.jmxremote
jmx_opt=${jmx_opt} -Djava.net.preferIPv4Stack=true
jmx_opt=${jmx_opt} -Dcom.sun.management.jmxremote.port=
You can try to add it in in conf/spark-defaults.conf
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value
-Dnumbers=one two three”
Thanks.
Zhan Zhang
On Jan 16, 2015, at 9:56 AM, Michel Dufresne sparkhealthanalyt...@gmail.com
wrote:
Hi All,
I'm trying to set some JVM
On Fri, Jan 16, 2015 at 10:07 AM, Michel Dufresne
sparkhealthanalyt...@gmail.com wrote:
Thank for your reply, I've should have mentioned that spark-env.sh is the
only option i found because:
- I'm creating the SpeakConf/SparkContext from a Play Application
(therefore I'm not using
You are running on a local file system right? HDFS orders the file based on
names, but local file system often don't. I think that's why the difference.
We might be able to do a sort and order the partitions when we create a RDD
to make this universal though.
On Fri, Jan 16, 2015 at 8:26 AM,
+1 to adding such an optimization to parquet. The bytes are tagged
specially as UTF8 in the parquet schema so it seem like it would be
possible to add this.
On Fri, Jan 16, 2015 at 8:17 AM, Mick Davies michael.belldav...@gmail.com
wrote:
Hi,
It seems that a reasonably large proportion of
Hi, thinking of picking up this Jira ticket:
https://issues.apache.org/jira/browse/SPARK-4259
Anyone done any work on this to date? Any thoughts on it before we go too
far in?
Thanks!
Best
Andrew
Hi David,
Yes, we are still headed in that direction.
Please take a look at the repo I sent earlier.
I think that's a good starting point.
Thanks,
-Kushal.
On Thu, Jan 15, 2015 at 8:31 AM, David Robinson drobin1...@gmail.com
wrote:
I am new to Spark and GraphX, however, I use Tinkerpop
Reynold,
The source file you are directing me to is a little too terse for me to
understand what exactly is going on. Let me tell you what I'm trying to do
and what problems I'm encountering, so that you might be able to better
direct me investigation of the SparkSQL codebase.
I am computing the
Looking at https://github.com/kdatta/tinkerpop3/compare/graphx-gremlin I
only see a maven build file. Do you have some source code some place else?
I've worked on a spark based implementation (
https://github.com/kellrott/spark-gremlin ), but its not done and I've been
tied up on other projects.
Reynold,
Your clarification is much appreciated. One issue though, that I would
strongly encourage you to work on, is to make sure that the Scaladoc CAN be
generated manually if needed (a Use at your own risk clause would be
perfectly legitimate here). The reason I say this is that currently even
The source code is under a new module named 'graphx'. let me double check.
On Fri, Jan 16, 2015 at 2:11 PM, Kyle Ellrott kellr...@soe.ucsc.edu wrote:
Looking at https://github.com/kdatta/tinkerpop3/compare/graphx-gremlin I
only see a maven build file. Do you have some source code some place
Hi Alex,
Can you attach the output of sql(explain extended your
query).collect.foreach(println)?
Thanks,
Yin
On Fri, Jan 16, 2015 at 1:54 PM, Alessandro Baretta alexbare...@gmail.com
wrote:
Reynold,
The source file you are directing me to is a little too terse for me to
understand what
code updated. sorry, wrong branch uploaded before.
On Fri, Jan 16, 2015 at 2:13 PM, Kushal Datta kushal.da...@gmail.com
wrote:
The source code is under a new module named 'graphx'. let me double check.
On Fri, Jan 16, 2015 at 2:11 PM, Kyle Ellrott kellr...@soe.ucsc.edu
wrote:
Looking at
Yes, I am running on a local file system.
Is there a bug open for this? Mingyu Kim reported the problem last April:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html
-Ewan
On 01/16/2015 07:41 PM, Reynold Xin wrote:
You are running on a
Please find my answers on JIRA page.
Muhammad-Ali
On Thursday, January 15, 2015 3:25 AM, Xiangrui Meng men...@gmail.com
wrote:
Please find my comments on the JRIA page. -Xiangrui
On Tue, Jan 13, 2015 at 1:49 PM, Muhammad Ali A'råby
angelland...@yahoo.com.invalid wrote:
I have to
18 matches
Mail list logo