RE: Save GraphX to disk

2015-11-13 Thread Buttler, David
A graph is nodes and vertices. What else are you expecting to save/load? You could save/load the triplets, but that is actually more work to reconstruct the graph than the nodes and vertices separately. Dave From: Gaurav Kumar [mailto:gauravkuma...@gmail.com] Sent: Friday, November 13, 2015

RE: hdfs-ha on mesos - odd bug

2015-11-11 Thread Buttler, David
I have verified that this error exists on my system as well, and the suggested workaround also works. Spark version: 1.5.1; 1.5.2 Mesos version: 0.21.1 CDH version: 4.7 I have set up the spark-env.sh to contain HADOOP_CONF_DIR pointing to the correct place, and I have also linked in the hdfs-si

RE: GraphX vs GraphLab

2015-01-13 Thread Buttler, David
would be if the AMP Lab or Databricks maintained a set of benchmarks on the web that showed how much each successive version of Spark improved. Dave From: Madabhattula Rajesh Kumar [mailto:mrajaf...@gmail.com] Sent: Monday, January 12, 2015 9:24 PM To: Buttler, David Subject: Re: GraphX vs

inconsistent edge counts in GraphX

2014-11-10 Thread Buttler, David
Hi, I am building a graph from a large CSV file. Each record contains a couple of nodes and about 10 edges. When I try to load a large portion of the graph, using multiple partitions, I get inconsistent results in the number of edges between different runs. However, if I use a single partitio

RE: K-means with large K

2014-04-28 Thread Buttler, David
@spark.apache.org Cc: user@spark.apache.org Subject: Re: K-means with large K David, Just curious to know what kind of use cases demand such large k clusters Chester Sent from my iPhone On Apr 28, 2014, at 9:19 AM, "Buttler, David" mailto:buttl...@llnl.gov>> wrote: Hi, I am trying to

K-means with large K

2014-04-28 Thread Buttler, David
Hi, I am trying to run the K-means code in mllib, and it works very nicely with small K (less than 1000). However, when I try for a larger K (I am looking for 2000-4000 clusters), it seems like the code gets part way through (perhaps just the initialization step) and freezes. The compute nodes

RE:

2014-04-23 Thread Buttler, David
This sounds like a configuration issue. Either you have not set the MASTER correctly, or possibly another process is using up all of the cores Dave From: ge ko [mailto:koenig@gmail.com] Sent: Sunday, April 13, 2014 12:51 PM To: user@spark.apache.org Subject: Hi, I'm still going to start w