RE: GraphX partition problem

2014-05-25 Thread Zhicharevich, Alex
I’m not sure about 1.2TB, but I can give it a shot. Is there some way to persist intermediate results to disk? Does all the graph has to be in memory? Alex From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Monday, May 26, 2014 12:23 AM To: user@spark.apache.org Subject: Re: GraphX partition p

Re: how to set task number?

2014-05-25 Thread qingyang li
I using " create table bigtable002 tblproperties('shark.cache'='tachyon') as select * from bigtable001" to create table bigtable002; while bigtable001 is load from hdfs, it's format is text file , so i think bigtable002's is text. 2014-05-26 11:14 GMT+08:00 Aaron Davidson : > What is the forma

Re: counting degrees graphx

2014-05-25 Thread dizzy5112
sorry directions of edges in this image -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370p6384.html Sent from the Apache Spark User List mailing list ar

Re: counting degrees graphx

2014-05-25 Thread dizzy5112
yes thats correct I want the vertex set for each source vertice in the graph. Which of course leads me on to my next question is to add a level to each of these. For example the image shows the in and out links of the g

Re: Fails: Spark sbt/sbt publish local

2014-05-25 Thread Aaron Davidson
Googling that error, I came across something that appears relevant: https://groups.google.com/forum/#!msg/spark-users/T1soH67C5M4/vihzNt92anYJ I'd try just doing sbt/sbt clean first, and if that fails, digging deeper into that thread. (By the way, "sbt/sbt publish-local" IS what you want, otherw

Re: Fails: Spark sbt/sbt publish local

2014-05-25 Thread ABHISHEK
Thanks for reply Aaron. I tried with "sbt/sbt publish local" but got below error. [error] /home/cloudera/at_Installation/spark-0.9.1-bin-cdh4/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:669: type mismatch; [error] found : org.apache.spark.streaming.dst

Re: Fails: Spark sbt/sbt publish local

2014-05-25 Thread Aaron Davidson
I suppose you actually ran "publish-local" and not "publish local" like your example showed. That being the case, could you show the compile error that occurs? It could be related to the hadoop version. On Sun, May 25, 2014 at 7:51 PM, ABHISHEK wrote: > Hi, > I'm trying to install Spark along w

Re: how to set task number?

2014-05-25 Thread Aaron Davidson
What is the format of your input data, prior to insertion into Tachyon? On Sun, May 25, 2014 at 7:52 PM, qingyang li wrote: > i tried "set mapred.map.tasks=30" , it does not work, it seems shark does > not support this setting. > i also tried "SET mapred.max.split.size=6400", it does not wor

Re: counting degrees graphx

2014-05-25 Thread ankurdave
Sorry, I missed vertex 6 in that example. It should be [{1}, {1}, {1}, {1}, {1, 6}, {6}, {7}, {7}]. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370p6378.html Sent from the Apache Spark User List mailing list archive at Nabble.c

Re: counting degrees graphx

2014-05-25 Thread Ankur Dave
I'm not sure I understand what you're looking for. Could you provide some more examples to clarify? One interpretation is that you want to tag the source vertices in a graph (those with zero indegree) and find for each vertex the set of sources that lead to that vertex. For vertices 1-8 in the gra

Re: how to set task number?

2014-05-25 Thread qingyang li
i tried "set mapred.map.tasks=30" , it does not work, it seems shark does not support this setting. i also tried "SET mapred.max.split.size=6400", it does not work,too. is there other way to control task number in shark CLI ? 2014-05-26 10:38 GMT+08:00 Aaron Davidson : > You can try setting

Fails: Spark sbt/sbt publish local

2014-05-25 Thread ABHISHEK
Hi, I'm trying to install Spark along with Shark. Here's configuration details: Spark 0.9.1 Shark 0.9.1 Scala 2.10.3 Spark assembly was successful but running "sbt/sbt publish-local" failed. Please refer attached log for more details and advise. Thanks, Abhishek Sparkhome>SPARK_HADOOP_VERSION=2.0

Re: how to set task number?

2014-05-25 Thread Aaron Davidson
You can try setting "mapred.map.tasks" to get Hive to do the right thing. On Sun, May 25, 2014 at 7:27 PM, qingyang li wrote: > Hi, Aaron, thanks for sharing. > > I am using shark to execute query , and table is created on tachyon. I > think i can not using RDD#repartition() in shark CLI; > if s

Re: com.google.protobuf out of memory

2014-05-25 Thread Hao Wang
Hi, Zuhair According to my experience, you could try following steps to avoid Spark OOM: 1. Increase JVM memory by adding export SPARK_JAVA_OPTS="-Xmx2g" 2. Use .persist(storage.StorageLevel.MEMORY_AND_DISK) instead of .cache() 3. Have you set spark.executor.memory value? It's 512m by de

Re: how to set task number?

2014-05-25 Thread qingyang li
Hi, Aaron, thanks for sharing. I am using shark to execute query , and table is created on tachyon. I think i can not using RDD#repartition() in shark CLI; if shark support "SET mapred.max.split.size" to control file size ? if yes, after i create table, i can control file num, then I can contr

Re: how to set task number?

2014-05-25 Thread Aaron Davidson
How many partitions are in your input data set? A possibility is that your input data has 10 unsplittable files, so you end up with 10 partitions. You could improve this by using RDD#repartition(). Note that mapPartitionsWithIndex is sort of the "main processing loop" for many Spark functions. It

counting degrees graphx

2014-05-25 Thread dizzy5112
Hi, looking for a little help on counting the degrees in a graph. Currently my graph consists of 2 subgraphs the and it looks like this: val vertexArray = Array( (1L,("101","x")), (2L,("102","y")), (3L,("103","y")), (4L,("104","y")), (5L,("105","y")), (6L,("106","x")), (7L,("107","x")), (8L,("108"

Re: how to set task number?

2014-05-25 Thread qingyang li
hi, Mayur, thanks for replying. I know spark application should take all cores by default. My question is how to set task number on each core ? If one silce, one task, how can i set silce file size ? 2014-05-23 16:37 GMT+08:00 Mayur Rustagi : > How many cores do you see on your spark master (80

Re: can communication and computation be overlapped in spark?

2014-05-25 Thread wxhsdp
anyone see my thread? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-communication-and-computation-be-overlapped-in-spark-tp6348p6368.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Comprehensive Port Configuration reference?

2014-05-25 Thread Andrew Ash
Hi Jacob, The config option spark.history.ui.port is new for 1.0 The problem that History server solves is that in non-Standalone cluster deployment modes (Mesos and YARN) there is no long-lived Spark Master that can store logs and statistics about an application after it finishes. History serve

Re: KryoSerializer Exception

2014-05-25 Thread Andrew Ash
Hi Andrea, What version of Spark are you using? There were some improvements in how Spark uses Kryo in 0.9.1 and to-be 1.0 that I would expect to improve this. Also, can you share your registrator's code? Another possibility is that Kryo can have some difficulty serializing very large objects.

Re: problem about broadcast variable in iteration

2014-05-25 Thread Andrew Ash
Hi Randy, In Spark 1.0 there was a lot of work done to allow unpersisting data that's no longer needed. See the below pull request. Try running kvGlobal.unpersist() on line 11 before the re-broadcast of the next variable to see if you can cut the dependency there. https://github.com/apache/spar

Re: Dead lock running multiple Spark jobs on Mesos

2014-05-25 Thread Andrew Ash
Hi Martin, Tim suggested that you pastebin the mesos logs -- can you share those for the list? Cheers, Andrew On Thu, May 15, 2014 at 5:02 PM, Martin Weindel wrote: > Andrew, > > thanks for your response. When using the coarse mode, the jobs run fine. > > My problem is the fine-grained mode.

Re: GraphX partition problem

2014-05-25 Thread Ankur Dave
Once the graph is built, edges are stored in parallel primitive arrays, so each edge should only take 20 bytes to store (srcId: Long, dstId: Long, attr: Int). Unfortunately, the current implementation in EdgePartitionBuilder uses an array of Edge objects as an intermediate representation for sortin

Re: PySpark & Mesos random crashes

2014-05-25 Thread Mark Hamstra
The end of your example is the same as SPARK-1749. When a Mesos job causes an exception to be thrown in the DAGScheduler, that causes the DAGScheduler to need to shutdown the system. As part of that shutdown procedure, the DAGScheduler tries to kill any running jobs; but Mesos doesn't support tha

PySpark & Mesos random crashes

2014-05-25 Thread Perttu Ranta-aho
Hi, We have a small Mesos (0.18.1) cluster with 4 nodes. Upgraded to Spark 1.0.0-rc9, to overcome some PySpark bugs. But now we are experiencing random crashes with almost every job. Local jobs run fine, but same code with same data set in Mesos cluster leads to errors like: 14/05/22 15:03:34 ERR

RE: GraphX partition problem

2014-05-25 Thread Zhicharevich, Alex
Thanks Ankur, Built it from git and it works great. I have another issue now. I am trying to process a huge graph with about 20 billion edges with GraphX. I only load the file, compute connected components and persist it right back to disk. When working with subgraphs

Re: Using Spark to analyze complex JSON

2014-05-25 Thread Michael Armbrust
On Sat, May 24, 2014 at 11:47 PM, Mayur Rustagi wrote: > > Is the in-memory columnar store planned as part of SparkSQL ? > This has already been ported from Shark, and is used when you run cacheTable. > Also will both HiveQL & SQLParser be kept updated? > Yeah, we need to figure out exactly wha

com.google.protobuf out of memory

2014-05-25 Thread Zuhair Khayyat
Dear all, I am getting a OutOfMemoryError in class ByteString.java from package com.google.protobuf when processing very large data using spark 0.9. Does increasing spark.shuffle.memoryFraction helps or I should add more memory to my workers? Below the error I get during execution. 14/05/25 07:26