Try --executor-memory 5g , because you have 8 gb RAM in each machine
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-on-Yarn-tp21729p22603.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
In record reader level you can pass the file name as key or value.
sc.newAPIHadoopRDD(job.getConfiguration,
classOf[AvroKeyInputFormat[myObject]],
classOf[AvroKey[myObject]],
classOf[Text] // can contain your file)
AvroKeyInputFormat extends InputFormat
{
cretaRecor
Hi Steve
i did spark 1.3.0 page rank bench-marking on soc-LiveJournal1 in 4 node
cluster. 16,16,8,8 Gbs ram respectively. Cluster have 4 worker including
master with 4,4,2,2 CPUs
I set executor memroy to 3g and driver to 5g.
No. of Iterations --> GraphX(mins)
1 --> 1
2
To Spark-admin,
I like the data frames in 1.3 version, is there any plan to integrate this
with Graphx in 1.4 or later ?
currently I have huge information in vertex property, if I can use data
frames to hold the properties instead of VerexRDD, that will help me a lot.
--
View this mess
Hi All,
I've big physical machine with 16 CPUs , 256 GB RAM, 20 TB Hard disk. I just
need to know what should be the best solution to make a spark cluster?
If I need to process TBs of data then
1. Only one machine, which contain driver, executor, job tracker and task
tracker everything.
2. crea
I can able to run it without any issue from standalone as well as in cluster.
spark-submit --class org.graphx.test.GraphFromVerteXEdgeArray
--executor-memory 1g --driver-memory 6g --master spark://VM-Master:7077
spark-graphx.jar
code is exact same as above
--
View this message in context:
Instead of setting in SparkConf , set it into
SparkContext.hadoopconfiguration.set(key,value)
and from JobContext extract same key.
--Harihar
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-send-user-variables-from-Spark-client-to-custom-InputForma
I'm also facing the same issue, this is third time whenever I post anything
it never accept by the community and at the same time got a failure mail in
my register mail id.
and when click to "subscribe to this mailing list" link, i didnt get any new
subscription mail in my inbox.
Please anyone
Hi,
I have written custom InputFormat and RecordReader for Spark, I need to use
user variables from spark client program.
I added them in SparkConf
val sparkConf = new
SparkConf().setAppName(args(0)).set("developer","MyName")
*and in InputFormat class*
protected boolean isSplitabl
Spark Doesn't support it, but this connector is open source, you can get it
from github.
The difference between these two DB is depending on what type of solution
you are looking for. Please refer this link :
http://blog.nahurst.com/visual-guide-to-nosql-systems
FYI, from the list of NOSQL in
do set executor memory as well. You have RAM in each node and storage. set it
o 6 GB or more , if require change driver memory from 10 gb to more.
--Harihar
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/We-are-migrating-Tera-Data-SQL-to-Spark-SQL-Quer
Hi
How to set a preferred location for an InputSplit in spark standalone?
I have data in specific machine and I want to read them using Splits which
is created for that node only, by assigning some property which help Spark
to create a split in that node only.
--
View this message in co
I have wrote a custom input split and I want to set to the specific node,
where my data is stored. but currently split can start at any node and pick
data from different node in the cluster. any suggestion, how to set host in
spark
--
View this message in context:
http://apache-spark-user-
13 matches
Mail list logo