t; besides a Writable.
>
> On Mon, Jul 25, 2016, 18:50 Jia Zou <jacqueline...@gmail.com> wrote:
>
>>
>> My code is as following:
>>
>> System.out.println("Initialize points...");
&g
My code is as following:
System.out.println("Initialize points...");
JavaPairRDD data =
sc.sequenceFile(inputFile, IntWritable.class,
DoubleArrayWritable.class);
Divya,
According to my recent Spark tuning experiences, optimal executor-memory
size not only depends on your workload characteristics (e.g. working set
size at each job stage) and input data size, but also depends on your total
available memory and memory requirements of other components like
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS
instance with 30GB physical memory.
Spark will cache data off-heap to Tachyon, the input data is also stored in
Tachyon.
Tachyon is configured to use 15GB memory, and use tired store.
Tachyon underFS is /tmp.
The only
Hi, dears, the problem has been solved.
I mistakely use tachyon.user.block.size.bytes instead of
tachyon.user.block.size.bytes.default. It works now. Sorry for the
confusion and thanks again to Gene!
Best Regards,
Jia
On Wed, Jan 27, 2016 at 4:59 AM, Jia Zou <jacqueline...@gmail.com>
Dears, I keep getting below exception when using Spark 1.6.0 on top of
Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.
Any suggestions will be appreciated, thanks!
=
Exception in thread "main" org.apache.spark.SparkException: Job aborted
)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 15 more
On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou <jacqueline...@gmail.com> wrote:
> Dears, I keep getting below exception when using Sp
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou <jacqueline...@gmail.com> wrote:
> BTW. The tachyon worker log says
extraJavaOptions per job, or adding it to
> tachyon-site.properties.
>
> I hope that helps,
> Gene
>
> On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou <jacqueline...@gmail.com> wrote:
>
>> Dear all,
>>
>> First to update that the local file system data p
-10-73-198-35:7077
/home/ubuntu/HiBench/src/sparkbench/target/sparkbench-5.0-SNAPSHOT-MR2-spark1.5-jar-with-dependencies.jar
tachyon://localhost:19998/Kmeans/Input/samples 10 5
On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou <jacqueline...@gmail.com> wrote:
> Dears, I keep getting below excep
thod can't work for
Tachyon data.
Do you have any suggestions? Thanks very much!
Best Regards,
Jia
-- Forwarded message ------
From: Jia Zou <jacqueline...@gmail.com>
Date: Thu, Jan 21, 2016 at 10:05 PM
Subject: Spark partition size tuning
To: "user @spark" <user
I configured HDFS to cache file in HDFS's cache, like following:
hdfs cacheadmin -addPool hibench
hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench
But I didn't see much performance impacts, no matter how I configure
dfs.datanode.max.locked.memory
Is it possible that
Dear all!
When using Spark to read from local file system, the default partition size
is 32MB, how can I increase the partition size to 128MB, to reduce the
number of tasks?
Thank you very much!
Best Regards,
Jia
Dear all,
Can I configure Spark on multiple nodes without HDFS, so that output data
will be written to the local file system on each node?
I guess there is no such feature in Spark, but just want to confirm.
Best Regards,
Jia
hat is a Hadoop mapreduce concept, not
> Spark.
>
> On Sun, Jan 17, 2016 at 7:29 AM, Jia Zou <jacqueline...@gmail.com> wrote:
>
>> Dear all,
>>
>> Is there a way to reuse executor JVM across different JobContexts? Thanks.
>>
>> Best Regards,
>> Jia
>>
>
>
Dear all,
Is there a way to reuse executor JVM across different JobContexts? Thanks.
Best Regards,
Jia
Dear all,
I am using Spark1.5.2 and Tachyon0.7.1 to run KMeans with
inputRDD.persist(StorageLevel.OFF_HEAP()).
I've set tired storage for Tachyon. It is all right when working set is
smaller than available memory. However, when working set exceeds available
memory, I keep getting errors like
store the partitions that don't fit on disk and read them from there when
> they are needed.
> Actually, it's not necessary to set so large driver memory in your case,
> because KMeans use low memory for driver if your k is not very large.
>
> Cheers
> Yanbo
>
> 2015-12-30 22:20
I am running Spark MLLib KMeans in one EC2 M3.2xlarge instance with 8 CPU
cores and 30GB memory. Executor memory is set to 15GB, and driver memory is
set to 15GB.
The observation is that, when input data size is smaller than 15GB, the
performance is quite stable. However, when input data becomes
My goal is to use hprof to profile where the bottleneck is.
Is there anyway to do this without modifying and rebuilding Spark source
code.
I've tried to add "
-Xrunhprof:cpu=samples,depth=100,interval=20,lineno=y,thread=y,file=/home/ubuntu/out.hprof"
to spark-class script, but it can only profile
Hi, Ted, it works, thanks a lot for your help!
--Jia
On Sat, Dec 12, 2015 at 3:01 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Have you tried adding the option below through
> spark.executor.extraJavaOptions ?
>
> Cheers
>
> > On Dec 13, 2015, at 3:36 AM, Jia Zou <
21 matches
Mail list logo