unsubscribe

2019-12-09 Thread Calvin Tran
unsubscribe On Dec. 9, 2019 6:59 a.m., "Areg Baghdasaryan (BLOOMBERG/ 731 LEX)" wrote: This e-mail (and any attachments) is intended only for the use of the addressee and may contain confidential and privileged information. If you are not the intended recipient, any collection, use, disclosu

Re: Remove dependence on HDFS

2017-02-13 Thread Calvin Jia
which describes a similar architecture. Hope this helps, Calvin On Mon, Feb 13, 2017 at 12:46 AM, Saisai Shao wrote: > IIUC Spark doesn't strongly bind to HDFS, it uses a common FileSystem > layer which supports different FS implementations, HDFS is just one option. > You could also us

Re: Question about Spark and filesystems

2016-12-19 Thread Calvin Jia
ng you flexibility to store your data in your preferred storage without performance penalties. Hope this helps, Calvin On Sun, Dec 18, 2016 at 11:23 PM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > I am using gluster and i have decent performance with basic mainten

Re: About Spark Multiple Shared Context with Spark 2.0

2016-12-13 Thread Calvin Jia
summarizing the end-to-end workflow of using Alluxio to share RDDs <https://alluxio.com/blog/effective-spark-rdds-with-alluxio> or Dataframes <https://alluxio.com/blog/effective-spark-dataframes-with-alluxio> between Spark jobs. Hope this helps, Calvin On Tue, Dec 13, 2016 at 3:42 AM, C

Re: sanboxing spark executors

2016-11-04 Thread Calvin Jia
/en/Security.html> has more details. Hope this helps, Calvin On Fri, Nov 4, 2016 at 6:31 AM, Andrew Holway < andrew.hol...@otternetworks.de> wrote: > I think running it on a Mesos cluster could give you better control over > this kinda stuff. > > > On Fri, Nov 4, 2016 at 7:41 AM, blaze

Re: feasibility of ignite and alluxio for interfacing MPI and Spark

2016-09-19 Thread Calvin Jia
dds-with-alluxio/> which details the pros and cons of the two approaches. At a high level, Alluxio performs better with larger datasets or if you plan to use your dataset in more than one Spark job. Hope this helps, Calvin

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-28 Thread Calvin Jia
Hi, Thanks for the detailed information. How large is the dataset you are running against? Also did you change any Tachyon configurations? Thanks, Calvin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: spark 1.6.0 on ec2 doesn't work

2016-01-19 Thread Calvin Jia
Hi Oleg, The Tachyon related issue should be fixed. Hope this helps, Calvin On Mon, Jan 18, 2016 at 2:51 AM, Oleg Ruchovets wrote: > Hi , >I try to follow the spartk 1.6.0 to install spark on EC2. > > It doesn't work properly - got exceptions and at the end standalon

Re: Re: Spark RDD cache persistence

2015-12-09 Thread Calvin Jia
://tachyon-project.org/documentation/Running-Spark-on-Tachyon.html Hope this helps, Calvin On Thu, Nov 5, 2015 at 10:29 PM, Deenar Toraskar wrote: > You can have a long running Spark context in several fashions. This will > ensure your data will be cached in memory. Clients will access t

Re: spark shared RDD

2015-12-09 Thread Calvin Jia
http://tachyon-project.org/documentation/Running-Spark-on-Tachyon.html Hope this helps, Calvin On Tue, Nov 10, 2015 at 2:24 AM, Ben wrote: > Hi, > After reading some documentations about spark and ignite, > I am wondering if shared RDD from ignite can be used to share data in > m

Re: Saving RDDs in Tachyon

2015-12-09 Thread Calvin Jia
Spark-on-Tachyon.html . Hope this helps, Calvin On Fri, Oct 30, 2015 at 7:04 AM, Akhil Das wrote: > I guess you can do a .saveAsObjectFiles and read it back as sc.objectFile > > Thanks > Best Regards > > On Fri, Oct 23, 2015 at 7:57 AM, mark wrote: > >> I have Avro re

Re: How does Spark coordinate with Tachyon wrt data locality

2015-10-23 Thread Calvin Jia
Hi Shane, Tachyon provides an api to get the block locations of the file which Spark uses when scheduling tasks. Hope this helps, Calvin On Fri, Oct 23, 2015 at 8:15 AM, Kinsella, Shane wrote: > Hi all, > > > > I am looking into how Spark handles data locality wrt Tachyon. My

Re: TTL for saveAsObjectFile()

2015-10-14 Thread Calvin Jia
with Spark in this meetup <http://www.meetup.com/Tachyon/events/226030510/>, which may be helpful to understanding different ways to integrate. Hope this helps, Calvin On Tue, Oct 13, 2015 at 1:07 PM, antoniosi wrote: > Hi, > > I am using RDD.saveAsObjectFile() to save the R

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread Calvin Jia
case where Baidu runs Tachyon to get 30x performance improvement in their SparkSQL workload. Hope this helps, Calvin On Fri, Aug 7, 2015 at 9:42 AM, Muler wrote: > Spark is an in-memory engine and attempts to do computation in-memory. > Tachyon is memory-centeric distributed storage, OK,

Re: tachyon

2015-08-07 Thread Calvin Jia
Tachyon <http://tachyon-project.org> to manage more than 100 nodes in production resulting in a 30x performance improvement for their SparkSQL workload. They are also using the tiered storage feature in Tachyon giving them over 2PB of Tachyon managed space. Hope this helps, Calvin On Fri, Au

Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread Calvin Jia
Hi, You can apply this patch <https://github.com/apache/spark/pull/5354> and recompile. Hope this helps, Calvin On Tue, Apr 28, 2015 at 1:19 PM, sara mustafa wrote: > Hi Zhang, > > How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon > version > to 0.6.3

pyspark/yarn and inconsistent number of executors

2014-08-19 Thread Calvin
it be preferable to have Spark stop requesting containers if the cluster is at capacity rather than kill the job or error out? Does anyone have any recommendations on how to tweak the number of executors in an automated manner? Thanks, Calvin

Re: Spark working directories

2014-08-14 Thread Calvin
I've had this issue too running Spark 1.0.0 on YARN with HDFS: it defaults to a working directory located in hdfs:///user/$USERNAME and it's not clear how to set the working directory. In the case where HDFS has a non-standard directory structure (i.e., home directories located in hdfs:///users/)