Re: Best practises to storing data in Parquet files

2016-08-29 Thread Mich Talebzadeh
uld we store in HDFS (directory structure, ... )? > Should partition the file into small pieces. > > > > On Aug 28, 2016, at 9:43 PM, Kevin Tran wrote: > > > > Hi, > > Does anyone know what is the best practises to store data to parquet > file? > > Does

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Chanh Le
lete a specific time without rebuild them all. > How should we store in HDFS (directory structure, ... )? Should partition the file into small pieces. > On Aug 28, 2016, at 9:43 PM, Kevin Tran wrote: > > Hi, > Does anyone know what is the best practises to store data to parquet file

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
reference architecture which HBase is apart of ? Please share with me best practises you might know or your favourite designs. Thanks, Kevin. On Mon, Aug 29, 2016 at 5:18 AM, Mich Talebzadeh wrote: > Hi, > > Can you explain about you particular stack. > > Example what i

Re: Best practises to storing data in Parquet files

2016-08-28 Thread Mich Talebzadeh
anyone know what is the best practises to store data to parquet file? > Does parquet file has limit in size ( 1TB ) ? > Should we use SaveMode.APPEND for long running streaming app ? > How should we store in HDFS (directory structure, ... )? > > Thanks, > Kevin. >

Best practises to storing data in Parquet files

2016-08-28 Thread Kevin Tran
Hi, Does anyone know what is the best practises to store data to parquet file? Does parquet file has limit in size ( 1TB ) ? Should we use SaveMode.APPEND for long running streaming app ? How should we store in HDFS (directory structure, ... )? Thanks, Kevin.

Re: Best practises around spark-scala

2016-08-08 Thread Deepak Sharma
n anyone please give any documents that may be there around spark-scala >> best practises? >> >> -- >> Thanks >> Deepak >> www.bigdatabig.com >> www.keosha.net >> >

Re: Best practises around spark-scala

2016-08-08 Thread vaquar khan
hat may be there around spark-scala > best practises? > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net >

Best practises around spark-scala

2016-08-08 Thread Deepak Sharma
Hi All, Can anyone please give any documents that may be there around spark-scala best practises? -- Thanks Deepak www.bigdatabig.com www.keosha.net

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Alex Kozlov
Praveen, the mode in which you run spark (standalone, yarn, mesos) is determined when you create SparkContext . You are right that spark-submit and spark-shell create different SparkContexts. In general, resour

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread praveen S
Even i was trying to launch spark jobs from webservice : But I thought you could run spark jobs in yarn mode only through spark-submit. Is my understanding not correct? Regards, Praveen On 15 Feb 2016 08:29, "Sabarish Sasidharan" wrote: > Yes you can look at using the capacity scheduler or the

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Sabarish Sasidharan
Yes you can look at using the capacity scheduler or the fair scheduler with YARN. Both allow using full cluster when idle. And both allow considering cpu plus memory when allocating resources which is sort of necessary with Spark. Regards Sab On 13-Feb-2016 10:11 pm, "Eugene Morozov" wrote: > Hi

Re: Best practises of share Spark cluster over few applications

2016-02-13 Thread Jörn Franke
This is possible with yarn. You also need to think about preemption in case one web service starts doing something and after a while another web service wants also to do something. > On 13 Feb 2016, at 17:40, Eugene Morozov wrote: > > Hi, > > I have several instances of the same web-service

Best practises of share Spark cluster over few applications

2016-02-13 Thread Eugene Morozov
Hi, I have several instances of the same web-service that is running some ML algos on Spark (both training and prediction) and do some Spark unrelated job. Each web-service instance creates their on JavaSparkContext, thus they're seen as separate applications by Spark, thus they're configured with

Re: Best practises

2015-11-02 Thread Sushrut Ikhar
e: >> >>> HI All, >>> Yes, any such doc will be a great help!!! >>> >>> >>> >>> On Fri, Oct 30, 2015 at 4:35 PM, huangzheng <1106944...@qq.com> wrote: >>> >>>> I have the same question.anyone help us. >>>>

Re: Best practises

2015-11-02 Thread Denny Lee
gt; >> On Fri, Oct 30, 2015 at 4:35 PM, huangzheng <1106944...@qq.com> wrote: >> >>> I have the same question.anyone help us. >>> >>> >>> -- 原始邮件 ------ >>> *发件人:* "Deepak Sharma"; >>> *发送时间

Re: Best practises

2015-11-02 Thread Stefano Baghino
c will be a great help!!! > > > > On Fri, Oct 30, 2015 at 4:35 PM, huangzheng <1106944...@qq.com> wrote: > >> I have the same question.anyone help us. >> >> >> -- 原始邮件 -- >> *发件人:* "Deepak Sharma"; >> *发送时

Re: Best practises

2015-11-02 Thread satish chandra j
(星期五) 晚上7:23 > *收件人:* "user"; > *主题:* Best practises > > Hi > I am looking for any blog / doc on the developer's best practices if using > Spark .I have already looked at the tuning guide on spark.apache.org. > Please do let me know if any one is aware of any such resource. > > Thanks > Deepak >

??????Best practises

2015-10-30 Thread huangzheng
I have the same question.anyone help us. -- -- ??: "Deepak Sharma"; : 2015??10??30??(??) 7:23 ??: "user"; : Best practises Hi I am looking for any blog / doc on the developer's best practice

Best practises

2015-10-30 Thread Deepak Sharma
Hi I am looking for any blog / doc on the developer's best practices if using Spark .I have already looked at the tuning guide on spark.apache.org. Please do let me know if any one is aware of any such resource. Thanks Deepak

Best practises to clean up RDDs for old applications

2015-10-08 Thread Jens Rantil
spark/rdd/spark-local-20150903112858-a72d 23M /var/lib/spark/rdd/spark-local-20150929141201-143f The applications (such as "20150903112858") aren't running anymore. What are best practises to clean these up? A cron job? Enabling some kind of cleaner in Spark? I'm currently ru