Re: mesos or kubernetes ?

2016-08-14 Thread Gurvinder Singh
On 08/13/2016 08:24 PM, guyoh wrote: > My company is trying to decide whether to use kubernetes or mesos. Since we > are planning to use Spark in the near future, I was wandering what is the > best choice for us. > Thanks, > Guy > Both Kubernetes and Mesos enables you to share your infrastructur

spark master ui to proxy app and worker ui

2016-03-03 Thread Gurvinder Singh
Hi, I am wondering if it is possible for the spark standalone master UI to proxy app/driver UI and worker UI. The reason for this is that currently if you want to access UI of driver and worker to see logs, you need to have access to their IP:port which makes it harder to open up from networking p

Re: Tungsten and Spark Streaming

2015-09-10 Thread Gurvinder Singh
On 09/10/2015 07:42 AM, Tathagata Das wrote: > Rewriting is necessary. You will have to convert RDD/DStream operations > to DataFrame operations. So get the RDDs in DStream, using > transform/foreachRDD, convert to DataFrames and then do DataFrame > operations. Are there any plans for 1.6 or later

Re: How to avoid shuffle errors for a large join ?

2015-09-05 Thread Gurvinder Singh
On 09/05/2015 11:22 AM, Reynold Xin wrote: > Try increase the shuffle memory fraction (by default it is only 16%). > Again, if you run Spark 1.5, this will probably run a lot faster, > especially if you increase the shuffle memory fraction ... Hi Reynold, Does the 1.5 has better join/cogroup perfo

Re: spark and mesos issue

2014-09-15 Thread Gurvinder Singh
> > > On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone <mailto:vi...@twitter.com>> wrote: > > > On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh > mailto:gurvinder.si...@uninett.no>> > wrote: > > ERROR storage.BlockManagerMasterActor: Got two different bl

Re: Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-04 Thread Gurvinder Singh
I want to add that there a regression when using pyspark to read data from HDFS. its performance during map tasks has gone down approx 1 -> 0.5x. I have tested the 1.0.2 and the performance was fine, but the 1.1 release candidate has this issue. I tested by setting the following properties to make

Re: hdfs read performance issue

2014-08-20 Thread Gurvinder Singh
this operation. Any suggestion/help in this regard will be helpful. - Gurvinder On 08/14/2014 10:27 AM, Gurvinder Singh wrote: > Hi, > > I am running spark from the git directly. I recently compiled the newer > version Aug 13 version and it has performance drop of 2-3x in read from >

read performance issue

2014-08-14 Thread Gurvinder Singh
Hi, I am running spark from the git directly. I recently compiled the newer version Aug 13 version and it has performance drop of 2-3x in read from HDFS compare to git version of Aug 1. So I am wondering which commit would have cause such an issue in read performance. The performance is almost sam

Re: SQLCtx cacheTable

2014-08-04 Thread Gurvinder Singh
where to look for changing the mesos setting in this case. - Gurvinder > > On Sun, Aug 3, 2014 at 11:35 PM, Gurvinder Singh > mailto:gurvinder.si...@uninett.no>> wrote: > > On 08/03/2014 02:33 AM, Michael Armbrust wrote: > > I am not a mesos expert... but it sound

Re: SQLCtx cacheTable

2014-08-03 Thread Gurvinder Singh
It has exact size set in -Xms/-Xmx params. Do you if somehow I can find which class or thread inside the spark jvm process is using how much memory and see which makes it to reach the memory limit on CacheTable case where as not in cache RDD case. - Gurvinder > > On Fri, Aug 1, 2014 at 12:07 A

Re: SQLCtx cacheTable

2014-07-31 Thread Gurvinder Singh
ed for SchemaRDDs. It something similar to MEMORY_ONLY_SER > but not quite. You can specify the persistence level on the > SchemaRDD itself and register that as a temporary table, however it > is likely you will not get as good performance. > > > On Thu, Jul 31, 2014 at

SQLCtx cacheTable

2014-07-31 Thread Gurvinder Singh
Hi, I am wondering how can I specify the persistence level in cacheTable. As it is takes only table name as parameter. It should be possible to specify the persistence level. - Gurvinder

Re: reading compress lzo files

2014-07-05 Thread Gurvinder Singh
On 07/06/2014 05:19 AM, Nicholas Chammas wrote: > On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh > mailto:gurvinder.si...@uninett.no>> wrote: > > csv = > > sc.newAPIHadoopFile(opts.input,"com.hadoop.mapreduce.LzoTextInputFormat",

Re: reading compress lzo files

2014-07-04 Thread Gurvinder Singh
io.LongWritable","org.apache.hadoop.io.Text").count() - Gurvinder On 07/03/2014 06:24 PM, Gurvinder Singh wrote: > Hi all, > > I am trying to read the lzo files. It seems spark recognizes that the > input file is compressed and got the decompressor as > > 14/07/

Re: issue with running example code

2014-07-04 Thread Gurvinder Singh
is not set. > > Just to mention again the pyspark works fine as well as spark-shell, > only when we are running compiled jar it seems SPARK_HOME causes some > java run time issues that we get class cast exception. > > Thanks, > Gurvinder > On 07/01/2014 09:28 AM, Gurvinde

spark and mesos issue

2014-07-04 Thread Gurvinder Singh
We are getting this issue when we are running jobs with close to 1000 workers. Spark is from the github version and mesos is 0.19.0 ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling about it seems that mesos is st

Re: issue with running example code

2014-07-03 Thread Gurvinder Singh
fine as well as spark-shell, only when we are running compiled jar it seems SPARK_HOME causes some java run time issues that we get class cast exception. Thanks, Gurvinder On 07/01/2014 09:28 AM, Gurvinder Singh wrote: > Hi, > > I am having issue in running scala example code. I have t

reading compress lzo files

2014-07-03 Thread Gurvinder Singh
Hi all, I am trying to read the lzo files. It seems spark recognizes that the input file is compressed and got the decompressor as 14/07/03 18:11:01 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 14/07/03 18:11:01 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [h

issue with running example code

2014-07-01 Thread Gurvinder Singh
Hi, I am having issue in running scala example code. I have tested and able to run successfully python example code, but when I run the scala code I get this error java.lang.ClassCastException: cannot assign instance of org.apache.spark.examples.SparkPi$$anonfun$1 to field org.apache.spark.rdd.Ma