Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Sahil Sareen
rror with small > volume files > > Pradeep > > On May 20, 2016, at 9:32 AM, Sahil Sareen <sareen...@gmail.com> wrote: > > Hey all > > I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting > my application performance due to these errors. > Can someo

rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Sahil Sareen
Hey all I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my application performance due to these errors. Can someone please let out all the possible problems that could cause this? Full log: 16/05/19 02:17:54 ERROR ContextCleaner: Error cleaning broadcast 266685

Re: GraphX can show graph?

2016-01-28 Thread Sahil Sareen
Try Neo4j for visualization, GraphX does a pretty god job at distributed graph processing. On Thu, Jan 28, 2016 at 12:42 PM, Balachandar R.A. wrote: > Hi > > I am new to GraphX. I have a simple csv file which I could load and > compute few graph statistics. However, I

Neo4j and Spark/GraphX

2016-01-27 Thread Sahil Sareen
Hey everyone! I'm using spark and graphx for graph processing and wish to export a subgraph to Neo4j(from the spark-submit console) for visualisation and basic graph querying that neo4j supports. I looked at the mazerunner project but it seems to be overkill. Any alternatives? -Sahil

Re: JSON to SQL

2016-01-27 Thread Sahil Sareen
Isn't this just about defining a case class and using parse(json).extract[CaseClassName] using Jackson? -Sahil On Wed, Jan 27, 2016 at 11:08 PM, Andrés Ivaldi wrote: > We dont have Domain Objects, its a service like a pipeline, data is read > from source and they are saved

Difference between DataFrame.cache() and hiveContext.cacheTable()?

2015-12-18 Thread Sahil Sareen
Is there any difference between the following snippets: val df = hiveContext.createDataFrame(rows, schema) df.registerTempTable("myTable") df.cache() and val df = hiveContext.createDataFrame(rows, schema) df.registerTempTable("myTable") hiveContext.cacheTable("myTable") -Sahil

Does calling sqlContext.cacheTable("oldTableName") remove the cached contents of the oldTable

2015-12-18 Thread Sahil Sareen
Spark 1.5.2 dfOld.registerTempTable("oldTableName") sqlContext.cacheTable("oldTableName") // // do something // dfNew.registerTempTable("oldTableName") sqlContext.cacheTable("oldTableName") Now when I use the "oldTableName" table I do get the latest contents from dfNew but do the

Re: Does calling sqlContext.cacheTable("oldTableName") remove the cached contents of the oldTable

2015-12-18 Thread Sahil Sareen
of the given > [[Queryable]]. > ... > val planToCache = query.queryExecution.analyzed > if (lookupCachedData(planToCache).nonEmpty) { > > Is the schema for dfNew different from that of dfOld ? > > Cheers > > On Fri, Dec 18, 2015 at 3:33 AM, Sahil Sareen <

Re: Does calling sqlContext.cacheTable("oldTableName") remove the cached contents of the oldTable

2015-12-18 Thread Sahil Sareen
: > > def sameResult(plan: LogicalPlan): Boolean = { > > There is detailed comment above this method which should give some idea. > > Cheers > > On Fri, Dec 18, 2015 at 9:21 AM, Sahil Sareen <sareen...@gmail.com> wrote: > >> Thanks Ted! >> >> Y

Re: Does calling sqlContext.cacheTable("oldTableName") remove the cached contents of the oldTable

2015-12-18 Thread Sahil Sareen
ory Deserialized 1x Replicated2100%728.2 KB0.0 B0.0 BThis means it wasn't overwritten :( My question now is, if only the latest table is going to be used, why isn't the earlier version auto cleared? On Fri, Dec 18, 2015 at 11:44 PM, Sahil Sareen <sareen...@gmail.com> wrote: > So I looked at the

Using TestHiveContext/HiveContext in unit tests

2015-12-11 Thread Sahil Sareen
I'm trying to do this in unit tests: val sConf = new SparkConf() .setAppName("RandomAppName") .setMaster("local") val sc = new SparkContext(sConf) val sqlContext = new TestHiveContext(sc) // tried new HiveContext(sc) as well But I get this: *[scalatest] **Exception

Re: Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sahil Sareen
Attaching the JIRA as well for completeness: https://issues.apache.org/jira/browse/SPARK-12117 On Thu, Dec 3, 2015 at 4:13 PM, Sachin Aggarwal wrote: > > Hi All, > > need help guys, I need a work around for this situation > > *case where this works:* > > val TestDoc1

Re: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Sahil Sareen
"select 'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case when uid is null then 1 else 0 end)/count(uid) from tb" Is this as is, or did you use a UDF here? -Sahil On Thu, Dec 3, 2015 at 4:06

Re: spark sql cli query results written to file ?

2015-12-02 Thread Sahil Sareen
Yeah, Thats the example from the link I just posted. -Sahil On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das wrote: > Something like this? > > val df = > sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name", >

Re: Improve saveAsTextFile performance

2015-12-02 Thread Sahil Sareen
PTAL: http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per -Sahil On Thu, Dec 3, 2015 at 9:18 AM, Ram VISWANADHA < ram.viswana...@dailymotion.com> wrote: > Yes. That did not help. > > Best Regards, > Ram > From: Ted Yu

Re: spark sql cli query results written to file ?

2015-12-02 Thread Sahil Sareen
Did you see: http://spark.apache.org/docs/latest/sql-programming-guide.html -Sahil On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com wrote: > HI, > How could I save the spark sql cli running queries results and write the > results to some local file ? > Is there any

Re: Multiplication on decimals in a dataframe query

2015-12-02 Thread Sahil Sareen
+1 looks like a bug I think referencing trades() twice in multiplication is broken, scala> trades.select(trades("quantity")*trades("quantity")).show +-+ |(quantity * quantity)| +-+ | null| | null| scala>

Re: how to skip headers when reading multiple files

2015-12-02 Thread Sahil Sareen
You could use "filter" to eliminate headers from your text file RDD while going over each line. -Sahil On Thu, Dec 3, 2015 at 9:37 AM, Jeff Zhang wrote: > Are you read csv file ? If so you can use spark-csv which support skip > header > >

java.io.FileNotFoundException: Job aborted due to stage failure

2015-11-26 Thread Sahil Sareen
Im using Spark1.4.2 with Hadoop 2.7, I tried increasing spark.shuffle.io.maxRetries to 10 but didn't help. Any ideas on what could be causing this?? This is the exception that I am getting: [MySparkApplication] WARN : Failed to execute SQL statement select * from TableS s join TableC c on

Spark 1.4.2- java.io.FileNotFoundException: Job aborted due to stage failure

2015-11-24 Thread Sahil Sareen
I tried increasing spark.shuffle.io.maxRetries to 10 but didn't help. This is the exception that I am getting: [MySparkApplication] WARN : Failed to execute SQL statement select * from TableS s join TableC c on s.property = c.property from X YZ org.apache.spark.SparkException: Job