rror with small
> volume files
>
> Pradeep
>
> On May 20, 2016, at 9:32 AM, Sahil Sareen <sareen...@gmail.com> wrote:
>
> Hey all
>
> I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting
> my application performance due to these errors.
> Can someo
Hey all
I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my
application performance due to these errors.
Can someone please let out all the possible problems that could cause this?
Full log:
16/05/19 02:17:54 ERROR ContextCleaner: Error cleaning broadcast 266685
Try Neo4j for visualization, GraphX does a pretty god job at distributed
graph processing.
On Thu, Jan 28, 2016 at 12:42 PM, Balachandar R.A. wrote:
> Hi
>
> I am new to GraphX. I have a simple csv file which I could load and
> compute few graph statistics. However, I
Hey everyone!
I'm using spark and graphx for graph processing and wish to export a
subgraph to Neo4j(from the spark-submit console) for visualisation and
basic graph querying that neo4j supports.
I looked at the mazerunner project but it seems to be overkill. Any
alternatives?
-Sahil
Isn't this just about defining a case class and using
parse(json).extract[CaseClassName] using Jackson?
-Sahil
On Wed, Jan 27, 2016 at 11:08 PM, Andrés Ivaldi wrote:
> We dont have Domain Objects, its a service like a pipeline, data is read
> from source and they are saved
Is there any difference between the following snippets:
val df = hiveContext.createDataFrame(rows, schema)
df.registerTempTable("myTable")
df.cache()
and
val df = hiveContext.createDataFrame(rows, schema)
df.registerTempTable("myTable")
hiveContext.cacheTable("myTable")
-Sahil
Spark 1.5.2
dfOld.registerTempTable("oldTableName")
sqlContext.cacheTable("oldTableName")
//
// do something
//
dfNew.registerTempTable("oldTableName")
sqlContext.cacheTable("oldTableName")
Now when I use the "oldTableName" table I do get the latest contents
from dfNew but do the
of the given
> [[Queryable]].
> ...
> val planToCache = query.queryExecution.analyzed
> if (lookupCachedData(planToCache).nonEmpty) {
>
> Is the schema for dfNew different from that of dfOld ?
>
> Cheers
>
> On Fri, Dec 18, 2015 at 3:33 AM, Sahil Sareen <
:
>
> def sameResult(plan: LogicalPlan): Boolean = {
>
> There is detailed comment above this method which should give some idea.
>
> Cheers
>
> On Fri, Dec 18, 2015 at 9:21 AM, Sahil Sareen <sareen...@gmail.com> wrote:
>
>> Thanks Ted!
>>
>> Y
ory Deserialized 1x Replicated2100%728.2 KB0.0 B0.0 BThis
means it wasn't overwritten :(
My question now is, if only the latest table is going to be used, why isn't
the earlier version auto cleared?
On Fri, Dec 18, 2015 at 11:44 PM, Sahil Sareen <sareen...@gmail.com> wrote:
> So I looked at the
I'm trying to do this in unit tests:
val sConf = new SparkConf()
.setAppName("RandomAppName")
.setMaster("local")
val sc = new SparkContext(sConf)
val sqlContext = new TestHiveContext(sc) // tried new HiveContext(sc)
as well
But I get this:
*[scalatest] **Exception
Attaching the JIRA as well for completeness:
https://issues.apache.org/jira/browse/SPARK-12117
On Thu, Dec 3, 2015 at 4:13 PM, Sachin Aggarwal
wrote:
>
> Hi All,
>
> need help guys, I need a work around for this situation
>
> *case where this works:*
>
> val TestDoc1
"select 'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case
when uid is null then 0 else 1 end),sum(case when uid is null then 1 else 0
end),sum(case when uid is null then 1 else 0 end)/count(uid) from tb"
Is this as is, or did you use a UDF here?
-Sahil
On Thu, Dec 3, 2015 at 4:06
Yeah, Thats the example from the link I just posted.
-Sahil
On Thu, Dec 3, 2015 at 11:41 AM, Akhil Das
wrote:
> Something like this?
>
> val df =
> sqlContext.read.load("examples/src/main/resources/users.parquet")df.select("name",
>
PTAL:
http://stackoverflow.com/questions/29213404/how-to-split-an-rdd-into-multiple-smaller-rdds-given-a-max-number-of-rows-per
-Sahil
On Thu, Dec 3, 2015 at 9:18 AM, Ram VISWANADHA <
ram.viswana...@dailymotion.com> wrote:
> Yes. That did not help.
>
> Best Regards,
> Ram
> From: Ted Yu
Did you see: http://spark.apache.org/docs/latest/sql-programming-guide.html
-Sahil
On Thu, Dec 3, 2015 at 11:35 AM, fightf...@163.com
wrote:
> HI,
> How could I save the spark sql cli running queries results and write the
> results to some local file ?
> Is there any
+1 looks like a bug
I think referencing trades() twice in multiplication is broken,
scala> trades.select(trades("quantity")*trades("quantity")).show
+-+
|(quantity * quantity)|
+-+
| null|
| null|
scala>
You could use "filter" to eliminate headers from your text file RDD while
going over each line.
-Sahil
On Thu, Dec 3, 2015 at 9:37 AM, Jeff Zhang wrote:
> Are you read csv file ? If so you can use spark-csv which support skip
> header
>
>
Im using Spark1.4.2 with Hadoop 2.7, I tried increasing
spark.shuffle.io.maxRetries to 10 but didn't help.
Any ideas on what could be causing this??
This is the exception that I am getting:
[MySparkApplication] WARN : Failed to execute SQL statement select *
from TableS s join TableC c on
I tried increasing spark.shuffle.io.maxRetries to 10 but didn't help.
This is the exception that I am getting:
[MySparkApplication] WARN : Failed to execute SQL statement select *
from TableS s join TableC c on s.property = c.property from X YZ
org.apache.spark.SparkException: Job
20 matches
Mail list logo