uld find a way to get this example working (for arbitrary values of
rowSize), I suspect that it would also give me a solution to the
custom-aggregation issue I outlined in my previous email. Any suggestions
would be much appreciated.
Thanks,
~ Andrew
On Mon, Aug 12, 2019 at 5:31 PM Andrew L
Hi All,
I'm attempting to clean up some Spark code which performs groupByKey /
mapGroups to compute custom aggregations, and I could use some help
understanding the Spark API's necessary to make my code more modular and
maintainable.
In particular, my current approach is as follows:
- Start w
When training a RandomForest model, the Strategy class (in
mllib.tree.configuration) provides a subsamplingRate parameter. I was hoping
to use this to cut down on processing time for large datasets (more than 2MM
rows and 9K predictors), but I've found that the runtime stays approximately
cons
: Thursday, April 23, 2015 4:46 PM
To: Andrew Leverentz
Cc: user@spark.apache.org
Subject: Re: Understanding Spark/MLlib failures
Hi Andrew,
I observed similar behavior under high GC pressure, when running ALS. What
happened to me was that, there would be very long Full GC pauses (over 600
seconds
cryptic error messages along the
lines of “Missing an output location for shuffle.” Having some way to diagnose
what’s really going here on would be helpful.
~ Andrew
From: Reza Zadeh [mailto:r...@databricks.com]
Sent: Thursday, April 23, 2015 4:58 PM
To: Andrew Leverentz
Cc: user
Subject: Re