We want to migrate our data (approximately 20M rows) from parquet to postgres,
when we are using dataframe writer's jdbc method the execution time is very
large, we have tried the same with batch insert it was much effective.
Is it intentionally implemented in that way?
Hi,
hope this will help you
import org.apache.spark.sql.functions._
import sqlContext.implicits._
import java.sql.Timestamp
val df = sc.parallelize(Array((date1, date2))).toDF("day1", "day2")
val dateDiff = udf[Long, Timestamp, Timestamp]((value1, value2) =>
Days.d
It's good
Thanks for your reply Michael.
On Thu, Aug 20, 2015 at 11:03 PM, Michael Armbrust
wrote:
> We will probably fix this in Spark 1.6
>
> https://issues.apache.org/jira/browse/SPARK-10040
>
> On Thu, Aug 20, 2015 at 5:18 AM, Aram Mkrtchyan <
> aram.mkrtchyan...
which are the best practices to submit spark streaming application on mesos.
I would like to know about scheduler mode.
Is `coarse-grained` mode right solution?
Thanks
at 12:15 PM, Aram Mkrtchyan <
aram.mkrtchyan...@gmail.com> wrote:
> which are the best practices to submit spark streaming application on
> mesos.
> I would like to know about scheduler mode.
> Is `coarse-grained` mode right solution?
>
> Thanks
>
Trying to build recommendation system using Spark MLLib's ALS.
Currently, we're trying to pre-build recommendations for all users on daily
basis. We're using simple implicit feedbacks and ALS.
The problem is, we have 20M users and 30M products, and to call the main
predict() method, we need to ha
ubset of users that were or are
> likely to be active soon. (Or compute on the fly.) Is anything like
> that an option?
>
> On Wed, Mar 18, 2015 at 7:13 AM, Aram Mkrtchyan
> wrote:
> > Trying to build recommendation system using Spark MLLib's ALS.
> >
> > Curr
;s also something that needs building more code.
>
> I'm sure a couple people could chime in on that here but it's kind of
> a separate topic.
>
> On Wed, Mar 18, 2015 at 8:04 AM, Aram Mkrtchyan
> wrote:
> > Thanks much for your reply.
> >
> > By saying
Hi.
I'm trying to trigger DataFrame's save method in parallel from my driver.
For that purposes I use ExecutorService and Futures, here's my code:
val futures = [1,2,3].map( t => pool.submit( new Runnable {
override def run(): Unit = {
val commons = events.filter(_._1 == t).map(_._2.common)
g is getting caught in your closure, maybe
> unintentionally, that's not serializable. It's not directly related to
> the parallelism.
>
> On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
> wrote:
> > Hi.
> >
> > I'm trying to trigger DataFrame's sav
Hi,
We want to have Marathon starting and monitoring Chronos, so that when
Chronos based Spark job fails, marathon automatically restarts them in
scope of Chronos. Will this approach work if we start Spark jobs as shell
scripts from Chronos or Marathon?
11 matches
Mail list logo