Re: [Jprogramming] Jdb vs Spark

bill lam Sat, 02 Dec 2017 20:49:00 -0800

Right. Jd can run on commodity hardware, not only that, I tested Jd can
even run on raspberry pi3 and passed test suite.




On Dec 3, 2017 7:28 AM, "Scott Locklin" <[email protected]> wrote:

> On Nov 29, 2017; 12:17pm, Miles Wells wrote:
>
> > Did Scott ever write that blog post he mentioned in the linked post? I
> am
> > quite interested in reading more about his testing.
>
>
> No, I never did. I quit the company a few months later and attempted to
> found a TSDB company called Kerf, so I was a little busy with other
> things. Kerf didn't work out, and I'm back working with jd again.
>
> Spark beating incident was one of those "had a big company deadline"
> things, and the Spark solution not only didn't have the functionality to
> solve the problem, despite having several Spark bigshots on the team, it
> was obscenely slow, even using parquet files.
>
> Jd 2.0 had more than enough functionality to solve the problem
> (aggregating a 500gb data set into a smaller data set our machine
> learning tools could use), and was fast and memory efficient. It was a
> peculiar problem in that the data was relatively modest in rowspace, but
> huge in column space (and it got column bigger with the aggregate).
> Pretty sure jd could have handled a problem like this on 4x the size on
> the server I had access to. Remember: jd is single threaded: spark was
> running on something like 250 CPUs (6-8 machines).
>
> Spark is not worth it unless you have a huge cluster and can solve the
> problem in no other way.  Maybe it's gotten better since then, but a lot
> of the design decisions can be described in no other way than "bad."
> It's CPU, IO and memory inefficient. Better than Hadoop at least. A
> friend of mine who was an early google employee needed a map-reduce
> framework for his new project. He fiddled with these open source ones,
> laughed at how bad they were, wrote his own in a week, and has been
> using it in production for a few years now.
>
> J/jd is capable of doing spark-like things (aka sharding the problem
> across many machines) with some back end work. The Kx people have done
> this sort of thing. Maybe some day this will happen; APL languages are a
> natural fit for parallel compute -there are old papers on doing this,
> and I think one of J's ancestors were specifically designed for it (FP).
>
> In the meantime, I think jd is a great tool for terascale problems which
> have some sort of time orientation (and probably without time
> orientation also). It is comparable in speed to K and it costs less.
> Plus I know J better, and J comes out of the box with more tools I need
> (including a large personal library of J based prediction algorithms).
>
> I'm not yet grinding particularly big data on my present jd-related
> project, but it's reassuring to know that I can when the time comes.
>
>
> -SL
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Jdb vs Spark

Reply via email to