Right. Jd can run on commodity hardware, not only that, I tested Jd can even run on raspberry pi3 and passed test suite.
On Dec 3, 2017 7:28 AM, "Scott Locklin" <[email protected]> wrote: > On Nov 29, 2017; 12:17pm, Miles Wells wrote: > > > Did Scott ever write that blog post he mentioned in the linked post? I > am > > quite interested in reading more about his testing. > > > No, I never did. I quit the company a few months later and attempted to > found a TSDB company called Kerf, so I was a little busy with other > things. Kerf didn't work out, and I'm back working with jd again. > > Spark beating incident was one of those "had a big company deadline" > things, and the Spark solution not only didn't have the functionality to > solve the problem, despite having several Spark bigshots on the team, it > was obscenely slow, even using parquet files. > > Jd 2.0 had more than enough functionality to solve the problem > (aggregating a 500gb data set into a smaller data set our machine > learning tools could use), and was fast and memory efficient. It was a > peculiar problem in that the data was relatively modest in rowspace, but > huge in column space (and it got column bigger with the aggregate). > Pretty sure jd could have handled a problem like this on 4x the size on > the server I had access to. Remember: jd is single threaded: spark was > running on something like 250 CPUs (6-8 machines). > > Spark is not worth it unless you have a huge cluster and can solve the > problem in no other way. Maybe it's gotten better since then, but a lot > of the design decisions can be described in no other way than "bad." > It's CPU, IO and memory inefficient. Better than Hadoop at least. A > friend of mine who was an early google employee needed a map-reduce > framework for his new project. He fiddled with these open source ones, > laughed at how bad they were, wrote his own in a week, and has been > using it in production for a few years now. > > J/jd is capable of doing spark-like things (aka sharding the problem > across many machines) with some back end work. The Kx people have done > this sort of thing. Maybe some day this will happen; APL languages are a > natural fit for parallel compute -there are old papers on doing this, > and I think one of J's ancestors were specifically designed for it (FP). > > In the meantime, I think jd is a great tool for terascale problems which > have some sort of time orientation (and probably without time > orientation also). It is comparable in speed to K and it costs less. > Plus I know J better, and J comes out of the box with more tools I need > (including a large personal library of J based prediction algorithms). > > I'm not yet grinding particularly big data on my present jd-related > project, but it's reassuring to know that I can when the time comes. > > > -SL > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
