Re: [Jprogramming] A few general questions from a wannabe J-er

Devon McCormick Sun, 30 Mar 2008 21:15:50 -0700

Hi Mattia -

I was working on a longer answer to your solver question because I have
something written up on that but I'll just say that the short answer is
there are any number of simple, elegant solvers that can be implemented in J
and I have examples of using a couple of them.  Unfortunately, searching the
J wiki for "solver" turns up nothing of interest though I'm sure there is
material out there.


As far as working with large datasets, I've been vainly attempting to meet
the Netflix challenge though I have been able to work easily with its
dataset in a crude way.  It's about 100 million small records - about 1% the
size of yours in terms of bytes - and I can do simple things to all the
records in about five minutes on a 2 GHz XP machine having 1 GB RAM.  I
don't know if this is good or bad from your perspective.

The crude technique I've implemented to work with this large dataset is to
break it up into about 100 files that I treat en-masse by applying a J verb
across the group or selected parts of it.  It works well enough for my
purposes.

Anyway, with any luck we'll continue this dialog.

Good luck,

Devon

On 3/30/08, Mattia Landoni <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> this is a narrowed-down version of an email I just sent to the general
> list
> with the same subject
>
>
> I am an economist and I discovered J a few days ago. I haven't been so
> excited since when I was 13 and Santa brought me an 8-bit Nintendo
> Entertainment System. Yet before taking a week off from work to study J
>
> (just kidding) I would like to be sure it does everything I need. Here is
> what concerns me the most.
>
>
> - How does J deal with very large datasets? currently I am dealing with a
> 65-Gb dataset. So far only software I can use is SAS. Performing an SQL
> query [SELECT, GROUP BY] in SAS on a dedicated server takes me six hours,
> of
> which a large part of the time is network I/O (I guess SAS's computing
> time
> would be an hour, perhaps two). The data is divided in 7 chunks of 7 to 13
> Gb each. Having the same amount of data on a good computer, would I be
> able
> to perform the same operations with J? Assume plentiful RAM and speedy
> processor: what's the order of magnitude of the time it would take?
> - I read something about memory mapping in past posts and I intuitively
> understand what it means but I never did it. What are the limits of memory
> mapping? In general, what are the techniques to deal with large datasets?
>
>
> Any answer, hint, link,... most welcome.
>
>
> Mattia
>
>
> --
> Mattia Landoni
> 1201 S Eads St Apt 417
> Arlington, VA 22202-2837
> USA
> Greenwich -5 hours
>
> Office: +1 202 62 35922
> Cell: +1 202 492 3404
> Home: +1 360 968 1684
>
> Govern a great country as you would fry a small fish: do not poke at it
> too
> much.
> -- Lao Tzu
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] A few general questions from a wannabe J-er

Reply via email to