Re: [Jprogramming] A few general questions from a wannabe J-er

Joey K Tuttle Sun, 30 Mar 2008 21:00:50 -0700

Mattia,

Your enthusiasm is refreshing - more than 10 years ago, I presented apaper about dealing with "large" data sets using j - you can read itat


  http://www.jsoftware.com/papers/tuttle.htm

and there are several other interesting papers, including some aboutdealing with large collections of data at


  http://www.jsoftware.com/jwiki/Articles

Of course, what were large data sets in 1996 can be easily done inmemory these days... After my talk in 1996, I continue to use j tolook at largish collections of data - but none of the magnitude youspeak of. Memory mapped files are a very powerful tool, and a 64 bitsystem should allow you to work directly with your large data sets -depending on how the files are organized, mapping can make provide "jlike" structures that are pleasant to work with. If your files areencoded as relational databases, they may be difficult to processdirectly - but sometimes even such encoded files can be handleddirectly.

You are likely to get more opinions and answers if you give someexample data and the kinds of analysis you want to perform. I assumeyou have discovered and experimented with things like aggregation /.in j. You would probably have to use some chunking of data to processthe 7-13 Bbyte collections, but given generous memory and fast IO youshould get good performance - guessing what the times might actuallybe is quite beyond me, but again, if you give some example data andwhat you are trying to extract, some forum members may have actualexperience to speculate about your performance questions.

One thing that is almost always true is that finding good algorithmsis important to avoid brute force (long/slow) solutions...


- joey


At 23:24  -0400 2008/03/30, Mattia Landoni wrote:

Hi all,

this is a narrowed-down version of an email I just sent to the general list
with the same subject

I am an economist and I discovered J a few days ago. I haven't been so
excited since when I was 13 and Santa brought me an 8-bit Nintendo
Entertainment System. Yet before taking a week off from work to study J
(just kidding) I would like to be sure it does everything I need. Here is
what concerns me the most.

- How does J deal with very large datasets? currently I am dealing with a
65-Gb dataset. So far only software I can use is SAS. Performing an SQL
query [SELECT, GROUP BY] in SAS on a dedicated server takes me six hours, of
which a large part of the time is network I/O (I guess SAS's computing time
would be an hour, perhaps two). The data is divided in 7 chunks of 7 to 13
Gb each. Having the same amount of data on a good computer, would I be able
to perform the same operations with J? Assume plentiful RAM and speedy
processor: what's the order of magnitude of the time it would take?
- I read something about memory mapping in past posts and I intuitively
understand what it means but I never did it. What are the limits of memory
mapping? In general, what are the techniques to deal with large datasets?

Any answer, hint, link,... most welcome.

Mattia

--
Mattia Landoni
1201 S Eads St Apt 417
Arlington, VA 22202-2837
USA
Greenwich -5 hours

Office: +1 202 62 35922
Cell: +1 202 492 3404
Home: +1 360 968 1684

Govern a great country as you would fry a small fish: do not poke at it too
much.
-- Lao Tzu
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] A few general questions from a wannabe J-er

Reply via email to