Re: [Jprogramming] J performance with large excel spreadsheets.

amit bolakani Mon, 21 Apr 2008 18:01:22 -0700

Hi guys,
Thanks for all the responses. I really appreciate all you guys trying to
help me out with my problem. And I think I owe you guys a better explanation
of the problem. This is an interesting set of requirements and I don't know
if something like this has been done with J before. Here is what I am trying
to achieve:
I want to build a tool for Windows which will have a GUI based interface and
will be able to interface with:
1. Databases (SQL, Oracle, Mysql etc)
2. Excel and CSV files
3. Text files
These interfaces will be used to pull data into the tool. While inputting
the data, I want the user to be able to specify some data transformations on
the incoming data and also be able to add new user defined columns in the
dataset which would be derived from already existing columns. Once I have
imported this data I would want to have the ability to store it in memory so
that I can easily manipulate it. The tool should be restricted only by the
amount of RAM on the PC (ideally something like MAXWS that Dyalog APL has).
Ideally I would want to build an in memory database so that I have easy
access and manipulation capabilities. I would then like to run things like
frequency reports, averages, means, variances of the various columns and the
like on the data and show reports and produce output excel/csv/word files.
I think J could be ideal for such a project though I am a newbie to J
programming.
There are 2 approaches that I have been considering before I dive in:
1. Use a C# front end for the GUI and Excel/CSV/DB interfaces and read the
data into a SQLite database while operating it in an in-memory mode. Though
I am not sure how I would easily be able to apply column transformations and
add new columns (would the transformations be done in J and the new columns
be added using SQLite) and moreover what the performance of this would be
for large files (like say a million records -- basically limited by the PC
RAM size). With this approach I would use J for processing the operations.
Note that I don't need to persist the data -- so I would effectively
construct an temporary in memory db session with SQLite (and I understand
that SQLite already has an addon for J).
2. Use J for GUI and use an in memory database (I don't think anything is
happening with jdatabase project for extending J with database
capabilities?) like kdb (I believe there is an addon for J with kdb as
well). I would also in this approach use J for the processing the
operations.
Lastly, I would want to give the users the capability to apply operations on
the data and the set of operations should only be restricted by the rich
functionality that is offered by J. I would want to basically map the user
defined operations easily onto J operations, almost as if the user defined
operations are disguised J operations.
I am still not sure if J can handle all the above requirements and all your
inputs would be extremely valuable to help me make my decision.
All in all, I want to build an analytical database  with rich user defined
functionality with is only restricted  by  the  features offered by J and
want to have optimum performance (in seconds)  without the need to persist
the data and do it all on Windows with a GUI.
Thanks to everyone in advance.
-Amit
NOTE: If there are any J experts in NYC, I would like to sit down with them
(if they are kind enough and interested enough to spend time on this) and
work out more details


On Mon, Apr 21, 2008 at 10:32 AM, Devon McCormick <[EMAIL PROTECTED]>
wrote:

> FWIW - since I have Excel 2007 installed on one of my machines, I thought
> I'd mention that its row and column limits are 1048576 x 16384.
>
> I kept running out of memory trying to use Tara to write my 65536 by 256
> matrix of floating point numbers to a spreadsheet.  I also had problems
> using OLE but was able to do it in pieces (about 16K rows at a time) in
> about 1 minute (on a 2.66 GHz machine with 2 GB RAM).  I'm sure it could
> be
> done in Tara much the same way (possibly more quickly) but I'm not
> familiar
> with that package.
>
> On 4/21/08, Sherlock, Ric <[EMAIL PROTECTED]> wrote:
> >
> > Amit,
> > >From the thread you started on comp.lang.apl it seems that your files
> are
> > only about 50,000 rows. As Richard suggested in that thread - that isn't
> > very big. You shouldn't have to resort to memory-mapped files for
> something
> > that big. If you just want to import the data from the Excel spreadsheet
> and
> > then work with it in J then I wouldn't imagine that will present too
> much of
> > a problem.
> >
> > I just used the Tara addon for J to create an Excel workbook with a
> sheet
> > containing 50,000 rows and 10 columns. I then used Tara to read that
> data
> > from the Excel worksheet back into J.
> >
> > Importing (reading) the file into J took a number of seconds using Tara,
> > but once the array was in J, I was able to operate on the whole (or
> parts of
> > the) array quickly and easily. For example getting the sum of all 10
> columns
> > over the 50,000 rows of the resulting array took 0.0057 seconds in J.
> >
> ...
>
>
> --
> Devon McCormick, CFA
> ^me^ at acm.
> org is my
> preferred e-mail
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] J performance with large excel spreadsheets.

Reply via email to