Re: [Haskell-cafe] looking for a good algorithm

2009-11-13 Thread Casey Hawthorne
Sorry, I forgot to ask an important question.

Is the table stored 
in a dense format as in complete rows and complete columns
or
in a sparse table format?


The question is more about algorithm than Haskell. But I am going to code in 
Haskell which I am still learning. 

Suppose I have a large table, with hundreds of columns and thousands of rows. 
But not every cell has a value (of String, or Int, or Double type). 

I want to shuffle the rows to maximize the number of columns whose first 100 
rows have at least one number, given a list of preferred column names since 
there is no guarantee that every number column will have at least one number 
in its first 100 rows after shuffling. 

Can someone provide a good algorithm for this problem? (I do not have any 
background in algorithms.) You can assume I already know which columns are of 
Int or Double type. 

--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] looking for a good algorithm

2009-11-13 Thread Hong Yang
Hi Casey,

Thanks very much for your zeal.

The table is a csv file. Actually the number of rows can be sixty thousand.

Hong

On Fri, Nov 13, 2009 at 5:08 PM, Casey Hawthorne cas...@istar.ca wrote:

 Sorry, I forgot to ask an important question.

 Is the table stored
 in a dense format as in complete rows and complete columns
 or
 in a sparse table format?


 The question is more about algorithm than Haskell. But I am going to code
 in Haskell which I am still learning.

 Suppose I have a large table, with hundreds of columns and thousands of
 rows. But not every cell has a value (of String, or Int, or Double type).

 I want to shuffle the rows to maximize the number of columns whose first
 100 rows have at least one number, given a list of preferred column names
 since there is no guarantee that every number column will have at least one
 number in its first 100 rows after shuffling.

 Can someone provide a good algorithm for this problem? (I do not have any
 background in algorithms.) You can assume I already know which columns are
 of Int or Double type.

 --
 Regards,
 Casey

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] looking for a good algorithm

2009-11-12 Thread Casey Hawthorne
To: Casey Hawthorne cas...@istar.ca
Subject: Re: [Haskell-cafe] looking for a good algorithm
From: Casey Hawthorne cas...@istar.ca
Date: Thu, 12 Nov 2009 11:14:02 -0800

On third thought, convert the table to a 2D array of bits (or a 1D
array of bits mapped to a 2D coordinate system).
The bit is true for either an Int or Double.

If space is a concern and one can tolerate a low rate of false
positives, than a 2D Bloom Filter might be appropriate, or a 1D Bloom
Filter with mapping to 2D coordinates.

http://en.wikipedia.org/wiki/Bloom_filter


Then on to simmulated annealing, possibly.


Note: Donald Knuth's new fascicle is out:
Volume 4 Fascicle 1, Bitwise Tricks  Techniques; Binary Decision
Diagrams (2009), xiii+261pp. ISBN 0-321-58050-8


On Wed, 11 Nov 2009 15:32:35 -0800, you wrote:

So, as I understand it, you have a very large sparse table, thousands
of rows and hundreds of columns, of which each cell within a column of
type String, Int, or Double can contain one of those types or nothing.

Then you to want to shuffle the rows to maximize the number of columns
whose first 100 rows have at least one number (Int or Double), given a
list of preferred column names since there is no guarantee that every
number column will have at least one number in its first 100 rows
after shuffling.


I'm wondering about hashing on the rows and hashing on the columns,
then the column hash has the number of Int's or Double's (don't need
the String's) in that column and the rows they are in.

The row hash would have the number of Int's and Double's in that row
and what column's they are in.

Then;

Then scan the row hash and sort into descending order, and by tagging
those rows, not by actually moving them.

Then I think your ready for simmulated annealing.

--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] looking for a good algorithm

2009-11-11 Thread Tillmann Vogt

Hong Yang schrieb:

The question is more about algorithm than Haskell. But I am going to code in
Haskell which I am still learning.

Suppose I have a large table, with hundreds of columns and thousands of
rows. But not every cell has a value (of String, or Int, or Double type).

I want to shuffle the rows to maximize the number of columns whose first 100
rows have at least one number, given a list of preferred column names since
there is no guarantee that every number column will have at least one number
in its first 100 rows after shuffling.

Can someone provide a good algorithm for this problem? (I do not have any
background in algorithms.) You can assume I already know which columns are
of Int or Double type.


I would say it depends on the distribution of values in the table. If 
there are rows with a lot of values and rows with few values, then I 
would first sort the rows after the number of cells with values. If you 
look at all the columns and the number of values for each row is unique 
then it would be perfectly solved. With a list of preferred columns and 
also a uniform distribution the problem might be hard (NP-complete?), 
but these hard problems can often be approximated, i.e with simulated 
annealing, which in short is switching two rows repeatedly as long as 
the result improves.




This is not a homework.
Thanks,

Hong




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] looking for a good algorithm

2009-11-11 Thread Hong Yang
Thanks. I actually prototyped in Perl the SA method intuitively (though I do
not know its name). But it is way slow. So I want to improve the speed by
means of both algorithm and language.

Hong

On Wed, Nov 11, 2009 at 9:36 AM, Tillmann Vogt tillmann.v...@rwth-aachen.de
 wrote:

 Hong Yang schrieb:

  The question is more about algorithm than Haskell. But I am going to code
 in
 Haskell which I am still learning.

 Suppose I have a large table, with hundreds of columns and thousands of
 rows. But not every cell has a value (of String, or Int, or Double type).

 I want to shuffle the rows to maximize the number of columns whose first
 100
 rows have at least one number, given a list of preferred column names
 since
 there is no guarantee that every number column will have at least one
 number
 in its first 100 rows after shuffling.

 Can someone provide a good algorithm for this problem? (I do not have any
 background in algorithms.) You can assume I already know which columns are
 of Int or Double type.


 I would say it depends on the distribution of values in the table. If there
 are rows with a lot of values and rows with few values, then I would first
 sort the rows after the number of cells with values. If you look at all the
 columns and the number of values for each row is unique then it would be
 perfectly solved. With a list of preferred columns and also a uniform
 distribution the problem might be hard (NP-complete?), but these hard
 problems can often be approximated, i.e with simulated annealing, which in
 short is switching two rows repeatedly as long as the result improves.



 This is not a homework.
 Thanks,

 Hong


 

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] looking for a good algorithm

2009-11-11 Thread Casey Hawthorne
So, as I understand it, you have a very large sparse table, thousands
of rows and hundreds of columns, of which each cell within a column of
type String, Int, or Double can contain one of those types or nothing.

Then you to want to shuffle the rows to maximize the number of columns
whose first 100 rows have at least one number (Int or Double), given a
list of preferred column names since there is no guarantee that every
number column will have at least one number in its first 100 rows
after shuffling.


I'm wondering about hashing on the rows and hashing on the columns,
then the column hash has the number of Int's or Double's (don't need
the String's) in that column and the rows they are in.

The row hash would have the number of Int's and Double's in that row
and what column's they are in.

Then;

Then scan the row hash and sort into descending order, and by tagging
those rows, not by actually moving them.

Then I think your ready for simmulated annealing.


--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe