Re: [Haskell-cafe] How to select n random words from a file ...

2012-06-11 Thread Aditya Manthramurthy
Pick the i-th word (replacing the previously chosen word, if any) with
probability 1/i? (numbering of words starts from 1 instead of 0).

On 11 June 2012 11:13, KC kc1...@gmail.com wrote:

 An interesting related problem is if you are only allowed one pass through
 the data how would you randomly choose one word.





 --
 --
 Regards,
 KC

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to select n random words from a file ...

2012-06-11 Thread Jerzy Karczmarczuk

KC:
An interesting related problem is if you are only allowed one pass 
through the data how would you randomly choose one word.


Let's choose  n items.

You must know the length of the sequence, of course, otherwise the 
'probability' loses its sense. So, for lists it might not be just one 
pass...


Suppose the length of the sequence be m.

Suppose you have a random generator  called rg, just a simple function 
which transforms: seed - seed' , between 0 and 1


Make n and m real to make the typechecker happy.
Then the most straightforward solution for lists is:

nran l n = nr l m n seed where
  m = fromIntegral(length(l))
  nr [] _ _ _ = []
  nr (x:q) m n seed =
let seed'=rg seed
in  ifseed'  n/m then x:nr q (m-1) (n-1) seed'
else  nr q (m-1) n seed'

-- =

Now, you may make it tail-recursive, use a different random generation 
protocol, or whatever. I believe that this solution is known for years...


Jerzy Karczmarczuk


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] How to select n random words from a file ...

2012-06-10 Thread Noon Silk
Hi,

 I'm clearly new to haskell, and I suppose those is a basic question,
but after a bit of searching I've been unable to figure out the best
way to do this. I was first trying to find out how to, say, get a
random element from a list, but I'm starting to think that may not be
the best way (because list access isn't constant time, among other
reasons). So, I wonder if there is a more direct way to just get a
random word from a wordfile (i.e. one word per line) ... can anyone
suggest a method?

 Ideally I'd like to be able to adapt this to get n random words from
several files (i.e. randomly select n words from all files; i.e. just
read them into one merged array and choose from there?)

 Thanks for any advice ...

-- 
Noon Silk

Fancy a quantum lunch? https://sites.google.com/site/quantumlunch/

Every morning when I wake up, I experience an exquisite joy — the joy
of being this signature.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to select n random words from a file ...

2012-06-10 Thread Eugene Kirpichov
Hi,

Look up reservoir sampling, it will most probably help.

On Sun, Jun 10, 2012 at 6:21 AM, Noon Silk noonsli...@gmail.com wrote:
 Hi,

  I'm clearly new to haskell, and I suppose those is a basic question,
 but after a bit of searching I've been unable to figure out the best
 way to do this. I was first trying to find out how to, say, get a
 random element from a list, but I'm starting to think that may not be
 the best way (because list access isn't constant time, among other
 reasons). So, I wonder if there is a more direct way to just get a
 random word from a wordfile (i.e. one word per line) ... can anyone
 suggest a method?

  Ideally I'd like to be able to adapt this to get n random words from
 several files (i.e. randomly select n words from all files; i.e. just
 read them into one merged array and choose from there?)

  Thanks for any advice ...

 --
 Noon Silk

 Fancy a quantum lunch? https://sites.google.com/site/quantumlunch/

 Every morning when I wake up, I experience an exquisite joy — the joy
 of being this signature.

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



-- 
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to select n random words from a file ...

2012-06-10 Thread Alexander Solla
On Sun, Jun 10, 2012 at 6:21 AM, Noon Silk noonsli...@gmail.com wrote:

 Hi,

  I'm clearly new to haskell, and I suppose those is a basic question,
 but after a bit of searching I've been unable to figure out the best
 way to do this. I was first trying to find out how to, say, get a
 random element from a list, but I'm starting to think that may not be
 the best way (because list access isn't constant time, among other
 reasons). So, I wonder if there is a more direct way to just get a
 random word from a wordfile (i.e. one word per line) ... can anyone
 suggest a method?


My preferred option, assuming the file sizes make it amenable, is to use
Iteratees to fold the wordlists into an IntMap of words.  You can then take
the size of the map and choose n unique Ints in the range.

Note that any algorithm is going to include these basic steps. (Get the
size, pick randoms, access the words at the key/line number).  The most
direct approach would basically be imperative and in the IO monad, using
things like openFile and hSeek.

This approach gets harder if you want to pull words out of multiple
wordlists.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to select n random words from a file ...

2012-06-10 Thread KC
An interesting related problem is if you are only allowed one pass through
the data how would you randomly choose one word.





-- 
--
Regards,
KC
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe