Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell

2010-11-21 Thread Manlio Perillo
Il 21/11/2010 06:49, Bruno Damour ha scritto:
 Hello,
 I have a very strange (for me) problem that I manage to reduce to this :
 I have a small program that reads a file with 1 only character (è = e8)
 The program is ftest2.hs :
 

 [...]

The only difference I can see is the codepage used.

The Windows console use codepage 850:
http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using

Instead the default codepage of Windows for western languages is 1252.


Now, fate is that (Python console):
 '\xe8'.decode('cp1252').encode('cp850')
'\x8a'
 '\xde'.decode('cp1252').encode('cp850')
'\xe8'


You can now see the possible cause of the problem.


Try to change the codepage of the console.
See also:
http://www.postgresql.org/docs/9.0/interactive/app-psql.html#AEN75686

 [...]



Regards   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell

2010-11-21 Thread Manlio Perillo
Il 21/11/2010 19:06, Bruno Damour ha scritto:
 Le 21/11/10 17:21, Manlio Perillo a écrit :
 Il 21/11/2010 06:49, Bruno Damour ha scritto:
 Hello,
 I have a very strange (for me) problem that I manage to reduce to this :
 I have a small program that reads a file with 1 only character (è = e8)
 The program is ftest2.hs :

 [...]
 Now, fate is that (Python console):
 '\xe8'.decode('cp1252').encode('cp850')
 '\x8a'
 '\xde'.decode('cp1252').encode('cp850')
 '\xe8'

 [...]

 yes I kind of began to figure that IO might use an environment setting.

Did you tried to execute again the program, setting the console codepage
to 1252?

 That souns a bit weird to me (newbe) at it should impact the result of a
 program depending on where it is launched... its the same binary anyway
 ? or ?

This is only a guess, but recent versions of GHC I/O lib do a low level
encoding, when reading a file in text mode.

This is the correct way, since a Char is supposed to be an Unicode
character.

I assume that when reading a text file, the I/O lib just check the
system encoding and use it.

In your case, you have a text file, codified with codepage 1252, but
that GHC is trying to read using codepage 850, instead.

So, as in the example I posted, you have (using, again, Python syntax):
- the character u'è' - Unicode code point 0xe8
- a byte data in the file, as 0xe8; this is the result of
  u'è'.encode('cp1252')
- a Haskell Char '\xde'; this is the result of
  '\xe8'.decode('cp850')


There are 3 solutions:
1) open the file in binary mode
2) set the console codepage to 1252.

   I do this by changing the Command Prompt shortcut destination to:
 `%SystemRoot%\system32\cmd.exe /k chcp 1252`
3) explicitly set the encoding when reading the file in text mode

   Unfortunately this is now a rather low level and GHC specific
   operation:

http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html

   The Python API is, by the way:
   http://docs.python.org/dev/py3k/library/functions.html#open

   GHC API is quite different (if I understand it correctly).
   You can change the encoding only after the file has been opened, and
   you can change it again after having read some data (in Python,
   instead, the file encoding is immutable)



Regards   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell

2010-11-21 Thread Manlio Perillo
Il 21/11/2010 19:28, Bruno Damour ha scritto:
 [...]
 Of course you're right but that was a surprise to me...

 G:\CODE\rlibchcp 1252

 Page de codes active: 1252

 G:\CODE\rlibftest3.exe

 è

 Just '1'

 G:\CODE\rlibchcp 850

 Page de codes active : 850

 G:\CODE\rlibftest3.exe

 è

 Just '2'


 Quite treacherous IMHO ? Or what

It is not treacherous at all.

When you open a file, GHC use localeEncoding, that, as the name suggest,
depends on system current codepage (in Windows case).

In your example, you are simply changing the codepage, and thus the
program behaviour changes accordling.


It is the same as when you have a program that print some environ
parameter (as an example with System.Environment.getEnvironment).
Of course if you change the OS environ from the console, that program
behaviour will change.

And it is also the same when your program read a file content.
If you change the file content from elsewere, the program behaviour will
change.



As for the original example, I just think that the GHC user guide
*should* clearly explain what does it means to open a file in text mode
[1], and, if possible, add a note about Windows console (as it has been
done with PostgreSQL documentation).


[1] right now I do not remember what the Haskell Report says



Regards   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell

2010-11-21 Thread Manlio Perillo
Il 21/11/2010 21:51, Manlio Perillo ha scritto:
 [...]
 There are 3 solutions:
 1) open the file in binary mode
 2) set the console codepage to 1252.
 
I do this by changing the Command Prompt shortcut destination to:
  `%SystemRoot%\system32\cmd.exe /k chcp 1252`
 3) explicitly set the encoding when reading the file in text mode
 
Unfortunately this is now a rather low level and GHC specific
operation:
 
 http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html
 

Correction: encoding support is in System.IO (base 4.2 package), but it
is not documented in the Haskell 2010 Report.


By the way: what is the rationale why the TextEncoding data does not
contain the encoding name?


Regards   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: dynamic loading of code on windows

2010-11-12 Thread Manlio Perillo
Il 12/11/2010 19:01, Kevin Jardine ha scritto:
 This isn't about the plugin functionality, it's about compiling code.

 As the message says :

 This requires a Unix compatibility toolchain such as MinGW+MSYS or
 Cygwin.


Is it really necessary to use autoconf?

I have read the autoconf.ac file and I'm not sure to understand why
autoconf is used.



Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] back doors into the IO monad

2010-10-23 Thread Manlio Perillo
Hi.

What are the available methods to execute IO actions from pure code?

I know only unsafePerformIO and foreign import (to call a non pure
foreign function).


Assuming I want to execute external untrusted code using plugins (via
the `plugins` package), is it possible to completely forbid execution of
IO actions from user code?



Thanks   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] surrogate code points in a Char

2009-11-18 Thread Manlio Perillo
Hi.

The Unicode Standard (version 4.0, section 3.9, D31 - pag 76) says:

Because surrogate code points are not included in the set of Unicode
scalar values, UTF-32 code units in the range D800 .. DFFF are
ill-formed

However GHC does not reject this code units:

Prelude print '\xD800'
'\55296'


Is this a correct behaviour?
Note that Python, too (2.5.4, UCS4 build, Linux Debian), accept these
code units.



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] attoparsec and parsec

2009-11-15 Thread Manlio Perillo
Jason Dusek ha scritto:
   To add to the confusion, I forked `bytestringparser` when I
   wrote the `json-b` package. The fork is here:
 
 http://hackage.haskell.org/package/bytestringparser-temporary/
 
   I have added a number of things to original as well as fixing
   some problems with it.
 
   The reason I went with the older package is that the new one
   depended on stuff that wouldn't build on Hackage 

I installed attoparsec yesterday without problems.

 so I was like
   whatever; however, I now consider that it might have been
   better to work off the newer package.
 
   A subtle error, corrected in my version, seems yet to be
   present in the `attoparsec-0.7.2`, in an operator used
   internally to build up the result set.
 
 {-# LINE 132 Data/Attoparsec/Internal.hs #-}
 -- | Turn our split representation back into a normal lazy ByteString.
 (+:) :: SB.ByteString - LB.ByteString - LB.ByteString
 sb +: lb | SB.null sb = lb
  | otherwise = LB.Chunk sb lb
 
   Where this operator showed up in `bytestringparser`, I
   replaced `LB.Chunk` with the smart constructor, `LB.chunk`, to
   ensure that the no empty chunks invariant of lazy
   `ByteString`s was followed (I discovered this failing one
   evening when I was fleshing out the JSON parser).
 

Did you try to contact the author/maintainer?
This test can be added to the test suite.



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Lexical Syntax and Unicode

2009-11-14 Thread Manlio Perillo
Hi.

Reading the Haskell 98 Report (section 9.2), I have found a possible
problem.

The lexical syntax supports Unicode, however this is not true for the
newline:

newline - return linefeed | return | linefeed | formfeed


The Unicode standard adds two additional characters:

U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

The Unicode Character Database, also defines two general categories:
Zl = Separator, line
Zp = Separator, paragraph

The Zl category only contains the LINE SEPARATOR character and the Zp
category only contains the PARAGRAPH SEPARATOR character.


So, IMHO, the lexical syntax should be changed in :

newline - return linefeed | return | linefeed | formfeed
   | uniLine | uniPara
uniLine - any Unicode character defined as line separator
uniPara - any Unicode character defined as paragraph separator

or, alternatively:

uniLine - LINE SEPARATOR
uniPara - PARAGRAPH SEPARATOR



Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] attoparsec and parsec

2009-11-14 Thread Manlio Perillo
Hi.

I'm writing a generic log parsing package, and I'm serching a parser
combinators library.

Parsing log files is a simple task, so I would like to use the more
efficient solution.

I have looked at attoparsec source code, and it seems very specialized
for lazy bytestrings parsing, so I assume it is very efficient.

How stable is the attoparsec package?
How much more efficient is attoparsec than standard packages like parsec?

By the way: there seem to be problems with generated documentation in
http://hackage.haskell.org/package/attoparsec.

Moreover, there is a similar package:
http://hackage.haskell.org/package/bytestringparser

what is the status of this package?
It has the same API of attoparsec, but its older.
However there is no indication that this package is deprecated in favor
of attoparsec.



Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Converting IO [XmlTree] to [XmlTree]

2009-04-21 Thread Manlio Perillo
Luke Palmer ha scritto:
 [...]
 Note that it is not always possible to separate IO from pure code.
 As an example, consider an HTTP 1.1 server that read a request body
 containing a number for each line, and return a response body containing
 the sum of the numbers.
 
 
 What?
 
 sumRequest :: String - String
 -- strips header, sums numbers, returns response
 
 sumServer :: IO ()
 -- reads from socket, writes sumRequest to socket
 
 And many other permutations, with differing degrees of laziness and
 parametericity.
 

As long as you stricly read a string from the socket, this is ok.
But you can not read a string lazily.



Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Converting IO [XmlTree] to [XmlTree]

2009-04-19 Thread Manlio Perillo
Henning Thielemann ha scritto:
 
 On Tue, 14 Apr 2009, rodrigo.bonifacio wrote:
 
 I guess this is a very simple question. How can I convert IO [XmlTree]
 to just a list of
 XmlTree?
 
 The old Wiki had:
   http://www.haskell.org/wikisnapshot/ThatAnnoyingIoType.html
 
 Should be ported to the new Wiki since it is a FAQ ...

Note that it is not always possible to separate IO from pure code.
As an example, consider an HTTP 1.1 server that read a request body
containing a number for each line, and return a response body containing
the sum of the numbers.

Here, you can not really separate IO from pure computation.
And you can not use lazy IO, if you want your server to support HTTP 1.1
pipelining.


Regards  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Maybe off-topic -- Writing contracts or software specifications

2009-04-08 Thread Manlio Perillo

Maurí­cio ha scritto:

Hi,
 [...]
I need, for instance, to write a contract with a programmer
we are hiring for a task. 


 [...]

The question is: how much do you trust the programmer?
And how much do the programmer trust you?

Much of the complications of the contracts arise from the need to deal 
with parts that don't trust each other.



A few pages should suffice.

Make sure that:
1) You explain accurately what the program must do, and how the
   programmer intend to write the program.
   Do you need strong unit tests?
2) Write the deadline for program completion.
   What happens if the deadline is not honoured?
3) Write the estimate price for the work.
   Are price changes allowed?



Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] advice on space efficient data structure with efficient snoc operation

2009-04-07 Thread Manlio Perillo

Hi.

I'm still working on my Netflix Prize project.

For a function I wrote, I really need a data structure that is both 
space efficient (unboxed elements) and with an efficient snoc operation.


I have pasted a self contained module with the definition of the 
function I'm using:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453


The movie ratings are loaded from serialized data, and the result is 
serialized again, using the binary package:


transcodeIO :: IO ()
transcodeIO = do
  input - L.hGetContents stdin
  let output = encodeZ $ transcode $ decodeZ input
  L.hPut stdout output

(here encodeZ and decodeZ are wrappers around Data.Binary.encode and 
Data.Binary.decode, with support to gzip compression/decompression)



This function (transcodeIO, not transcode) takes, on my
Core2 CPU T7200  @ 2.00GHz:

real30m8.794s
user29m30.659s
sys 0m10.313s

1068 Mb total memory in use


The problem here is with snocU, that requires a lot of copying.

I rewrote the transcode function so that the input data set is split in 
N parts:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453#a3456

The mapReduce function is the one defined in the Real World Haskell.


The new function takes (using only one thread):

real18m48.039s
user18m30.901s
sys 0m6.520s

1351 Mb total memory in use


The additional required memory is probably caused by unionsWith, that is 
not strict.

The function takes less time, since array copying is optimized.
I still use snocU, but on small arrays.

GC time is very high: 54.4%


Unfortunately I can not test with more then one thread, since I get 
segmentation faults (probably a bug with uvector packages).


I also got two strange errors (but this may be just the result of the 
segmentation fault, I'm not able to reproduce them):


tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 || 
(new_prio = __sched_fifo_min_prio  new_prio = 
__sched_fifo_max_prio)' failed.



internal error: removeThreadFromQueue: not found
(GHC version 6.8.2 for i386_unknown_linux)
Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug



Now the question is: what data structure should I use to optimize the 
transcode function?


IMHO there are two solutions:

1) Use a lazy array.
   Something like ByteString.Lazy, and what is available in
   storablevector package.

   Using this data structure, I can avoid the use of appendU.

2) Use an unboxed list.

   Something like:
  http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h

   That is: a linked list of unboxed arrays, but unlike the lazy array
   solution, a snoc operation avoid copying if there is space in the
   current array.

   I don't know if this is easy/efficient to implement in Haskell.


Any other suggestions?


Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] advice on space efficient data structure with efficient snoc operation

2009-04-07 Thread Manlio Perillo

Edward Kmett ha scritto:
I'm in the process of adding a Data.Sequence.Unboxed to 
unboxed-containers. I hope to have it in hackage today or tomorrow, 
should its performance work out as well as Data.Set.Unboxed.
 


Looking at the data definition of Data.Sequence I suspect that it is not 
really space efficient.


Please note that I have 480189 arrays, where each array has, on average, 
209 (Word16 :*: Word8) elements.


Using a [(Word32, UArr (Word16 :*: Word8))] takes about 800 MB (but it's 
hard to measure exact memory usage).



 [...]


Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] binary package: memory problem decoding an IntMap

2009-04-05 Thread Manlio Perillo

Manlio Perillo ha scritto:

Hi.

I'm having memory problems decoding a big IntMap.

The data structure is:

IntMap (UArr (Word16 :*: Word8))


There are 480189 keys, and a total of 100480507 elements
(Netflix Prize).
The size of the encoded (and compressed) data is 184 MB.

When I load data from the Netflix Prize data set, total memory usage is
1030 Mb.



It seems there is a problem with tuples, too.

I have a:
[(Word16, UArr (Word32 :*:* Word8))]

This eats more memory than it should, since tuples are decoded lazily.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] binary package: memory problem decoding an IntMap

2009-04-05 Thread Manlio Perillo

Manlio Perillo ha scritto:

[...]

It seems there is a problem with tuples, too.

I have a:
[(Word16, UArr (Word32 :*:* Word8))]

This eats more memory than it should, since tuples are decoded lazily.



My bad, sorry.

I simply solved by using a strict consumer (foldl' instead of foldl).


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] about import alias

2009-04-04 Thread Manlio Perillo

Hi.

Haskell 98 allows import alias:
import qualified VeryLongModuleName as C

however it does not allow aliasing for imported names
import NormalModule (veryLongFunctionName as C)


what is the rational?
IMHO this can be very useful in some cases.



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] binary package: memory problem decoding an IntMap

2009-04-02 Thread Manlio Perillo

Hi.

I'm having memory problems decoding a big IntMap.

The data structure is:

IntMap (UArr (Word16 :*: Word8))


There are 480189 keys, and a total of 100480507 elements
(Netflix Prize).
The size of the encoded (and compressed) data is 184 MB.

When I load data from the Netflix Prize data set, total memory usage is
1030 Mb.

However when I try to decode the data, memory usage grows too much (even 
using the -F1.1 option in the RTS).



The problem seems to be with `fromAscList` function, defined as:

fromList :: [(Key,a)] - IntMap a
fromList xs
  = foldlStrict ins empty xs
  where
ins t (k,x)  = insert k x t

(by the way, why IntMap module does not use Data.List.foldl'?).

The `ins` function is not strict.



This seems an hard problem to solve.
First of all, IntMap should provide strict variants of the implemented 
functions.

And the binary package should choose whether use the strict or lazy version.


For me, the simplest solution is to serialize the association list 
obtained from `toAscList` function, instead of directly serialize the 
IntMap.


The question is: can I reuse the data already serialized?
Is the binary format of `IntMap a` and `[(Int, a)]` compatible?



Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-04-01 Thread Manlio Perillo

wren ng thornton ha scritto:

Manlio Perillo wrote:
Since ratings for each customers are parsed at the same time, using 
a plain list would consume a lot of memory, since stream fusion can 
only be executed at the end of the parsing.


On the other hand, when I want to group ratings by movies, stream 
fusion seems to work fine.



[...]

For the problem as you've discussed it, I'd suggest a different 
approach: You can't fit all the data into memory at once, so you 
shouldn't try to. You should write one program that takes in the 
per-movie grouping of data and produces a per-user file as output. 


Well, creating 480189 files in a directory is not a very nice thing to 
do to a normal file system.


I should arrange files in directory, but then this starts to become too 
complex.


The solution I'm using now just works.
It takes about 950 MB of memory and 35 minutes, but it's not a big 
problem since:

1) Once loaded, I can serialize the data in binary format
2) I think that the program can be parallelized, parsing
   subsets of the files in N threads, and then merging the maps.

   Using this method, should optimize array copying.

   The problem is that unionWith seems to be lazy, and there is no no
   strict variant; I'm not sure.

Then 
have your second program read in the reorganized data and do fusion et al.


This reduces the problem to just writing the PerMovie - PerUser 
program. Since you still can't fit all the data into memory, that means 
you can't hope to write the per-user file in one go. 


The data *do* fit into memory, fortunately.

 [...]


Best of luck.




Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-31 Thread Manlio Perillo

Claus Reinke ha scritto:
appendU is strict, insertWith just doesn't force it (follow the 
source link

in the haddocks to see why).


Ok, I see.
But, IMHO, this should be clearly documented.


There seems to be some agreement that strict variant operations should
also be provided, but it needs some more thinking, and whenever I look
at the code, I have the feeling that there are just too many operations
defined in it, so I look away again:-(



Well, it does not matter if strict variant operation is provided or not.
But at least it should be documented whether an implemented operation is 
strict or lazy.




However, reading the code now, I prefer my version using alter.


Yes, it is a more direct way to enforce strict evaluation. Btw, if your
update function 'f' is strict in 'old' and 'new', you might just want to 
use 'Just $! f old new' and save all those 'seq's.




Righ, thanks.
And this seems to be a bit more memory friendly, too.


By the way, about insertWith/alter; from IntMap documentation:

insertWithKey: O(min(n,W)
alter: O(log n)

So, alter is more efficient than insertWithKey?
And what is that `W` ?


'W' is a maximum bound (see comment at top of that haddock page).


Where?
And the top of that page there is only a link to a paper where the 
algorithm is defined.



'alter' has some shortcuts if the update functions doesn't actually do
anything, but for reimplementing 'insertWithKey', the complexity should 
be the same as the code would be the same.




Ok.


But piling up appendUs is still not a good idea. For a moment,
I thought that the stream representation's (+++) was handling
runtime fusion gracefully, but a simple test case suggests otherwise
at least for the simpler case of consU (the attached appendU.hs
doesn't do any appendUs, as the consU case already demonstrates
the issue; be careful with large numbers here, it'll quickly eat your 
ram):


I'm not sure to fully understand the code.


[ Ok ]



But, again, IMHO it does not apply to my original problem.


It is a question of resource constraints, probably. Instead of maintaining
an 'IntMap (UArr Int)' all the way through buildup, you could use an
'IntMap [Int]' while parsing, and then compress into 'IntMap (UArr Int)'
when parsing is finished. 


No, as I have written, this is not possible for my problem.
Because this would eat all available memory after few minutes.

The data I'm parsing from Netflix Prize data set contains:
17770 movies
480189 customers (with customer ids ranging from 1 to 2649429)
100480507 total ratings


ratings are grouped for each movie, but I want to group them by costumers.
This means that each customer has few ratings (200, on average), so 
there are a lot of small arrays.


Since ratings for each customers are parsed at the same time, using a 
plain list would consume a lot of memory, since stream fusion can only 
be executed at the end of the parsing.


On the other hand, when I want to group ratings by movies, stream fusion 
seems to work fine.


This is IMHO, of course.


If you can afford the larger memory profile
during parsing, this should give better overall performance, with the
same memory need for the final state, but larger memory needs up
to that state. I'd be interested in the differences, if you try it.



Unfortunately, I can not afford this memory profile!
My PC has only 2 GB of RAM.

And I doubt any other could afford the required memory.
Using UArr, the total required memory is about 900 MB.



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-31 Thread Manlio Perillo

Claus Reinke ha scritto:

Can I close this ticket as not being to do with uvector?
-- Don


You did notice the suggestion that performance of uvector and bytestring 
could be improved drastically if compile-time fusion would be augmented

with runtime fusion?



The storablevector package implements Data.StorableVector.Lazy

Just as with Data.ByteString.Lazy, it contains a linked list of chunks.

I think that this can improve performances of my implementation, since 
it is much more efficient to append elements at the end of the vector 
(it avoids a lot of copying).


In my draft implementation of the the Netflix Prize in D language, I 
used a similar implementation, base on:

http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h

Unfortunately, D garbage collector is really bad when there are a lot of 
allocations, so I gave up.




Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haddock: inserting interactive sessions in the documentation

2009-03-31 Thread Manlio Perillo

Wouter Swierstra ha scritto:

What is the suggested (if any) convention for inserting an interactive
session in the documentation?


You may want to look at:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/DocTest



Ah, ok, thanks.
So convention is to just use `-- `.


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-30 Thread Manlio Perillo

Claus Reinke ha scritto:
But Claus was right, appendU is lazy; this seems to be the cause of 
the problem.


appendU is strict, insertWith just doesn't force it (follow the source link
in the haddocks to see why).



Ok, I see.
But, IMHO, this should be clearly documented.

I have updated my test code:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103

The interesting thing is that using appendU is *faster*.

Using appendU:
real0m38.736s
user0m38.638s
sys 0m0.048s

Using snocU:
real0m41.279s
user0m40.939s
sys 0m0.040s


Memory usage is the same.

However now I don't really understand why the two implementations 
differs in lazyness.


Or, to ask a different question, how can I make the version using 
insertWith strict?


deja vu:-(
http://www.haskell.org/pipermail/haskell-cafe/2009-March/057032.html



Your are right, sorry.
The problem is that at that time I was not able to fully understand the 
code!


However, reading the code now, I prefer my version using alter.


By the way, about insertWith/alter; from IntMap documentation:

insertWithKey: O(min(n,W)
alter: O(log n)

So, alter is more efficient than insertWithKey?
And what is that `W` ?


As you've noticed, alter also allows to enforce strictness.

But piling up appendUs is still not a good idea. For a moment,
I thought that the stream representation's (+++) was handling
runtime fusion gracefully, but a simple test case suggests otherwise
at least for the simpler case of consU (the attached appendU.hs
doesn't do any appendUs, as the consU case already demonstrates
the issue; be careful with large numbers here, it'll quickly eat your ram):



I'm not sure to fully understand the code.
But, again, IMHO it does not apply to my original problem.

 [...]


Thanks   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-30 Thread Manlio Perillo

Don Stewart ha scritto:

Can I close this ticket as not being to do with uvector?



Yes, thanks; and sorry for the noise.

But it may be interesting to study this the example I have pasted:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103

I find it a bit surprising that using appendU is actually faster than 
using snocU.




The interesting thing is that using appendU is *faster*.

Using appendU:
real0m38.736s
user0m38.638s
sys 0m0.048s

Using snocU:
real0m41.279s
user0m40.939s
sys 0m0.040s




Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Haddock: inserting interactive sessions in the documentation

2009-03-30 Thread Manlio Perillo

Hi.


What is the suggested (if any) convention for inserting an interactive
session in the documentation?

Right know I'm doing (in random-shuffle package):

-- |Convert a sequence @(e1...en)@ to a complete binary tree.
--
-- @
--   System.Random.Shuffle buildTree ['a','b','c','d','e']
--   Node 5 (Node 4 (Node 2 (Leaf 'a') (Leaf 'b'))
--  (Node 2 (Leaf 'c') (Leaf 'd')))
--  (Leaf 'e')
-- @
buildTree :: [a] - Tree a


As an example, in Python we use

 3 + 2
5



One nice feature of this special syntax is that it can be parsed, using 
doctest.



Thanks  Manlio

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] uvector package appendU: memory leak?

2009-03-29 Thread Manlio Perillo

Hi.

As with a previous post, I think I have found a possible memory problem 
with the uvector package.


I have this data structure (for, again, my Netflix Prize project):

IntMap (UArr (Word16 :*: Word8))

I was adding elements to the map using something like:

v = map singletonU (a :*: b)

insertWith appendU k v m

However doing this eats a *lot* of memory.

Today I have rewritten my program to use `alter` and `snocU`:

append Nothing = Just $ singletonU (a :*: b)
append (Just u) = Just $ snocU u (a :*: b)

alter append k m


This, finally, works.

Unfortunately I'm still not able to load the entire Netflix Prize 
training data set, grouping ratings by customers, because my PC has only 
2 GB of RAM.
The required memory is about 2 GB, but when the system start to swap, I 
have to kill the program.



So the question is: why appending an array of only one element to an 
existing array causes memory problems?


This should be pratically the same as adding an element.



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-29 Thread Manlio Perillo

Don Stewart ha scritto:

[...]
So the question is: why appending an array of only one element to an  
existing array causes memory problems?



It must copy the entire array.



Isn't it the same with snocU?

And, since the final result is the same, what happens to the temporary 
memory used for array copying?


I have executed the program with:
   +RTS -A128M -s -c -F1.1 -RTS

The memory seems to leak.


-- Don




Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-29 Thread Manlio Perillo

Manlio Perillo ha scritto:

Hi.

As with a previous post, I think I have found a possible memory problem 
with the uvector package.


I have this data structure (for, again, my Netflix Prize project):

IntMap (UArr (Word16 :*: Word8))

[...]

Today I have rewritten my program to use `alter` and `snocU`:

append Nothing = Just $ singletonU (a :*: b)
append (Just u) = Just $ snocU u (a :*: b)

alter append k m


 [...]

Unfortunately I'm still not able to load the entire Netflix Prize 
training data set, grouping ratings by customers, because my PC has only 
2 GB of RAM.
The required memory is about 2 GB, but when the system start to swap, I 
have to kill the program.




Just another small strictness hint:

append (Just u) = u `seq` Just $ snocU u (a :*: b)

and now memory usage is 955 MB.


Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-29 Thread Manlio Perillo

Claus Reinke ha scritto:

IntMap (UArr (Word16 :*: Word8))

I was adding elements to the map using something like:

v = map singletonU (a :*: b)

insertWith appendU k v m

However doing this eats a *lot* of memory.


Since 'insertWith' doesn't actually do the 'appendU', the appends
will also be compressed in time, at point of use, rather than spread
out in time, over construction. Which might be inconvenient if large
volumes of data are involved.



Ah, ok; that may explain the problem, but I'm still not sure.


With this program:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063

Memory usage seems ok.
955 MB, against 622 MB when ratings are grouped by movies
(  using [(MovieID, UArr (Word32 :*: Word8))]  )


The version using `insertWith appendU` is here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063#a3065


I still can not explain so much difference in memory usage.

So the question is: why appending an array of only one element to an  
existing array causes memory problems?


It must copy the entire array.


And doing this repeatedly, with arrays of increasing length, would
not be a good idea. 


I can't really see other solutions.


For single-threaded use, one might use fusion
to avoid the construction/recopying of the intermediate arrays.


Fusion is not possible, in my case, IMHO.
I'm parsing movie ratings, where we have customers ratings for each 
movie in separate files (17770 total movies).


Each file has format
movie1:
customer1,rating1,date1
customer2,rating2,date2
...
movie2:
customer3,rating3,date3
...


I want to group ratings by customers, instead of grouping them by movies.
Stream fusion is only possible in the latter case (but in this case I 
don't even need an IntMap).



I'm missing something?

 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] uvector package appendU: memory leak?

2009-03-29 Thread Manlio Perillo

Don Stewart ha scritto:

manlio_perillo:

Don Stewart ha scritto:

[...]
So the question is: why appending an array of only one element to an  
existing array causes memory problems?


It must copy the entire array.


Isn't it the same with snocU?

And, since the final result is the same, what happens to the temporary  
memory used for array copying?


I have executed the program with:
   +RTS -A128M -s -c -F1.1 -RTS

The memory seems to leak.


Send me a test case.



http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3071

But Claus was right, appendU is lazy; this seems to be the cause of the 
problem.


However now I don't really understand why the two implementations 
differs in lazyness.


Or, to ask a different question, how can I make the version using 
insertWith strict?




Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] g++ std:map vs GHC IntMap

2009-03-26 Thread Manlio Perillo

Hi.

I have tried to compare performance of the g++ std::map versus the GHC 
IntMap.


The test consists in adding 1000 elements to an empty map.

Haskell code is here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2899

C++ code is here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2900


The execution time and CPU usage is almost the same.
However the C++ version requires 305 MB, the GHC version 617 MB.

gcc 4.3.2
ghc 6.8.2
on Debian Linux Lenny, i386.


Isn't really possible to optimize memory usage in cases like this?


I also tried with Data.HashTable:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2902

but memory usage is 703 MB, and execution time is about 4.5 times slower!


Thanks  Manlio Perillo


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-26 Thread Manlio Perillo

Claus Reinke ha scritto:

Continuing our adventures into stylistic and semantic differences:-)



Can you write this analysis on the wiki?

Thanks!

Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] g++ std:map vs GHC IntMap

2009-03-26 Thread Manlio Perillo

Brandon S. Allbery KF8NH ha scritto:

On 2009 Mar 26, at 11:39, Manlio Perillo wrote:

The execution time and CPU usage is almost the same.
However the C++ version requires 305 MB, the GHC version 617 MB.


I wonder how much of that is due to lifting (i.e. laziness).



http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2911

Performances are the same; but I'm not sure if this removes all laziness.

I have changed
ins t (k,x)  = insert k x t
to
ins t (k,x)  = k `seq` x `seq` insert k x t


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] g++ std:map vs GHC IntMap

2009-03-26 Thread Manlio Perillo

Bulat Ziganshin ha scritto:

Hello Manlio,

Thursday, March 26, 2009, 6:39:12 PM, you wrote:


The test consists in adding 1000 elements to an empty map.


+RTS -c -F1.1

then read about garbage collection




It now requires 386 MB of memory, but is 4.7 times slower.

So, now memory required is about the same as the C++ version, but how 
can I optimize memory usage without having to tweak the garbage collector?



Thanks  Manlio

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: generalized shuffle

2009-03-25 Thread Manlio Perillo

o...@okmij.org ha scritto:

Hello!


The reason is that in a project I need to extract random elements
from a list (and I need to extract a lot of them), and using normal
methods [1] seems to be rather inefficient.


Note that I have used an incorrect term, sorry.
What I need, in detail, is to:
Given a population of 100480507 elements, extract 17770 random samples, 
where each sample has a size between 1 and 480189.


Yes, it is again my Netflix Prize project; here I'm trying to write a 
program to generate the entire training data set using random data.



[1] by normal method I mean: extract a random number, in the range
 [0, n] (where n is the length of the sequence), and get the element
 indexed by that number.


BTW, have you tried to use Data.IntMap? That is, put the sequence into
an IntMap (each element being indexed by its index) and then extract
elements at random; IntMap seems to be quite efficient...



This should be ok if I need independent random choices from a 
population, but I need random samples, instead, sorry for the confusion.


Do you think that it is possible to further optimize your tree 
structure? Or to use IntMap, instead?



Cheers,
Oleg


Thanks   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: about Haskell code written to be too smart

2009-03-25 Thread Manlio Perillo

Heinrich Apfelmus ha scritto:

Manlio Perillo wrote:

Conal Elliott ha scritto:

Manlio,

We live in the age of participation -- of co-education.  Don't worry
about text-books.  Contribute to some wiki pages  blogs today that
share these smart techniques with others.


When I started learning Haskell (by my initiative), what I did was:

[steps 1) - 9), mostly internet tutorials ]


I think you'd have had a much easier time by starting with a proper book
right away, like Richard Bird's Introduction to Functional Programming
in Haskell, accompanied by Real World Haskell. 


Unfortunately, one year ago Real World Haskell was not here.
And note that I have no problems with basic functional programming concepts.
My problems are specific to Haskell.


You see, the reason that
books cost money is (should be) high quality content. :)




Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-25 Thread Manlio Perillo

wren ng thornton ha scritto:

Manlio Perillo wrote:
[...]
Following directly from the Rule of Least Power, if you can get away 
with foreach then that's what you should use. Why? Because the less 
power the construct has, the fewer corner cases and generalizations a 
reader of the code needs to consider. Now, just because iterators exist 
does not mean that one should never use the more general tool. If you're 
fighting to break out of your chosen straitjacket, then chances are it's 
the wrong one to use in the first place; it'd be clearer to use more 
power and have less fighting.




 [...]


Note that, as I have already written, I agree with you.
And this is one of the reasons i like Haskell.

The main problem, here, is that:
- recursion and pattern matching are explained in every tutorial about
  functional programming and Haskell.

  This is the reason why I find them more natural.

- high level, Haskell specific, abstractions, are *not* explained in
  normal tutorials or books.
  The libraries where these concepts are implemented, are not well
  documented.
  Most of the documentation is in research papers, and a normal
  programmer don't want to read these papers.

  Only in the recent Real World Haskell, all these high level
  abstraction have been properly documented


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] a cabal package with both a library and executable programs

2009-03-25 Thread Manlio Perillo

Hi.

This example is taken from the Cabal documentation:

Name:TestPackage
...

Library
  Build-Depends:   HUnit
  Exposed-Modules: A, B, C

Executable program1
  Main-Is: Main.hs
  Hs-Source-Dirs:  prog1
  Other-Modules:   A, B

Executable program2
  Main-Is: Main.hs
  Hs-Source-Dirs:  prog2
  Other-Modules:   A, C, Utils


Here I see a small problem.
Why should I explicitly declare modules from the library that are used 
in the executables?


I would like to do, instead

...

Executable program1
  Main-Is: Main.hs
  Hs-Source-Dirs:  prog1
  build-depends:   TestPackage

Executable program2
  Main-Is: Main.hs
  Hs-Source-Dirs:  prog2
  build-depends:   TestPackag
  Other-Modules:   Utils


I tried this, but configuration fails, since Cabal does not find the 
TestPackage.

It is true that it does not yet exists, but we are going to build it!


Should I fill a feature request ticket, or this is how it is supposed to 
work?




Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Equations for `foo' have different numbers of arguments

2009-03-24 Thread Manlio Perillo

Hi.

There is a limitation, in Haskell, that I'm not sure to understand.
Here is an example:

module Main where

divide :: Float - Float - Float
divide _ 0 = error division by 0
divide = (/)

main = do
  print $ divide 1.0 0.0
  print $ divide 4.0 2.0



With GHC I get:
Equations for `divide' have different numbers of arguments

With HUGS:
Equations give different arities for divide


However the two equations really have the same number of arguments.

What's the problem?



Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Hi.

In these days I'm discussing with some friends, that mainly use Python 
as programming language, but know well other languages like Scheme, 
Prolog, C, and so.


These friends are very interested in Haskell, but it seems that the main 
reason why they don't start to seriously learning it, is that when they 
start reading some code, they feel the Perl syndrome.


That is, code written to be too smart, and that end up being totally 
illegible by Haskell novice.


I too have this feeling, from time to time.


Since someone is starting to write the Haskell coding style, I really 
suggest him to take this problem into strong consideration.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Equations for `foo' have different numbers of arguments

2009-03-24 Thread Manlio Perillo

Bulat Ziganshin ha scritto:

Hello Manlio,

Tuesday, March 24, 2009, 5:44:14 PM, you wrote:


divide _ 0 = error division by 0
divide = (/)



Equations for `divide' have different numbers of arguments


you should write

divide a b = a/b



Right.
But my question was: why can't I write the function as I did?
Why can't I write an equation in the curried form?


Thanks (and thanks for the other responses, I will reply only here)
Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Tim Newsham ha scritto:
These friends are very interested in Haskell, but it seems that the 
main reason why they don't start to seriously learning it, is that 
when they start reading some code, they feel the Perl syndrome.


That is, code written to be too smart, and that end up being totally 
illegible by Haskell novice.


I too have this feeling, from time to time.

Since someone is starting to write the Haskell coding style, I really 
suggest him to take this problem into strong consideration.


When you think about it, what you are saying is that Haskell programmers 
shouldn't take advantage of the extra tools that Haskell provides. 


No, I'm not saying this.

But, as an example, when you read a function like:

buildPartitions xs ns = zipWith take ns . init $ scanl (flip drop) xs ns

that can be rewritten (argument reversed) as:

takeList :: [Int] - [a] - [[a]]
takeList [] _ =  []
takeList _ [] =  []
takeList (n : ns) xs  =  head : takeList ns tail
where (head, tail) = splitAt n xs

I think that there is a problem.

The buildPartition contains too many blocks.
And I have read code with even more blocks in one line.

It may not be a problem for a seasoned Haskell programmer, but when 
you write some code, you should never forget that your code will be read 
by programmers that can not be at your same level.


I think that many Haskell programmers forget this detail, and IMHO this 
is wrong.


Haskell provides the ability to abstract code beyond what many other 
programming systems allow.  This abstraction gives you the ability to 
express things much more tersely.  This makes the code a lot harder to 
read for people who are not familiar with the abstractions being used.  


The problem is that I have still problems at reading and understanding 
code that is too much terse...
Because I have to assemble in my mind each block, and if there are too 
many blocks I have problems.


 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Jake McArthur ha scritto:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Manlio Perillo wrote:
| These friends are very interested in Haskell, but it seems that the main
| reason why they don't start to seriously learning it, is that when they
| start reading some code, they feel the Perl syndrome.
|
| That is, code written to be too smart, and that end up being totally
| illegible by Haskell novice.
|
| I too have this feeling, from time to time.

I used to think this as well, but have since changed my mind about most
cases. 


The same for me.

 [...]

All these factors combined just means that you have to concentrate
just as hard to understand one line of Haskell as you might 10 or 20
lines of other languages. There is 10 or 20 times the amount of 
information.




This is right.
The problem is that often (IMHO) a function definition can be rewritten 
so that it is much more readable.


As an example, with the takeList function I posted.

In other cases, you can just break long lines, introducing intermediate 
functions that have a descriptive name *and* a type definition.


Doing this is an art, but a coding style for Haskell should try to 
document this.


 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Jake McArthur ha scritto:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Manlio Perillo wrote:
| This is right.
| The problem is that often (IMHO) a function definition can be rewritten
| so that it is much more readable.
|
| As an example, with the takeList function I posted.

I looked at it, found nothing wrong with the original, and absolutely
hated your fixed version. 


With the original version, you have to follow 3 separate operations:

Prelude let xs = [1, 2, 3, 4] :: [Int]
Prelude let ns = [3, 1] :: [Int]
Prelude let _1 = scanl (flip drop) xs ns
Prelude let _2 = init _1
Prelude let _3 = zipWith take ns _2


With my function, instead, you only have to follow 1 operation:

Prelude (head, tail) = splitAt n xs

 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Yitzchak Gale ha scritto:

[...]
So the bottom line is that Manlio is right, really. It's just
that Haskell is still very different than what most
programmers are used to. So it does take a while to
get a feeling for what is too smart.



Right, you centered the problem!

The problem is where to place the separation line between normal and 
too smart.


Your function is readable, once I mentally separate each step.
For someone with more experience, this operation may be automatic, and 
the function may appear totally natural.


When writing these dense function, it is important, IMHO, to help the 
reader using comments, or by introducing intermediate functions.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Jake McArthur ha scritto:

[...]
| With my function, instead, you only have to follow 1 operation:
|
| Prelude (head, tail) = splitAt n xs

I think you are way oversimplifying your own code.

~takeList :: [Int] - [a] - [[a]]
~takeList [] _ =  []
~takeList _ [] =  []
~takeList (n : ns) xs  =  head : takeList ns tail
~where (head, tail) = splitAt n xs

In order to understand this, I have to look at three different cases, an
uncons, a splitAt, a cons, *and* a recursive call. This is *seven*
different things I have to absorb.


These cases are, IMHO, more natural.

We have a set of equations, pattern matching and recursion.
These are one of the basic building block of Haskell.

The only foreign building block is the splitAt function.

But this may be really a question of personal taste or experience.
What is more natural?

1) pattern matching
2) recursion
or
1) function composition
2) high level functions

?

 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Conal Elliott ha scritto:
Another helpful strategy for the reader is to get smarter, i.e. to 
invest effort in rising to the level of the writer.   Or just choose a 
different book if s/he prefers.  - Conal




This strategy is doomed to failure, unfortunately.
We live in the real world, compromises are necessary.

 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Jonathan Cast ha scritto:

[...]

I think, in general, the best way to document the purpose of the
function is

-- | Split a function into a sequence of partitions of specified
lenth
takeList :: [Int] - [a] - [[a]]



Note that I was not speaking about the best way to document a function.

I was speaking about the best way to write a function, so that it may 
help someone who is learning Haskell.


 [...]

Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Zachary Turner ha scritto:

[...]

 but I do understand that one of the primary uses
cases and/or motivating factors for using Haskell is when you really 
just NEED that extra abstraction and power you get from being able to do 
these types of things.  Someone once said that simple problems should 
be simple and difficult problems should be possible.  That doesn't mean 
the difficult problems become EASY.  One of the best uses for haskell is 
solving difficult problems.  It's obviously still going to be difficult 
to solve, and as such the writer (and hence by extension the reader) is 
going to have to be smart as well. 



I agree with you, and in fact I'm still learning Haskell.
The reason I'm still learning Haskell is because I like its syntax.
And yes, I also like the ability to write efficient function by 
composing other function.


But there is a limit.
In C you have the ability to write assembler code, but one usually think 
twice before doing so, since it will become unreadable to most of the 
people.


If you think that writing low level assembler code is the best solution, 
you should at least document it well, instead of assuming that the 
reader is as smart as you.



As I have written at the begin of the thread, there are people I know 
(*much* more smarter then me), that keep themselves away from Haskell 
because they start to read some code, and they feel something is wrong.


They *think* ah, the author wrote code in this way just to show how 
smart he is; how can I learn a language if most of the available code is 
written in this way?


Note the use of the verb think.
This is only a sensation, and it is wrong; but sensations are important.


 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Erik de Castro Lopo ha scritto:

Manlio Perillo wrote:

I was speaking about the best way to write a function, so that it may 
help someone who is learning Haskell.


I've been learning Haskell for about 3 months.

I think its a mistake to write code so that its easy for someone
learning Haskell to read it. Code should be written to be easily
read by other experienced users of the langauge.



Note that to write code so that its easy to read, does not mean rewrite 
the code as I did in the example.


It also means to add good comments, in the right places.


Erik


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Conal Elliott ha scritto:
I'd love to help newbies get the hang of Haskell without having to jump 
in the deep (and smart-infested) end first.  And I'd love for people to 
keep writing smart code for non-newbies to enjoy.


Perhaps a practical suggestion would be some wiki pages devoted to 
pointing out code with various learning qualities, to help haskellers of 
all levels of experience learn effectively.




Yes, this is a good start.

Advices to people learning Haskell about how to learn reading code.
And advices to experienced Haskell programmers about how to document 
their code so that it may help less experienced programmers.


IMHO, this should also go in the future Haskell coding style.

 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Dan Piponi ha scritto:

Miguel Mitrofanov wrote:

takeList = evalState . mapM (State . splitAt)



However, ironically, I stopped using them for pretty
much the same reason that Manlio is saying.


Are you saying there's a problem with this implementation? It's the
only one I could just read immediately. 


Yes, you understand it immediately once you know what a state monad is.
But how well is introduced, explained and emphasized the state monad in 
current textbooks?


When I started learning Haskell, the first thing I learned was recursion 
and pattern matching.


So, this may be the reason why I find more readable my takeList solution.


 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Conal Elliott ha scritto:

Manlio,

We live in the age of participation -- of co-education.  Don't worry 
about text-books.  Contribute to some wiki pages  blogs today that 
share these smart techniques with others.




When I started learning Haskell (by my initiative), what I did was:

1) Quick reading of the first tutorial I found on the wiki.
   http://darcs.haskell.org/yaht/yaht.pdf, if i remember correctly

2) Quick reading the Haskell Report

3) Reading another tutorial:
   http://www.haskell.org/tutorial/

4) Reading again the Haskell Report

5) A lot of time spent finding good tutorials.
   Yet, I did not knew what monads were, I just
   felt that monads were some strange and advanced feature

... A period where I stop looking for Haskell

6) Found some good tutorial about what monads are, but yet I did not
   knew anything about state monads, monad transformers, and so.

... Another period were I stop looking for Haskell

7) The Real Word Haskell book.
   Finally in one book all advanced concepts.

   I read the book online.
   I found the book good, but i think it is too dispersive in some
   chapters.
   I already forgot some of the concepts I read, mostly because in some
   chapter I get annoyed, and started skipping things, or reading it
   quickly.

   I will buying a copy in May, at Pycon Italy
   (were there will be a stand by O'Really), so that I can read it
   again.

8) New impetus at learning Haskell.
   I read again the Haskell Report, and the
   A Gentle Introduction to Haskell.

   I finally started to understand how things works

7) Start to write some real code.

   I now I'm able to understand much of the code I read.
   But for some kind of code I still have problems.


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] about Haskell code written to be too smart

2009-03-24 Thread Manlio Perillo

Conal Elliott ha scritto:

And advices to experienced Haskell programmers about how to document
their code so that it may help less experienced programmers.


Manlio -- You may be missing the point of my suggestion, 


Ah, sorry.

which is to 
help people *find* code that suits them, rather than changing anyone's 
coding style.  Optimizing code for one segment of readers is pessimizing 
it for another.  Instead of dumbing down the smart code, I'd like to 
help your friends to help each other find dumber code, *and* to help 
others of us find smarter code.





This may be hard to do.

However I already suggested to start reading the Prelude code, from the 
Haskell Report.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] System.Random.Shuffle fix

2009-03-23 Thread Manlio Perillo

friggin friggin ha scritto:
I was looking for a shuffling algorithm to shuffle mp3-playlists so was 
very happy to see System.Random.Shuffle:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/random-shuffle-0.0.2

However I get  errors,non-exhaustive patterns in function shufleTree or 
extractTree depending how I call it. Errors are at the bottom.




During building you should only get warnings.
Non exhaustive patterns are ok, you hit them only if the input data is 
incorret.


Probably in these cases, error should be used.

I fixed it but I don't have the math skills to see if I perhaps broke it 
statistically ...


Here is my fix, someone (don't remember who, helped me a little):
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2789#a2789
the shuffle at the end is with the fix.


*Freet S.shuffle [1..10] [1..3]


Your input is not correct.
If you read the source code (in a future version I'll add Haddock support):

-- Given a sequence (e1,...en) to shuffle, and a sequence
-- (r1,...r[n-1]) of numbers such that r[i] is an independent sample
-- from a uniform random distribution [0..n-i], compute the
-- corresponding permutation of the input sequence.

I have added a convenience function `shuffle'`, where you just need to 
supply a random number generator.


Note that the shuffle' function contains a bug;
it should return the new random generator:
shuffle' :: RandomGen gen = [a] - Int - gen - ([a], gen)

I'm going to fix it in next version.

 [...]



Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] generalized shuffle

2009-03-23 Thread Manlio Perillo

Hi.

I have implemented a generalized shuffle function, for the 
random-shuffle package

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/random-shuffle

I have not yet commited the change, before that I would like to receive 
some feedbacks (especially by the original author of the shuffle 
function, in Cc)


The new function is defined in this example:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2808

Note that it make use of functions not exported by the 
System.Random.Shuffle module.


I have generalized the original shuffle function in two ways:

1) It is now possible to obtain a random sample from a sequence,
   by doing, as an example:
   shuffles [1..10] [2, 3, 4, 2, 1]

   Note that this is equivalent at taking the first 5 elements of the
   full shuffled sequence, and the performance are pratically the same.

2) It is now possible to pass an infinite r-sequence to the shuffle
   function, as an example:
   shuffles [1..10] $ cycle [9, 8 .. 1]

   The result is an infinite list, with elements cyclically extracted
   from the r-sequence.

   When the r-sequence contains random numbers, we get an infinite
   sequence containing all possible random choices of elements in the
   original sequence.

   Why I'm doing this?
   The reason is that in a project I need to extract random elements
   from a list (and I need to extract a lot of them), and using normal
   methods [1] seems to be rather inefficient.

   Using the `shuffles` function should be much more efficient, since it
   uses a tree, and this tree is built only once.

[1] by normal method I mean: extract a random number, in the range
[0, n] (where n is the length of the sequence), and get the element
indexed by that number.

Indexing for a list if very expensive.
And it is not very fast, even using arrays (unless you use
non portable unsafeRead).




Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Haskell syntax highlighting support for Trac

2009-03-22 Thread Manlio Perillo

Hi.

I have noted that with Haskell projects that make use of Trac for issue 
tracking, it is not possible to have syntax highlighting for Haskell code.


http://hackage.haskell.org/trac/hackage/wiki/WikiProcessors


Is someone working on this?
I can try to write a Trac plugin or patch to solve this.


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Haskell syntax highlighting support for Trac

2009-03-22 Thread Manlio Perillo

Patai Gergely ha scritto:
I'm not sure what your problem is. 


Yes, haskell support is available in recent versions of Trac.

I was trying with the Trac instance for Hackage/Cabal.

 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Cabal properties with conditionals

2009-03-21 Thread Manlio Perillo

Hi.

Assuming this configuration fragment:

library xxx
cc-options: -Wall

if flag(HAVE_URANDOM)
cc-options:-DHAVE_URANDOM

In case the HAVE_URANDOM flag is defined, what will be the value of the 
used cc-options?

1) -DHAHE_URANDOM
2) -Wall -DHAHE_URANDOM


Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [ANN] random-stream 0.1.1

2009-03-21 Thread Manlio Perillo

I'm pleased to announce a new version of random-stream package.

In this version I have rewritten the internal API.
Now the package exposes a new System.Random.URandom module, with the 
function:


urandom :: Int - IO S.ByteString

This function is an interface to the system pseudo random numbers generator.

The API of the module System.Random.Stream is left unchanged, but it now 
uses the urandom function, internally.


System.Random.Stream provides a pure interface over urandom, using a 
lazy ByteString of random bytes.

Since it provides an infinite stream of random data, things should be ok.


I have also added support to Windows (older versions may not be 
supported, however [1]).



The package should be available on Hackage.
Mercurial repository is at:
http://hg.mperillo.ath.cx/haskell/random-stream/


Example usage: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2738


[1] It is possible to use OpenSSL, as a fallback.
Just set the HAVE_SSL flag, during package configuration.
The OpenSSL DLL should be placed where the linker can find it.
I tried to test this, but without success
(but I did not put much effort in it).



Regards  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [ANN] ansi-terminal, ansi-wl-pprint - ANSI terminal support for Haskell

2009-03-21 Thread Manlio Perillo

Achim Schneider ha scritto:

Manlio Perillo manlio_peri...@libero.it wrote:


Do you plan to extend the package to any terminal, using terminfo
database?


Are there any non-ansi terminals left? I assumed they were extinct...
it'd come close to using EBCDIC.



http://groups.google.com/group/comp.unix.shell/browse_thread/thread/a1a088da1032f336/b3862f039688f0f8


Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] ansi-terminal, ansi-wl-pprint - ANSI terminal support for Haskell

2009-03-21 Thread Manlio Perillo

Max Bolingbroke ha scritto:

2009/3/21 Manlio Perillo manlio_peri...@libero.it:

Max Bolingbroke ha scritto:

These two packages allow Haskell programs to produce much richer
console output by allowing colorisation, emboldening and so on.


Do you plan to extend the package to any terminal, using terminfo database?


Manlio,

I don't plan on an extension like this myself, as assuming ANSI
support is enough for all the applications I'm interested in, and I
suspect dealing with terminfo stuff will be a headache. Sorry!



I think, instead, that it should not be that hard.
After all the C API can be wrapper in pure functions.
And you can also write a pure Haskell interface to terminfo database.



Cheers,
Max




Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Cabal properties with conditionals

2009-03-21 Thread Manlio Perillo

Duncan Coutts ha scritto:

On Sat, 2009-03-21 at 14:26 +0100, Manlio Perillo wrote:

Hi.

Assuming this configuration fragment:

library xxx
 cc-options: -Wall

 if flag(HAVE_URANDOM)
 cc-options:-DHAVE_URANDOM

In case the HAVE_URANDOM flag is defined, what will be the value of the 
used cc-options?

1) -DHAHE_URANDOM
2) -Wall -DHAHE_URANDOM


The latter. Try it.

In general all fields get `mappend`ed which for list-like fields means
appending. For single value fields like True/False then latter fields
win.



Ok, thanks.


P.S: I tried to send an email to cabal-devel some days ago, with a 
feature I would like to see in Cabal.

But the mail was never posted to the mailing list.
Is that list moderated?
Should I simply fill a ticket?



Duncan




Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Cabal properties with conditionals

2009-03-21 Thread Manlio Perillo

Duncan Coutts ha scritto:

On Sat, 2009-03-21 at 23:05 +0100, Manlio Perillo wrote:

P.S: I tried to send an email to cabal-devel some days ago, with a 
feature I would like to see in Cabal.

But the mail was never posted to the mailing list.
Is that list moderated?


It's subscriber only, like all the haskell.org mailing lists. We got too
much spam otherwise. :-(



But I did subscribed!


Should I simply fill a ticket?


Yes, thanks very much.



Ok.

I posted some details in this thread:
http://www.haskell.org/pipermail/haskell-cafe/2009-March/057923.html


Duncan




Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] random shuffle and random list partition

2009-03-20 Thread Manlio Perillo

Yitzchak Gale ha scritto:

Hi Manlio,

Manlio Perillo wrote:

For my Netflix Prize project I have implemented two reusable modules.
The first module implements a random shuffle on immutable lists...
The second module implements a function used to partition a list into n
sublists of random length.


[...]



As you point out, your partition algorithm is not fair.
Using your Random.Shuffle and a well-know trick
from combinatorics, you can easily get a fair
partitions function:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2495



Someone added an alternative implementation:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2497


Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Announce: language-python

2009-03-19 Thread Manlio Perillo

Bernie Pope ha scritto:
Language-python provides a parser (and lexer) for Python written in 
Haskell. Currently it only supports version 3 of Python (the most recent 
version), but it will support version 2 in the future.




Great, thanks!
I was planning to write a Python parser by myself, since I would like to 
do some AST processing, and Haskell is IMHO a better choice.


 [...]


Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [ANN] random-stream package

2009-03-19 Thread Manlio Perillo

Hi.

I'm pleased to announce the availability of my random-stream package.
The cabalized package is available at:
http://haskell.mperillo.ath.cx/random-stream-0.0.1.tar.gz

Note that I have not uploaded it on Hackage, and I do not plan to upload 
it in the near future, at least until I will repute the package mature 
enough.


From package description:
Portable interface for the operating system source of pseudo
random data.

Supported sources are Unix /dev/urandom, Win32 CryptGenRandom and
OpenSSL pseudo random numbers generator.

This package is based on idea from os.urandom implementation, in
CPython.


The idea is to view the system pseudo random generator as a stream of 
infinite random bytes.


So, I have used a lazy bytestring, and
unsafePerformIO/unsafeInterleaveIO.

The underlying data is:
newtype Stream = Stream L.ByteString

and the constructor *is exposed*.


The package configuration *must* be done using a flag to specify the 
source to use.


As an example:

runghc Setup.hs configure -O2 -fHAVE_URANDOM
runghc Setup.hs configure -O2 -fHAVE_SSL
runghc Setup.hs configure -O2 -fHAVE_WIN32_CRYPT


If the flag is not specified, the compilation will fail.
Note that Windows it not yet supported.

The stream generator implements tha RandomGen interface.



Some notes:
1) The system *pseudo* random generator is used.
   As an example, on Linux, /dev/random produces true random numbers,
   but it may block if there is not enough entropy in the system.
   More details on
   http://en.wikipedia.org/wiki//dev/random
2) When reading data, the lazy bytestring chunk size is used.
   The default chunk size, however, is fine when reading from a regular
   file, but may not be the best choice when reading pseudo random
   numbers.

   For SSL (and Windows) support, the chunk size can be easily changed;
   but for Unix support this seems to not be possible, unless I
   reimplement hGetContents.

   The Data.ByteString.Lazy module have an hGetContentsN function.
   Unfortunately it is not exposed.
   I find this annoying; is it possible to export the *N functions
   from Data.ByteString.Lazy.Internal?
3) I have tested the package on Debian Linux Etch, using GHC 6.8.2.


Feedback will be appreciate.
I not sure about some details.


Usage example:

module Main where

import System.Random
import System.Random.Stream


gen = mkStream
l :: [Int]
l = randoms gen :: [Int]

main = do
  print l




Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] doubts about runGetState in the binary package

2009-03-19 Thread Manlio Perillo

Hi.

I have some doubts about the runGetState function in the binary package.
The signature is:
runGetState :: Get a - LBS - Int64 - (a, LBS, Int64)


however the Int64 input parameter is not documented.
What value should I pass?
How will be used?


Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: doubts about runGetState in the binary package

2009-03-19 Thread Manlio Perillo

ChrisK ha scritto:

Manlio Perillo wrote:

Hi.

I have some doubts about the runGetState function in the binary package.
The signature is:
runGetState :: Get a - LBS - Int64 - (a, LBS, Int64)


however the Int64 input parameter is not documented.
What value should I pass?
How will be used?


 [...]


hackage has the code at
http://hackage.haskell.org/packages/archive/binary/0.5.0.1/doc/html/src/Data-Binary-Get.html#runGetState 



Yes, and I have read the code, as the first thing.
And (after some testing) I figured out how it works.

However I wanted to be sure I understand it, since, as I have written, 
IMHO it is not clearly documented; and I can't see how it can be useful, 
there are no usage examples.


 [...]



Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] random-stream package

2009-03-19 Thread Manlio Perillo

Brent Yorgey ha scritto:

On Thu, Mar 19, 2009 at 11:55:16AM +0100, Manlio Perillo wrote:
Note that I have not uploaded it on Hackage, and I do not plan to upload it 
in the near future, at least until I will repute the package mature enough.


I would encourage you to upload it to Hackage regardless of its
supposed maturity, especially if you want feedback.  I'm not sure why
you would be hesitant to upload your package; 


Because I'm very strict with my code.
I don't even release my Python packages on http://pypi.python.org/pypi, 
and I'm much more expert in Python ;-).




Hackage hosts an
extremely wide range of packages in terms of maturity, and that's a
good thing.  It also provides a very convenient method (in combination
with cabal-install) of disseminating libraries.  I am several orders
of magnitude more likely to try something out if I can just 'cabal
install foo' than if I have to manually download and unpack a tarball
myself, and I doubt I am alone in this.



It should be possible to do:
cabal install http://haskell.mperillo.ath.cx/random-stream-0.0.1.tar.gz

It's unfortunate that this is not supported.


Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] random-stream package

2009-03-19 Thread Manlio Perillo

Gökhan San ha scritto:

Manlio Perillo manlio_peri...@libero.it writes:


The stream generator implements tha RandomGen interface.


This is really cool, though I think 'split' is a must. Maybe all
instances could share the same stream in the background, then it
wouldn't cause resource issues.



I have thought about this, but is it safe?
I feared that this would break referential transparency.


Also, IMHO mkStream should produce an IO Stream (like getStdGen), as
current implementation isn't referentially transparent; let the library
user decide whether to use unsafePerformIO.



The basic idea is that there is this system wide random number 
generator, that is always available.

That's the reason why mkIOStream is hidden.



3) I have tested the package on Debian Linux Etch, using GHC 6.8.2.


Tested with Gentoo Linux on a 64-bit machine using GHC 6.10.1. On my
system, it is less than 2 times slower than StdGen.




Thanks for the report.
I have tested against the pure mersenne twister, on my 32 bit system, 
and it seems to have the same performances.

I should check against StdGen, too.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] random-stream package

2009-03-19 Thread Manlio Perillo

Andrew Wagner ha scritto:
Maybe the feature has been added, but not released, because somebody is 
strict about their code...
 


We should first agree on the meaning of release some code...


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANN] random-shuffle package

2009-03-19 Thread Manlio Perillo

Max Rabkin ha scritto:

On Thu, Mar 19, 2009 at 4:41 PM, Manlio Perillo
manlio_peri...@libero.it wrote:

However, in this case, the package name should be changed.
I'm not sure it is a good idea to release a package that implements only one
function (but I may be wrong).


Personally, I think that there is little harm in releasing a package
if it does something useful in a not-totally-broken way. Especially if
you plan to extend it.



Ok, I will add the package on Hackage, thanks.


One last thing.

Yitzchak Gale kindly posted an example with the implementation of my 
`partition` function:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485
using shuffle'.

Now, I would like to implement a `sample` function, like the one in the 
Python random module:

http://docs.python.org/library/random.html#random.sample

Is an implementation based on shuffle good?
Or a more efficient implementation is possible?

How should be the interface of this sample function?
Should it return the sampled list *and* the original list without the 
sampled elements?

Or it is better to just return the sampled list?


If both partition and sample functions can be implemented efficiently 
using shuffle', then it may be appropriate to add these two functions 
inside the random-shuffle package.




Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] random shuffle and random list partition

2009-03-17 Thread Manlio Perillo

Yitzchak Gale ha scritto:

[...]
While I think Oleg's tree method is beautiful, in practice
it may be re-inventing the wheel. I haven't tested it, but
I doubt that this implementation is much better than
using the classical shuffle algorithm on an IntMap.


Do you have a working implementation?


It's essentially the same tree inside. That's what I
usually use for this, and it works fine.



Oleg implementation is rather efficient, but it requires a lot of memory 
for huge lists.


Here, as an example, two programs, one in Python and one in Haskell.
The default Python generator in Python use the Mersenne Twister, but 
returning floats number in the range [0, 1].



# Python version
from random import shuffle

n = 1000
m = 10
l = range(1, n + 1)

shuffle(l)
print l[:m]


-- Haskell version
module Main where

import Random.Shuffle
import System.Random.Mersenne.Pure64 (newPureMT)

n = 1000
m = 10
l = [1 .. n]

main = do
  gen - newPureMT
  print $ take m $ shuffle' l n gen



The Python version performances are:

real0m16.812s
user0m16.469s
sys 0m0.280s

150 MB memory usage


The Haskell version performances are:

real0m8.757s
user0m7.920s
sys 0m0.792s

800 MB memory usage



In future I can add an implementation of the random
shuffle algorithm on mutable arrays in the ST monad.


I've tried that in the past. Surprisingly, it wasn't faster
than using trees. Perhaps I did something wrong. Or
perhaps the difference only becomes apparent for
huge lists.



Can you try it on the list I have posted above?



As you point out, your partition algorithm is not fair.
Using your Random.Shuffle and a well-know trick
from combinatorics, you can easily get a fair
partitions function:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2495



Thanks, this is very nice.
I have to run some benchmarks to see if it is efficient.


Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] can GHC build an executable from a C source file?

2009-03-17 Thread Manlio Perillo

Hi.

I'm checking if it possible to build an executable from C source files only.

As an example:

#include stdio.h

int main () {
printf(hello world\n);
return 0;
}


$ghc --make foo.c


However this only produces the object file, foo.o; it does not build the 
executable file.



What is the reason for this behaviour?
I have tested with GHC 6.8.2.



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] can GHC build an executable from a C source file?

2009-03-17 Thread Manlio Perillo

Anton Tayanovskyy ha scritto:

Works for me without the --make, as `ghc foo.c`



For me, too, thanks.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] can GHC build an executable from a C source file?

2009-03-17 Thread Manlio Perillo

Joe Fredette ha scritto:
You know, I hear theres this brilliant program for compiling C code -- 
gcd? ccg? gcc, yah gcc... Anyone tried it?


In all seriousness though, why do you need to compile c with ghc? I'm 
curious, it seems a bit pointless...




It's for a possible extension I'm planning for Cabal.

The idea is to add support for configuration features.
A configuration feature is similar to a configuration flag, but with 
some important differences.



data Feature = Bool | String


A feature block:

Feature HaveURandom
action: execute
include-dirs: ...
c-sources: features/urandom.c
-- other possible properties, as listed in
-- Build information chapter, excluding `buildable` and
-- `other-modules`
-- The `action` property can have values `compile` (default`)
-- or `execute`

This means that I'm testing for a feature named HaveURandom, and the 
testing requires to compile/build and then execute some code.


In this case there is only a C source file that needs to be compiled; 
this is the reason why I have asked if GHC supports compilation for C 
source files only, and the creation of an executable.



Here I'm asking Cabal to execute the generated program.
If the executable returns 0, then HaveURandom will have value
`Feature True`; if it returns 1, then HaveURandom will have value 
`Feature False`.


If the executable write something on stdout, then HaveURandom will have 
value `Feature string`.


If compilation fails, HaveURandom will have value `Feature False`.
If `action` property is compile, then a successful compilation will 
result in HaveURandom having a value of `Feature True`.




Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Visualising the Hierarchical Namespace

2009-03-16 Thread Manlio Perillo

Don Stewart ha scritto:

I've just finished a post (and quick tool) for graphing the complete
module namespace of Haskell, taken from the core libraries and all of
Hackage. 


It's quite large:

http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/


Just a note: isn't it time to switch to SVG images?
All that PNG images are pretty useless, they are too small.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Visualising the Hierarchical Namespace

2009-03-16 Thread Manlio Perillo

Don Stewart ha scritto:

manlio_perillo:

Don Stewart ha scritto:

I've just finished a post (and quick tool) for graphing the complete
module namespace of Haskell, taken from the core libraries and all of
Hackage. 


It's quite large:

http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/

Just a note: isn't it time to switch to SVG images?
All that PNG images are pretty useless, they are too small.



All the .svg's are linked at the bottom.

Note: they will break your browser



FF handle them quite well.

The problem is that there is too much informations :).

By the way: in the SVG files there is a lot of redundant informations.
Graphviz should realy use CSS instead of SVG presentation attributes.

Using CSS to factor out common style should reduce image size a lot.
I would like to see a Graphics package for Haskell, based on 
SVG/libVG/Cairo/PDF imaging model.





Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] random shuffle and random list partition

2009-03-16 Thread Manlio Perillo

Hi.

For my Netflix Prize project I have implemented two reusable modules.
The first module implements a random shuffle on immutable lists.
It uses http://okmij.org/ftp/Haskell/perfect-shuffle.txt, with an 
additional wrapper function, having a more friendly interface.


The second module implements a function used to partition a list into n 
sublists of random length.



I have pasted the modules here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2483
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485

If someone is interested (and if Oleg give me permission), I can release 
them as a package on Hackage.

I need to improve documentation, however.

In future I can add an implementation of the random shuffle algorithm on 
mutable arrays in the ST monad.





Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types

2009-03-14 Thread Manlio Perillo

Daniel Peebles ha scritto:

I have added UAProd-based Binary instances to my copy of the uvector
repo at http://patch-tag.com/publicrepos/pumpkin-uvector . I have some
extensive tests (added to the existing tests directory) of things in
there and they seem to be mostly sane.



Thanks for the work.

I have a question, however.
Isn't it possible to write binary serialization code for the generic 
Stream type?



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Alternative to Data.Binary

2009-03-14 Thread Manlio Perillo

Svein Ove Aas ha scritto:

On Sat, Mar 14, 2009 at 1:37 PM, Grzegorz Chrupala
grzegorz.chrup...@computing.dcu.ie wrote:

Hi all,
Is there a serialization library other than the Data.Binary from hackage?

I am using Data.Binary in a couple of projects, but I have found its stack
and memory usage very hard to control. Its very common that decoding a map
or list of non-trivial size uses up all available RAM, or causes a stack
overflow.


That little problem appears to be an artifact of the particular Binary
instance for lists (the map instance starts by converting to/from a
list), and should be fixable in principle, for example by writing them
in chunks of 256 elements.



I can confirm that reading serialized UArr from uvector package does not 
cause memory problems.



Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types

2009-03-14 Thread Manlio Perillo

Daniel Peebles ha scritto:

I have added UAProd-based Binary instances to my copy of the uvector
repo at http://patch-tag.com/publicrepos/pumpkin-uvector . 



I can confirm that it works for me.

However I have now a memory problem with data decoding.

I need to serialize the Netflix Prize training dataset.
When I parse the data from original data set, memory usage is about 640 
MB [1].


But when I load the data serialized and compressed, (as [UArr (Word32 
*:* Word8)]), memory usage is about 840 MB...


The culprit is probably the decoding of the list (17770 elements).



[1] I have written a script in Python that parse the data, and it only
takes 491 MB
(using a list of a tuple with two compact arrays from numpy).
So, GHC has memory problems here.




Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Alternative to Data.Binary

2009-03-14 Thread Manlio Perillo

Don Stewart ha scritto:

grzegorz.chrupala:

Hi all,
Is there a serialization library other than the Data.Binary from hackage?

I am using Data.Binary in a couple of projects, but I have found its stack
and memory usage very hard to control. Its very common that decoding a map
or list of non-trivial size uses up all available RAM, or causes a stack
overflow.


[...]
Have you tried the latest release, which modified the Map and [a]
instances?



Tried right now.

My [UArr (Word32 :*: Word8)], where the list length is 17770, now 
requires 660 MB of memory, when decoding, against 840 MB with the 
previous version of the binary package.


This is fantastic, thanks!



Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] bytestring vs. uvector

2009-03-13 Thread Manlio Perillo

Don Stewart ha scritto:

[...]

I think uvector only works with certain types that can be
unboxed, while storablevector works with all types that
instantiate Foreign.Storable.Storable.  I don't know about
vector.  From the description of vector, I have the




One of the nice feature of uvector is the support for UArr (a :*: b).

An UArr (a :*: b) can be easily (with fstU and sndU) transformed in
UArr a and UArr b.

uvector package also suppors Complex and Rational, however the support 
for these type is hard written, using a UAProd class, and requires 
some boiler plate code (IMHO).


I find StorableVector implementation much more simple; I would like to 
see it in the Haskell Platform.


As for Data.Parallel, uvector and vector, it seems there is some code 
duplication.


Both Data.Parallel and uvector, make us of a strict pair type.
Such a type is also implemented in the strict package [1].

The authors are the same, so I don't understand the reason of code 
replication.



There is also replication in the definition of the Stream data type.



[1] there seems to be an error in the documentation:
http://hackage.haskell.org/packages/archive/strict/0.3.2/doc/html/Data-Strict-Tuple.html

In the description, there is:
Same as regular Haskell pairs, but (x :*: _|_) = (_|_ :*: y) = _|_

but in the synopsis, the data constructor is :!:, not :*:.



Regards   Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] bytestring vs. uvector

2009-03-13 Thread Manlio Perillo

Don Stewart ha scritto:

bulat.ziganshin:

Hello Don,

Wednesday, March 11, 2009, 12:12:07 AM, you wrote:


Right, so my point stands: there's no difference now. If you can write a
Storable instance, you can write a UA et al instance.

yes, if there is some class provided for this and not just hard-coded
4 or so base types 


That's right. For example (supporting even pairs):

instance (RealFloat a, UA a) = UA (Complex a) where

  newtype UArr  (Complex a)   = UAComplex  (UArr (a :*: a))
  newtype MUArr (Complex a) s = MUAComplex (MUArr (a :*: a) s)



You also have to add instance for UIO:

instance (RealFloat a, UIO a) = UIO (Complex a) where
hPutU h (UAComplex arr) = hPutU h arr
hGetU h = do arr - hGetU h
 return (UAComplex arr)


With Storable, this should not be required; you just have to write an 
instance for the Storable class.




Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] bytestring vs. uvector

2009-03-13 Thread Manlio Perillo

Manlio Perillo ha scritto:

[...]
uvector package also suppors Complex and Rational, however the support 
for these type is hard written, using a UAProd class, and requires 
some boiler plate code (IMHO).




Correction: UAProd is not a class, sorry.
It is the UA constructor overloaded for a:*:b, and Complex and Rational 
just reuse this specialization.



 [...]


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] bytestring vs. uvector

2009-03-13 Thread Manlio Perillo

Don Stewart ha scritto:

[...]

You also have to add instance for UIO:

instance (RealFloat a, UIO a) = UIO (Complex a) where
hPutU h (UAComplex arr) = hPutU h arr
hGetU h = do arr - hGetU h
 return (UAComplex arr)


With Storable, this should not be required; you just have to write an  
instance for the Storable class.




Though you get no IO operations with Storable... UIO is entirely
separate.



Yes, but using Storable and Foreign.Ptr, IO is rather simple.
From storablevector package:

-- | Outputs a 'Vector' to the specified 'Handle'.
hPut :: (Storable a) = Handle - Vector a - IO ()
hPut h v =
   if null v
 then return ()
 else
   let (fptr, s, l) = toForeignPtr v
   in  withForeignPtr fptr $ \ ptr -
  let ptrS = advancePtr ptr s
  ptrE = advancePtr ptrS l
  -- use advancePtr and minusPtr in order to respect
  -- alignment
  in  hPutBuf h ptrS (minusPtr ptrE ptrS)

-- | Read a 'Vector' directly from the specified 'Handle'.  This
-- is far more efficient than reading the characters into a list
-- and then using 'pack'.
--
hGet :: (Storable a) = Handle - Int - IO (Vector a)
hGet _ 0 = return empty
hGet h i =
   createAndTrim i $ \p -
  let elemType :: Ptr a - a
  elemType _ = undefined
  sizeOfElem = sizeOf (elemType p)
  in  fmap (flip div sizeOfElem) $
  hGetBuf h p (i * sizeOfElem)


Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] bytestring vs. uvector

2009-03-13 Thread Manlio Perillo

Bryan O'Sullivan ha scritto:

[...]
text is not mature, and is based on the same modern fusion framework as 
uvector and vector. It uses unpinned arrays, but provides functions for 
dealing with foreign code. 


What is the reason why you have decided to use unpinned arrays 
(ByteArray#) instead of pinned arrays (Foreign.Ptr)?



 [...]



Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] broken IO support in uvector package, when using non primitive types

2009-03-13 Thread Manlio Perillo

Hi.

I'm sure this is a know bug, but I can't find details with Google.

The offending code is here:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2362

When I execute the program I get:
  uio: readChunkBU: can't read

What's the problem?


I'm using uvector from:
http://code.haskell.org/~dons/code/uvector/



Thanks   Manlio Perillo

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types

2009-03-13 Thread Manlio Perillo

Daniel Peebles ha scritto:

As far as I know, the reason for this is that the UIO instance for
productions writes the two rows out sequentially to file, but
doesn't include any means to determine the length of the two halves
when it's loading up again. When you try to read the production back
in, it tries to read in two arrays, but the first array read consumes
all the input leaving the second with nothing.



Ok, thanks.

For now, I think that the simple solution is to not use UArr (a:*:b), but
(UArr a, UArr b).


Or I should just switch to Data.Array.Unboxed.
The operations I need with the array are only:
- sum of elements
- binary search
- serialization/deserialization
- sorting (but right now I sort the list before creating the array)

with a very big data set.

For my problem, is uvector the best solution?
What about storablevector or cvector, instead?



Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types

2009-03-13 Thread Manlio Perillo

Daniel Fischer ha scritto:

[...]
Worked with uvector-0.1.0.1:

 [...]
But not with uvector-0.2

 [...]

The main difference is that in uvector 0.2, hPutBU does not write in the 
file the length of the array; hGetBU simply use the file size.


   let elemSize = sizeBU 1 (undefined :: e)
   n - fmap ((`div`elemSize) . fromInteger) $ hFileSize h


So, the problem seems to be here.
This simply don't support having two arrays written in the same file, 
and sizeBU belongs to the UAE class, whose instances are only declared 
for builtin types.



So, the patch is: just revert this change.


Regards  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types

2009-03-13 Thread Manlio Perillo

Don Stewart ha scritto:

[...]


So, the patch is: just revert this change.


Or... use your own UIO instance. That's why it's a type class!



Why should I rewrite the UIO instance, if one already exists?


Anyway, for the background on this:

Tue Nov 18 08:44:46 PST 2008 Malcolm Wallace
  * Use hFileSize to determine arraysize, rather than encoding it in the
  file.

Here is a patch to the uvector library that fixes hGetBU and hPutBU to
use the filesize to determine arraysize, rather than encoding it within
the file.  I guess the disadvantages are that now only one array can
live in a file, and the given Handle must indeed be a file, not a socket
Handle.  But the advantage is that one can read packed raw datafiles
obtained externally.

Still, again, I'd point out that uvector is alpha, APIs can and will
change. 



Ok, with API changes.
But they *should not* break normal package usage.

That patch simply introduced a bug in uvector.
If you think that the patch is useful, then the patch should also 
include a modification to the UIO instance for a:*:b.



Regards  Manlio


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] advice on a parsing function

2009-03-11 Thread Manlio Perillo

Hi.

I'm still working on the Netflix Prize; now I have managed to implement 
a parsing function for the qualifying data set (or quiz set).


The quiz set is in the format:

1:
10
20
30
2:
100
200
3:
1000
2000
3000
4000
5000


Where 1, 2, 3 are movie ids, and the others are user ids.

The parsing program is at:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2300


The program reads the file using lazy IO.
One of the feature I want is, for the quiz function, to be a *good 
producer*.


I'm quite satisfied with the result (this is the first complex parsing 
function I have written in Haskell), and I have also managed to avoid 
the use of an accumulator.


However I'm interested to know it there is a more efficient solution.


The qualifying.txt file is 51MB, 2834601 lines.

On my laptop, the function performance are:

real1m14.117s
user0m2.496s
sys 0m0.136s

CPU usage is about 3%,
system load is about 0.20,
memory usage is 4956 KB.


What I'm worried about is:

quiz' ((id, :) : l) = (id, quiz'' l) : quiz' l
quiz' ((id, _) : l) = quiz' l


the problem here is that the same elements in the list are processed 
multiple times.



I have written alternate versions.
The first using foldl
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2303
(that, however, builds the entire data structure in memory, and in 
reverse order)


The latter using foldr
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2304
(that, however, is incorrect and I'm unable to fix).


The performances of the foldr version are very similar to the 
performances of the first implementation (it make use, however, of 3704 
KB, and it is about 3 seconds faster).




P.S:
the expected result for the sample quiz set I have posted is:
[(1,[10,20,30]),(2,[100,200]),(3,[1000,2000,3000,4000,5000])]

The foldl version produces:
[(3,[5000,4000,3000,2000,1000]),(2,[200,100]),(1,[30,20,10])]

The foldr version produces:
[(1,[]),(2,[10,20,30]),(3,[100,200]),(5000,[1000,2000,3000,4000])]




Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString in patterns

2009-03-11 Thread Manlio Perillo

Don Stewart ha scritto:

manlio_perillo:

Don Stewart ha scritto:

[...]
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Char8 as C

isMatch :: C.ByteString - Bool
isMatch match = True
isMatch _   = False

main = print . map isMatch . C.lines = C.getContents

What is the reason why instance declarations for IsString class are not  
defined for available ByteStrings?


I need to define it by myself.


They're exported from Data.ByteString.Char8



Then there is something I'm missing.
Your code does not compile.



Thanks  Manlio Perillo

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] advice on a parsing function

2009-03-11 Thread Manlio Perillo

minh thu ha scritto:

[...]
I suggest you try an alternative strategy.
That altenrative strategy is twofold, just like you have
quiz' and quiz'.
This alternate strategy avoid pattern matching on strings
(which would be cumbersome for a bit more complex syntax).



But for this specific case it is very compact and elegant (IMHO).


[...]
Now, given those two functions, try to apply them
on your input string, feeding the next function application
with the resulting string of the current application.



So, I should not split the string into lines?

An useful feature of my program is that it parses both an input like:

1:
1046323,2005-12-19

and
1:
1046323


If I write a parser from scratch I need to implement two separate functions.

I will give it a try, just to check if it has better performances.


Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] advice on a parsing function

2009-03-11 Thread Manlio Perillo

minh thu ha scritto:

[...]
The approach I suggested is a bit overkill. You can indeed use L.lines
to split the input into lines then work on that.

But still, avoid the pair (Int, Bytestring). Instead, you can basically map
on each line the unsafeReadInt modified to :
- return the id
- return if it is one kind of id or the other kind.

so :
type UserId = Int
type MovieId = Int
unsafeReadInt :: Line - Either MovieId UserId

Now you have a nice list [Either MovieId UserId] that
you need to transform into (MovieId, [UserId]).



Thanks, this seems a much better solution.


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString in patterns

2009-03-11 Thread Manlio Perillo

Don Stewart ha scritto:

[...]

Then there is something I'm missing.
Your code does not compile.


Sure it does:



As Daniel suggested, I'm using an old bytestring version that came with 
Debian Etch (GHC 6.8.2).




Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] advice on a parsing function

2009-03-11 Thread Manlio Perillo

Manlio Perillo ha scritto:

minh thu ha scritto:

[...]
The approach I suggested is a bit overkill. You can indeed use L.lines
to split the input into lines then work on that.

But still, avoid the pair (Int, Bytestring). Instead, you can 
basically map

on each line the unsafeReadInt modified to :
- return the id
- return if it is one kind of id or the other kind.

so :
type UserId = Int
type MovieId = Int
unsafeReadInt :: Line - Either MovieId UserId

Now you have a nice list [Either MovieId UserId] that
you need to transform into (MovieId, [UserId]).



Thanks, this seems a much better solution.



Done:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2309


real1m15.220s
user0m4.816s
sys 0m0.308s


3084 KB memory usage

Previous version required 4956 KB of memory.


Thanks again for the suggestion, Minh.


Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


  1   2   3   >