Re: [Haskell-cafe] Data Structures GSoC

2010-04-05 Thread Nathan Hunter
Well, one of my most important questions has been indirectly answered. It
seems like Map is still the main point of interest, and Jamie Brandon's list
of remaining 
objectiveshttp://www.haskell.org/pipermail/haskell-cafe/2009-March/058270.html
include
modifying the API to work with the edison or collections API, and
benchmarking. I haven't attempted a project this large before, so I'm not
sure what is feasible in the allotted time. A complete Data Structures
overhaul seems out of the question, but polishing up a single one seems far
too sparse. Then again, the benchmarking could represent a significant chunk
of work.

-Nathan



On Wed, Mar 31, 2010 at 11:28 AM, wren ng thornton w...@freegeek.orgwrote:

 Nathan Hunter wrote:

 Hello.

 I am hoping to take on the Data Structures project proposed two years ago
 by
 Don Stewart here
 http://hackage.haskell.org/trac/summer-of-code/ticket/1549,

 this summer.
 Before I write up my proposal to Google, I wanted to gauge the reaction of
 the Haskell community to this project.
 Particularly:

 -What Data Structures in the current libraries are in most dire need of
 improvement?
 -How necessary do you think a Containers Library revision is?


 One thing I've seen come up repeatedly is the issue of presenting a unified
 and general interface for Data.Map, Data.IntMap, and related things like
 Data.Set, bytestring-trie, pqueue, etc which intended to mimic their
 interface. That alone isn't big enough for a GSoC, but it would be a very
 nice thing to have. Every few months there's a request on the librar...@list 
 to alter, generalize, or reunify the map interface in some way.

 ** Just to be clear, I do not mean coming up with a typeclass nor doing
 things like the generalized-trie tricks, I just mean a good old fashioned
 standard API. **

 There are countervailing forces for making a good API. On the one hand we
 want functions to do whatever we need, on the other hand we want the API to
 be small enough to be usable/memorable. In the bytestring-trie library I
 attempted to resolve this conflict by offering a small set of highly
 efficient ueber-combinators in the internals module, a medium sized set of
 functions for standard use in the main module, and then pushed most
 everything else off into a convenience module.

 The containers library would do good to follow this sort of design. The
 Data.Map and Data.IntMap structures don't provide the necessary
 ueber-combinators, which has led to the proliferation of convenience
 functions which are more general than the standard use functions but not
 general enough to make the interface complete. Also, these generalized
 functions are implemented via code duplication rather than having a single
 implementation, which has been known to lead to cutpaste bugs and
 maintenance issues. Provided the correct ueber-combinators are chosen, there
 is no performance benefit for this code duplication either (so far as I've
 discovered with bytestring-trie).

 Additionally, it'd be nice if some of the guts were made available for
 public use (e.g., the bit-twiddling tricks of Data.IntMap so I don't have to
 duplicate them in bytestring-trie).

 Also it would be nice to develop a cohesive test and benchmarking suite,
 which would certainly be a large enough task for GSoC, though perhaps not
 fundable.

 I would be willing to co-mentor API and algorithm design for cleaning the
 cobwebs out of the containers library. I wouldn't have the time for
 mentoring the choice of datastructures or benchmarking however.

 --
 Live well,
 ~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Data Structures GSoC

2010-03-31 Thread Nathan Hunter
Hello.

I am hoping to take on the Data Structures project proposed two years ago by
Don Stewart herehttp://hackage.haskell.org/trac/summer-of-code/ticket/1549,
this summer.
Before I write up my proposal to Google, I wanted to gauge the reaction of
the Haskell community to this project.
Particularly:

-What Data Structures in the current libraries are in most dire need of
improvement?
-How necessary do you think a Containers Library revision is?
-Should I attempt to build on the work Jamie Brandon did with Map as
generalised tries, or is that beyond the scope of this project

I am very excited to work with functional data structures, and I would
greatly appreciate any input.

-Nathan Hunter
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data Structures GSoC

2010-03-31 Thread Darryn Reid
Nathan,

For what it is worth: I'd propose that Data.HashTable needs to be
replaced; it appears to me having played around with it and found it
wanting that its limitations are pretty common knowledge in the Haskell
community. (I'm sure most people on this list would already know much
more about the limitations of the existing HashTable library than do I,
for am only new to Haskell, so sorry if my suggestion is an
eyeball-roller).

Darryn.

On Wed, 2010-03-31 at 00:00 -0700, Nathan Hunter wrote:
 Hello. 
 
 
 I am hoping to take on the Data Structures project proposed two years
 ago by Don Stewart here, this summer.
 Before I write up my proposal to Google, I wanted to gauge the
 reaction of the Haskell community to this project.
 Particularly:
 -What Data Structures in the current libraries are in most
 dire need of improvement?
 -How necessary do you think a Containers Library revision is?
 -Should I attempt to build on the work Jamie Brandon did with
 Map as generalised tries, or is that beyond the scope of this
 project
 I am very excited to work with functional data structures, and I would
 greatly appreciate any input.
 
 
 -Nathan Hunter
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Data Structures GSoC

2010-03-31 Thread wren ng thornton

Nathan Hunter wrote:

Hello.

I am hoping to take on the Data Structures project proposed two years ago by
Don Stewart herehttp://hackage.haskell.org/trac/summer-of-code/ticket/1549,
this summer.
Before I write up my proposal to Google, I wanted to gauge the reaction of
the Haskell community to this project.
Particularly:

-What Data Structures in the current libraries are in most dire need of
improvement?
-How necessary do you think a Containers Library revision is?


One thing I've seen come up repeatedly is the issue of presenting a 
unified and general interface for Data.Map, Data.IntMap, and related 
things like Data.Set, bytestring-trie, pqueue, etc which intended to 
mimic their interface. That alone isn't big enough for a GSoC, but it 
would be a very nice thing to have. Every few months there's a request 
on the libraries@ list to alter, generalize, or reunify the map 
interface in some way.


** Just to be clear, I do not mean coming up with a typeclass nor doing 
things like the generalized-trie tricks, I just mean a good old 
fashioned standard API. **


There are countervailing forces for making a good API. On the one hand 
we want functions to do whatever we need, on the other hand we want the 
API to be small enough to be usable/memorable. In the bytestring-trie 
library I attempted to resolve this conflict by offering a small set of 
highly efficient ueber-combinators in the internals module, a medium 
sized set of functions for standard use in the main module, and then 
pushed most everything else off into a convenience module.


The containers library would do good to follow this sort of design. The 
Data.Map and Data.IntMap structures don't provide the necessary 
ueber-combinators, which has led to the proliferation of convenience 
functions which are more general than the standard use functions but not 
general enough to make the interface complete. Also, these generalized 
functions are implemented via code duplication rather than having a 
single implementation, which has been known to lead to cutpaste bugs 
and maintenance issues. Provided the correct ueber-combinators are 
chosen, there is no performance benefit for this code duplication either 
(so far as I've discovered with bytestring-trie).


Additionally, it'd be nice if some of the guts were made available for 
public use (e.g., the bit-twiddling tricks of Data.IntMap so I don't 
have to duplicate them in bytestring-trie).


Also it would be nice to develop a cohesive test and benchmarking suite, 
which would certainly be a large enough task for GSoC, though perhaps 
not fundable.


I would be willing to co-mentor API and algorithm design for cleaning 
the cobwebs out of the containers library. I wouldn't have the time for 
mentoring the choice of datastructures or benchmarking however.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe