Re: [Haskell-cafe] Please review my Xapian foreign function interface

2011-02-21 Thread Edward Z. Yang
Excerpts from Oliver Charles's message of Mon Feb 21 08:53:48 -0500 2011:
 Yes, this is a concern to me as well. The only places I've used
 unsafePerformIO is with Query objects, which I am mostly treating as
 immutable data, and never exposing anyway to modify query
 objects.

That is a good way to start thinking about it.  If there is an efficient
mechanism for copying query objects, you can also implement persistent
update (e.g. copy the structure and then mutate it).

 However, what is better? Should I avoid taking this
 risk/assumption of immutability and use this within the IO monad also? I
 guess my biggest fear is that this entire library is only usable in the
 IO monad, which from what I understand limits my ability to test easily.

Don't take the risk: verify for yourself that there is no risk!  Note that
putting things in IO doesn't get you off the concurrency hook: things in
IO can be run in different threads and you need to synchronize them. Indeed,
as the Xapian faq states:

If you want to use the same object concurrently from different threads,
it's up to you to police access (with a mutex or in some other way) to
ensure only one method is being executed at once.

It is admittedly more annoying to test things in IO. One thing you can do is
if database objects are completely isolated from one another (which seems
to be the case) you can build up a custom monad for manipulating this object
in a single-threaded and/or thread safe manner.  I did something like
this (actually, I needed to enforce more complex invariants about when
what functions could get called), but unfortunately it was for work and the
code hasn't been cleared for publication yet.

Cheers,
Edward

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Please review my Xapian foreign function interface

2011-02-20 Thread Edward Z. Yang
Thanks Oliver!

I haven't had time to look at your bindings very closely, but I do
have a few initial things to think about:

* You're writing your imports by hand.  Several other projects used
  to do this, and it's a pain in the neck when you have hundreds
  of functions that you need to bind and you don't quite do it all
  properly, and then you segfault because there was an API mismatch.
  Consider using a tool like c2hs which rules out this possibility
  (and reduces the code you need to write!)

* I see a lot of unsafePerformIO and no consideration for:
- Interruptibility
- Thread safety
  People who use Haskell tend to expect their code to be thread-safe and
  interruptible, so we have high standards ;-) But even C++ code
  that looks thread safe may be mutating shared memory under the hood,
  so check carefully.

I use Sup, so I deal with Xapian on a day-to-day basis. Bindings are good
to see.

Cheers,
Edward

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Please review my Xapian foreign function interface

2011-02-18 Thread Oliver Charles
Hello!

I've finally came up with some motivation for a project to get my feet
wet using Haskell, and for this little pet project I need an interface
to Xapian. After reading various documents on FFI in general, I've got a
brief working implementation, and I'm now looking for how to better
structure the public API. First, a quick bit of background if you're not
familiar with Xapian.

Xapian is a search engine, and provides a C++ API. You store documents
in a database (handled by Xapian), and index documents by adding terms
to them. Xapian provides stemming algorithms to help generate these
terms from other data. Xapian also has an interface to queries (through
a Xapian::Enquire object), and also a query parser to allow for natural
language queries to be parsed and ran. For more information, you can
check out the API at [1] - it's fairly small.

As Xapian is C++, it seems my best option is to create my own simple C
wrapper, which also lets me tailor my FFI to be easy to use from
Haskell. You can see my C api on Github [2] - for now it's very stripped
down; I've been wrapping stuff on a need-to-use basis.

* * *

Currently what I have is functional (in the sense that it works), but
it's extremely tied to I/O and very little of the code is pure. For
example, to create and index a document, you need to do something along
the lines of:

do document - newDocument
   setDocumentData document Document data
   addPosting document search_term 1
   addDocument database document

(Assuming you already have an open database handle). How horrible
imperative this all looks! :-) A document *feels* like it should be
quite pure, however retrieving properties of a document performs
I/O. For example, I'd like to have something like:

data Document = Document { data :: String, postings :: [String] }
do document - getDocument database 123 -- Get doc #123

and have `document` refer to a pure Document object. I'm still stuck in
the IO monad a bit, but at least I can write pure functions to operate
on `Document` values now. The problem I see with this, is that I believe
I'd have to retrieve all parts of document in my `getDocument` function
(include the data and all postings), and I can't benefit from being lazy
here.

From what I gather, all the methods on Xapian documents are lazy (such
as getting the document data, and getting terms associated with
documents), which would mean that my foreign imports would have to be
`IO String`, for example. This tends to fairly quick cause the IO monad
to propogate everywhere.

* * *

I think that's enough information to explain my current progress, and my
concerns. It could well be that I'm overly worrying about everything
being in the IO monad, but as I said - Haskell is new to me.

All of my work is at [3], and I'd love any advice you have. Haddock
documents have been exported to ocharles.org.uk [4].

Thanks for your time,

Oliver Charles / ocharles

--

[1]: http://xapian.org/docs/apidoc/html/annotated.html
[2]: https://github.com/ocharles/Xapian-Haskell/blob/master/c/cxapian.h
[3]: https://github.com/ocharles/Xapian-Haskell
[4]: http://ocharles.org.uk/tmp/search-xapian/Search-Xapian.html

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe