RE: NO_NORMS and TOKENIZED?

Steven Parkes Fri, 16 Feb 2007 19:55:20 -0800

I guess I'm in the minority here. I tend to use custom classes because I
can tweak the API to make it easier for people writing to the API. I'll
duck type all over the place to make things act like what someone might
expect, but I don't like forcing people to create raw data structures
when there are semantics that can make things easier.


In this case at hand, I was going to say that making a doc a hash makes
adding fields sequentially difficult. But ruby's nice enough that it
simply makes it a little ugly:
        doc[:contents].to_a.push data

I like being able to write
        doc[:contents] = data
where I know there's no other data for that field but I like being able
to write
      doc << { :content => data }
A doc can certainly act like a hash in some cases, but it's more than a
hash.

One of the first tests I wrote (a long time ago, with less ruby under my
belt) for the jruby interface for adding docs was:

  def test_add_lists

    @index << [ :contents, "the quick brown fox jumped over the lazy
dog" ] \
           << [ :contents, "Alas poor Yorick,", \
                :contents, "I knew him Horatio" ] \
           << [ :contents, [ "To be,", "or not ", "to be" ] ]

  end

Was never too happy with the middle case. Maybe an extra list level?

This doesn't use an explicit doc object but it makes one internally.
That nice little cool (pronounced "scary") ability in ruby to say

class Array
  def to_lucene_doc
        ...
  end
end

class Hash
  def to_lucene_doc
        ...
  end
end

module Lucene
  class Document
    def to_lucene_doc
      self
    end
  end
  class Index
    def << *docs
       docs.map! { |doc| doc.to_lucene_doc }
       ...
    end
end

I haven't had as much time as I would like, either, to follow solrb (and
I guess I'm also in the minority thinking that solrb was a clever name).

-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 16, 2007 5:43 PM
To: [email protected]
Subject: Fwd: NO_NORMS and TOKENIZED?

A recent e-mail from Mr. KinoSearch to java-user has a quote that I  
wanted to point out here:

Begin forwarded message:
> KS 0.20 doesn't even have Document or Field classes.  :)  They've  
> been eliminated, and native Perl hashes are now used to transport  
> document data.

I think we could simplify (wow, even at this early stage) the solrb  
code a bit by simply representing a document as a Hash.  For  
multiValued data, the values would be arrays.  Do we really need any  
other semantics at the solrb level, or does a Hash convey it all?   
Just thinking out loud here, so feel free to ignore me.

Marvin makes some other great points about fixed schemas, which maps  
to the schema.xml facility of Solr I believe.  I am interested in  
exploring how field names get mapped, along with client knowledge of  
Solr's schema.xml structure, can make an elegant API.

        Erik

p.s. I've queued up several ruby-dev e-mails to respond to over the  
next several days.  I'm happily swamped and eager to keep the  
momentum of solrb and Flare going strong.

RE: NO_NORMS and TOKENIZED?

Reply via email to