Re: Namespaces in response (SOLR-1586)

Mattmann, Chris A (388J) Fri, 11 Dec 2009 13:53:12 -0800

Hi Hoss,

> : > : I think the initial geosearch feature can start off with
> : > : <str>10,20</str> for a point.
> : >
> : > +1.
> :
> : Fundamentally, how is a string a point?
> 
> Fundementally a string is not a point, and a point is not a string -- but
> if you want express the concept of a point in a manner that only uses very
> simple primative types, then a string containing comma seperated numbers
> is a pretty dencet way to do it.  If you'd prefer, a pair of numbers would
> workd just as well...
> 
>    <arr><float>10</float><float>20</float></arr>


I'm conflicted here. In simple semantics, sure it's just an array of
float/double numbers. A, if a string must be used a comma is probably OK, so
long as it maps to some existing known approach to represent points. I've
asked several times if there are examples. I can point to one that uses
spaces to separate the coordinates in the point (georss). What others use
comma? 

> 
> : > The current XML format SOlr uses was designed to be extremely simple, very
> : > JSON-esque, and easily parsable by *anyone* in any langauge, without
> : > needing special knowledge of types .
> :
> : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
> : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
> : It's also for representation?
> 
> No, actually the use case for FieldTYpes is entirely about the internal
> logic of how Solr should deal with those fields, and how various
> operations should work on them.  FieldTypes can dictate the internal
> representation within the confines of a Lucene index, but they should not
> circumvent the contracts of the response writers in dictating what
> is/isn't a legal response.

Well, I actually would disagree. What's the point of #toInternal and
#toExternal then, other than to convert from the external representation to
an internal Lucene index representation, and then to do the opposite coming
out of the index? 

> 
> : allowed for a while I think), why prevent it? Allowing namespaces does _not_
> : break anything.
>         ...
> : > introducing a new 'point" concept, wether as <point> or as
> : > <georss:point/>, is going to break things for people.
> :
> : Show me an example, I fundamentally disagree with this.
> 
> Ok. Let's start with SolrJ then: take a look at the KnownType enum (line
> 151) in XMLResponseParser...
> 
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/clien
> t/solrj/impl/XMLResponseParser.java?revision=819403&view=markup

Got it. OK, sure, well thanks for actually being able to identify somewhere
where it would be and for taking the time to provide a link. So what you are
saying is that this breaks the SolrJ and python clients and people who
develop clients to parse and read the (undocumented) SOLR response schema.

> 
> ...or let's do a random google code search for "solr xml lst" -- check out
> ResponseContentHandler in solrpy...
> 
> http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841
> 
> ...I can't write python code to save my life, but I have pretty good idea
> what that code will do if it sees an unexpected tag.

Gotcha.


> : And why is that? Isn't the point of SOLR to expand to use cases brought up
> : by users of the system? As long as those use cases can be principally
> : supported, without breaking backwards compatibility (or in that case,  if
> : they do, with large blinking red text that says it), then you're shutting
> : people out for 0 benefit? It's aesthetics we're talking about here.
> 
> I don't know if i'd say that's the point of Solr, but yes we should
> absolutely try to grow the capabilities of the system as new use cases
> come along.

Well that's what I was trying to do, but all I was hearing was a lot of
hollering without any help to understand why. Thanks for being the one to
finally provide that information.

> 
> I am 100% in agreement that the existing "simple" XMLRresponseWriter is
> not for everyone -- Historicly we've tried to maintain a sense of equality
> between all of hte Response writers, so that they all contained the same
> data just with different markup -- but there are clearly cases where it
> would be nice to have a response writer that is allowed to "know more"
> about teh real structure of the data and represent it in a manner that
> more closely represents it's purpose.

I'd like to refactor the whole thing to be a bit less brittle, and also to
close off people that shouldn't be dealing with SOLR's XML in/out (by taking
away your favorite writePrim method and its public modifier and making the
class final which it once was). We should rename that to
SolrXmlResponseWriter, but it's not really generic XML (as the name
suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since
it's undocumented, I'd be happy to throw it together for it's XML format.
Would that also be welcomed? Then, we should develop an easy extension point
mechanism for people who want to develop their own XML response writers and
write their own clients (or leverage existing clients that understand that
XML).

> There is a clear push for Solr to natively be able to generated responses that
> incorporate more "industry standard" XML schemas, and i would love to see
> us start adding functionality to do that, but bastardizing the existing
> XMLResponseWriter format is not the way to do it.

I see the light now.

> 
> Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags
> to the output generated by the XMLResponseWriter.  Feel free to call me
> stubborn, call me obstinant, call me pedantic -- but there is no way in
> hell i'm going to support a patch that does that.

I won't call you any of those things. Thanks for the help. Let me know what
you think.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Reply via email to