Re: Faceted search problem

2007-01-16 Thread Erik Hatcher


On Jan 16, 2007, at 10:05 PM, Peter McPeterson wrote:
Hi all, I'm trying this solr ruby DSL called Flare/solrb and I  
don't really know how the faceted search works because I cant add  
whatever fields I want to to the index. This is currently not working:


conn = Solr::Connection.new('http://localhost:8983/solr')
doc = {:id => 1, :cat => 'eletronics', :features => 'video,  
music', :product => 'iPod'}

conn.send(Solr::Request::AddDocument.new(doc))
=> #@status_message="ERROR:unknown field 'cat'", @status_code="400",  
@raw_response="ERROR:unknown field 'cat'result>", @doc= ... >


In case that if it was working, what I'd like to do is:
(pseudo-code)

request = Solr::Request::Standard.new(
:query => 'ipod',
:facets => {
 :fields => :cat
 }
)

Any help would be appreciated.


I'm copying in Ed Summers, who may not be on solr-user now, but is a  
key contributor to solrb at the moment also.


Good question Peter.  Bear with this, as I want to detail lots here  
so folks understand what is going on with solrb a bit more clearly  
than svn commits and brief allusions.


There are a couple of important things to note here specifically  
about Solr itself.  It is driven by a schema (see solr/solr/conf/ 
schema.xml) which defines how fields are handled within Solr/Lucene.   
Solr needs to know what to do with field text when it gets it from an  
.  In the solrb version of Solr's schema, which varies from the  
Solr schema that ships with the Solr example application, locks down  
two only 3 field naming possiblities: id, *_text, and *_facet).  I  
intentionally started it as simple as I could for now, knowing that  
opening up the schema is inevitable and we want to do it wisely with  
a bit more knowledge of how we want Ruby and Solr to interoperate.


Two relatively quick fix options to get you started:

  (A) difficulty: easy Rename your non-id fields to *_text and  
*_facet.  For example:


   doc = {:id => 1, :cat_facet => 'eletronics', :features_facet  
=> 'video, music', :product_text => 'iPod'}


  (B) difficulty: solr experienced only.  You're welcome to tweak  
the schema.xml and go to town with Request::AddDocument and any field  
names you want.  Be sure you know what you're doing with faceting,  
tokenization, and sorting though.


-- NOTE: If you're familiar with Solr, this will make sense as a  
difference to the Solr proper example schema --
  id:  is mandatory, and is a unique identifier for a document, it  
can be any string you like.  how searchable this id is depends on  
what characters it contains.  minimizing special characters makes it  
easier to search for a specific id without worrying about query  
parser syntax conflicts.


  *_text: is tokenized and copied into the "text" field (so the  
client doesn't need to/shouldn't send a "text" field, only *_text  
field names).  the default search field is "text" and includes text  
from all *_text fields.


  *_facet: is not tokenized, and it is suitable for use with the  
faceting features that Solr supports

---

The faceting feature is only starting to come together through the  
API, and so its not quite easily exposed.  In fact, only earlier  
today did the response handling refactoring allow for facets to be  
accessed.


*** Sidebar ***
Why does the facet data come back as outside the 'response'  
structure?  Here's an example:


{
'responseHeader'=>{
  'status'=>0,
  'QTime'=>3057,
  'params'=>{
'wt'=>'ruby',
'facet.limit'=>'2',
'rows'=>'0',
'facet.missing'=>'true',
'start'=>'0',
'facet'=>'true',
'facet.field'=>[
 'subject_genre_facet',
 'subject_era_facet',
 'subject_topic_facet'],
'indent'=>'on',
'q'=>'[* TO *]',
'facet.zeros'=>'true'}},
'response'=>{'numFound'=>4,'start'=>0,'docs'=>[]
},
'facet_counts'=>{
  'facet_queries'=>{},
  'facet_fields'=>{
'subject_genre_facet'=>{
 'Biography.'=>2605,
 'Congresses.'=>1837,
 ''=>38262},
'subject_era_facet'=>{
 '20th century.'=>1251,
 '20th century'=>1250,
 ''=>41219},
'subject_topic_facet'=>{
 'History.'=>2259,
 'History and criticism.'=>1769,
 ''=>15833

  (yes, i'm refactoring to add Yonik's latest facet changes in now!)


Have a look at the latest API, thanks in large part to Ed's ideas on  
where a Sol.rb DSL should head:




Here's the example pasted below:

  require 'solr'  # load the library
  include Solr# Allow Solr:: to be omitted from class/module  
references


  # connect to the solr instance
  conn = Connection.new('http://localhost:8983/solr', :autocommit  
=> :on)


  # add a document to the index
  conn.add(:id => 123, :title_text => 'Lucene in Action')

  # update the document
  conn.update(:id => 123, :title_text => 'Solr in Action')

  # print out the first hit in a query for 'action'
  response =

Re: One item, multiple fields, and range queries

2007-01-16 Thread Jeff Rodenburg

Yonik/Hoss -

OK, you lost me.  It sounds as if this PhraseQuery-ish approach involves
breaking datetime and lat/long values into pieces, and evaluation occurs
with positioning.  Is that accurate?



On 1/16/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 1/15/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> PhraseQuery artificially enforces that the Terms you add to it are
> in the same field ... you could easily write a PhraseQuery-ish query
that
> takes Terms from differnet fields, and ensures that they appear "near"
> eachother in terms of their token sequence -- the context of that
comment
> was searching for instances of words with specific usage (ie: "house"
used
> as a noun) by putting the usage type of each term in a different term in
a
> seperate parallel field, but with identicle token positions.

It seems like this could even be done in the same field if one had a
query type that allowed querying for tokens at the same position.
Just index "_noun" at the same position as "house" (and make sure
there can't be collisions between real terms and markers via escaping,
or use \0 instead of _, etc).

-Yonik



Re: One item, multiple fields, and range queries

2007-01-16 Thread Yonik Seeley

On 1/15/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

PhraseQuery artificially enforces that the Terms you add to it are
in the same field ... you could easily write a PhraseQuery-ish query that
takes Terms from differnet fields, and ensures that they appear "near"
eachother in terms of their token sequence -- the context of that comment
was searching for instances of words with specific usage (ie: "house" used
as a noun) by putting the usage type of each term in a different term in a
seperate parallel field, but with identicle token positions.


It seems like this could even be done in the same field if one had a
query type that allowed querying for tokens at the same position.
Just index "_noun" at the same position as "house" (and make sure
there can't be collisions between real terms and markers via escaping,
or use \0 instead of _, etc).

-Yonik


Faceted search problem

2007-01-16 Thread Peter McPeterson
Hi all, I'm trying this solr ruby DSL called Flare/solrb and I don't really 
know how the faceted search works because I cant add whatever fields I want 
to to the index. This is currently not working:


conn = Solr::Connection.new('http://localhost:8983/solr')
doc = {:id => 1, :cat => 'eletronics', :features => 'video, music', :product 
=> 'iPod'}

conn.send(Solr::Request::AddDocument.new(doc))
=> #ERROR:unknown field 'cat'", @doc= ... 
>


In case that if it was working, what I'd like to do is:
(pseudo-code)

request = Solr::Request::Standard.new(
:query => 'ipod',
:facets => {
 :fields => :cat
 }
)

Any help would be appreciated.

Peter

_
The MSN Entertainment Guide to Golden Globes is here.  Get all the scoop. 
http://tv.msn.com/tv/globes2007/?icid=nctagline2




Re: solr + cocoon problem

2007-01-16 Thread Chris Hostetter

: java.io.IOException: Server returned HTTP response code: 505 for URL:
: http://hostname/solr/select/?q=a b
:
:
: The interesting thing is that if I access http://hostname/solr/select/?q=a b
: directly it works.

i don't know anything about cocoon, but that is not a legal URL, URLs
can't have spaces in them ... if you type a space into your browser, it's
probably being nice and URL escaping it for you (that's what most browsers
seem to do now a days)

i'm guessing Cocoon automaticaly un-escapes the input to your app, and you
need to re-URL escape it before sending it to Solr.




-Hoss



Re: XML querying

2007-01-16 Thread Thorsten Scherler
On Mon, 2007-01-15 at 13:42 +, Luis Neves wrote:
> Hi!
> 
> Thorsten Scherler wrote:
> 
> > On Mon, 2007-01-15 at 12:23 +, Luis Neves wrote:
> >> Hello.
> >> What I do now to index XML documents it's to use a Filter to strip the 
> >> markup, 
> >> this works but it's impossible to know where in the document is the match 
> >> located.
> >> What would it take to make possible to specify a filter query that accepts 
> >> xpath 
> >> expressions?... something like:
> >>
> >> fq=xmlField:/book/content/text()
> >>
> >> This way only the "/book/content/" element was searched.
> >>
> >> Did I make sense? Is this possible?
> > 
> > AFAIK short answer: no.
> > 
> > The field is ALWAYS plain text. There is no xmlField type.
> > 
> > ...but why don't you just add your text in multiple field when indexing.
> > 
> > Instead of plain stripping the markup do above xpath on your document
> > and create different fields. Like
> >   > select="/book/content/text()"/>
> >  
> > 
> > Makes sense?
> 
> Yes, but I have documents with different schemas on the same "xml field", 
> also, 
> that way I  would have to know the schema of the documents being indexed 
> (which 
> I don't).
> 
> The schema I use is something like:
> 
> 
> 
> Where each distinct DocumentType has its own schema.
> 
> I could revise this approach to use an Solr instance for each DocumentType 
> but I 
> would have to find a way to "merge" results from the different instances 
> because 
> I also need to search across different DocumentTypes... I guess I'm SOL :-(
> 

I think you should explain your use case a wee bit more.

>>> What I do now to index XML documents it's to use a Filter to strip
the markup, 
> >> this works but it's impossible to know where in the document is the match 
> >> located.

why do you need to know where? 

Maybe we can think of something.

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)




Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Thorsten Scherler
On Tue, 2007-01-16 at 13:56 +0100, Bertrand Delacretaz wrote:
> On 1/16/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote:
> 
> > ...Have a look at
> > https://issues.apache.org/jira/browse/SOLR-86...
> 
> Right, I should have mentioned this one as well. I have linked SOLR-20
> and SOLR-86 now, so that people can see the various options for Java
> clients.

Cheers, mate. :)

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)




Re: solr + cocoon problem

2007-01-16 Thread Thorsten Scherler
On Tue, 2007-01-16 at 16:02 -0500, [EMAIL PROTECTED] wrote:
> Hi,
> 
> I am trying to implement a cocoon based application using solr for searching.
> In particular, I would like to forward the request from my response page to
> solr.  I have tried several alternatives, but none of them worked for me.
> 

Please see http://wiki.apache.org/solr/SolrForrest.

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)




Re: Apostrophes in fields

2007-01-16 Thread Nick Jenkin

Using the fuzzy searching fixed the problem - I will have a play with
the analzyers and see if I can get it working nicely.

Thanks again, much apreciated.

On 1/17/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: This problem is why some sloppiness is recommended when dealing with
: WordDelimiterFilter.

particularly when using the generate___Parts="true" options

Nick: if you want simpler matching like this, you might want to consider
simplifying your definition of "text" ... if you look at the "textTight"
fieldtype in the example shema (used by the field "sku") you'll see a
simpler usage of WordDelimiterFilter ... alternately you may just want to
use lucene's basic StandardAnalzyer ... i believe it strips Apostrophes.

as a real last resort, you could use the recently added
PatternReplaceFilter to strip out apostrophe's prior to
WordDelimiterFilter (if you like everything WordDelim does for you except
spliting on apostrophes)

:   - optionally index ohara at *both* "o" and "hara"

then searching for "Shelley ohara memorial" fails without unless yo have
slop .. if you need slop, you might as well not index it twice (not to
mention it throws off the tf/idf calculations)

:   - pick the "alignment" based on the token position in the stream...
: right-justify the catenations if it's the first token, otherwise
: left-justify.  One could try to identify proper names and do the
: justification correctly too (blech).

oh for the love of god please no.



-Hoss





--
- Nick


Re: solr + cocoon problem

2007-01-16 Thread Thorsten Scherler
On Tue, 2007-01-16 at 16:19 -0500, Walter Lewis wrote:
> [EMAIL PROTECTED] wrote:
> > Any ideas on how to implement a cocoon layer above solr?

I just finished a forrest plugin (in the whiteboard, our testing ground
in forrest) that is doing what you asked for and some pagination.
Forrest is cocoon based so you just have to build the plugin jar and add
it to your cocoon project. Please ask on the forrest list if you have
problems.

http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/

> You're far from the only one approaching solr via cocoon ... :)
> 
> The approach we took, passes the search parameters to a "solrsearch" 
> stylesheet, the heart of which is a  block that embeds the 
> solr results.  A further transformation prepares the results of the solr 
> query for display.

That was my first version for above plugin as well, but since forrest
makes use of the cocoon crawler I needed something with a default search
string for offline generation.

You should have a closer look on 
http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/output.xmap?view=markup
and 
http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/input.xmap?view=markup

For the original use case of this thread I added a generator:



and as well a paginator transformer that calculates the next pages based on 
start, rows and numFound:

 

We use it as follows:


  


  





  


  

You may be interested in the update generator as well. 

Please give feedback to [EMAIL PROTECTED] 

It really needs more testing besides myself, you could be the first to provide 
feedback.




  

  


  

HTH

salu2
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)




Re: Apostrophes in fields

2007-01-16 Thread Chris Hostetter

: This problem is why some sloppiness is recommended when dealing with
: WordDelimiterFilter.

particularly when using the generate___Parts="true" options

Nick: if you want simpler matching like this, you might want to consider
simplifying your definition of "text" ... if you look at the "textTight"
fieldtype in the example shema (used by the field "sku") you'll see a
simpler usage of WordDelimiterFilter ... alternately you may just want to
use lucene's basic StandardAnalzyer ... i believe it strips Apostrophes.

as a real last resort, you could use the recently added
PatternReplaceFilter to strip out apostrophe's prior to
WordDelimiterFilter (if you like everything WordDelim does for you except
spliting on apostrophes)

:   - optionally index ohara at *both* "o" and "hara"

then searching for "Shelley ohara memorial" fails without unless yo have
slop .. if you need slop, you might as well not index it twice (not to
mention it throws off the tf/idf calculations)

:   - pick the "alignment" based on the token position in the stream...
: right-justify the catenations if it's the first token, otherwise
: left-justify.  One could try to identify proper names and do the
: justification correctly too (blech).

oh for the love of god please no.



-Hoss



Re: Apostrophes in fields

2007-01-16 Thread Yonik Seeley

On 1/16/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

> It appears to be matching author:"Shelley Ohara" but when I do this
> search no results are returned, searches like author:"Shelley O hara",
> author:"Shelley O'hara" work as expected. Any ideas?


This problem is why some sloppiness is recommended when dealing with
WordDelimiterFilter.

"Shelley Ohara"~1 should work.


Hmm, shouldn't "ohara" be generated at the same position as "o", not
"hara"?  It looks like it is failing to do exact phrase matching
because the index contains "shelley o (ohara|hara)"


The problem is, if you do it one way, the other way breaks.  If you
index "ohara" with "o", then a field like "O'hara Shelley" wouldn't
match a query like "oraha shelly".

There are a few possible options:
 - optionally index ohara at *both* "o" and "hara"
 - pick the "alignment" based on the token position in the stream...
right-justify the catenations if it's the first token, otherwise
left-justify.  One could try to identify proper names and do the
justification correctly too (blech).

-Yonik


Re: Apostrophes in fields

2007-01-16 Thread Mike Klaas

On 1/16/07, Nick Jenkin <[EMAIL PROTECTED]> wrote:

Hi Jeff, Bertrand
THanks for your help,

The analyzers I am using are the same as in the example schema.xml
Author field:

analysis result:
http://nickjenkin.com/misc/solr.jpg

It appears to be matching author:"Shelley Ohara" but when I do this
search no results are returned, searches like author:"Shelley O hara",
author:"Shelley O'hara" work as expected. Any ideas?


Hmm, shouldn't "ohara" be generated at the same position as "o", not
"hara"?  It looks like it is failing to do exact phrase matching
because the index contains "shelley o (ohara|hara)"

-Mike


Re: Apostrophes in fields

2007-01-16 Thread Nick Jenkin

Hi Jeff, Bertrand
THanks for your help,

The analyzers I am using are the same as in the example schema.xml
Author field:

analysis result:
http://nickjenkin.com/misc/solr.jpg

It appears to be matching author:"Shelley Ohara" but when I do this
search no results are returned, searches like author:"Shelley O hara",
author:"Shelley O'hara" work as expected. Any ideas?
Thanks
-Nick

On 1/16/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 1/16/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> Nick - this depends on the analyzer used to index the field as well as the
> analyzer used in your search query

Note that the Solr "analysis" page, in the admin interface, allows you
to see exactly how your field's content is converted for indexing.
There's an example at http://www.xml.com/lpt/a/1668 in the "Content
Analysis" part of the article.

-Bertrand




--
- Nick


Re: solr + cocoon problem

2007-01-16 Thread Walter Lewis

[EMAIL PROTECTED] wrote:

Any ideas on how to implement a cocoon layer above solr?

You're far from the only one approaching solr via cocoon ... :)

The approach we took, passes the search parameters to a "solrsearch" 
stylesheet, the heart of which is a  block that embeds the 
solr results.  A further transformation prepares the results of the solr 
query for display.


The latest rewrite is getting more complicated as we work in flowscript 
to manipulate the values more before presenting them to solr, but the 
heart of the solution is below.


Walter


 From the sitemap.xmap =
   
   
   
   
   
   
   
   
   
   
   

=== From solrsearch.xsl 
[assuming parameters of q, start and rows]

   
   
http://localhost:8080/solr/select?q=select='$q' />&start=/>&rows=

   
   




solr + cocoon problem

2007-01-16 Thread mirko
Hi,

I am trying to implement a cocoon based application using solr for searching.
In particular, I would like to forward the request from my response page to
solr.  I have tried several alternatives, but none of them worked for me.

One which would seem a logical way to me is to have response page, which is
forwarded to solr with cocoon's file generator.  It works fine if I perform
queries which contain only alphanumeric characters, but it gives the following
error if I try to query for a string containing nonalphanum characters:

http://hostname/cocoon/mywebapp/response?q=a+b

java.io.IOException: Server returned HTTP response code: 505 for URL:
http://hostname/solr/select/?q=a b


The interesting thing is that if I access http://hostname/solr/select/?q=a b
directly it works.


The relevant part of my sitemap.xmap:


  http://hostname/solr/select/?q={request-param:q}";
type="file" >
  
  


Any ideas on how to implement a cocoon layer above solr?

thanks,
mirko

ps. I realize this question might be more of a cocoon question, but I am
posting it here because I have gotten the idea from
http://wiki.apache.org/solr/XsltResponseWriter to use cocoon on top of solr) 
So, I assume some of you have already had run into similar issues and/or knows
the solution...


Re: separate log files

2007-01-16 Thread Chris Hostetter

: I wonder of jetty or tomcat can be configured to put logging output
: for different webapps in different log files...

i've never tried it, but the tomcat docs do talk about
tomcat providing a custom implimentation of java.util.logging specificly
for this purpose.

Ben: please take a look at this doc...

http://tomcat.apache.org/tomcat-5.5-doc/logging.html

..specifically the section on java.util.logging (since that's what Solr
uses) ... I believe you'll want something like the "Example
logging.properties file to be placed in common/classes" so that you can
control the logging.

Please let us all know if this works for you ... it would make a great
addition to the SolrTomcat wiki page.


: On 1/15/07, Ben Incani <[EMAIL PROTECTED]> wrote:
: > Hi Solr users,
: >
: > I'm running multiple instances of Solr, which all using the same war
: > file to load from.
: >
: > Below is an example of the servlet context file used for each
: > application.
: >
: >  debug="0" crossContext="true" >
: >  value="/var/local/app1" override="true" />
: > 
: >
: > Hence each application is using the same
: > WEB-INF/classes/logging.properties file to configure logging.
: >
: > I would like to each instance to log to separate log files such as;
: > app1-solr.-mm-dd.log
: > app2-solr.-mm-dd.log
: > ...
: >
: > Is there an easy way to append the context path to
: > org.apache.juli.FileHandler.prefix
: > E.g.
: > org.apache.juli.FileHandler.prefix = ${catalina.context}-solr.
: >
: > Or would this require a code change?
: >
: > Regards
: >
: > -Ben
:



-Hoss



Re: XML querying

2007-01-16 Thread Yonik Seeley

On 1/15/07, Luis Neves <[EMAIL PROTECTED]> wrote:

Yes, but I have documents with different schemas on the same "xml field", also,
that way I  would have to know the schema of the documents being indexed (which
I don't).


Solr and Lucene don't really support indexing structured data such as
XML... people are looking at ways to add flexible indexing to Lucene
so that XML indexing could be supported.  When that happens, then
we'll figure out how to fit that into Solr.

There are also XML databases out there, but performance currently
isn't great from what I've heard.

-Yonik


Re: separate log files

2007-01-16 Thread Yonik Seeley

I wonder of jetty or tomcat can be configured to put logging output
for different webapps in different log files...

-Yonik

On 1/15/07, Ben Incani <[EMAIL PROTECTED]> wrote:

Hi Solr users,

I'm running multiple instances of Solr, which all using the same war
file to load from.

Below is an example of the servlet context file used for each
application.





Hence each application is using the same
WEB-INF/classes/logging.properties file to configure logging.

I would like to each instance to log to separate log files such as;
app1-solr.-mm-dd.log
app2-solr.-mm-dd.log
...

Is there an easy way to append the context path to
org.apache.juli.FileHandler.prefix
E.g.
org.apache.juli.FileHandler.prefix = ${catalina.context}-solr.

Or would this require a code change?

Regards

-Ben


Re: Internationalization

2007-01-16 Thread Bess Sadler

Hi, Jörg.

At the Tibetan Himalayan Digital Library, we are working with XML  
files that have fields that might be in Tibetan, Chinese, Nepalese,  
or English. Our solr schema.xml file looks like this:


   stored="true" multiValued="true"/>
   stored="true" multiValued="true"/>
   stored="true" multiValued="true"/>
   stored="true" multiValued="true"/>


I run all of our XML data through a XSL transformation that puts it  
in solr indexable form and also figures out what language a field is  
in and gives it an appropriate name, e.g., "location_eng" or  
"formalname_tib". So far this is working very well for us.


Currently, we are assigning all fields, no matter what language to  
type string, defined as




This does string matching very well, but doesn't do any stop words,  
or stemming, or anything fancy. We are toying with the idea of a  
custom Tibetan indexer to better break up the Tibetan into discrete  
words, but for this particular project (because it mostly has to do  
with proper names, not long passages of text) this hasn't been a  
problem yet, and the above solution seems to be doing the trick.


I hope this helps.

Good luck!

Bess

On Jan 16, 2007, at 10:23 AM, Jörg Pfründer wrote:


Hello,

is there anyone who has experience on internationalization  
(internationalisation) with SOLR?


How do you setup a multi language data index?  Should we use a  
dynamic field like text_en, text_fr, text_es?


Is there a GermanPorterFilterFactory or FrenchPorterFilterFactory?

Thank you very much.

Jörg Pfründer

_
Gratis Emailpostfach mit 2 GB Speicher -
10 SMS - http://www.xemail.de
Spam? mailto:[EMAIL PROTECTED]



Elizabeth (Bess) Sadler
Head, Technical and Metadata Services
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]
(434) 243-2305




Re: Converting Solr response back to pojo's, experiences?

2007-01-16 Thread Ken Krugler
Anyone having experience converting xml responses back to pojo's, 
which technologies have you used?


We started off using a regular DOM parser and coding it by hand, but 
have switched to XStream. This (with some help handling the Solr <-> 
pojo mappings) seems to work fine once you get past a few encoding 
issues.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"


Re: Internationalization

2007-01-16 Thread Bertrand Delacretaz

Hi Jorg,

On 1/16/07, Jörg Pfründer <[EMAIL PROTECTED]> wrote:

...is there anyone who has experience on internationalization 
(internationalisation) with SOLR?...


I've been setting up a french language index in the last months, and
it works very well.

There are some pointers on how to analyze French text in my article at
xml.com (see http://wiki.apache.org/solr/SolrResources).


...How do you setup a multi language data index?  Should we use a dynamic field 
like
text_en, text_fr, text_es?...


Yes, I don't think you can currently mix languages in the same field,
so having fields named after the language might be the easiest.


Is there a GermanPorterFilterFactory or FrenchPorterFilterFactory?...


The SnowballFilterFactory now supports a language parameter, see
http://issues.apache.org/jira/browse/SOLR-27

Hope this helps,
-Bertrand


Internationalization

2007-01-16 Thread J�rg Pfr
Hello,

is there anyone who has experience on internationalization 
(internationalisation) with SOLR?

How do you setup a multi language data index?  Should we use a dynamic field 
like text_en, text_fr, text_es?

Is there a GermanPorterFilterFactory or FrenchPorterFilterFactory?

Thank you very much.

Jörg Pfründer

_
Gratis Emailpostfach mit 2 GB Speicher -
10 SMS - http://www.xemail.de
Spam? mailto:[EMAIL PROTECTED]




Can this be achieved? (Was: document support for file system crawling)

2007-01-16 Thread Eivind Hasle Amundsen
First: Please pardon the cross-post to solr-user for reference. I hope 
to continue this thread in solr-dev. Please answer to solr-dev.



1) more documentation (and posisbly some locking configuration options) on
how you can use Solr to access an index generated by the nutch crawler (i
think Thorsten has allready done this) or by Compass, or any other system
that builds a Lucene index.


Thorsten Scherler? Is this code available anywhere? Sounds very 
interesting to me. Maybe someone could ellaborate on the differences 
between the indexes created by Nutch/Solr/Compass/etc., or point me in 
the direction of an answer?



2) "contrib" code that runs as it's own process to crawl documents and
send them to a Solr server. (mybe it parses them, or maybe it relies on
the next item...)


Do you know FAST? It uses a step-by-step approach ("pipeline") in which 
all of these tasks are done. Much of it is tuned in a easy web tool.


The point I'm trying to make is that contrib code is nice, but a 
"complete package" with these possibilities could broaden Solr's appeal 
somewhat.



3) Stock "update" plugins that can each read a raw inputstreams of a some
widely used file format (PDF, RDF, HTML, XML of any schema) and have
configuration options telling them them what fields in the schema each
part of their document type should go in.


Exactly, this sounds more like it. But if similar inputstreams can be 
handled by Nutch, what's the point in using Solr at all? The http API's? 
 In other words, both Nutch and Solr seem to have functionality that 
enterprises would want. But neither gives you the "total solution".


Don't get it wrong, I don't want to bloat the products, even though it 
would be nice to have a crossover solution which is easy to set up.


The architecture could look something like this:

Connector -> Parser -> DocProc -> (via schema) -> Index

Possible connectors: JDBC, filesystem, crawler, manual feed
Possible parsers: PDF, whatever

Both connectors, parsers AND the document processors would be plugins. 
The DocProcs would typically be adjusted for each enterprise' needs, so 
that it fits with their schema.xml.


Problem is; I haven't worked enough with Solr, Nutch, Lucene etc. to 
really know all possibilities and limitations. But I do believe that the 
outlined architecture would be flexible and answer many needs. So the 
question is:


What is Solr missing? Could parts of Nutch be used in Solr to achieve 
this? How? Have I misunderstood completely? :)


Eivind


Re: Converting Solr response back to pojo's, experiences?

2007-01-16 Thread Thorsten Scherler
On Tue, 2007-01-16 at 14:58 +0100, [EMAIL PROTECTED] wrote:
> Anyone having experience converting xml responses back to pojo's,  
> which technologies have you used?
> 
> Anyone doing json <-> pojo's?

Using pure xml myself but have a look at 
https://issues.apache.org/jira/browse/SOLR-20
and 
https://issues.apache.org/jira/secure/attachment/12348567/solr-client.zip

HTH
salu2

> 
> Grtz
> 



Converting Solr response back to pojo's, experiences?

2007-01-16 Thread maarten
Anyone having experience converting xml responses back to pojo's,  
which technologies have you used?


Anyone doing json <-> pojo's?

Grtz



Re: question: optimize

2007-01-16 Thread James liu

thank u.



2007/1/16, Stephanie Belton <[EMAIL PROTECTED]>:


Hi,

You need to send  rather than  (closing the tag)

HTH
Steph






--
regards
jl


RE: question: optimize

2007-01-16 Thread Stephanie Belton
Hi,

You need to send  rather than  (closing the tag)

HTH
Steph




Re: question: optimize

2007-01-16 Thread James liu

error information :"Exception during commit/optimize: java.io.EOFException:
no more data available-expected end tag  to close start
tag  from line 1, paser stopped on START_TAG seen ..."

It work well and i check index data, and search.

i wanna know why it happen and how to fix it.

anyone with same question?

i use jetty.




2007/1/16, James liu <[EMAIL PROTECTED]>:


I find "Exception during commit/optimize: java.io.EOFException: no more
data" when i index my data.




--
regards
jl





--
regards
jl


question: optimize

2007-01-16 Thread James liu

I find "Exception during commit/optimize: java.io.EOFException: no more
data" when i index my data.




--
regards
jl


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote:


...Have a look at
https://issues.apache.org/jira/browse/SOLR-86...


Right, I should have mentioned this one as well. I have linked SOLR-20
and SOLR-86 now, so that people can see the various options for Java
clients.

-Bertrand


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, Pavel Penchev <[EMAIL PROTECTED]> wrote:


...What about the case where solr and my application are deployed in the
same instance of say tomcat. Is there a way to skip the http requests
and use a direct api?...


The javax.servlet.RequestDispatcher interface allows you to access
other resources (including servlets) running in the same container.
I've never used it but it looks like what you'd need (including a
custom HttpServletResponse class to capture the other servlet's
output).

See http://java.sun.com/j2ee/1.4/docs/tutorial/doc/Servlets9.html#wp64684
which is part of
http://java.sun.com/j2ee/1.4/docs/tutorial/doc/index.html

Depending on how much faster this is than going the http way, it might
be interesting to include it as another protocol in a Java Solr
client.

-Bertrand


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Thorsten Scherler
On Tue, 2007-01-16 at 12:52 +0100, [EMAIL PROTECTED] wrote:
> Thanks!
> 
> and how would you do it calling it from another web application, let's  
> say from a servlet or so? I need to do some stuff in my web java code,  
> then call the Solr service and do some more stuff afterwards
> 

Have a look at 
https://issues.apache.org/jira/browse/SOLR-86

HTH

salu2




Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Pavel Penchev

A newbie question on the same topic:
What about the case where solr and my application are deployed in the 
same instance of say tomcat. Is there a way to skip the http requests 
and use a direct api?


Regards,
Pavel



Bertrand Delacretaz wrote:
On 1/16/07, [EMAIL PROTECTED] 
<[EMAIL PROTECTED]> wrote:


...and how would you do it calling it from another web application, 
let's

say from a servlet or so?...


Doesn't make much difference if your client is a standalone or a web
application: you Solr client class will need to be configured with the
base URL of the Solr server, it will make HTTP requests to it and
parse the results as needed.

-Bertrand






Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


...and how would you do it calling it from another web application, let's
say from a servlet or so?...


Doesn't make much difference if your client is a standalone or a web
application: you Solr client class will need to be configured with the
base URL of the Solr server, it will make HTTP requests to it and
parse the results as needed.

-Bertrand


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread maarten

Thanks!

and how would you do it calling it from another web application, let's  
say from a servlet or so? I need to do some stuff in my web java code,  
then call the Solr service and do some more stuff afterwards



Quoting Bertrand Delacretaz <[EMAIL PROTECTED]>:

On 1/16/07, [EMAIL PROTECTED]   
<[EMAIL PROTECTED]> wrote:



...Could someone give me some code examples on how Solr requests can be
called by Java code...


Although our Java client landscape is still a bit fuzzy (there are
several variants floating around), you might want to look at the code
found in http://issues.apache.org/jira/browse/SOLR-20

If you're new to Java, I'd recommend playing with HttpClient first
(http://jakarta.apache.org/commons/httpclient/), see the tutorial
there for the basics.

The standard Java library classes are also usable to write HTTP
clients, but HttpClient will help a lot in getting the "details"
right, if you don't mind depending on that library.

-Bertrand






Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


...Could someone give me some code examples on how Solr requests can be
called by Java code...


Although our Java client landscape is still a bit fuzzy (there are
several variants floating around), you might want to look at the code
found in http://issues.apache.org/jira/browse/SOLR-20

If you're new to Java, I'd recommend playing with HttpClient first
(http://jakarta.apache.org/commons/httpclient/), see the tutorial
there for the basics.

The standard Java library classes are also usable to write HTTP
clients, but HttpClient will help a lot in getting the "details"
right, if you don't mind depending on that library.

-Bertrand


Calling Solr requests from java code - examples?

2007-01-16 Thread maarten

Hi,

Could someone give me some code examples on how Solr requests can be  
called by Java code. I'm new to Java and I'm not very sure on how URLs  
+ params can be called from java code and how the responses can be  
captured. Or what th best practices are?


Grtz