Re: Simple Faceted Searching out of the box

2006-09-10 Thread Tim Archambault

For those using PHP to interface with can you explain to me how your PHP
code interacts with Solr? Does PHP create a query_string manually and post
an URL like this:
http://localhost:8983/solr/select?q=vertical%3Ajobs+accountingversion=2.1start=0rows=10fl=qt=standardstylesheet=indent=onexplainOther=hl.fl=
for example then using some PHP command to read a webpage, it then parses
it?

I'm not much of a programmer, but I do know Coldfusion so I'm trying to
apply the PHP principles to CF.

Thanks for any and all help.

Tim


On 9/10/06, Erik Hatcher [EMAIL PROTECTED] wrote:



On Sep 9, 2006, at 9:09 AM, Tim Archambault wrote:
 I need to understand this then. Thanks. I want to use Solr for our
 newspaper
 website and this would be a great way to break out content. Kind of
 greys
 the lines between what is search and what is browsing categories,
 which is a
 great thing actually. Thanks for the help.

greys the lines indeed.  there isn't any difference between search
and browse in my view now.  let's just call it findability :)  (by
the way, Ambient Findability is a fantastic book)

   Erik




Re: Simple Faceted Searching out of the box

2006-09-10 Thread Chris Hostetter

:   What is faceted browsing? Maybe an example of a site interface

Whoops! ... sorry about that, i tend to get ahead of my self.

The examples Erik pointed out are very representative, but there are more
subtle ways faceted searching can come into play -- for example, if you
look at these two search results...

   http://shopper-search.cnet.com/search?q=gta
   http://shopper-search.cnet.com/search?q=ipod

...the categories in the left nav change based on what you search on,
because we treat category as a facet, and the individual categories as
possible constraints ... we don't show the user the exact count of how
many products match in each category but we use that information to
determine the order of the categories (or wether we should include a
category in the list at all)

: website and this would be a great way to break out content. Kind of greys
: the lines between what is search and what is browsing categories, which is a
: great thing actually. Thanks for the help.

Even without facets, browsing a set of documents is just a search for
all docuemnts (or depending on who you talk to: searching is just
browsing with a special user entered constraint on the text facet)




-Hoss



Re: Got it working! And some questions

2006-09-10 Thread Chris Hostetter

: - What is the loadFactor variable of HashDocSet? Should I optimize it too?

this is the same as the loadFactor in a HashMap constructor -- but i don't
think it has much affect on performance since the HashDocSets never
grow.

I personally have never tuned the loadFactor :)

: - What's the units on the size value of the caches? Megs, number of
: queries, kilobytes? Not described anywhere.

entries ... the number of items allowed in the cache.

: - Any way to programatically change the OR/AND preference of the query
: parser? I set it to AND by default for user queries, but i'd like to set
: it to OR for some server-side queries I must do (find related articles,
: order by score).

you mean using StandardRequestHandler? ... not that i can think of off the
top of my head, but typicaly it makes sense to just configure what you
want for your users in the schema, and then make any machine generated
queries be explicit.

: - Whats the difference between the 2 commits type? Blocking and
: non-blocking. Didn't see any differences at all, tried both.

do you mean the waitFlush and waitSearcher options?
if either of those is true, you shouldn't get a response back from the
server untill they have finished.  if they are false, then the server
should respond instantly even if it takes several seconds (or maybe even
minutes) to complete the operation (optimizes can take a while in some
cases -- as can opening newSearchers if you have a lot of cache warming
configured)

: - Every time I do an optimize command, I get the following in my
: catalina logs - should I do anything about it?

the optimize command needs to be well formed XML, try optimize/
instead of just optimize

: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.

the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get that your caches never fill up, it may not matter.




-Hoss



Re: Re: IIS web server and Solr integration

2006-09-10 Thread Jeff Rodenburg

Tim -

If you can help it, I would suggest running Solr under Tomcat under Linux.
Speaking from experience in a mixed mode environment, the Linux/Tomcat/Solr
implementation just works.  We're not newbies under Linux, but we're also a
native Windows shop.  The memory management and system availability is just
outstanding in that stack.

If you must run Windows, Tomcat does integrate with IIS, but be prepared to
jump through a few hoops.  Spend time on making that combination work, and
you'll be 90% there

Hope this helps.

-- j

On 9/10/06, Tim Archambault [EMAIL PROTECTED] wrote:


Good news. The rookie did just that. Thanks Chris. Just having a
difficult time how to send my query parameters to the engine from
Coldfusion [intelligently]. I'm going to download the PHP app and see
if I can figure it out. Having lots of fun with this for sure.

Tim

On 9/10/06, Chris Hostetter [EMAIL PROTECTED] wrote:

 : Should it run on a separate port than IIS or integrated using ISAPI
plug-in?

 I can't make any specific recomendations about Windows or IIS, but i
 personally wouldn't Run Solr in the same webserver/appserver that your
 users hit -- from a security standpoint, i would protect your solr
 instance the same way you would protect a database, let the applications
 running in your webserver connect to it and run queries against it, but
 don't expose it to the outside world directly.


 -Hoss





Re: Got it working! And some questions

2006-09-10 Thread Michael Imbeault
First of all, it seems the mailing list is having some troubles? Some of 
my posts end up in the wrong thread (even new threads I post), I don't 
receive them in my mail, and they're present only in the 'date archive' 
of http://www.mail-archive.com, and not in the 'thread' one? I don't 
receive some of the other peoples post in my mail too, problems started 
last week I think.


Secondly, Chris, thanks for all the useful answers, everything is much 
clearer now. This info should be added to the wiki I think; should I do 
it? I'm still a little disappointed that I can't change the OR/AND 
parsing by just changing some parameter (like I can do for the number of 
results returned, for example); adding a OR between each word in the 
text i want to compare sounds suboptimal, but i'll probably do it that 
way; its a very minor nitpick, solr is awesome, as I said before.


@ Brian Lucas: Don't worry, solrPHP was still 99.9% functional, great 
work; part of it sending a doc at a time was my fault; I was following 
the exact sequence (add to array, submit) displayed in the docs. The 
only thing that could be added is a big //TODO: change this code 
before sections you have to change to make it work for a particular 
schema. I'm pretty sure the custom header curl submit works for everyone 
else than me; I'm on a windows test box with WAMP on it, so it may be 
caused by that. I'll send you tomorrow the changes I done to the code 
anyway; as I said, nothing major.


Chris Hostetter wrote:

: - What is the loadFactor variable of HashDocSet? Should I optimize it too?

this is the same as the loadFactor in a HashMap constructor -- but i don't
think it has much affect on performance since the HashDocSets never
grow.

I personally have never tuned the loadFactor :)

: - What's the units on the size value of the caches? Megs, number of
: queries, kilobytes? Not described anywhere.

entries ... the number of items allowed in the cache.

: - Any way to programatically change the OR/AND preference of the query
: parser? I set it to AND by default for user queries, but i'd like to set
: it to OR for some server-side queries I must do (find related articles,
: order by score).

you mean using StandardRequestHandler? ... not that i can think of off the
top of my head, but typicaly it makes sense to just configure what you
want for your users in the schema, and then make any machine generated
queries be explicit.

: - Whats the difference between the 2 commits type? Blocking and
: non-blocking. Didn't see any differences at all, tried both.

do you mean the waitFlush and waitSearcher options?
if either of those is true, you shouldn't get a response back from the
server untill they have finished.  if they are false, then the server
should respond instantly even if it takes several seconds (or maybe even
minutes) to complete the operation (optimizes can take a while in some
cases -- as can opening newSearchers if you have a lot of cache warming
configured)

: - Every time I do an optimize command, I get the following in my
: catalina logs - should I do anything about it?

the optimize command needs to be well formed XML, try optimize/
instead of just optimize

: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.

the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get that your caches never fill up, it may not matter.




-Hoss
  

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212