Re: Making stemming dynamic at query time

2007-12-18 Thread Bertrand Delacretaz
On Dec 18, 2007 9:41 PM, Kamran Shadkhast [EMAIL PROTECTED] wrote:

 ...it would be great if we could dynamiclly control this during
 search if we want to search with stemming or not

The easiest is probably to have two copies of your field, using
copyField, one stemmed and one not, and search in one or the other.

-Bertrand


Re: Which terms in the query match

2007-10-17 Thread Bertrand Delacretaz
On 10/16/07, Nishant Soni [EMAIL PROTECTED] wrote:

 ...So is there a way to query solr about which of the tokens in the query
 actually matched ?...

The analyzer admin page should help, see
http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9

-Bertrand


Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote:

 ..when we search for matthé or for matthe, we get two totally
 different results

The analyzer admin tool should help you find out what's happening, see
http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9

-Bertrand


Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote:

 ...Thank you very much. Moving the filter class=
 solr.ISOLatin1AccentFilterFactory/ up in the chain fixed it

Yes, the problem was the EnglishPorterFilterFactory before the accents
removal: the stemmer doesn't know about accents, so no stemming
occured on matthé whereas matthe was stemmed to matth.

BTW, your rené example makes me think you're indexing french, if
that's the case you might want to use a stemmer configured for that
language, for example

filter
  class=Solr.SnowballPorterFilterFactory
  language=French/

-Bertrand


Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thorsten Scherler [EMAIL PROTECTED] wrote:
 ...Betrand, does the French Snowball work fine?...

I've seen some weirdnesses, like tennis and tenir (means to hold)
both stemmed to ten, but in all of our (simple) tests it was ok.

The application where we're using it does not require high precision
though, so it looked good enough and we didn't do create very
extensive tests for it.

-Bertrand


Re: SOLR developer

2007-08-30 Thread Bertrand Delacretaz
On 8/31/07, Tim Archambault [EMAIL PROTECTED] wrote:

 ...I'm thinking of sending a similar
 list-serv item out, but I noticed this is a solr-user list, not necessarily
 a developers list so I thought I'd ask

Note that there's also [EMAIL PROTECTED] for such purposes, see
http://www.apachenews.org/archives/000465.html

But AFAIK, project-related job offers are ok on ASF lists, preferably
with a [JOB] marker in the subject line.

-Bertrand (*not* available for consulting ATM, and currently inactive
on Solr anyway)


Re: solr question

2007-07-21 Thread Bertrand Delacretaz

On 7/21/07, Alessandro Ferrucci [EMAIL PROTECTED] wrote:


... the user could enter the following combinations of words:
... WORD WORD
...where the second instance is either last-name first-name OR
first-name last-name. ...


The dismax handler can indeed search terms in several fields, but I'd
also suggest, as an alternative, copying all names to an additional
allnames field at indexing time. This is done using copyfield in
you schema.xml, see http://wiki.apache.org/solr/SchemaXml and the Solr
example schema.xml.

You can then search in this allnames field when you don't know if
terms belong to the first or last names, and also easily combine this
with other searches, boost it, etc.

-Bertrand


Re: LIUS/Fulltext indexing

2007-06-12 Thread Bertrand Delacretaz

On 6/12/07, Yonik Seeley [EMAIL PROTECTED] wrote:


... I think Tika will be the way forward (some of the code for Tika is
coming from LIUS)...


Work has indeed started to incoroporate the Lius code into Tika, see
https://issues.apache.org/jira/browse/TIKA-7 and
http://incubator.apache.org/projects/tika.html

-Bertrand


Re: LIUS/Fulltext indexing

2007-06-12 Thread Bertrand Delacretaz

On 6/12/07, Vish D. [EMAIL PROTECTED] wrote:

...Sounds interesting. I can't seem to find any clear dates on the project
website. Do you know? ...V1 shipping date?...


Not at the moment, Tika just entered incubation and it's impossible to
predict what will happen.

But help is welcome, of course ;-)

-Bertrand


Re: how to crawl when Solr is search engine?

2007-06-07 Thread Bertrand Delacretaz

On 6/7/07, Ian Holsman [EMAIL PROTECTED] wrote:


. it's called XSLT. most modern browsers can do the transform on the
client side.
otherwise there is some server side tools (cocoon I think does this) to
do the transform on the server before sending it out


Solr also does server-side XSLT, see
http://wiki.apache.org/solr/XsltResponseWriter

-Bertrand


Re: Solr in Windows

2007-04-26 Thread Bertrand Delacretaz

On 4/26/07, guruprasad [EMAIL PROTECTED] wrote:


...Is it only for Linux or can I install
Solr on my Windows Desktop too?...


Solr itself should run fine on any JVM 1.5, including Windows (and
several Solr developers are working on Windows IIUC).

Some of our docs refer to auxiliary scripts that do not run under
plain windows.

The SimplePostTool described in
http://lucene.apache.org/solr/tutorial.html helps, it's not released
yet but you can get it from
https://issues.apache.org/jira/browse/SOLR-194

-Bertrand


Re: Re[2]: Things are not quite stable...

2007-04-25 Thread Bertrand Delacretaz

On 4/25/07, Jack L [EMAIL PROTECTED] wrote:


...Maybe it's time to think about upgrading Jetty...


It's in the pipeline, see https://issues.apache.org/jira/browse/SOLR-128

-Bertrand


Re: Re[6]: Things are not quite stable...

2007-04-25 Thread Bertrand Delacretaz

On 4/25/07, Jack L [EMAIL PROTECTED] wrote:


...Regardless, I think it's a good idea to use a newer, released (not RC)
version in general, considering 5.1 is one major version behind


Agreed, but note that we don't have any factual evidence that the
Jetty RC that we use is indeed the cause of SOLR-118, so upgrading
might not solve the problem.

We're just at the wild guess stage at this point, and many of us have
never seen the problem. In my case, we have more urgent stuff to do
before looking at the problem in more detail.

-Bertrand


Re: snapshooter on OS X

2007-04-22 Thread Bertrand Delacretaz

On 4/23/07, Grant Ingersoll [EMAIL PROTECTED] wrote:

...The error says something about command not found line 15, but all the
files I looked at, line 15 was a comment...


Running your script with

 bash -x myscript

should help, it will echo commands before executing them.

-Bertrand


Re: finalizer() in SolrCore (was: Commits and Container Shutdown)

2007-04-16 Thread Bertrand Delacretaz

On 4/16/07, Yonik Seeley [EMAIL PROTECTED] wrote:


...Yes, it's a typo.


Fixed in revision 529367.

-Bertrand


finalizer() in SolrCore (was: Commits and Container Shutdown)

2007-04-15 Thread Bertrand Delacretaz

On 4/16/07, Erik Hatcher [EMAIL PROTECTED] wrote:


...Further details on this: SolrCore has a finalizer() method that
closes the update handler.  I'm not clear on finalizer() though.  How/
when is that invoked?   I know about Object.finalize(), but not
finalizer()...


Looking at the code, it seems like SolrCore.finalizer() is not called
anywhere. A typo maybe?

There's also a similar SolrIndexWriter.finalizer().

-Bertrand


Re: Solr Query Language

2007-04-15 Thread Bertrand Delacretaz

On 4/16/07, Jack L [EMAIL PROTECTED] wrote:


Is the lucene query syntax available in solr? ...


The syntax depends on the request handler used, if you're using the
standard one the docs are at

 http://wiki.apache.org/solr/StandardRequestHandler

-Bertrand


Re: Posting PDF,DOC,TXT

2007-04-06 Thread Bertrand Delacretaz

On 4/6/07, Suresh Kannan [EMAIL PROTECTED] wrote:

I would like to post PDF, DOC, TXT into SOLR to do the indexing.


There's no way to do that directly at the moment, you'll need to
convert them to the XML format that Solr expects.

The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ lists a
number of tools that can help extract content and metadata from
various formats.

-Bertrand


Re: Solr logo poll

2007-04-06 Thread Bertrand Delacretaz

On 4/6/07, Yonik Seeley [EMAIL PROTECTED] wrote:


...What form of logo do you prefer, A or B?


B

-Bertrand (a Tex Avery fan ;-)


Re: Instructables on solr

2007-04-05 Thread Bertrand Delacretaz

On 4/4/07, Ryan McKinley [EMAIL PROTECTED] wrote:


...We have been running solr for months as a band-aid, this release
integrates solr deeply...


Awesome - thanks for sharing this!

If you don't mind, it'd be cool to add some info to
http://wiki.apache.org/solr/PublicServers

-Bertrand


Re: Reposting unABLE to match

2007-03-27 Thread Bertrand Delacretaz

On 3/27/07, Shridhar Venkatraman [EMAIL PROTECTED] wrote:

...Reposting unABLE to match

No need to repost if your message made it to the list.

If it hasn't been answered yet, it either means that no one knows the
answer or that no one has had the time to answer yet. We're all
volunteers here.

-Bertrand


Re: schema field type doesn't work

2007-03-24 Thread Bertrand Delacretaz

On 3/24/07, Dimitar Ouzounov [EMAIL PROTECTED] wrote:


...I must be doing something wrong, maybe in the schema. Does anyone
have any suggestions?..


The best way to debug such problems is with the analyzer admin tool:
http://localhost:8983/solr/admin/analysis.jsp

You can try various combinations of analyzers and see what Solr
actually indexes for various values.

HTH,
-Bertrand


Re: How to assure a permanent index.

2007-03-21 Thread Bertrand Delacretaz

On 3/21/07, Thierry Collogne [EMAIL PROTECTED] wrote:


...I mean if I do the following.

 -  delete all documents from the index
 -  add all documents
 -  do a commit.

Will this result in a temporary empty index, or will I always have results?...


Changes to the index are invisible to the search components until a
commit/ is sent to Solr, so you should be fine (although personally
I'd feel safer replacing documents in smaller batches).

You could also use the index switching mechanism used when
replicating Solr indexes (see
http://wiki.apache.org/solr/CollectionDistribution) to prepare the
index in another Solr instance and activate it instantly when needed.

-Bertrand


Re: Problems with special characters

2007-03-21 Thread Bertrand Delacretaz

On 3/21/07, Thierry Collogne [EMAIL PROTECTED] wrote:


...I am using the post.jar file to update the search indexes. Problem is that
foreign characters like é, à, ... don't work correctly...


You're right, I have entered the issue in
https://issues.apache.org/jira/browse/SOLR-194

For now, using this as a workaround should help:

java -Dfile.encoding=UTF-8 -jar post.jar
http://localhost:8983/solr/update utf8-example.xml

-Bertrand


Re: Problems with special characters

2007-03-21 Thread Bertrand Delacretaz

On 3/21/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote:


...For now, using this as a workaround should help:

java -Dfile.encoding=UTF-8 -jar post.jar
http://localhost:8983/solr/update utf8-example.xml..


Should be fixed now, if you can grab the latest SimplePostToolCode [1]
it should work irrelevant of the default JVM encoding. Please confirm
if you test it.

It's a kind of brute force fix, I have hardcoded the encoding as
UTF-8, I'm keeping SOLR-194 open so that we don't forget to fix this
(but considering SOLR-190 it's not urgent to fix).

-Bertrand

[1] 
https://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java


Re: Date range boost

2007-03-12 Thread Bertrand Delacretaz

On 3/12/07, stefano nicolai [EMAIL PROTECTED] wrote:


...All of these items have a field containing the date they were created
(it's a string field at the moment, as i have this type inside my DB).

I want to give a higher score to the ones with the most recent date...


You should be able to use boost functions for this, see for example
http://www.mail-archive.com/solr-user@lucene.apache.org/msg01877.html

and

http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#parseFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema)

-Bertrand


Re: production solr - app server choice ?

2007-03-10 Thread Bertrand Delacretaz

On 3/9/07, rubdabadub [EMAIL PROTECTED] wrote:


...The site is a local portal and the traffic is very high and I am not
sure if Jetty is enough maybe it is


Just an additional note on this: asking four people about what very
high traffic means might also give you five different answers ;-)

FWIW, I've been testing Solr on the plain Jetty example config at more
than 100 semi-random queries per second and it ran just fine, on a
medium-range server (dual Xeon 2Ghz IIRC).

But this is with our data and our type of queries - I agree with Erik
that testing is the only way to find out how your setup will perform
with your own data and queries.

Simply generating a lot of semi-random requests from a collection of
possible query parameters, and feeding the resulting URLs to multiple
instances of curl or wget to generate some load, will tell you a lot
about how your setup performs, and where the hotspots are.

-Bertrand


Re: Adding data as UTF-8

2007-03-10 Thread Bertrand Delacretaz

On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote:

It is better to use application/xml. See RFC 3023.
Using text/xml; charset=UTF-8 will override the XML
encoding declaration. application/xml will not...


I agree, but did you try this with our example setup, started with
java -jar start.jar?

It doesn't seem to work here: If I change our example/exampledocs/post.sh to use

  curl $URL --data-binary @$f -H 'Content-type:application/xml'

instead of

 curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'

the encoding declaration of my posted XML is ignored, characters are
interpreted according to my JVM encoding (-Dfile.encoding makes a
difference in that case).

Are you seeing something different, or do you know why this is so?

-Bertrand


Re: Adding data as UTF-8

2007-03-10 Thread Bertrand Delacretaz

On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote:

If it does something different, that is a bug. RFC 3023 is clear. --wunder..


Sure - just wanted to confirm what I'm seeing, thanks!

-Bertrand


Re: production solr - app server choice ?

2007-03-09 Thread Bertrand Delacretaz

On 3/9/07, rubdabadub [EMAIL PROTECTED] wrote:


...I am wondering what everyone is using when it comes to app server i.e.
Jetty, Resin, Tomcat etc


I suspect that asking four people might give you five different
answers on this one ;-)

Whichever servlet container you use, IMHO the important thing is to
learn to know how to tune it according to your needs, traffic
patterns, hardware and software environment, etc.

-Bertrand


Re: Error with bin/optimize and multiple solr webapps

2007-03-07 Thread Bertrand Delacretaz

On 3/7/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:

Oops, my bad I didn't see either 186 or 187 before entering 188.  :-)


I have closed SOLR-186 and SOLR-187 as duplicates, please add relevant
info to SOLR-188 if needed.

-Bertrand


Re: merely a suggestion: schema.xml validator or better schema validation logging

2007-03-03 Thread Bertrand Delacretaz

On 3/3/07, Ryan McKinley [EMAIL PROTECTED] wrote:


...The rationale with the solrconfig stuff is that a broken config should
behave as best it can.  This is great if you are running a real site
with people actively using it - it is a pain in the ass if you are
getting started and don't notice errors


I think it's a PITA in any case, I like my systems to fail loudly when
something's wrong in the configs (with details about what's happening,
of course).

-Bertrand


Re: merely a suggestion: schema.xml validator or better schema validation logging

2007-03-01 Thread Bertrand Delacretaz

On 3/2/07, Jed Reynolds [EMAIL PROTECTED] wrote:


...my first try at defining a schema.xml file was tough because my
only feedback for a long time was NullPointerException from SolrCore
when I was trying to add content...


Can you give us enough information to reproduce the problem? What was
wrong in your schema, exactly?

Please indicate also which version of Solr you used.

-Bertrand


Re: MoreLikeThis and term vectors - documentation suggestion

2007-02-26 Thread Bertrand Delacretaz

On 2/26/07, Ken Krugler [EMAIL PROTECTED] wrote:


...I was trying out the MoreLikeThis support, and getting some odd results...


Thanks for the info, I have added a link to your message at
https://issues.apache.org/jira/browse/SOLR-69

-Bertrand


Re: Tagging

2007-02-14 Thread Bertrand Delacretaz

On 2/14/07, Erik Hatcher [EMAIL PROTECTED] wrote:


...Sorry if I'm sending things mangled somehow - and if anyone has
suggestions on correcting I'm all ears


For long links I tend to use http://tinyurl.com/, but it's a bit
painful to do that for all links.

-Bertrand


Re: Incremental replication...

2007-02-13 Thread Bertrand Delacretaz

On 2/13/07, escher2k [EMAIL PROTECTED] wrote:


...Atleast from looking at the snapshooter script, it doesn't
seem to be doing anything specific...


The snapshooter script only makes an instant snapshot of the index
directory using cp -lr. This does not involve any copying of index
data.

The actual replication is done using rsync in the other scripts, by
copying the index snapshot elsewhere.

Rsync only copies what has changed since the last copy, and not many
files change in a Lucene index when adding documents, so it's correct
that replication uses little bandwidth when adding documents.

Index optimization, OTOH, causes much larger changes in the index
directory, so after an optimization rsync will usually have much more
data to transfer.

-Bertrand


Re: performance testing practices

2007-02-05 Thread Bertrand Delacretaz

On 2/5/07, Erik Hatcher [EMAIL PROTECTED] wrote:

...What numbers are folks capturing?  What techniques are you using to
capture numbers?...


I've been using my httpstone utility
(http://code.google.com/p/httpstone/) along with ab
(http://httpd.apache.org/docs/2.2/programs/ab.html) to generate many
concurrent search requests, based on semi-random query URLs generated
by shell scripts.

The goal was to find out, on our hardware, how many typical queries
per second we could serve with acceptable response times (less than
2.5 seconds).

In our case, we found out that 100-200 requests per second were not a
problem, and stopped testing as this is much more than we need
currently. So I don't have precise numbers, but we know that we're
safe with our current load.

HTH, but it's more empirical than structured testing ;-)

-Bertrand


Re: MoreLikeThis similarity-type queries in Solr

2007-01-31 Thread Bertrand Delacretaz

On 1/31/07, Brian Whitman [EMAIL PROTECTED] wrote:

Does Solr have support for the Lucene query-contrib MoreLikeThis
query type or anything like it? ...


Yes, there's a patch in http://issues.apache.org/jira/browse/SOLR-69 -
if you try it, please add your comments on that page.

-Bertrand


Re: MoreLikeThis similarity-type queries in Solr

2007-01-31 Thread Bertrand Delacretaz

On 1/31/07, Andrew Nagy [EMAIL PROTECTED] wrote:


... Yes, there's a patch in http://issues.apache.org/jira/browse/SOLR-69 -...

Anyword on something like this being incorporated into the official SOLR
release?


The patch is quite simple, I think we could commit it soon if the
other committers agree.

What's missing are unit tests, I'll try to write them next week unless
someone  beats me to it (I'm quite busy with other stuff ATM).

-Bertrand


Re: How to Index Word, Excel, PDF files?

2007-01-29 Thread Bertrand Delacretaz

On 1/29/07, Leandro Saad [EMAIL PROTECTED] wrote:

...I'd like to know if solr can index Word, Excel and PDF files or I must
create a xml representation of those files matching my schema?...


Currently you must create the XML yourself outside of Solr.

This might change, see https://issues.apache.org/jira/browse/SOLR-104
and the recent related update plugins discussions.

-Bertrand


Re: Split one string into many fields

2007-01-22 Thread Bertrand Delacretaz

On 1/22/07, Yonik Seeley [EMAIL PROTECTED] wrote:

...When we get to it, I'd like to hear why it (things like PDF parsing)
should be inside Solr rather than outside using our update interfaces


Same here.

I haven't had time to follow the recent (rich) design discussions
about this stuff, but if I was designing this, I'd put all the
document processing code in a separate module (separate servlet?) and
keep the Solr core lean and mean, with as thin an interface as
possible.

-Bertrand


Re: Document freshness and Boost Functions

2007-01-17 Thread Bertrand Delacretaz

On 1/17/07, Luis Neves [EMAIL PROTECTED] wrote:


...I see that is possible to use
Boost Functions to influence the score. How would that work in order to
improve the score of recent documents? (I have a timestamp field in the
schema)...


I've been using expressions like these in boolean queries, based on  a
broadcast_date field:

_val_:linear(recip(rord(broadcast_date),1,1000,1000),11,0)

Where recip computes an age-based score, and linear is used to boost it.

See 
http://incubator.apache.org/solr/docs/api/org/apache/solr/search/QueryParsing.html,
and also the list archives, these functions have been discussed
before.

I'm not sure off the top of my head how to use this with dismax queries though.

-Bertrand


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


...Could someone give me some code examples on how Solr requests can be
called by Java code...


Although our Java client landscape is still a bit fuzzy (there are
several variants floating around), you might want to look at the code
found in http://issues.apache.org/jira/browse/SOLR-20

If you're new to Java, I'd recommend playing with HttpClient first
(http://jakarta.apache.org/commons/httpclient/), see the tutorial
there for the basics.

The standard Java library classes are also usable to write HTTP
clients, but HttpClient will help a lot in getting the details
right, if you don't mind depending on that library.

-Bertrand


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


...and how would you do it calling it from another web application, let's
say from a servlet or so?...


Doesn't make much difference if your client is a standalone or a web
application: you Solr client class will need to be configured with the
base URL of the Solr server, it will make HTTP requests to it and
parse the results as needed.

-Bertrand


Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz

On 1/16/07, Pavel Penchev [EMAIL PROTECTED] wrote:


...What about the case where solr and my application are deployed in the
same instance of say tomcat. Is there a way to skip the http requests
and use a direct api?...


The javax.servlet.RequestDispatcher interface allows you to access
other resources (including servlets) running in the same container.
I've never used it but it looks like what you'd need (including a
custom HttpServletResponse class to capture the other servlet's
output).

See http://java.sun.com/j2ee/1.4/docs/tutorial/doc/Servlets9.html#wp64684
which is part of
http://java.sun.com/j2ee/1.4/docs/tutorial/doc/index.html

Depending on how much faster this is than going the http way, it might
be interesting to include it as another protocol in a Java Solr
client.

-Bertrand


Re: Faceted Dates

2007-01-09 Thread Bertrand Delacretaz

On 1/9/07, Ryan McKinley [EMAIL PROTECTED] wrote:

...I would like to use faceted browsing to group documents by year,
month, and day.  I can think of a few ways to do this, but I'd like to
see what folks think before i start down the wrong track


Dunno if you've already read it, but I found this page interesting
when it comes to date queries, it might give you some additional
ideas:

 http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing

-Bertrand


Re: Handling disparate data sources in Solr

2006-12-23 Thread Bertrand Delacretaz

On 12/23/06, Alan Burlison [EMAIL PROTECTED] wrote:

...As well as centralising the index, I also want
to centralise the handling of the different document types...


My Subversion and Solr presentation from the last Cocoon GetTogether
might give you ideas for how to handle this, see the link at
http://wiki.apache.org/solr/SolrResources.

Although it does not handle all binary formats out of the box (might
need to write some java glue code to implement new formats), Cocoon is
a good tool for transforming various document formats to XML and
filter the results to generate the appropriate XML for Solr. I
wouldn't add functionality to Solr for doing this, it's best to keep
things loosely-coupled IMHO.

-Bertrand


Re: Opinions wanted about a new Solr logo (SOLR-58)

2006-12-18 Thread Bertrand Delacretaz

On 12/18/06, Linda Tan [EMAIL PROTECTED] wrote:

I just learned no attachments are allowed on this list. I've put the
image in the jira..


Thanks, it looks good indeed!
-Bertrand


Re: post the output of a URL to solr

2006-11-30 Thread Bertrand Delacretaz

On 11/30/06, Mike Klaas [EMAIL PROTECTED] wrote:


...Try something like:

wget http://localhost:/gaz/solr/f0.xml -O - | curl
http://localhost:8983/solr/update --data-binary - -H
'Content-type:text/xml; charset=utf-8'


and if you use curl you can use it on both sides to avoid the
dependency on both tools:

 curl http://localhost:/gaz/solr/f0.xml | curl ...

-Bertrand


Re: Solr and Oracle

2006-11-24 Thread Bertrand Delacretaz

On 11/23/06, Nicolas St-Laurent [EMAIL PROTECTED] wrote:


...I index huge Oracle tables with Lucene with a custom made
indexer/search engine. But I would prefer to use Solr instead...


Instead of using Lucene's API directly, with Solr you'll have to add
your documents to the index using HTTP POST messages.

There are a few Java clients for Solr floating around on the wiki and
in Jira IIRC, but you just need a POST, any way of doing it is fine
(using jakarta httpclient for example).

See http://wiki.apache.org/solr/SolrResources for more info.

-Bertrand


Re: Extending Solr's Admin functionality

2006-09-24 Thread Bertrand Delacretaz

On 9/24/06, Erik Hatcher [EMAIL PROTECTED] wrote:


...perhaps some authentication/
authorization as well as HTTPS should eventually make it into the
core, but getting more fine grained is unnecessary...


If meaningful URLs are used (admin/stats, admin/config,
admin/analysis, etc.), it is relatively easy to use either the servlet
container or something like mod_proxy to implement security. Designing
a good URL scheme might remove the need to address security concerns
at the Solr level.

-Bertrand


Re: Re: Doc add limit

2006-07-28 Thread Bertrand Delacretaz

On 7/28/06, Yonik Seeley [EMAIL PROTECTED] wrote:


...Getting all the little details of connection handling correct can be
tough... it's probably a good idea if we work toward common client
libraries so everyone doesn't have to reinvent them


Jakarta's HttpClient [1] is IMHO a good base for Java clients, and
it's easy to use, see the PostXML example in [2].

-Bertrand

[1] http://jakarta.apache.org/commons/httpclient/

[2] 
http://svn.apache.org/viewvc/jakarta/commons/proper/httpclient/trunk/src/examples/PostXML.java?revision=410848view=markup


Re: Re: Cyrillic characters

2006-07-19 Thread Bertrand Delacretaz

On 7/19/06, Tricia Williams [EMAIL PROTECTED] wrote:


...What I called the _solr url encoding_ was the q= parameter
translated into I'm not sure what encoding in the url...


I think I've seen the same problem, haven't investigated deeper but
IIUC the encoding used when posting a form is related to both the
encoding indicated by the web server in the HTTP headers, and the
encoding indicated (optionally) in the HTML page with something like
meta content=text/html; charset=UTF-8 http-equiv=content-type/

In my case I've found that, running SOLR from start.jar with default settings:

-If I search désormais from the solr/admin page, it is translated to
q=d%E9sormais in the URL, and nothing's found (the word is in my
index)

-If I replace the q= value with q=d%C3%A9sormais (which is the
encoding that I get when entering this word in the Google search
form), my query works

I haven't seen the problem with my own search form, which includes the
above http-equiv meta and is served as a static page from my web
server.

So I think something's wrong with the encoding on the solr/admin/
search page, but I haven't investigated further.

Hope this helps...not sure if it does but the above scenario looks
similar to yours.

-Bertrand