[Solrj] Documentation & SolrServer#Ping

2007-08-09 Thread franz see
Good day, Where can I find some documentation of Solrj? Does it have a wiki page or something? I am currently trying it out and I did a simple ping to see if it works. new CommonsHttpSolrServer( url ).ping(); However, I am getting a "Exception in thread "main" org.apache.solr.common.SolrExce

RE: Best use of wildcard searches

2007-08-09 Thread Jonathan Woods
Maybe there's a different way, in which path-like values like this are treated explicitly. I use a similar approach to Matthew at www.colfes.com, where all pages are generated from Lucene searches according to filters on a couple of hierarchical categories ('spaces'), i.e. subject and organisation

Re: [newbie] how to debug the schema?

2007-08-09 Thread Franz Allan Valencia See
Good day, danc86 of #lucene gave me the answer - I was not storing the fields :-) Thanks, Franz On 8/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > > > > > [QUESTION] > > What could be the problem? .Or what else can I do to debug this problem? > > > > In general 'luke' is a great tool to f

RE: Multivalued fields and the 'copyField' operator

2007-08-09 Thread Lance Norskog
If we have a field spellcheck_db, and have two lines for it: ... Basically the type without stemming... All I want to do is make a pile of words as input to the spellcheck feature. If I index with this, the spellcheck Analyser class c

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Norberto Meijome
On Thu, 9 Aug 2007 15:23:03 -0700 "Lance Norskog" <[EMAIL PROTECTED]> wrote: > Underlying this all, you have a sneaky network performance problem. Your > successive posts do not reuse a TCP socket. Obvious: re-opening a new socket > each post takes time. Not obvious: your server has sockets buildi

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Sean Timm
It should probably be configurable: (1) return nothing if no match, (2) substitute with an alternate field, (3) return first sentence or N number of tokens. -Sean Yonik Seeley wrote on 8/9/2007, 5:50 PM: > On 8/9/07, Benjamin Higgins <[EMAIL PROTECTED]> wrote: > > Thanks Mike. I didn't thin

Re: tomcat and solr multiple instances

2007-08-09 Thread Pieter Berkel
The current working directory (Cwd) is the directory from which you started the Tomcat server and is not dependent on the Solr instance configurations. So as long as SolrHome is correct for each Solr instance, you shouldn't have a problem. cheers, Piete On 10/08/07, Jae Joo <[EMAIL PROTECTED]>

Re: Multivalued fields and the 'copyField' operator

2007-08-09 Thread Yonik Seeley
On 8/9/07, Lance Norskog <[EMAIL PROTECTED]> wrote: > I'm adding a field to be the source of the spellcheck database. Since that > is its only job, it has raw text lower-cased, de-Latin1'd, and > de-duplicated. > > Since it is only for the spellcheck DB, it does not need to keep duplicates. Dupli

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Lance Norskog
Jython is a Python interpreter implemented in Java. (I have a lot of Python code.) Total throughput in the servlet is very sensitive to the total number of servlet sockets available v.s. the number of CPUs. The different analysers have very different performance. You might leave some data in the

Re: Returning a list of matching words

2007-08-09 Thread Yonik Seeley
On 8/9/07, Thiago Jackiw <[EMAIL PROTECTED]> wrote: > This may be obvious but I can't get my head straight. Is there a way > to return a list of matching words that a record got matched against? Unfortunately no... lucene doesn't provide that capability with standard queries. You could do it (slow

RE: tomcat and solr multiple instances

2007-08-09 Thread Jae Joo
Here are the Catalina/localhost/ files For "example" instance For ca_companies instance Urls http://host:8080/solr/admin --> pointint "example" instance (Problem...) http://host:8080/solr_ca/admin --> pointing "ca-companies" instance (it is working) -Original Message- From: Ja

EmbeddedSolr and optimize

2007-08-09 Thread Sundling, Paul
http://wiki.apache.org/solr/EmbeddedSolr Following the example on connecting to the Index directly without using HTTP, I tried to optimize by passing the true flag to the CommitUpdateCommand. When optimizing an index with Lucene directly it doubles the size of the index temporarily and then del

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Yonik Seeley
On 8/9/07, Benjamin Higgins <[EMAIL PROTECTED]> wrote: > Thanks Mike. I didn't think of creating a blurb beforehand, but that's > a great solution. I'll probably do that. Yonik, I can still add a JIRA > issue if you'd like, though. Always 10 different ways to tackle the same problem in the sear

tomcat and solr multiple instances

2007-08-09 Thread Jae Joo
Hi, I have built 2 solr instance - one is "example" and the other is "ca_companies". The "ca_companies" solr instance is working find, but "example is not working... In the admin page, "/solr/admin", for "example" instance, it shows that Cwd=/rpt/src/apache-solr-1.2.0/ca_companies/s

Is it possible to know from where in the field highlighed text comes from?

2007-08-09 Thread Benjamin Higgins
Hi again, It'd be nice to know what the starting line number is for highlighted snippets. I imagine others might find it useful to know the starting byte offset. Is there an easy way to add this in? I'm not afraid of hacking the source if it's not too involved. Thanks. Ben

RE: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Benjamin Higgins
Thanks Mike. I didn't think of creating a blurb beforehand, but that's a great solution. I'll probably do that. Yonik, I can still add a JIRA issue if you'd like, though. Ben -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, August 09, 2007 2:32 PM To: solr

Multivalued fields and the 'copyField' operator

2007-08-09 Thread Lance Norskog
I'm adding a field to be the source of the spellcheck database. Since that is its only job, it has raw text lower-cased, de-Latin1'd, and de-duplicated. Since it is only for the spellcheck DB, it does not need to keep duplicates. I specified it as 'multiValued="false" and used from a few other

Returning a list of matching words

2007-08-09 Thread Thiago Jackiw
This may be obvious but I can't get my head straight. Is there a way to return a list of matching words that a record got matched against? For instance: record_a: ruby, solr, mysql, rails record_b: solr, java Then ?q=solr+OR+rails would return the matched words for the records record_a: solr, ra

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Mike Klaas
On 9-Aug-07, at 2:10 PM, Benjamin Higgins wrote: Hi all, I'd like to provide a blurb of documents matching a search in the case when there is no text highlighted. I assumed that perhaps the highlighter would give me back the first few words in a document if this occurred, but it doesn't. M

Re: Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Yonik Seeley
On 8/9/07, Benjamin Higgins <[EMAIL PROTECTED]> wrote: > Hi all, I'd like to provide a blurb of documents matching a search in > the case when there is no text highlighted. I assumed that perhaps the > highlighter would give me back the first few words in a document if this > occurred, but it does

Creating a document blurb when nothing is returned from highlight feature

2007-08-09 Thread Benjamin Higgins
Hi all, I'd like to provide a blurb of documents matching a search in the case when there is no text highlighted. I assumed that perhaps the highlighter would give me back the first few words in a document if this occurred, but it doesn't. My conundrum is that I'd rather not grab the whole docume

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% > 3EMen's%20Apparel% > 3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python > > The same exact query, with... wait.. > > Wow. I'm making myself look like an idiot. > > I

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% 3EMen's%20Apparel% 3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python The same exact query, with... wait.. Wow. I'm making myself look like an idiot. I swear that these queries didn't work the first time I ran them.

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > Feel free to run some queries yourself. We opened the firewall for > this box... > > http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% > 3EMen's\%20Apparel% > 3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python OK, so

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Feel free to run some queries yourself. We opened the firewall for this box... http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel% 3EMen's\%20Apparel% 3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python ++ |

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Hm, I don't see any attachments, I'm forwarding them to you directly. Would anyone else like to see them? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 +--

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Sure thing! Heres 1, and 2. 1 - just a space. 2 - a "\ ". ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 9, 2007, at 1:14 PM, Yonik Seeley

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > They translate to different queries. > But can I see the XML output for 1 and 2 with &debugQuery=on&indent=on > appended? Or perhaps with wt=python would be less confusing seeing that there are '>' chars in there that would otherwise be escaped

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > Yes, we've reindexed several times. Here are three sample result sets.. > > 1 - ?q=department_exact:Apparel>Men's? > Apparel>Jackets*&fq=country_code:US&fq=brand_exact:adidas > 2 - ?q=department_exact:Apparel>Men's\ > Apparel>Jackets*&fq=country_

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Yes, we've reindexed several times. Here are three sample result sets.. 1 - ?q=department_exact:Apparel>Men's? Apparel>Jackets*&fq=country_code:US&fq=brand_exact:adidas 2 - ?q=department_exact:Apparel>Men's\ Apparel>Jackets*&fq=country_code:US&fq=brand_exact:adidas 3 - ?q=department_exact:Appa

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, Kevin Holmes <[EMAIL PROTECTED]> wrote: > Python script queries the mysql DB then calls bash script > > Bash script performs a curl POST submit to solr For the most up-to-date solr client for python, check out https://issues.apache.org/jira/browse/SOLR-216 -Yonik

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Kevin Holmes
Is this a native feature, or do we need to get creative with scp from one server to the other? If it's a contention between search and indexing, separate them via a query-slave and an index-master. --cw

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > Here you go.. I thought that "string" wasn't munged, so I used that... > > > stored="true"/> > Hmmm, that looks ok. You re-indexed since department_exact was added? If so, could you show the exact XML response containing a document with depa

Re: question: how to divide the indexing into sperate domains

2007-08-09 Thread Yonik Seeley
Hmmm, I think you can map an empty (zero length) value to something else via f.foo.map=:something But that column does currently need to be there in the CSV. Specifying default values in a per-request basis is interesting, and something we could perhaps support in the future. The quickest way to i

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Here you go.. I thought that "string" wasn't munged, so I used that... stored="true"/> ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 9,

Re: Best use of wildcard searches

2007-08-09 Thread Yonik Seeley
On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > Hmm.. I just tried the following three queries... > > /?q=department_exact:Apparel>Men's? > Apparel>Jackets*&fq=country_code:US&fq=brand_exact:adidas... > (no results) > > /?q=department_exact:Apparel>Men's\ > Apparel>Jackets*&fq=country_code:US&

Re: Too many open files

2007-08-09 Thread Mike Klaas
On 9-Aug-07, at 7:52 AM, Ard Schrijvers wrote: ulimit -n 8192 Unless you have an old, creaky box, I highly recommend simply upping your filedesc cap. -Mike

Synonym questions

2007-08-09 Thread Tom Hill
Hi - Just looking at synonyms, and had a couple of questions. 1) For some of my synonyms, it seems to make senses to simply replace the original word with the other (e.g. "theatre" => "theater", so searches for either will find either). For others, I want to add an alternate term while preserving

always fail to update the first time after I restart the server

2007-08-09 Thread Xuesong Luo
Hi, I noticed the first index update after I restart my jboss server always fail with the exception below. Any update after that works fine. Does anyone know what the problem is? The solr version I'm using is solr1.2 Thanks Xuesong 2007-08-09 11:41:44,559 ERROR [STDERR] Aug 9, 2007 11:41:44 AM

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, Siegfried Goeschl <[EMAIL PROTECTED]> wrote: > +) my colleague just finished a database import service running within > the servlet container to avoid writing out the data to the file system > and transmitting it over HTTP. Most people doing this read data out of the database and constr

Re: Best use of wildcard searches

2007-08-09 Thread Matthew Runo
Hmm.. I just tried the following three queries... /?q=department_exact:Apparel>Men's? Apparel>Jackets*&fq=country_code:US&fq=brand_exact:adidas... (no results) /?q=department_exact:Apparel>Men's\ Apparel>Jackets*&fq=country_code:US&fq=brand_exact:adidas... (no results) /?q=Apparel>Men's\

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Siegfried Goeschl
Hi Kevin, I'm also a newbie but some thoughts along the line ... +) for evaluating SOLR we used a less exotic setup for data import base on Pnuts (a JVM based scripting language) ... :-) ... but Groovy would do as well if you feel at home with Java. +) my colleague just finished a database i

RE: Too many open files

2007-08-09 Thread Stu Hood
If you check out the documentation for mergeFactor, you'll find that adjusting it downward can lower the number of open files. Just remember that it is a speed tradeoff, and only lower it as much as you need to to stop getting the "too many files" errors. See this section: http://www.onjava.c

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 8/9/07, David Whalen <[EMAIL PROTECTED]> wrote: > > Plus, I have to believe there's a faster way to get documents > > into solr/lucene than using curl Oh yeah, and by "curl" I assume you meant HTTP in general. You certainly don't want to

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Yonik Seeley
On 8/9/07, David Whalen <[EMAIL PROTECTED]> wrote: > Plus, I have to believe there's a faster way to get documents > into solr/lucene than using curl One issue with HTTP is latency. You can get around that by adding multiple documents per request, or by using multiple threads concurrently. Y

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Clay Webster
If it's a contention between search and indexing, separate them via a query-slave and an index-master. --cw On 8/9/07, David Whalen <[EMAIL PROTECTED]> wrote: > > What we're looking for is a way to inject *without* using > curl, or wget, or any other http-based communication. We'd > like for th

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Brian Whitman
On Aug 9, 2007, at 11:12 AM, Kevin Holmes wrote: 2: Is there a way to inject into solr without using POST / curl / http? Check http://wiki.apache.org/solr/EmbeddedSolr There's examples in java and cocoa to use the DirectSolrConnection class, querying and updating solr w/o a web serve

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Tobin Cataldo
(re)building the index separately (ie. on a different computer) and then replacing the active index may be an option. David Whalen wrote: What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the HTTP daemon to only handle

RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread David Whalen
What we're looking for is a way to inject *without* using curl, or wget, or any other http-based communication. We'd like for the HTTP daemon to only handle search requests, not indexing requests on top of them. Plus, I have to believe there's a faster way to get documents into solr/lucene than u

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Clay Webster
Condensing the loader into a single executable sounds right if you have performance problems. ;-) You could also try adding multiple s in a single post if you notice your problems are with tcp setup time, though if you're doing localhost connections that should be minimal. If you're already local

Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Kevin Holmes
I inherited an existing (working) solr indexing script that runs like this: Python script queries the mysql DB then calls bash script Bash script performs a curl POST submit to solr We're injecting about 1000 records / minute (constantly), frequently pushing the edge of our CPU / RAM limit

RE: Too many open files

2007-08-09 Thread Ard Schrijvers
Hello, useCompoundFile set to true, should avoid the problem. You could also try to set maximum open files higher, something like (I assume linux) ulimit -n 8192 Ard > > You're a gentleman and a scholar. I will donate the M&Ms to > myself :). > Can you tell me from this snippet of my solrc

RE: Too many open files

2007-08-09 Thread Kevin Holmes
You're a gentleman and a scholar. I will donate the M&Ms to myself :). Can you tell me from this snippet of my solrconfig.xml what I might tweak to make this more betterer? -KH false 10 1000 2147483647 1 1000 1

question: how to divide the indexing into sperate domains

2007-08-09 Thread Ben Shlomo, Yatir
Hi! say I have 300 csv files that I need to index. Each one holds millions of lines (each line is a few fields separated by commas) Each csv file represents a different domain of data (e,g, file1 is computers, file2 is flowers, etc) There is no indication of the domain ID in the data insid

RE: Too many open files

2007-08-09 Thread Jonathan Woods
You could try committing updates more frequently, or maybe optimising the index beforehand (and even during!). I imagine you could also change the Solr config, if you have access to it, to tweak indexing (or index creation) parameters - http://wiki.apache.org/solr/SolrConfigXml should be of use to

Too many open files

2007-08-09 Thread Kevin Holmes
java.io.FileNotFoundException: /usr/local/bin/apache-solr/enr/solr/data/index/_16ik.tii (Too many open files) When I'm importing, this is the error I get. I know it's vague and obscure. Can someone suggest where to start? I'll buy a bag of M&Ms (not peanut) for anyone who can help me solve t

Re: Best use of wildcard searches

2007-08-09 Thread Erick Erickson
I just saw an e-mail from Yonik suggesting escaping the space. I know so little about Solr that all I can do is parrot Yonik... Erick On 8/8/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > > OK. > > So a followup question.. > > ?q=department_exact:Apparel%3EMen's% > 20Apparel*&fq=country_code:US&fq

RE: Best use of wildcard searches

2007-08-09 Thread Pierre-Yves LANDRON
Hello I'm exactly in the same situation as you. I've got some structured subject ( as subjects:main subject/sub subject/sub sub subject ) and want to search them as litteral from a given level (subjects:main subject/*). As you know subjects:"main subject/"* doesn't work (but it should, shouldn't

RE: Retrieving a float field

2007-08-09 Thread Seema Khandkar
That worked. I had to get the schema, get the the FieldType, also get the Fieldable object from the document, then use fieldType.toExternal(fieldable).toString() but it ultimately worked! Thanks for your help, appreciate it. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECT