Re: Storing Related Data - At Different Times

2008-01-21 Thread Gavin
Hi Otis,
Thanks. Was thinking along those lines. But having two indexes will
hurt my search. 

1 . Searching fields that belong only to the personal details should
result in 5 resumes begin shown for the guy (if he has 5). But now it
will only show 1 link to the personal details and no resumes.

2 . Searching fields that belong to the personal details and resume
details will result in 2 sets of results which I will have to manually
combine using text processing. 

Can I avoid doing this?

Thanks,
Gavin



On Sun, 2008-01-20 at 22:52 -0800, Otis Gospodnetic wrote:
 You could have 2 separate indices tied with a common field (a la FK-PK).  
 Then you only need to change the item you are updating.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: Gavin [EMAIL PROTECTED]
 To: solr-user solr-user@lucene.apache.org
 Sent: Monday, January 21, 2008 12:09:23 AM
 Subject: Storing Related Data - At Different Times
 
 Hi,
 In the web application we are developing we have two sets of
  details.
 The personal details and the resume details. We allow 5 different
 resumes to be available for each user. But we want the personal details
 to remain same for each 5 resumes. Personal details are added at
 registration time. After than for each resume we want link personal
 details. This is a simple join in the db. But how do we achieve this in
 Solr. The problem is when personal details are changed we will have to
 update all 5 resumes. 
 
 I read the thread Some sort of join in SOLR?. But not sure this
 answers my problem. Would very much appreciate some sort of help on
  this
 one.
 
 Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for 
the use of the individual or entity to which they are addressed. The content 
and opinions 
contained in this email are not necessarily those of hSenid Software 
International. 
If you have received this email in error please contact the sender.



Re: Term vector

2008-01-21 Thread Grant Ingersoll
Term vectors are, to some extent, the opposite of the inverted index.   
They store term, position and offset (the latter two are optional) on  
a per document basis, such that you can say give me the terms,  
position and offsets for document X.  In terms of MLT, they are used  
to figure out what the most important terms are in a document, such  
that  a new query can be formed to find other documents that are more  
like this document.  They are also useful for highlighting and other  
non-search related activities like clustering, etc.


For more info, see my talk at ApacheCon: http://cnlp.org/presentations/slides/AdvancedLucene.pdf 
   Also, search for term vectors on the Lucene user mailing list (you  
can do this via Nabble)


-Grant

On Jan 20, 2008, at 10:04 PM, anuvenk wrote:



what are term vectors? How do they help with mlt?
--
View this message in context: 
http://www.nabble.com/Term-vector-tp14990408p14990408.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






Newbie with Java + typo

2008-01-21 Thread Daniel Andersson

Hi people

First the typo on http://wiki.apache.org/solr/mySolr:
Production
Typically it's not recommended do have your front end

it should probably be ..recommended To have..



Second, I don't know much about Java, nor about Jetty/Resin/JBoss/ 
Tomcat. I went through the tutorial and was impressed with how easy  
it all seemed. Until the tutorial ended..


As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing  
that comes with the example (Jetty, or?)?


All the installation pages talk about this and that that doesn't make  
much sense to non-Java people like myself :-/


Would be MUCH appreciated with some after-tutorial page for us  
newbies. Right now I'm just looking for something that can be used  
on a production level machine. It doesn't have to be the fastest, as  
long as it's fairly easy to install.


Recommendations and pointers are very welcome :)



Thanks in advance!



/ d


Re: Newbie with Java + typo

2008-01-21 Thread Michael Kimsal
Daniel:

As a fellow 'non-java' person I feel your pain (well, felt it anyway).  A
lot depends on your load and the machine, but I successfully ran the stock
jetty system on a box last summer for work and didn't have performance
problems.  The bigger issue was from the other java people complaining that
I hadn't used the standard jboss setup they had already working.  However, I
didnt' have access to that machine, nor would anyone give it to me at the
time, so it was a catch 22.  Performance-wise, the stock jetty will probably
do just fine for you.  Longer term, you may want to learn more about jboss
or tomcat or something else which can give you more application management
options and such.

But don't let those things stop you from running jetty/solr in production -
it's worked fine for me.


On Jan 21, 2008 10:48 AM, Daniel Andersson [EMAIL PROTECTED] wrote:

 Hi people

 First the typo on http://wiki.apache.org/solr/mySolr:
 Production
 Typically it's not recommended do have your front end

 it should probably be ..recommended To have..



 Second, I don't know much about Java, nor about Jetty/Resin/JBoss/
 Tomcat. I went through the tutorial and was impressed with how easy
 it all seemed. Until the tutorial ended..

 As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing
 that comes with the example (Jetty, or?)?

 All the installation pages talk about this and that that doesn't make
 much sense to non-Java people like myself :-/

 Would be MUCH appreciated with some after-tutorial page for us
 newbies. Right now I'm just looking for something that can be used
 on a production level machine. It doesn't have to be the fastest, as
 long as it's fairly easy to install.

 Recommendations and pointers are very welcome :)



 Thanks in advance!



 / d




-- 
Michael Kimsal
http://webdevradio.com


Multisearching with Solr

2008-01-21 Thread David Pratt
Hi. I am checking out solr after having some experience with lucene 
using pyLucene. I am looking at the potential of solr to search over a 
large index divided over multiple servers to collect results, sort of 
what the parallel multisearcher does in Lucene on its own. From quick 
scan of archives it appears SOLR-303 may be the answer to this. Can this 
functionality be incorporated into 1.2 in a sandbox environment? Has 
anyone written a recipe that would be helpful in getting a sandbox up 
and running with SOLR-303?


It will most likely be a few months before needing to incorporate this 
type of functionality in production but hoping to begin experimenting as 
soon as possible. On that note, is it anticipated that 1.3 will be out 
in a few months. If so, will it include this functionality? Lastly, what 
sort of load balancing and replication potential is anticipated for the 
multisearching capability? Many thanks.


Regards,
David


Re: Newbie with Java + typo

2008-01-21 Thread Ryan McKinley

Daniel Andersson wrote:

Hi people

First the typo on http://wiki.apache.org/solr/mySolr:
Production
Typically it's not recommended do have your front end

it should probably be ..recommended To have..



you can edit any of the wiki pages...  fixing typos is a great contribution!


As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing that 
comes with the example (Jetty, or?)?




Solr is servlet container agnostic -- it should run equally well on any 
of them.  Most people are constrained to use what they are already 
using.  If you really have no preference, perhaps stick with the jetty 
one included in the example.



Would be MUCH appreciated with some after-tutorial page for us newbies. 
Right now I'm just looking for something that can be used on a 
production level machine. It doesn't have to be the fastest, as long as 
it's fairly easy to install.


jetty is fine.  I think otis is using that in http://www.simpy.com/ -- I 
use resin.  Everyone you ask will give you a different answer ;) but the 
three containers that are most used by solr developers are jetty, resin 
an tomcat.


ryan


Re: Multisearching with Solr

2008-01-21 Thread Erick Erickson
You can always use the trunk build, but you'll have to check the
status of SOLR-303 to be sure it's in the trunk...

Here's a thread that discusses this...

http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489

Best
Erick

On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote:

 Hi. I am checking out solr after having some experience with lucene
 using pyLucene. I am looking at the potential of solr to search over a
 large index divided over multiple servers to collect results, sort of
 what the parallel multisearcher does in Lucene on its own. From quick
 scan of archives it appears SOLR-303 may be the answer to this. Can this
 functionality be incorporated into 1.2 in a sandbox environment? Has
 anyone written a recipe that would be helpful in getting a sandbox up
 and running with SOLR-303?

 It will most likely be a few months before needing to incorporate this
 type of functionality in production but hoping to begin experimenting as
 soon as possible. On that note, is it anticipated that 1.3 will be out
 in a few months. If so, will it include this functionality? Lastly, what
 sort of load balancing and replication potential is anticipated for the
 multisearching capability? Many thanks.

 Regards,
 David



Re: Newbie with Java + typo

2008-01-21 Thread Brian Whitman


On Jan 21, 2008, at 11:13 AM, Daniel Andersson wrote:
Well, no. Immutable Page, and as far as I know (english not being  
my mother tongue), that means I can't edit the page



You need to create an account first.


Re: Newbie with Java + typo

2008-01-21 Thread Daniel Andersson

On Jan 21, 2008, at 4:53 PM, Michael Kimsal wrote:

As a fellow 'non-java' person I feel your pain (well, felt it  
anyway).  A
lot depends on your load and the machine, but I successfully ran  
the stock

jetty system on a box last summer for work and didn't have performance
problems.    Performance-wise, the stock jetty will probably
do just fine for you.  Longer term, you may want to learn more  
about jboss
or tomcat or something else which can give you more application  
management

options and such.

But don't let those things stop you from running jetty/solr in  
production -

it's worked fine for me.


Sounds good to me, thanks!

/ d


Re: Multisearching with Solr

2008-01-21 Thread David Pratt
Hi Erick. Thank you for your reply. Unfortunately, I cannot access the 
link you provided. It this message from the solr-user list? Many thanks.


Regards,
David

Erick Erickson wrote:

You can always use the trunk build, but you'll have to check the
status of SOLR-303 to be sure it's in the trunk...

Here's a thread that discusses this...

http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489

Best
Erick

On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote:


Hi. I am checking out solr after having some experience with lucene
using pyLucene. I am looking at the potential of solr to search over a
large index divided over multiple servers to collect results, sort of
what the parallel multisearcher does in Lucene on its own. From quick
scan of archives it appears SOLR-303 may be the answer to this. Can this
functionality be incorporated into 1.2 in a sandbox environment? Has
anyone written a recipe that would be helpful in getting a sandbox up
and running with SOLR-303?

It will most likely be a few months before needing to incorporate this
type of functionality in production but hoping to begin experimenting as
soon as possible. On that note, is it anticipated that 1.3 will be out
in a few months. If so, will it include this functionality? Lastly, what
sort of load balancing and replication potential is anticipated for the
multisearching capability? Many thanks.

Regards,
David





Re: spellcheckhandler

2008-01-21 Thread anuvenk

I did try with the latest nightly build. The problem still exists. 
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano   returns   'ipod nano' which is good

2) with termsourcefield set to 'spell' (which is the catchall field of
'spell' fieldtype according to the tutorial 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
that has my text fields copied in to it at index time)
q=grapics returns 'graphics' which is good
but q=grapics card returns nothing.

Not sure if i'm missing something. Please help!!


Otis Gospodnetic wrote:
 
 You don't need to wait for 1.3 to be released - you can simply use a
 recent nightly build.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: anuvenk [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, January 21, 2008 12:35:52 AM
 Subject: Re: spellcheckhandler
 
 
 I followed the steps outlined in 
 http://wiki.apache.org/solr/SpellCheckerRequestHandler
 with regards to setting up of the schema with a new field 'spell' and
 copying other fields to this 'spell' field at index time.
 It works fine with single word queries but doesn't return anything for
 multi-word queries. I read previous posts where this has been
  discussed. I
 read that some of the active members are in the process of releasing
  patches
 that fixes this problem. I'm actually trying to implement this spell
  check
 in the production set up. Is it absolutely not possible to get spell
  check
 results back for multi-word queries, should i wait for 1.3 release. If
  there
 is any other option please educate me. In case a patch was already
  released,
 how to add it to the current 1.2 version that i'm using?
 -- 
 View this message in context:
  http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p15002379.html
Sent from the Solr - User mailing list archive at Nabble.com.



DisMax and Search Components

2008-01-21 Thread Doug Steigerwald
Is there any support for DisMax (or any search request handlers) in search components, or is that 
something that still needs to be done?  It seems like it isn't supported at the moment.


We want to be able to use a field collapsing component 
(https://issues.apache.org/jira/browse/SOLR-236), but still be able to use our DisMax handlers.


Right now it's one or the other, and we -need- both.

Thanks.
doug


Re: Multisearching with Solr

2008-01-21 Thread Erick Erickson
Yep, it's from the SOLR user list. Well, not really. I mistakenly copied
my gmail url when I was looking at the relevant post, which *of course*
you can't access

http://svn.apache.org/repos/asf/lucene/solr/trunk
or
http://lucene.apache.org/solr/version_control.html


Sorry 'bout that.
Erick


On Jan 21, 2008 11:34 AM, David Pratt [EMAIL PROTECTED] wrote:

 Hi Erick. Thank you for your reply. Unfortunately, I cannot access the
 link you provided. It this message from the solr-user list? Many thanks.

 Regards,
 David

 Erick Erickson wrote:
  You can always use the trunk build, but you'll have to check the
  status of SOLR-303 to be sure it's in the trunk...
 
  Here's a thread that discusses this...
 
 
 http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489
 
  Best
  Erick
 
  On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote:
 
  Hi. I am checking out solr after having some experience with lucene
  using pyLucene. I am looking at the potential of solr to search over a
  large index divided over multiple servers to collect results, sort of
  what the parallel multisearcher does in Lucene on its own. From quick
  scan of archives it appears SOLR-303 may be the answer to this. Can
 this
  functionality be incorporated into 1.2 in a sandbox environment? Has
  anyone written a recipe that would be helpful in getting a sandbox up
  and running with SOLR-303?
 
  It will most likely be a few months before needing to incorporate this
  type of functionality in production but hoping to begin experimenting
 as
  soon as possible. On that note, is it anticipated that 1.3 will be out
  in a few months. If so, will it include this functionality? Lastly,
 what
  sort of load balancing and replication potential is anticipated for the
  multisearching capability? Many thanks.
 
  Regards,
  David
 
 



Re: solr 1.3

2008-01-21 Thread Mike Klaas


On 20-Jan-08, at 5:07 PM, anuvenk wrote:


when will this be released? where can i find the list of
improvements/enhancements in 1.3 if its been documented already?


see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt? 
view=markup


We're not sure on a timeframe for release yet.

-Mike


RE: solr 1.3

2008-01-21 Thread Lance Norskog
 Would somone please consider marking a label on the Subversion repository
that says, This is a clean version? I only do HTTP requests and have no
custom software, so I don't care about internal interfaces changing.

Thanks,

Lance Norskog

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: solr 1.3


On 20-Jan-08, at 5:07 PM, anuvenk wrote:

 when will this be released? where can i find the list of 
 improvements/enhancements in 1.3 if its been documented already?

see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt? 
view=markup

We're not sure on a timeframe for release yet.

-Mike



Re: Missing Content Stream

2008-01-21 Thread Ismail Siddiqui
I am trying solrj to index.. using follwing code

 String url = http://localhost:8080/solr;;
  SolrServer server = new CommonsHttpSolrServer( url );

its giving error that undifined symbol for constructor(string). can somoen
tell me why this constructor thrwoing error while in source file i can
clearly see this constructor

thanks



On 1/15/08, Ismail Siddiqui [EMAIL PROTECTED] wrote:

 thanks brian and otis,
 i will definitely try solrj.. but actaually now the problem is resolved by
 setting content length in header i was missing it
 c.setRequestProperty(Content-Length, xmlText.length()+);
 but now its not throwing any error but not indexing the document either..
 do I have to set autoCommit on in solrconfig.xml ???


 thanks


  On 1/15/08, Brian Whitman [EMAIL PROTECTED] wrote:
 
 
  On Jan 15, 2008, at 1:50 PM, Ismail Siddiqui wrote:
 
   Hi Everyone,
   I am new to solr. I am trying to index xml using http post as follows
 
 
  Ismail, you seem to have a few spelling mistakes in your xml string.
  fiehld, nadme etc. (a) try fixing them, (b) try solrj instead, I
  agree w/ otis.
 
 
 
 



Is it possible to have append kind update operation?

2008-01-21 Thread zqzuk

Hi, is it possible to have append like updates, where if two records of
same id's are posted to solr, the contents of the two merges and composes a
single record with the id? I am asking because my program works in a
multi-thread manner where several threads produces serveral parts of a final
record which is to be posted and indexed. Currently I am having a
preprocessing program where the threads produces parts, then a post
processing where the parts are merged into a single xml file then posted to
solr. If it is possible to do append like updating, then each thread can
post to solr directly without writing temporary files.

For example, thread 1 produce an xml file like:
--
?xml version=1.0 encoding=UTF-8?
add allowDups=true overwriteCommitted=false overwritePending=false
doc
field name=record-id198/field
field name=descriptionThis is my short text. This is part 1 of the
record with id=198/field
/doc
/add
--

thread 2 produces xml like

--
?xml version=1.0 encoding=UTF-8?
add allowDups=true overwriteCommitted=false overwritePending=false
doc
field name=record-id198/field
field name=titleTitle here. This is part 2 of record with id=198/field
/doc
/add
--

Currently my program needs to produce the two separate files, then merge
them into

--
?xml version=1.0 encoding=UTF-8?
add allowDups=true overwriteCommitted=false overwritePending=false
doc
field name=record-id198/field
field name=descriptionThis is my short text. This is part 1 of the
record with id=198/field
field name=descriptionThis is my short text. This is part 1 of the
record with id=198/field
/doc
/add
--
 
Then post the final file. If I post the two separately, I get two separate
records with same id=198, while one has only field description and the
other has only field title.

Is it possible to append? Or is my settings allowDup incorrect?

Many thanks!
-- 
View this message in context: 
http://www.nabble.com/Is-it-possible-to-have-%22append%22-kind-update-operation--tp15006743p15006743.html
Sent from the Solr - User mailing list archive at Nabble.com.



illegal characters in xml file to be posted?

2008-01-21 Thread zqzuk

Hi, I am using the SimplePostTool to post files to solr. I have encoutered
some problem with the content of xml files. I noticed that if my xml file
has fields whose values contain the character  or  or , the post
fails and I get the exception :

javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
Message: The entity name must immediately follow the '' in the entity
reference

Looks like these characters are illegal in xml as embedded contents - but I
did extract them from xml in the first place. Is there a list of such
characters I need to deal with before I pass that to SimplePostTool?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15006748.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: illegal characters in xml file to be posted?

2008-01-21 Thread Binkley, Peter
You should encode those three characters, and it doesn't hurt to encode
the ampersand and double-quote characters too:
http://en.wikipedia.org/wiki/XML#Entity_references

Peter 

-Original Message-
From: zqzuk [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 21, 2008 2:24 PM
To: solr-user@lucene.apache.org
Subject: illegal characters in xml file to be posted?


Hi, I am using the SimplePostTool to post files to solr. I have
encoutered some problem with the content of xml files. I noticed that if
my xml file has fields whose values contain the character  or  or
, the post fails and I get the exception :

javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
Message: The entity name must immediately follow the '' in the entity
reference

Looks like these characters are illegal in xml as embedded contents -
but I did extract them from xml in the first place. Is there a list of
such characters I need to deal with before I pass that to
SimplePostTool?

Thanks!
--
View this message in context:
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150
06748p15006748.html
Sent from the Solr - User mailing list archive at Nabble.com.



Wildcards

2008-01-21 Thread dojolava
Hello,

I just started to use solr and I experience strange behaviour when it comes
to wildcards.

When I use the StandardRequestHandler queries like eur?p?an or eur*an
work fine.
But garden?r or admini*tion do not bring any results (without wildcards
there are some of course).

All affected fields are of type text, with the standard schema.xml from the
example.

Does anybody know how to fix this?


RE: illegal characters in xml file to be posted?

2008-01-21 Thread zqzuk

Thanks for the quick advice!


pbinkley wrote:
 
 You should encode those three characters, and it doesn't hurt to encode
 the ampersand and double-quote characters too:
 http://en.wikipedia.org/wiki/XML#Entity_references
 
 Peter 
 
 -Original Message-
 From: zqzuk [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 21, 2008 2:24 PM
 To: solr-user@lucene.apache.org
 Subject: illegal characters in xml file to be posted?
 
 
 Hi, I am using the SimplePostTool to post files to solr. I have
 encoutered some problem with the content of xml files. I noticed that if
 my xml file has fields whose values contain the character  or  or
 , the post fails and I get the exception :
 
 javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y]
 Message: The entity name must immediately follow the '' in the entity
 reference
 
 Looks like these characters are illegal in xml as embedded contents -
 but I did extract them from xml in the first place. Is there a list of
 such characters I need to deal with before I pass that to
 SimplePostTool?
 
 Thanks!
 --
 View this message in context:
 http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150
 06748p15006748.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15007840.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Wildcards

2008-01-21 Thread Yonik Seeley
On Jan 21, 2008 5:18 PM, dojolava [EMAIL PROTECTED] wrote:
 I just started to use solr and I experience strange behaviour when it comes
 to wildcards.

 When I use the StandardRequestHandler queries like eur?p?an or eur*an
 work fine.
 But garden?r or admini*tion do not bring any results (without wildcards
 there are some of course).

It's probably stemming.  Something like gardener is probably stemmed
to garden, so
a wildcard query that expects something longer than garden won't
find anything.

If you really need more accurate wildcard queries, do a copyField of
this field into another that does not have stemming (perhaps just
whitespace tokenizer and lowercase filter, and maybe stop filter).
Then use this alternate field for wildcard queries.

-Yonik


Re: Wildcards

2008-01-21 Thread dojolava
Thanks a lot!

I checked it, when I search for g?rden it works, only g?rdener does
not...

I will try the copyField solution.

On Jan 21, 2008 11:23 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 On Jan 21, 2008 5:18 PM, dojolava [EMAIL PROTECTED] wrote:
  I just started to use solr and I experience strange behaviour when it
 comes
  to wildcards.
 
  When I use the StandardRequestHandler queries like eur?p?an or
 eur*an
  work fine.
  But garden?r or admini*tion do not bring any results (without
 wildcards
  there are some of course).

 It's probably stemming.  Something like gardener is probably stemmed
 to garden, so
 a wildcard query that expects something longer than garden won't
 find anything.

 If you really need more accurate wildcard queries, do a copyField of
 this field into another that does not have stemming (perhaps just
 whitespace tokenizer and lowercase filter, and maybe stop filter).
 Then use this alternate field for wildcard queries.

 -Yonik



Re: DisMax and Search Components

2008-01-21 Thread Charles Hornberger
On Jan 21, 2008 10:23 AM, Doug Steigerwald
[EMAIL PROTECTED] wrote:
 Is there any support for DisMax (or any search request handlers) in search 
 components, or is that
 something that still needs to be done?  It seems like it isn't supported at 
 the moment.

I was curious about this, too ... If it *is* something that needs to
be done, am happy to help w/ the coding. But I would need some
advice/guidance up front --  I'm new enough to Solr that the design
behind the SearchComponents refactoring is not immediately obvious to
me, either from the Jira comments or the code itself.

-Charlie


Re: DisMax and Search Components

2008-01-21 Thread Yonik Seeley
The QueryComponent supports both lucene queryparser syntax and dismax
query syntax.
The dismax request handler now simply sets defType (the default base
query type) to dismax

-Yonik

On Jan 21, 2008 1:23 PM, Doug Steigerwald
[EMAIL PROTECTED] wrote:
 Is there any support for DisMax (or any search request handlers) in search 
 components, or is that
 something that still needs to be done?  It seems like it isn't supported at 
 the moment.

 We want to be able to use a field collapsing component
 (https://issues.apache.org/jira/browse/SOLR-236), but still be able to use 
 our DisMax handlers.

 Right now it's one or the other, and we -need- both.

 Thanks.
 doug



Re: DisMax and Search Components

2008-01-21 Thread Doug Steigerwald

We've found a way to work around it.  In our search components, we're doing 
something like:

  defType = defType == null ? DisMaxQParserPlugin.NAME : defType;

If you add defType=dismax to the query string, it'll use the 
DisMaxQParserPlugin.

Unfortunately, I haven't been able to figure out an easy way to access the config for the different 
defined disxmax handlers in the config, so on our service side (Rails app), we're going to have a 
configuration with all the params we need to pass (qf, pf, fl, etc) and send them based on 
parameters we have coming into the service that we use to figure out which dismax handler to use 
(uh, yeah, I think that sounds right).


This may not be the best way to do it, but it will work fine for us until we can dedicate more time 
to it (we roll out Solr and our search service to QA next week).


Doug

Charles Hornberger wrote:

On Jan 21, 2008 10:23 AM, Doug Steigerwald
[EMAIL PROTECTED] wrote:

Is there any support for DisMax (or any search request handlers) in search 
components, or is that
something that still needs to be done?  It seems like it isn't supported at the 
moment.


I was curious about this, too ... If it *is* something that needs to
be done, am happy to help w/ the coding. But I would need some
advice/guidance up front --  I'm new enough to Solr that the design
behind the SearchComponents refactoring is not immediately obvious to
me, either from the Jira comments or the code itself.

-Charlie


Re: DisMax and Search Components

2008-01-21 Thread Yonik Seeley
On Jan 21, 2008 9:06 PM, Doug Steigerwald
[EMAIL PROTECTED] wrote:
 We've found a way to work around it.  In our search components, we're doing 
 something like:

defType = defType == null ? DisMaxQParserPlugin.NAME : defType;

Would it be easier to just add it as a default parameter in the request handler?

-Yonik


Re: DisMax and Search Components

2008-01-21 Thread Doug Steigerwald

We don't always want to use the dismax handler in our setup.

Doug

Yonik Seeley wrote:

On Jan 21, 2008 9:06 PM, Doug Steigerwald
[EMAIL PROTECTED] wrote:

We've found a way to work around it.  In our search components, we're doing 
something like:

   defType = defType == null ? DisMaxQParserPlugin.NAME : defType;


Would it be easier to just add it as a default parameter in the request handler?

-Yonik


RE: copyField limitation

2008-01-21 Thread Lance Norskog
Sorting on a non-integer has space problems. As I understand it, sorting
creates an array of integers the size of the number of records in the entire
index. Sorting on a non-integer type also creates a separate array of the
same size with the field data copied into it.  Thus sorting a non-integer
field can use several times as much memory.

We have a very large index with very small records. We are creating matching
integer fields for various fields just to have faster sorts, and we are
doing this after benchmarking our speed and space behaviours.

I filed a Jira issue:

https://issues.apache.org/jira/browse/SOLR-464

Thanks for your time,

Lance Norskog

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Thursday, January 17, 2008 2:53 PM
To: solr-user@lucene.apache.org
Subject: Re: copyField limitation

On Jan 17, 2008 4:53 PM, Lance Norskog [EMAIL PROTECTED] wrote:
 Because sort works much faster on type 'integer', but range queries do 
 not work on type 'integer',

Really?  The sort speed should be identical.

-Yonik



OOE during indexing

2008-01-21 Thread Marcus Herou
Hi.

I get OOE with Solr 1.3 Autowarm seem to be the villain in cojunction with
FieldCache somehow.
JVM args: -Xmx512m -Xms512m -Xss128k

Index size is ~4 Million docs, where I index text and store database primary
keys.
du /srv/solr/feedItem/data/index/
1.7G/srv/solr/feedItem/data/index/

To ensure that the docs I index do not swell to much I only allow 5K per doc
to over the wire i.e. I substring 0, 5000 on the field content

I have removed firstSearcher and newSearcher since the queries I used
before killed performance on reindexing the whole index. I will add them
later again when I get into a delta update index state.

Stacktrace.
[06:25:53.122] [null] /update wt=xmlversion=2.2 0 3165
[06:25:53.877] Error during auto-warming of key:
[EMAIL PROTECTED]:java.lang.OutOfMemoryError:
Java heap space
[06:25:53.877]  at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java
:104)
[06:25:53.877]  at org.apache.lucene.index.SegmentTermEnum.term(
SegmentTermEnum.java:159)
[06:25:53.877]  at org.apache.lucene.index.SegmentMergeInfo.next(
SegmentMergeInfo.java:66)
[06:25:53.877]  at org.apache.lucene.index.MultiTermEnum.next(
MultiReader.java:315)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl$10.createValue(
FieldCacheImpl.java:388)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl$Cache.get(
FieldCacheImpl.java:72)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl.getStringIndex(
FieldCacheImpl.java:350)
[06:25:53.877]  at
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(
FieldSortedHitQueue.java:266)
[06:25:53.877]  at
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(
FieldSortedHitQueue.java:182)
[06:25:53.877]  at org.apache.lucene.search.FieldCacheImpl$Cache.get(
FieldCacheImpl.java:72)
[06:25:53.877]  at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(
FieldSortedHitQueue.java:155)
[06:25:53.877]  at org.apache.lucene.search.FieldSortedHitQueue.init(
FieldSortedHitQueue.java:56)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.getDocListNC(
SolrIndexSearcher.java:862)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.getDocListC(
SolrIndexSearcher.java:808)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.access$000(
SolrIndexSearcher.java:56)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem
(SolrIndexSearcher.java:254)
[06:25:53.877]  at org.apache.solr.search.LRUCache.warm(LRUCache.java:192)
[06:25:53.877]  at org.apache.solr.search.SolrIndexSearcher.warm(
SolrIndexSearcher.java:1393)
[06:25:53.877]  at org.apache.solr.core.SolrCore$2.call(SolrCore.java:702)
[06:25:53.877]  at java.util.concurrent.FutureTask$Sync.innerRun(
FutureTask.java:269)
[06:25:53.877]  at java.util.concurrent.FutureTask.run(FutureTask.java:123)
[06:25:53.877]  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
ThreadPoolExecutor.java:650)
[06:25:53.877]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:675)
[06:25:53.877]  at java.lang.Thread.run(Thread.java:595)

Help anyone?

Attaching schema.xml and solrconfig.xml

Kindly

//Marcus Herou
?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--  
 This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml
--

schema name=example version=1.1
  types

fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/

fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/
fieldType name=integer class=solr.IntField omitNorms=true/
fieldType name=long class=solr.LongField omitNorms=true/
fieldType name=float class=solr.FloatField omitNorms=true/
fieldType name=double class=solr.DoubleField omitNorms=true/
fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/
fieldType