date:20090914

if you wish to use conditional copy you can use a RegexTransformer

field column=guid  xpath=/rss/channel/guid/
field column=id regex=.* sourceColName=guid
replaceWith=${entityname.guid}/

this means that if guid!= null 'id' will be set to guid


On Mon, Sep 14, 2009 at 4:16 PM, Grant Ingersoll gsing...@apache.org wrote:
 As I said, copying is not an option.  That will break everything else.

 On Sep 14, 2009, at 1:07 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 The XPathRecordreader has a limit one mapping per xpath. So copying is
 the best solution

 On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie fer...@twig.me.uk
 wrote:

 I'm trying to import several RSS feeds using DIH and running into a
 bit of a problem.  Some feeds define a GUID value that I map to my
 Solr ID, while others don't.  I also have a link field which I fill in
 with the RSS link field.  For the feeds that don't have the GUID value
 set, I want to use the link field as the id.  However, if I define the
 same XPath twice, but map it to two diff. columns I don't get the id
 value set.

 For instance, I want to do:
 schema.xml
 field name=id type=string indexed=true stored=true
 required=true/
 field name=link type=string indexed=true stored=false/

 DIH config:
 field column=id xpath=/rss/channel/item/link /
 field column=link xpath=/rss/channel/item/link /

 Because I am consolidating multiple fields, I'm not able to do
 copyFields, unless of course, I wanted to implement conditional copy
 fields (only copy if the field is not defined) which I would rather not.

 How do I solve this?


 How about.

 entity name=x ... transformer=TemplateTransformer
  field column=link xpath=/rss/channel/item/link /
  field column=GUID xpath=/rss/channel/GUID /
  field column=id   template=${x.link} /
  field column-id   template=${x.GUID} /

 The TemplateTransformer does nothing if its source expression is null.
 So the first transform assign the fallback value to ID, this is
 overwritten by the GUID if it is defined.

 You can not sort of do if-then-else using a combination of template
 and regex transformers. Adding a bit of maths to the transformers and
 I think we will have a turing complete language:-)

 fergus.

 Thanks,
 Grant

 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Seeking help setting up solr in eclipse

2009-09-14 Thread Grant Ingersoll

I'm not familiar w/ Eclipse, but do you need to set solr.solr.home? 
Perhaps http://wiki.apache.org/solr/SolrTomcat can help too.

On Sep 13, 2009, at 7:12 PM, Markus Fischer wrote:


Hi,

I'ld like to set up Eclipse to run solr (in Tomcat for example), but
struggling with the issue that I can't get the index.jsp and other  
files

to be properly executed, for debugging and working on a plugin.

I've checked out solr via subclipse plugin, created a Dynamic Web
Project. It seems that I've to know in advance which directories  
contain
the proper web files. Since I can't find a definitive UI to change  
that
aftewards, I modified the .settings/org.eclipse.wst.common.component  
by

hand, but I can't get it work.

When I open solr/src/webapp/web/index.jsp via Run as/Run on Server,
Tomcat gets started and the browser window opens the URL
http://localhost:8080/solr/index.jsp which only gives me a HTTP Status
404 - /solr/index.jsp . That's straight to the point for me, but I'm  
not
sure where to fix this. My org.eclipse.wst.common.component looks  
like this:


?xml version=1.0 encoding=UTF-8?
project-modules id=moduleCoreId project-version=1.5.0
   wb-module deploy-name=solr
   wb-resource deploy-path=/ source-path=/src/webapp/web/
   wb-resource deploy-path=/WEB-INF/classes
source-path=/src/common/
   wb-resource deploy-path=/WEB-INF/classes
source-path=/src/java/
   wb-resource deploy-path=/WEB-INF/classes
source-path=/src/webapp/src/
   wb-resource deploy-path=/WEB-INF/classes
source-path=/src/webapp/web/
   property name=java-output-path/
   property name=context-root value=//
   /wb-module
/project-modules

I see that Tomcat gets started with these values (stripped path to
workspace):

/usr/lib/jvm/java-6-sun-1.6.0.15/bin/java
-Dcatalina.base=/workspace/.metadata/.plugins/ 
org.eclipse.wst.server.core/tmp0

-Dcatalina.home=/apache-tomcat-6.0.20
-Dwtp.deploy=/workspace/.metadata/.plugins/ 
org.eclipse.wst.server.core/tmp0/wtpwebapps

-Djava.endorsed.dirs=/apache-tomcat-6.0.20/endorsed
-Dfile.encoding=UTF-8 -classpath
/apache-tomcat-6.0.20/bin/bootstrap.jar:/usr/lib/jvm/java-6- 
sun-1.6.0.15/lib/tools.jar

org.apache.catalina.startup.Bootstrap start

The configuration files in /workspace/Servers/Tomcat v6.0 Server at
localhost-config, e.g. server.xml, contain:

Host appBase=webapps autoDeploy=true name=localhost
unpackWARs=true xmlNamespaceAware=false
xmlValidation=falseContext docBase=solr path=/solr
reloadable=true source=org.eclipse.jst.jee.server:solr//Host

I see files copied, e.g.

/workspace/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/ 
wtpwebapps/solr/WEB-INF/classes/index.jsp


I'm bumping against a wall currently, I don't see the woods  
anymore ...


thanks for any help,
- Markus


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Questions on copyField

2009-09-14 Thread Rahul R

Hello,
I have a few questions regarding the copyField directive in schema.xml

1. Does the destination field store a reference or the actual data ?
If I have soemthing like this
copyField source=name dest=text/
then will the values in the 'name' field get copied into the 'text' field or
will the 'text' field only store a reference to the 'name' field ? To put it
more simply, if I later delete the 'name' field from the index will I lose
the corresponding data in the 'text' field ?

2. Is there any inbuilt API which I can use to do the copyField action
programmatically ?

3. Can I do a copyfield from the schema as well as programmatically for the
same destination field
Suppose I want the 'text' field to contain values for name, age and
location. In my index only 'name' and 'age' are defined as fields. So I can
add directives like
copyField source=name dest=text/
copyField source=age dest=text/
The location however, I want to add it to the 'text' field programmatically.
I don't want to store the location as a separate field in the index. Can I
do this ?

Thank you.

Regards
Rahul

Searching for the '+' character


Hi all,

I need some help with a curious problem i can't find a solution for. I  
am somewhat of a newbie with the various analyzers and handlers and  
how they work together, so im looking for advice on how to proceed  
with my issue.


I have content with text like 'product+' which has been indexed as  
text. I need to search for the character '+', but try as I might i  
can't do this.


From the docs it should just be a matter of escaping:

http://lucene.apache.org/java/2_4_1/queryparsersyntax.html#Escaping%20Special%20Characters

So queries like:

http://localhost:8983/solr/select/?q=\+debugQuery=true or 
http://localhost:8983/solr/select/?q=\%2BdebugQuery=true

should do the trick but they don't. I get:

http://pastie.org/616055 and http://pastie.org/616052, respectively.

Only with the + url encoded does it appear in the output, but no  
results are returned.


I believe that the + is being stripped somehow but im not sure where  
exactly to look.


I included the debug info from the query but im not sure if the output  
is helpfull.


Does anyone have ideas on this issue, and how i should try to proceed?

Many thanks,

Paul

Solr results filtered on MoreLikeThis

2009-09-14 Thread Marcelk


Hi,

I hope someone can help me in my search for finding the right solution for
my search application. I hope I'm not repeating a question that has been
asked before, but I could not find a similar question out there. So that is
why I'm asking it here...

Here goes:

My index contains documents which also could contain duplicates based on
content. The sources of these documents are from various locations on the
internet. I some cases these documents look the same and in some cases they
are the same. 

What I am trying to achieve is a result with matching documents, but where
the results are unique based on the MoreLikeThis. So I want to provide
matching documents only in the details not in the results. The results
should state the number of morelikethis. 

So if 3 documents match and another 4 documents match, I only want 2 results
like this:

- document1 (3 similar documents)
- document2 (4 similar documents)

And when users click further I will let them see all the similar documents,
but not in the search result

I have used the MoreLikeThis via the standard query not the
MoreLikeThisHandler. And I can see that the results are seperate from the
morelikethis element in the result. 

I would like to have the morelikethis results be filtered on the actual
result list.

Sorry, if I'm repeating myself, but I'm just trying to explain it as best as
I can.

Regards,
Marcel



-- 
View this message in context: 
http://www.nabble.com/Solr-results-filtered-on-MoreLikeThis-tp25434881p25434881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: standard requestHandler components

2009-09-14 Thread Peter Wolanin

I just copied this information to the wiki at
http://wiki.apache.org/solr/SolrRequestHandler

-Peter

On Fri, Sep 11, 2009 at 7:43 PM, Jay Hill jayallenh...@gmail.com wrote:
 RequestHandlers are configured in solrconfig.xml. If no components are
 explicitly declared in the request handler config the the defaults are used.
 They are:
 - QueryComponent
 - FacetComponent
 - MoreLikeThisComponent
 - HighlightComponent
 - StatsComponent
 - DebugComponent

 If you wanted to have a custom list of components (either omitting defaults
 or adding custom) you can specify the components for a handler directly:
    arr name=components
      strquery/str
      strfacet/str
      strmlt/str
      strhighlight/str
      strdebug/str
      strsomeothercomponent/str
    /arr

 You can add components before or after the main ones like this:
    arr name=first-components
      strmycomponent/str
    /arr

    arr name=last-components
      strmyothercomponent/str
    /arr

 and that's how the spell check component can be added:
    arr name=last-components
      strspellcheck/str
    /arr

 Note that the a component (except the defaults) must be configured in
 solrconfig.xml with the name used in the str element as well.

 Have a look at the solrconfig.xml in the example directory
 (.../example/solr/conf/) for examples on how to set up the spellcheck
 component, and on how the request handlers are configured.

 -Jay
 http://www.lucidimagination.com


 On Fri, Sep 11, 2009 at 3:04 PM, michael8 mich...@saracatech.com wrote:


 Hi,

 I have a newbie question about the 'standard' requestHandler in
 solrconfig.xml.  What I like to know is where is the config information for
 this requestHandler kept?  When I go to http://localhost:8983/solr/admin,
 I
 see the following info, but am curious where are the supposedly 'chained'
 components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent)
 configured for this requestHandler.  I see timing and process debug output
 from these components with debugQuery=true, so somewhere these components
 must have been configured for this 'standard' requestHandler.

 name:    standard
 class:  org.apache.solr.handler.component.SearchHandler
 version:        $Revision: 686274 $
 description:    Search using components:

 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent,
 stats:  handlerStart : 1252703405335
 requests : 3
 errors : 0
 timeouts : 0
 totalTime : 201
 avgTimePerRequest : 67.0
 avgRequestsPerSecond : 0.015179728


 What I like to do from understanding this is to properly integrate
 spellcheck component into the standard requestHandler as suggested in a
 solr
 spellcheck example.

 Thanks for any info in advance.
 Michael
 --
 View this message in context:
 http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html
 Sent from the Solr - User mailing list archive at Nabble.com.






-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: stopfilterFactory isn't removing field name

2009-09-14 Thread Yonik Seeley

Thanks, I'll see if I can reproduce...

-Yonik
http://www.lucidimagination.com

On Mon, Sep 14, 2009 at 2:10 AM, mike anderson saidthero...@gmail.com wrote:
 Yeah.. that was weird. removing the line forever,for ever from my synonyms
 file fixed the problem. In fact, i was having the same problem for every
 double word like that. I decided I didn't really need the synonym filter for
 that field so I just took it out, but I'd really like to know what the
 problem is.
 -mike

 On Mon, Sep 14, 2009 at 1:10 AM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 That's pretty strange... perhaps something to do with your synonyms
 file mapping for to a zero length token?

 -Yonik
 http://www.lucidimagination.com

 On Mon, Sep 14, 2009 at 12:13 AM, mike anderson saidthero...@gmail.com
 wrote:
  I'm kind of stumped by this one.. is it something obvious?
  I'm running the latest trunk. In some cases the stopFilterFactory isn't
  removing the field name.
 
  Thanks in advance,
 
  -mike
 
  From debugQuery (both words are in the stopwords file):
 
  http://localhost:8983/solr/select?q=citations:fordebugQuery=true
 
  str name=rawquerystringcitations:for/str
  str name=querystringcitations:for/str
  str name=parsedquerycitations:/str
  str name=parsedquery_toStringcitations:/str
 
 
  http://localhost:8983/solr/select?q=citations:thedebugQuery=true
 
  str name=rawquerystringcitations:the/str
  str name=querystringcitations:the/str
  str name=parsedquery/str
  str name=parsedquery_toString/str
 
 
 
 
  schema analyzer for this field:
  !-- Citation text --
  fieldType name=citationtext class=solr.TextField
  positionIncrementGap=100
       analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
          filter class=solr.SynonymFilterFactory
  synonyms=substitutions.txt ignoreCase=true expand=false/
  filter class=solr.StandardFilterFactory/
         filter class=solr.StopFilterFactory ignoreCase=false
  words=citationstopwords.txt/
         filter class=solr.LowerCaseFilterFactory/
    filter class=solr.ISOLatin1AccentFilterFactory/
 
         !--filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/--
       /analyzer
       analyzer type=query
       tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.SynonymFilterFactory
  synonyms=substitutions.txt ignoreCase=true expand=false/
   filter class=solr.StandardFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=false
  words=citationstopwords.txt/
       filter class=solr.LowerCaseFilterFactory/
     filter class=solr.ISOLatin1AccentFilterFactory/
        !-- filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/ --
       /analyzer
     /fieldType

Re: Searching for the '+' character

 Hi all,
 
 I need some help with a curious problem i can't find a
 solution for. I am somewhat of a newbie with the various
 analyzers and handlers and how they work together, so im
 looking for advice on how to proceed with my issue.
 
 I have content with text like 'product+' which has been
 indexed as text. I need to search for the character '+', but
 try as I might i can't do this.
 
 From the docs it should just be a matter of escaping:

 I believe that the + is being stripped somehow but im not
 sure where exactly to look.

I think your analyzer is eating up +, which tokenizer are you using in it?
Do you want to return documents containing 'product+' by searching '+'?

Configuring slaves for a master backup without restarting

2009-09-14 Thread nourredine khadri

Hi,

A question about scalability.

Let imagine the following architecture based on Master/Slave schema : 

- A master for the indexation called Master 1
- A backup of Master 1 (called Master 2)
- Several slaves for search linked to Master 1

Can I configure the slaves to be automatically linked to Master 2 if Master 1 
fails without restarting the JVMs?

Thanks in advance.

Nourredine.

Re: shards and facet_count

2009-09-14 Thread Paul Rosen


Shalin Shekhar Mangar wrote:

On Fri, Sep 11, 2009 at 2:35 AM, Paul Rosen p...@performantsoftware.comwrote:


Hi again,

I've mostly gotten the multicore working except for one detail.

(I'm using solr 1.3 and solr-ruby 0.0.6 in a rails project.)

I've done a few queries and I appear to be able to get hits from either
core. (yeah!)

I'm forming my request like this:

req = Solr::Request::Standard.new(
 :start = start,
 :rows = max,
 :sort = sort_param,
 :query = query,
 :filter_queries = filter_queries,
 :field_list = @field_list,
 :facets = {:fields = @facet_fields, :mincount = 1, :missing = true,
:limit = -1},
 :highlighting = {:field_list = ['text'], :fragment_size = 600},
 :shards = @cores)

If I leave :shards = @cores out, then the response includes:

'facet_counts' = {
 'facet_dates' = {},
 'facet_queries' = {},
 'facet_fields' = { 'myfacet' = [ etc...], etc... }

which is what I expect.

If I add the :shards = @cores back in (so that I'm doing the exact
request above), I get:

'facet_counts' = {
 'facet_dates' = {},
 'facet_queries' = {},
 'facet_fields' = {}

so I've lost my facet information.

Why would it correctly find my documents, but not report the facet info?



I'm not a ruby guy but the response format in both the cases is exactly the
same so I don't think there is any problem with the ruby client parsing. Can
you check the Solr logs to see if there were any exceptions when you sent
the shards parameter?



I don't see any exceptions. The solr activity is pretty different for 
the two cases. Without the shards, it makes one call that looks 
something like this (I ellipsed the id and field parameters for clarity):


Sep 14, 2009 9:32:09 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select 
params={facet.limit=-1wt=rubyrows=30start=0facet=truefacet.mincount=1q=(rossetti)fl=archive,...,licenseqt=standardfacet.missing=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true} 
hits=27 status=0 QTime=6


Note that facet=true.

With the shards, it has five lines for the single call that I make:

Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [exhibits] webapp=/solr path=/select 
params={wt=javabinrows=30start=0facet=truefl=uri,scoreq=(rossetti)version=2.2isShard=truefacet.missing=truehl.fl=textfsv=truehl.fragsize=600facet.field=genrefacet.field=archivefacet.field=freeculturehl=false} 
hits=6 status=0 QTime=0


Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select 
params={wt=javabinrows=30start=0facet=truefl=uri,scoreq=(rossetti)version=2.2isShard=truefacet.missing=truehl.fl=textfsv=truehl.fragsize=600facet.field=genrefacet.field=archivefacet.field=freeculturehl=false} 
hits=27 status=0 QTime=3


Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select 
params={facet.limit=-1wt=javabinrows=30start=0ids=...,...facet=falsefacet.mincount=1q=(rossetti)fl=archive,...,uriversion=2.2facet.missing=trueisShard=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true} 
status=0 QTime=35


Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [exhibits] webapp=/solr path=/select 
params={facet.limit=-1wt=javabinrows=30start=0ids=...,...facet=falsefacet.mincount=1q=(rossetti)fl=archive,...,uriversion=2.2facet.missing=trueisShard=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true} 
status=0 QTime=41


Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute
INFO: [resources] webapp=/solr path=/select 
params={facet.limit=-1wt=rubyrows=30start=0facet=truefacet.mincount=1q=(rossetti)fl=archive,...,licenseqt=standardfacet.missing=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=trueshards=localhost:8983/solr/resources,localhost:8983/solr/exhibits} 
status=0 QTime=57


Note that on the third and fourth lines, facet=false. Is that significant?

Thanks,
Paul

Re: Searching for the '+' character


Hi Ahmet,

I believe its the WhitespaceTokenizerFactory, but i may be wrong.

I've pasted the schema.xml into http://pastie.org/616162



On 14 Sep 2009, at 14:29, AHMET ARSLAN wrote:


Hi all,

I need some help with a curious problem i can't find a
solution for. I am somewhat of a newbie with the various
analyzers and handlers and how they work together, so im
looking for advice on how to proceed with my issue.

I have content with text like 'product+' which has been
indexed as text. I need to search for the character '+', but
try as I might i can't do this.

From the docs it should just be a matter of escaping:



I believe that the + is being stripped somehow but im not
sure where exactly to look.


I think your analyzer is eating up +, which tokenizer are you using  
in it?
Do you want to return documents containing 'product+' by searching  
'+'?







Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth

Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-14 Thread palexv


I know that my issue is related to
http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
and https://issues.apache.org/jira/browse/SOLR-728
but my case is quite different.
As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
prevents concurrent executing of import operation but does NOT put command
in a queue.

I have only few records to index. When run full reindex - it works very
fast. But when I try to rerun this even after a couple of seconds - I am
getting 
Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
No operations allowed after connection closed.

At this time, when I check status - it says that status is idle and
everything was indexed success.
Second run of reindex without exception I can run only after 10 seconds. 
It does not work for me! If I apply patch from
https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex in
next 10 seconds as well.
Any suggestions?
-- 
View this message in context: 
http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuring slaves for a master backup without restarting

you can put both master1 and master2 behind a VIP.

If Master 1 goes down make the VIP point to Master2

On Mon, Sep 14, 2009 at 7:11 PM, nourredine khadri
nourredin...@yahoo.com wrote:
 Hi,

 A question about scalability.

 Let imagine the following architecture based on Master/Slave schema :

 - A master for the indexation called Master 1
 - A backup of Master 1 (called Master 2)
 - Several slaves for search linked to Master 1

 Can I configure the slaves to be automatically linked to Master 2 if Master 
 1 fails without restarting the JVMs?

 Thanks in advance.

 Nourredine.






-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Searching for the '+' character

 Hi Ahmet,
 
 I believe its the WhitespaceTokenizerFactory, but i may be
 wrong.
 
 I've pasted the schema.xml into http://pastie.org/616162
 

I looked at your field type named text. 

WordDelimiterFilterFactory is eating up '+'

You can use .../solr/admin/analysis.jsp tool to see behaviour of each 
tokenizer/tokenfilter for particular input.

But more importantly do you want to return documents containing 'product+' by 
searching '+'? You said you need to search for the character '+'. What that 
query supposed to return back?

Re: Single Core or Multiple Core?

The problem is that, if we use multicore it forces you to use a core
name. this is inconvenient. We must get rid of this restriction before
we move single-core to multicore.



On Sat, Sep 12, 2009 at 3:14 PM, Uri Boness ubon...@gmail.com wrote:
 +1
 Can you add a JIRA issue for that so we can vote for it?

 Chris Hostetter wrote:

 :  For the record: even if you're only going to have one SOlrCore, using
 the
 :  multicore support (ie: having a solr.xml file) might prove handy from
 a
 :  maintence standpoint ... the ability to configure new on deck cores
 with
        ...
 : Yeah, it is a shame that single-core deployments (no solr.xml) does not
 have
 : a way to enable CoreAdminHandler. This is something we should definitely
 : look at in Solr 1.5.

 I think the most straight forward starting point is to switch how we
 structure the examples so that all of the examples uses a solr.xml with
 multicore support.

 Then we can move forward on deprecating the specification of Solr Home
 using JNDI/systemvars and switch to having the location of the solr.xml be
 the one master config option with everything else coming after that.



 -Hoss







-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Searching for the '+' character

--- On Mon, 9/14/09, Paul Forsyth p...@ez.no wrote:

 From: Paul Forsyth p...@ez.no
 Subject: Re: Searching for the '+' character
 To: solr-user@lucene.apache.org
 Date: Monday, September 14, 2009, 5:55 PM
 With words like 'product+' i'd expect
 a search for '+' to return results like any other character
 or word, so '+' would be found within 'product+' or similar
 text.

 I've tried removing the worddelimiter from the query
 analyzer, restarting and reindexing but i get the same
 result. Nothing is found. I assume one of the filters could
 be adjusted to keep the '+'.

 Weird thing is that i tried to remove all filters from the
 analyzer and i get the same result.

 Paul

When you remove all filters '+' is kept, but still '+' won't match 'product+'. 
Because you want to search inside a token.

If + sign is always at the end of of your text, and you want to search only 
last character of your text EdgeNGramFilterFactory can do that.
with the settings side=back maxGramSize=1 minGramSize=1

The fieldType below will match '+' to 'product+'

fieldType name=textx class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English/
filter class=solr.EdgeNGramFilterFactory side=back maxGramSize=1 
minGramSize=1/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/  
filter class=solr.LowerCaseFilterFactory/
filter class=ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English/
  /analyzer
/fieldType

But this time 'product+' will be reduced to only '+'. You won't be able to 
search it otherways for example product*. Along with the last character, if you 
want to keep the original word it self you can set maxGramSize to 512. By doing 
this token 'product+' will produce 8 tokens: (and query product* or product+ 
will return it )

+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word

If + sign can be anywhere inside the text you can use NGramTokenFilter.
Hope this helps.

Re: Searching for the '+' character

With words like 'product+' i'd expect a search for '+' to return  
results like any other character or word, so '+' would be found within  
'product+' or similar text.


I've tried removing the worddelimiter from the query analyzer,  
restarting and reindexing but i get the same result. Nothing is found.  
I assume one of the filters could be adjusted to keep the '+'.


Weird thing is that i tried to remove all filters from the analyzer  
and i get the same result.


Paul


On 14 Sep 2009, at 15:17, AHMET ARSLAN wrote:


Hi Ahmet,

I believe its the WhitespaceTokenizerFactory, but i may be
wrong.

I've pasted the schema.xml into http://pastie.org/616162



I looked at your field type named text.

WordDelimiterFilterFactory is eating up '+'

You can use .../solr/admin/analysis.jsp tool to see behaviour of  
each tokenizer/tokenfilter for particular input.


But more importantly do you want to return documents containing  
'product+' by searching '+'? You said you need to search for the  
character '+'. What that query supposed to return back?






Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth

Re: Single Core or Multiple Core?

2009-09-14 Thread Israel Ekpo

I concur with Uri, but I would also add that it might be helpful to specify
a default core to use somewhere in the configuration file.

So that if no core is specified, the default one will be implicitly
selected.

I am not sure if this feature is available yet.

What do you think?

On Mon, Sep 14, 2009 at 10:46 AM, Uri Boness ubon...@gmail.com wrote:

 Is it really a problem? I mean, as i see it, solr to cores is what RDBMS is
 to databases. When you connect to a database you also need to specify the
 database name.

 Cheers,
 Uri


 On Sep 14, 2009, at 16:27, Noble Paul നോബിള്‍  नोब्ळ् 
 noble.p...@corp.aol.com wrote:

  The problem is that, if we use multicore it forces you to use a core
 name. this is inconvenient. We must get rid of this restriction before
 we move single-core to multicore.



 On Sat, Sep 12, 2009 at 3:14 PM, Uri Boness ubon...@gmail.com wrote:

 +1
 Can you add a JIRA issue for that so we can vote for it?

 Chris Hostetter wrote:


 :  For the record: even if you're only going to have one SOlrCore,
 using
 the
 :  multicore support (ie: having a solr.xml file) might prove handy
 from
 a
 :  maintence standpoint ... the ability to configure new on deck
 cores
 with
   ...
 : Yeah, it is a shame that single-core deployments (no solr.xml) does
 not
 have
 : a way to enable CoreAdminHandler. This is something we should
 definitely
 : look at in Solr 1.5.

 I think the most straight forward starting point is to switch how we
 structure the examples so that all of the examples uses a solr.xml with
 multicore support.

 Then we can move forward on deprecating the specification of Solr Home
 using JNDI/systemvars and switch to having the location of the solr.xml
 be
 the one master config option with everything else coming after that.



 -Hoss







 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: Single Core or Multiple Core?

2009-09-14 Thread Shalin Shekhar Mangar

On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness ubon...@gmail.com wrote:

 Is it really a problem? I mean, as i see it, solr to cores is what RDBMS is
 to databases. When you connect to a database you also need to specify the
 database name.


The problem is compatibility. If we make solr.xml compulsory then we only
force people to do a configuration change. But if we make a core name
mandatory, then we force them to change their applications (or the
applications' configurations). It is better if we can avoid that. Besides,
if there's only one core, why need a name?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Searching for the '+' character


Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the gramsize to take  
into account other possible uses but i can now read around these  
filters to make the adjustments.


With regard to WordDelimiterFilterFactory. Is there a way to place a  
delimiter on this filter to still get most of its functionality  
without it absorbing the + signs? Will i loose a lot of 'good'  
functionality by removing it? 'preserveOriginal' sounds promising and  
seems to work but is it a good idea to use this?


On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:




--- On Mon, 9/14/09, Paul Forsyth p...@ez.no wrote:


From: Paul Forsyth p...@ez.no
Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.

I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.

Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.

Paul


When you remove all filters '+' is kept, but still '+' won't match  
'product+'. Because you want to search inside a token.


If + sign is always at the end of of your text, and you want to  
search only last character of your text EdgeNGramFilterFactory can  
do that.

with the settings side=back maxGramSize=1 minGramSize=1

The fieldType below will match '+' to 'product+'

fieldType name=textx class=solr.TextField  
positionIncrementGap=100

 analyzer type=index
   tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=ISOLatin1AccentFilterFactory/
   filter class=solr.SnowballPorterFilterFactory  
language=English/
	filter class=solr.EdgeNGramFilterFactory side=back  
maxGramSize=1 minGramSize=1/

 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/

   filter class=solr.LowerCaseFilterFactory/
   filter class=ISOLatin1AccentFilterFactory/
   filter class=solr.SnowballPorterFilterFactory  
language=English/

 /analyzer
   /fieldType


But this time 'product+' will be reduced to only '+'. You won't be  
able to search it otherways for example product*. Along with the  
last character, if you want to keep the original word it self you  
can set maxGramSize to 512. By doing this token 'product+' will  
produce 8 tokens: (and query product* or product+ will return it )


+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word

If + sign can be anywhere inside the text you can use  
NGramTokenFilter.

Hope this helps.





Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth

Re: Searching for the '+' character

2009-09-14 Thread Chantal Ackermann




Paul Forsyth schrieb:

Hi Erick,

In this specific case my client does have a new product with a '+' at
the end. Its just one of those odd ones!

Customers are expected to put + into the search box so i have to have
results to show.

I hear your concerns though. Originally i thought I would need to
transform the + into something else, and do this back and forwards to
get a match!


sorry for jumping into the discussion with my little knowledge - but I 
actually think transforming the '+' into something else in the index 
(something like 'pluzz' that has a low probability to appear as such in 
the regular input) is a good solution. You just have to do the same on 
the query side. You could have your own filter for that to put it in the 
schema or just do it manually at index and query time.


is that a possibility?

Chantal



Hopefully this will be a standard solr install, but with this tweak
for escaped chars

Paul

On 14 Sep 2009, at 17:01, Erick Erickson wrote:


Before you go too much further with this, I've just got to ask
whetherthe
use case for searching product+ really serves your customers.
If you mess around with analyzers to make things include the +,
what does that mean for ? *? .? any other weird character
you can think of?

Would it be a bad thing for product to match product+ and vice
versa? Would it be more or less confusing for your users to have
product
FAIL to match product+?

Of course only you really know your problem space, but think carefully
about this issue before you take on the work of making product+ work
because it'll inevitably be wy more work than you think. Imagine
the
bug reports when product fails to match product+, both of which
fail to match product

I'd also get a copy of Luke and look at the index to be sure what you
*think*
is in there is *actually* there. It'll also help you understand what
analyzers
do better.

Don't forget that using different analyzers when indexing and
querying will
lead to...er...interesting results.

Best
Erick

On Mon, Sep 14, 2009 at 11:38 AM, Paul Forsyth p...@ez.no wrote:


Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the gramsize to
take into
account other possible uses but i can now read around these filters
to make
the adjustments.

With regard to WordDelimiterFilterFactory. Is there a way to place a
delimiter on this filter to still get most of its functionality
without it
absorbing the + signs? Will i loose a lot of 'good' functionality by
removing it? 'preserveOriginal' sounds promising and seems to work
but is it
a good idea to use this?


On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:



--- On Mon, 9/14/09, Paul Forsyth p...@ez.no wrote:

From: Paul Forsyth p...@ez.no

Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.

I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.

Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.

Paul


When you remove all filters '+' is kept, but still '+' won't match
'product+'. Because you want to search inside a token.

If + sign is always at the end of of your text, and you want to
search
only last character of your text EdgeNGramFilterFactory can do that.
with the settings side=back maxGramSize=1 minGramSize=1

The fieldType below will match '+' to 'product+'

fieldType name=textx class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=ISOLatin1AccentFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
language=English/
  filter class=solr.EdgeNGramFilterFactory side=back
maxGramSize=1 minGramSize=1/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt
ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=ISOLatin1AccentFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
language=English/
   /analyzer
 /fieldType


But this time 'product+' will be reduced to only '+'. You won't be
able to
search it otherways for example product*. Along with the last
character, if
you want to keep the original word it self you can set maxGramSize
to 512.
By doing this token 'product+' will produce 8 tokens: (and query
product* or
product+ will return it )

+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word

If + sign can be anywhere inside the text you can use
NGramTokenFilter.
Hope this helps.





Best regards,

Paul Forsyth

Re: Searching for the '+' character


Hi Erick,

In this specific case my client does have a new product with a '+' at  
the end. Its just one of those odd ones!


Customers are expected to put + into the search box so i have to have  
results to show.


I hear your concerns though. Originally i thought I would need to  
transform the + into something else, and do this back and forwards to  
get a match!


Hopefully this will be a standard solr install, but with this tweak  
for escaped chars


Paul

On 14 Sep 2009, at 17:01, Erick Erickson wrote:

Before you go too much further with this, I've just got to ask  
whetherthe

use case for searching product+ really serves your customers.
If you mess around with analyzers to make things include the +,
what does that mean for ? *? .? any other weird character
you can think of?

Would it be a bad thing for product to match product+ and vice
versa? Would it be more or less confusing for your users to have  
product

FAIL to match product+?

Of course only you really know your problem space, but think carefully
about this issue before you take on the work of making product+ work
because it'll inevitably be wy more work than you think. Imagine  
the

bug reports when product fails to match product+, both of which
fail to match product

I'd also get a copy of Luke and look at the index to be sure what you
*think*
is in there is *actually* there. It'll also help you understand what
analyzers
do better.

Don't forget that using different analyzers when indexing and  
querying will

lead to...er...interesting results.

Best
Erick

On Mon, Sep 14, 2009 at 11:38 AM, Paul Forsyth p...@ez.no wrote:


Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the gramsize to  
take into
account other possible uses but i can now read around these filters  
to make

the adjustments.

With regard to WordDelimiterFilterFactory. Is there a way to place a
delimiter on this filter to still get most of its functionality  
without it

absorbing the + signs? Will i loose a lot of 'good' functionality by
removing it? 'preserveOriginal' sounds promising and seems to work  
but is it

a good idea to use this?


On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:




--- On Mon, 9/14/09, Paul Forsyth p...@ez.no wrote:

From: Paul Forsyth p...@ez.no

Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.

I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.

Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.

Paul



When you remove all filters '+' is kept, but still '+' won't match
'product+'. Because you want to search inside a token.

If + sign is always at the end of of your text, and you want to  
search

only last character of your text EdgeNGramFilterFactory can do that.
with the settings side=back maxGramSize=1 minGramSize=1

The fieldType below will match '+' to 'product+'

fieldType name=textx class=solr.TextField  
positionIncrementGap=100

   analyzer type=index
 tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=ISOLatin1AccentFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
language=English/
  filter class=solr.EdgeNGramFilterFactory side=back
maxGramSize=1 minGramSize=1/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt

ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=ISOLatin1AccentFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
language=English/
   /analyzer
 /fieldType


But this time 'product+' will be reduced to only '+'. You won't be  
able to
search it otherways for example product*. Along with the last  
character, if
you want to keep the original word it self you can set maxGramSize  
to 512.
By doing this token 'product+' will produce 8 tokens: (and query  
product* or

product+ will return it )

+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word

If + sign can be anywhere inside the text you can use  
NGramTokenFilter.

Hope this helps.





Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth




Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth

Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

2009-09-14 Thread palexv

I am using 1.3
Do you suggest 1.4 from developer trunk? I am concern if it stable. Is it
safe to use it in big commerce app?

Noble Paul നോബിള്‍ नोब्ळ्-2 wrote:

which version of Solr are you using. can you try with a recent one and
confirm this?

On Mon, Sep 14, 2009 at 7:45 PM, palexv pal...@gmail.com wrote:

I know that my issue is related to
http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
and https://issues.apache.org/jira/browse/SOLR-728
but my case is quite different.
As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
prevents concurrent executing of import operation but does NOT put
command
in a queue.

I have only few records to index. When run full reindex - it works very
fast. But when I try to rerun this even after a couple of seconds - I am
getting
Caused by:
com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
No operations allowed after connection closed.

At this time, when I check status - it says that status is idle and
everything was indexed success.
Second run of reindex without exception I can run only after 10 seconds.
It does not work for me! If I apply patch from
https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex
in
next 10 seconds as well.
Any suggestions?
--
View this message in context:
http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

--
View this message in context:
http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436948.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching for the '+' character

2009-09-14 Thread Erick Erickson

Before you go too much further with this, I've just got to ask whetherthe
use case for searching product+ really serves your customers.
If you mess around with analyzers to make things include the +,
what does that mean for ? *? .? any other weird character
you can think of?

Would it be a bad thing for product to match product+ and vice
versa? Would it be more or less confusing for your users to have product
FAIL to match product+?

Of course only you really know your problem space, but think carefully
about this issue before you take on the work of making product+ work
because it'll inevitably be wy more work than you think. Imagine the
bug reports when product fails to match product+, both of which
fail to match product

I'd also get a copy of Luke and look at the index to be sure what you
*think*
is in there is *actually* there. It'll also help you understand what
analyzers
do better.

Don't forget that using different analyzers when indexing and querying will
lead to...er...interesting results.

Best
Erick

On Mon, Sep 14, 2009 at 11:38 AM, Paul Forsyth p...@ez.no wrote:

 Thanks Ahmet,

 Thats excellent, thanks :) I may have to increase the gramsize to take into
 account other possible uses but i can now read around these filters to make
 the adjustments.

 With regard to WordDelimiterFilterFactory. Is there a way to place a
 delimiter on this filter to still get most of its functionality without it
 absorbing the + signs? Will i loose a lot of 'good' functionality by
 removing it? 'preserveOriginal' sounds promising and seems to work but is it
 a good idea to use this?


 On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:



 --- On Mon, 9/14/09, Paul Forsyth p...@ez.no wrote:

  From: Paul Forsyth p...@ez.no
 Subject: Re: Searching for the '+' character
 To: solr-user@lucene.apache.org
 Date: Monday, September 14, 2009, 5:55 PM
 With words like 'product+' i'd expect
 a search for '+' to return results like any other character
 or word, so '+' would be found within 'product+' or similar
 text.

 I've tried removing the worddelimiter from the query
 analyzer, restarting and reindexing but i get the same
 result. Nothing is found. I assume one of the filters could
 be adjusted to keep the '+'.

 Weird thing is that i tried to remove all filters from the
 analyzer and i get the same result.

 Paul


 When you remove all filters '+' is kept, but still '+' won't match
 'product+'. Because you want to search inside a token.

 If + sign is always at the end of of your text, and you want to search
 only last character of your text EdgeNGramFilterFactory can do that.
 with the settings side=back maxGramSize=1 minGramSize=1

 The fieldType below will match '+' to 'product+'

 fieldType name=textx class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=ISOLatin1AccentFilterFactory/
   filter class=solr.SnowballPorterFilterFactory
 language=English/
filter class=solr.EdgeNGramFilterFactory side=back
 maxGramSize=1 minGramSize=1/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
   filter class=ISOLatin1AccentFilterFactory/
   filter class=solr.SnowballPorterFilterFactory
 language=English/
 /analyzer
   /fieldType


 But this time 'product+' will be reduced to only '+'. You won't be able to
 search it otherways for example product*. Along with the last character, if
 you want to keep the original word it self you can set maxGramSize to 512.
 By doing this token 'product+' will produce 8 tokens: (and query product* or
 product+ will return it )

 + word
 t+ word
 ct+ word
 uct+ word
 duct+ word
 oduct+ word
 roduct+ word
 product+ word

 If + sign can be anywhere inside the text you can use NGramTokenFilter.
 Hope this helps.




 Best regards,

 Paul Forsyth

 mail: p...@ez.no
 skype: paulforsyth

50% discount on Taming Text , Lucene in Action, etc

2009-09-14 Thread Fuad Efendi

http://www.manning.com/ingersoll/
And other books too, such as Lucene in Action 3rd edition... PDF only (MEAP)

Today Only! Save 50% on any ebook! This offer applies to all final ebooks
or ebook editions purchased through the Manning Early Access Program. Enter
code pop0914 in the Promotional Code box when you check out at manning.com.

Only one usage of each socket address error

2009-09-14 Thread R. Tan

Hi guys,
I'm getting an exception while in the middle of a batch indexing job. Can
anybody help me figure this out?

Error: Only one usage of each socket address (protocol/network address/port)
is normally permitted 127.0.0.1:8080

Solr is 1.4 on Tomcat.

Big thanks.

Rihaed

Re: Single Core or Multiple Core?

2009-09-14 Thread Uri Boness

Is it really a problem? I mean, as i see it, solr to cores is what  
RDBMS is to databases. When you connect to a database you also need to  
specify the database name.


Cheers,
Uri

On Sep 14, 2009, at 16:27, Noble Paul നോബിള്‍  नो 
ब्ळ् noble.p...@corp.aol.com wrote:



The problem is that, if we use multicore it forces you to use a core
name. this is inconvenient. We must get rid of this restriction before
we move single-core to multicore.



On Sat, Sep 12, 2009 at 3:14 PM, Uri Boness ubon...@gmail.com wrote:

+1
Can you add a JIRA issue for that so we can vote for it?

Chris Hostetter wrote:


:  For the record: even if you're only going to have one  
SOlrCore, using

the
:  multicore support (ie: having a solr.xml file) might prove  
handy from

a
:  maintence standpoint ... the ability to configure new on deck  
cores

with
   ...
: Yeah, it is a shame that single-core deployments (no solr.xml)  
does not

have
: a way to enable CoreAdminHandler. This is something we should  
definitely

: look at in Solr 1.5.

I think the most straight forward starting point is to switch how we
structure the examples so that all of the examples uses a solr.xml  
with

multicore support.

Then we can move forward on deprecating the specification of Solr  
Home
using JNDI/systemvars and switch to having the location of the  
solr.xml be

the one master config option with everything else coming after that.



-Hoss









--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re : Configuring slaves for a master backup without restarting

2009-09-14 Thread nourredine khadri

Good idea. Thanks.

Also, in such architecture (Master/Slave), is there any best practices for 
index stored on an NFS mounted filesystem ?


Specially about the rsync step, when the slaves want to synchronize their 
index from a remote filesystem (pb of inconsistent views of the directory).

Nourredine.





De : Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
À : solr-user@lucene.apache.org
Envoyé le : Lundi, 14 Septembre 2009, 16h15mn 55s
Objet : Re: Configuring slaves for a master backup without restarting

you can put both master1 and master2 behind a VIP.

If Master 1 goes down make the VIP point to Master2

On Mon, Sep 14, 2009 at 7:11 PM, nourredine khadri
nourredin...@yahoo.com wrote:
 Hi,

 A question about scalability.

 Let imagine the following architecture based on Master/Slave schema :

 - A master for the indexation called Master 1
 - A backup of Master 1 (called Master 2)
 - Several slaves for search linked to Master 1

 Can I configure the slaves to be automatically linked to Master 2 if Master 
 1 fails without restarting the JVMs?

 Thanks in advance.

 Nourredine.






-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Dataimport MySQLNonTransientConnectionException: No operations allowed after connection closed

which version of Solr are you using. can you try with a recent one and
confirm this?

On Mon, Sep 14, 2009 at 7:45 PM, palexv pal...@gmail.com wrote:

 I know that my issue is related to
 http://www.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html#a19160129
 and https://issues.apache.org/jira/browse/SOLR-728
 but my case is quite different.
 As I understand patch at https://issues.apache.org/jira/browse/SOLR-728
 prevents concurrent executing of import operation but does NOT put command
 in a queue.

 I have only few records to index. When run full reindex - it works very
 fast. But when I try to rerun this even after a couple of seconds - I am
 getting
 Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
 No operations allowed after connection closed.

 At this time, when I check status - it says that status is idle and
 everything was indexed success.
 Second run of reindex without exception I can run only after 10 seconds.
 It does not work for me! If I apply patch from
 https://issues.apache.org/jira/browse/SOLR-728 - I will unable to reindex in
 next 10 seconds as well.
 Any suggestions?
 --
 View this message in context: 
 http://www.nabble.com/Dataimport-MySQLNonTransientConnectionException%3A-No-operations-allowed-after-connection-closed-tp25436605p25436605.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Single Core or Multiple Core?

2009-09-14 Thread Jonathan Ariel

Yes, I think it is better to be backward compatible or the impact of moving
to the new solr version would be big.


On Mon, Sep 14, 2009 at 12:24 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness ubon...@gmail.com wrote:

  Is it really a problem? I mean, as i see it, solr to cores is what RDBMS
 is
  to databases. When you connect to a database you also need to specify the
  database name.
 
 
 The problem is compatibility. If we make solr.xml compulsory then we only
 force people to do a configuration change. But if we make a core name
 mandatory, then we force them to change their applications (or the
 applications' configurations). It is better if we can avoid that. Besides,
 if there's only one core, why need a name?

 --
 Regards,
 Shalin Shekhar Mangar.

Re: Searching for the '+' character

Interesting. I thought that would be the 'hard' approach rather than  
add a filter, but i guess thats all it really is anyway.


Has this been done before? Build a filter to transform a word there  
and back?


On 14 Sep 2009, at 17:17, Chantal Ackermann wrote:




Paul Forsyth schrieb:

Hi Erick,
In this specific case my client does have a new product with a '+' at
the end. Its just one of those odd ones!
Customers are expected to put + into the search box so i have to have
results to show.
I hear your concerns though. Originally i thought I would need to
transform the + into something else, and do this back and forwards to
get a match!


sorry for jumping into the discussion with my little knowledge - but  
I actually think transforming the '+' into something else in the  
index (something like 'pluzz' that has a low probability to appear  
as such in the regular input) is a good solution. You just have to  
do the same on the query side. You could have your own filter for  
that to put it in the schema or just do it manually at index and  
query time.


is that a possibility?

Chantal


Hopefully this will be a standard solr install, but with this tweak
for escaped chars
Paul
On 14 Sep 2009, at 17:01, Erick Erickson wrote:

Before you go too much further with this, I've just got to ask
whetherthe
use case for searching product+ really serves your customers.
If you mess around with analyzers to make things include the +,
what does that mean for ? *? .? any other weird character
you can think of?

Would it be a bad thing for product to match product+ and vice
versa? Would it be more or less confusing for your users to have
product
FAIL to match product+?

Of course only you really know your problem space, but think  
carefully
about this issue before you take on the work of making product+  
work

because it'll inevitably be wy more work than you think. Imagine
the
bug reports when product fails to match product+, both of which
fail to match product

I'd also get a copy of Luke and look at the index to be sure what  
you

*think*
is in there is *actually* there. It'll also help you understand what
analyzers
do better.

Don't forget that using different analyzers when indexing and
querying will
lead to...er...interesting results.

Best
Erick

On Mon, Sep 14, 2009 at 11:38 AM, Paul Forsyth p...@ez.no wrote:


Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the gramsize to
take into
account other possible uses but i can now read around these filters
to make
the adjustments.

With regard to WordDelimiterFilterFactory. Is there a way to  
place a

delimiter on this filter to still get most of its functionality
without it
absorbing the + signs? Will i loose a lot of 'good' functionality  
by

removing it? 'preserveOriginal' sounds promising and seems to work
but is it
a good idea to use this?


On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:



--- On Mon, 9/14/09, Paul Forsyth p...@ez.no wrote:

From: Paul Forsyth p...@ez.no

Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.

I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.

Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.

Paul


When you remove all filters '+' is kept, but still '+' won't match
'product+'. Because you want to search inside a token.

If + sign is always at the end of of your text, and you want to
search
only last character of your text EdgeNGramFilterFactory can do  
that.

with the settings side=back maxGramSize=1 minGramSize=1

The fieldType below will match '+' to 'product+'

fieldType name=textx class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English/
 filter class=solr.EdgeNGramFilterFactory side=back
maxGramSize=1 minGramSize=1/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English/
  /analyzer
/fieldType


But this time 'product+' will be reduced to only '+'. You won't be
able to
search it otherways for example product*. Along with the last
character, if
you want to keep the original word it self you can set maxGramSize
to 512.
By doing this token 'product+' will produce 8

Disabling tf (term frequency) during indexing and/or scoring

2009-09-14 Thread Aaron McKee


Hello,

Let me preface this by admitting that I'm still fairly new to Lucene and 
Solr, so I apologize if any of this sounds naive and I'm open to 
thinking about my problem differently.


I'm currently responsible for a rather large dataset of business records 
that I'm trying to build a Lucene/Solr infrastructure around, to replace 
an in-house solution that we've been using for a few years. These 
records are sourced from multiple providers and there's often a fair bit 
of overlap in the business coverage. I have a set of fuzzy correlation 
libraries that I use to identify these documents and I ultimately create 
a super-record that includes metadata from each of the providers. Given 
the nature of things, these providers often have slight variations in 
wording or spelling in the overlapping fields (it's amazing how many 
ways people find to refer to the same business or address). I'd like to 
capture these variations, as they facilitate searching, but TF 
considerations are currently borking field scoring here.


For example, taking business names into consideration, I have a Solr 
schema similar to:


field name=name_provider1 type=string indexed=false 
stored=false multiValued=true

...
field name=name_providerN type=string indexed=false 
stored=false multiValued=true
field name=nameNorm type=text indexed=true stored=false 
multiValued=true omitNorms=true


copyField source=name_provider1 dest=nameNorm
...
copyField source=name_providerN dest=nameNorm

For any given business record, there may be 1..N business names present 
in the nameNorm field (some with naming variations, some identical). 
With TF enabled, however, I'm getting different match scores on this 
field simply based on how many providers contributed to the record, 
which is not meaningful to me. For example, a record containing 
nameNormfoo barpositionIncrementGapfoo bar/nameNorm is necessarily 
scoring higher than a record just containing nameNormfoo 
bar/nameNorm.  Although I wouldn't mind TF data being considered 
within each discrete field value, I need to find a way to prevent score 
inflation based simply on the number of contributing providers.


Looking at the mailing list archive and searching around, it sounds like 
the omitTf boolean in Lucene used to function somewhat in this manner, 
but has since taken on a broader interpretation (and name) that now also 
disables positional and payload data. Unfortunately, phrase support for 
fields like this is absolutely essential. So what's the best way to 
address a need like this? I guess I don't mind whether this is handled 
at index time or search time, but I'm not sure what I may need to 
override or if there's some existing provision I should take advantage of.


Thank you for any help you may have.

Best regards,
Aaron

Re: KStem download




Pascal Dimassimo wrote:
 
 Hi,
 
 I want to try KStem. I'm following the instructions on this page:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
 
 ... but the download link doesn't work.
 
 Is anyone know the new location to download KStem?
 
I am stuck with the same issue
its link is not working for a long time


is there any alternate link 
Please let us know

darniz
-- 
View this message in context: 
http://www.nabble.com/KStem-download-tp24375856p25440432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: KStem download

2009-09-14 Thread Yonik Seeley

On Mon, Sep 14, 2009 at 1:56 PM, darniz rnizamud...@edmunds.com wrote:
 Pascal Dimassimo wrote:

 Hi,

 I want to try KStem. I'm following the instructions on this page:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

 ... but the download link doesn't work.

 Is anyone know the new location to download KStem?

 I am stuck with the same issue
 its link is not working for a long time


 is there any alternate link
 Please let us know

*shrug* - looks like they changed their download structure (or just
took it down).  I searched around their site a bit but couldn't find
another one (and google wasn't able to find it either).

The one from Lucid is functionally identical, free, and much, much
faster though - I'd just use that.

-Yonik
http://www.lucidimagination.com

Return one word - Auto Complete Request Handler

2009-09-14 Thread Mohamed Parvez

I am trying configure an request handler that will be uses in the Auto
Complete Query.

I am limiting the result to one field by using the fl parameter, which can
be used to specify field to return.

How to make the field return only one word not full sentences.



Thanks/Regards,
Parvez

Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th

2009-09-14 Thread Bradford Stephens

Greetings,

It's time for another Hadoop/Lucene/ApacheCloud  Stack meetup!
This month it'll be on Wednesday, the 30th, at 6:45 pm.

We should have a few interesting guests this time around -- someone from
Facebook may be stopping by to talk about Hive :)

We've had great attendance in the past few months, let's keep it up! I'm always
amazed by the things I learn from everyone.

We're back at the University of Washington, Allen Computer Science
Center (not Computer Engineering)
Map: http://www.washington.edu/home/maps/?CSE

Room: 303 -or- the Entry level. If there are changes, signs will be posted.

More Info:

The meetup is about 2 hours (and there's usually food): we'll have two
in-depth talks of 15-20
minutes each, and then several lightning talks of 5 minutes. If no
one offers, We'll then have discussion and 'social time'.  we'll just
have general discussion. Let net know if you're interested in speaking
or attending. We'd like to focus on education, so every presentation
*needs* to ask some questions at the end. We can talk about these
after the presentations, and I'll record what we've learned in a wiki
and share that with the rest of us.

Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com

Cheers,
Bradford
-- 
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: Searching for the '+' character

 Thanks Ahmet,
 
 Thats excellent, thanks :) I may have to increase the
 gramsize to take into account other possible uses but i can
 now read around these filters to make the adjustments.
 
 With regard to WordDelimiterFilterFactory. Is there a way
 to place a delimiter on this filter to still get most of its
 functionality without it absorbing the + signs? 

Yes you are right, preserveOriginal=1 will causes the original token to be 
indexed without modifications.

 Will i loose a lot of 'good' functionality by removing it?

It depends of your input data. It is used to break one token into subwords.
Like: Wi-Fi - Wi, Fi and PowerShot - Power, Shot
If you input data set contains such words, you may need it.

But I think just to make last character searchable, using NGramFilter(s) is not 
an optimal solution. I don't know what type of dataset you have but, I think 
using separate two fields (with different types) for that is more suitable. One 
field will contain actual data itself. The other will hold only the last 
character(s).

You can achieve this by a copyField or programatically during indexing. The 
type of the field lastCharsField will be using EdgeNGramFilter so that only 
last character of token(s) will pass that filter.

During searching you will search those two fields: 
originalField:\+ OR lastCharsField:\+

The query lastCharsField:\+ will return you all the products ending with +.

Hope this helps.

Re: KStem download

2009-09-14 Thread Joe Calderon

is the source for the lucid kstemmer available ? from the lucid solr
package i only found the compiled jars

On Mon, Sep 14, 2009 at 11:04 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Mon, Sep 14, 2009 at 1:56 PM, darniz rnizamud...@edmunds.com wrote:
 Pascal Dimassimo wrote:

 Hi,

 I want to try KStem. I'm following the instructions on this page:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

 ... but the download link doesn't work.

 Is anyone know the new location to download KStem?

 I am stuck with the same issue
 its link is not working for a long time


 is there any alternate link
 Please let us know

 *shrug* - looks like they changed their download structure (or just
 took it down).  I searched around their site a bit but couldn't find
 another one (and google wasn't able to find it either).

 The one from Lucid is functionally identical, free, and much, much
 faster though - I'd just use that.

 -Yonik
 http://www.lucidimagination.com

Re: KStem download


Ok i downlaod the lucid imaginationversion of Solr.

From the lib directory i copied the two jars
lucid-kstem.jar and  lucid-solr-kstem.jar

and put in my local solr instance 
at 
C:\solr\apache-solr-1.3.0\lib

When i declare a field type like this
fieldtype name=lucidkstemmer class=solr.TextField 
analyzer 
tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.LucidKStemFilterFactory 
protected=protwords.txt / 
/analyzer 
/fieldtype 

its throwing class not found exception.

Is there some other files which i am missing.

Please let me know thanks

Rashid






Yonik Seeley-2 wrote:
 
 On Mon, Sep 14, 2009 at 1:56 PM, darniz rnizamud...@edmunds.com wrote:
 Pascal Dimassimo wrote:

 Hi,

 I want to try KStem. I'm following the instructions on this page:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

 ... but the download link doesn't work.

 Is anyone know the new location to download KStem?

 I am stuck with the same issue
 its link is not working for a long time


 is there any alternate link
 Please let us know
 
 *shrug* - looks like they changed their download structure (or just
 took it down).  I searched around their site a bit but couldn't find
 another one (and google wasn't able to find it either).
 
 The one from Lucid is functionally identical, free, and much, much
 faster though - I'd just use that.
 
 -Yonik
 http://www.lucidimagination.com
 
 

-- 
View this message in context: 
http://www.nabble.com/KStem-download-tp24375856p25440690.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: KStem download


Ok i downlaod the lucid imaginationversion of Solr.

From the lib directory i copied the two jars
lucid-kstem.jar and  lucid-solr-kstem.jar

and put in my local solr instance
at
C:\solr\apache-solr-1.3.0\lib

When i declare a field type like this
fieldtype name=lucidkstemmer class=solr.TextField
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LucidKStemFilterFactory 
protected=protwords.txt /
/analyzer
/fieldtype

its throwing class not found exception.

Is there some other files which i am missing.

Please let me know thanks

darniz

Yonik Seeley-2 wrote:
 
 On Mon, Sep 14, 2009 at 1:56 PM, darniz rnizamud...@edmunds.com wrote:
 Pascal Dimassimo wrote:

 Hi,

 I want to try KStem. I'm following the instructions on this page:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

 ... but the download link doesn't work.

 Is anyone know the new location to download KStem?

 I am stuck with the same issue
 its link is not working for a long time


 is there any alternate link
 Please let us know
 
 *shrug* - looks like they changed their download structure (or just
 took it down).  I searched around their site a bit but couldn't find
 another one (and google wasn't able to find it either).
 
 The one from Lucid is functionally identical, free, and much, much
 faster though - I'd just use that.
 
 -Yonik
 http://www.lucidimagination.com
 
 

-- 
View this message in context: 
http://www.nabble.com/KStem-download-tp24375856p25440692.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: 50% discount on Taming Text , Lucene in Action, etc

2009-09-14 Thread Erik Hatcher


3rd edition?!   *whew* - let's get the 2nd edition in print first ;)

Erik

On Sep 14, 2009, at 12:10 PM, Fuad Efendi wrote:


http://www.manning.com/ingersoll/
And other books too, such as Lucene in Action 3rd edition... PDF  
only (MEAP)


Today Only! Save 50% on any ebook! This offer applies to all final  
ebooks
or ebook editions purchased through the Manning Early Access  
Program. Enter
code pop0914 in the Promotional Code box when you check out at  
manning.com.

Re: Searching for the '+' character

2009-09-14 Thread Matt Weber

Why don't you create a synonym for + that expands to your customers  
product name that includes the plus?  You can even have your FE do  
this sort of replacement BEFORE submitting to Solr.


Thanks,

Matt Weber

On Sep 14, 2009, at 11:42 AM, AHMET ARSLAN wrote:


Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the
gramsize to take into account other possible uses but i can
now read around these filters to make the adjustments.

With regard to WordDelimiterFilterFactory. Is there a way
to place a delimiter on this filter to still get most of its
functionality without it absorbing the + signs?


Yes you are right, preserveOriginal=1 will causes the original  
token to be indexed without modifications.



Will i loose a lot of 'good' functionality by removing it?


It depends of your input data. It is used to break one token into  
subwords.

Like: Wi-Fi - Wi, Fi and PowerShot - Power, Shot
If you input data set contains such words, you may need it.

But I think just to make last character searchable, using  
NGramFilter(s) is not an optimal solution. I don't know what type of  
dataset you have but, I think using separate two fields (with  
different types) for that is more suitable. One field will contain  
actual data itself. The other will hold only the last character(s).


You can achieve this by a copyField or programatically during  
indexing. The type of the field lastCharsField will be using  
EdgeNGramFilter so that only last character of token(s) will pass  
that filter.


During searching you will search those two fields:
originalField:\+ OR lastCharsField:\+

The query lastCharsField:\+ will return you all the products ending  
with +.


Hope this helps.

RE: 50% discount on Taming Text , Lucene in Action, etc

2009-09-14 Thread Fuad Efendi


Yes, 2nd edition; but subscription-based Manning Early Access Program
(MEAP) is available, $13.75 (today only...), plus Author Online:
http://www.manning-sandbox.com/forum.jspa?forumID=451
http://www.manning.com/hatcher3/


 -Original Message-
 From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
 Sent: September-14-09 3:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: 50% discount on Taming Text , Lucene in Action, etc
 
 3rd edition?!   *whew* - let's get the 2nd edition in print first ;)
 
   Erik
 
 On Sep 14, 2009, at 12:10 PM, Fuad Efendi wrote:
 
  http://www.manning.com/ingersoll/
  And other books too, such as Lucene in Action 3rd edition... PDF
  only (MEAP)
 
  Today Only! Save 50% on any ebook! This offer applies to all final
  ebooks
  or ebook editions purchased through the Manning Early Access
  Program. Enter
  code pop0914 in the Promotional Code box when you check out at
  manning.com.

Load synonyms dynamically

2009-09-14 Thread Mohamed Parvez

Is there a way to load the synonyms dynamically.

I mean if the synonym.txt file changes then during query time the newly
added synonym should be active.

Currently it required a reindex


Thanks/Regards,
Parvez

Solr 1.4 - autoSuggest - is it a default service

2009-09-14 Thread Yerraguntla


Hi,

I am trying to use autoSuggest in Solr 1.4. Is autoSugest service available
by default like select? or should I configure anything?


Solrconfig.xml contains the termcomponent defined.

Thanks
R


-- 
View this message in context: 
http://www.nabble.com/Solr-1.4---autoSuggestis-it-a-default-service-tp25443128p25443128.html
Sent from the Solr - User mailing list archive at Nabble.com.

Difficulty with Multi-Word Synonyms

2009-09-14 Thread Gregg Donovan

I'm running into an odd issue with multi-word synonyms in Solr (using
the latest [9/14/09] nightly ). Things generally seem to work as
expected, but I sometimes see words that are the leading term in a
multi-word synonym being replaced with the token that follows them in
the stream when they should just be ignored (i.e. there's no synonym
match for just that token). When I preview the analysis at
admin/analysis.jsp it looks fine, but at runtime I see problems like
the one in the unit test below. It's a simple case, so I assume I'm
making some sort of configuration and/or usage error.

package org.apache.solr.analysis;
import java.io.*;
import java.util.*;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;

public class TestMultiWordSynonmys extends junit.framework.TestCase {

  public void testMultiWordSynonmys() throws IOException {
    ListString rules = new ArrayListString();
    rules.add( a b c,d );
    SynonymMap synMap = new SynonymMap( true );
    SynonymFilterFactory.parseRules( rules, synMap, =, ,, true, null);

    SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new
StringReader(a e)), synMap );
    TermAttribute termAtt = (TermAttribute)
ts.getAttribute(TermAttribute.class);

    ts.reset();
    ListString tokens = new ArrayListString();
    while (ts.incrementToken()) tokens.add( termAtt.term() );

// This fails because [e,e] is the value of the token stream
    assertEquals(Arrays.asList(a,e), tokens);
  }
}

Any help would be much appreciated. Thanks.

--Gregg

multicore shards and relevancy score

2009-09-14 Thread Paul Rosen


Hi,

I've done a few experiments with searching two cores with the same 
schema using the shard syntax. (using solr 1.3)


My use case is that I want to have multiple cores because a few 
different people will be managing the indexing, and that will happen at 
different times. The data, however, is homogeneous.


I've noticed in my tests that the results are not interwoven, but it 
might just be my test data. In other words, all the results from one 
core appear, then all the results from the other core.


In thinking about it, it would make sense if the relevancy scores for 
each core were completely independent of each other. And that would mean 
that there is no way to compare the relevancy scores between the cores.


In other words, I'd like the following results:

- really relevant hit from core0
- pretty relevant hit from core1
- kind of relevant hit from core0
- not so relevant hit from core1

but I get:

- really relevant hit from core0
- kind of relevant hit from core0
- pretty relevant hit from core1
- not so relevant hit from core1

So, are the results supposed to be interwoven, and I need to study my 
data more, or is this just not something that is possible?


Also, if this is insurmountable, I've discovered two show stoppers that 
will prevent using multicore in my project (counting the lack of support 
for faceting in multicore). Are these issues addressed in solr 1.4?


Thanks,
Paul

Re: 50% discount on Taming Text , Lucene in Action, etc

2009-09-14 Thread Lukáš Vlček

Hi,

I can confirm it works! :-)
Regards,
Lukas


On Mon, Sep 14, 2009 at 10:20 PM, Fuad Efendi f...@efendi.ca wrote:


 Yes, 2nd edition; but subscription-based Manning Early Access Program
 (MEAP) is available, $13.75 (today only...), plus Author Online:
 http://www.manning-sandbox.com/forum.jspa?forumID=451
 http://www.manning.com/hatcher3/


  -Original Message-
  From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
  Sent: September-14-09 3:05 PM
  To: solr-user@lucene.apache.org
  Subject: Re: 50% discount on Taming Text , Lucene in Action, etc
 
  3rd edition?!   *whew* - let's get the 2nd edition in print first ;)
 
Erik
 
  On Sep 14, 2009, at 12:10 PM, Fuad Efendi wrote:
 
   http://www.manning.com/ingersoll/
   And other books too, such as Lucene in Action 3rd edition... PDF
   only (MEAP)
  
   Today Only! Save 50% on any ebook! This offer applies to all final
   ebooks
   or ebook editions purchased through the Manning Early Access
   Program. Enter
   code pop0914 in the Promotional Code box when you check out at
   manning.com.

Re: Solr 1.4 - autoSuggest - is it a default service

2009-09-14 Thread Mohamed Parvez

I guess you are looking for terms, Its in 1.4

just use a query like

http://localhost:port
/solr/terms/?terms=trueterms.fl=filed_nameterms.prefix=da


Thanks/Regards,
Parvez



On Mon, Sep 14, 2009 at 3:35 PM, Yerraguntla raveend...@yahoo.com wrote:


 Hi,

 I am trying to use autoSuggest in Solr 1.4. Is autoSugest service available
 by default like select? or should I configure anything?


 Solrconfig.xml contains the termcomponent defined.

 Thanks
 R


 --
 View this message in context:
 http://www.nabble.com/Solr-1.4---autoSuggestis-it-a-default-service-tp25443128p25443128.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Single Core or Multiple Core?

2009-09-14 Thread Uri Boness

IMO forcing the users to do configuration change in Solr or in their 
application is the same thing - it all boils down to configuration 
change (I'll be very surprised if someone is actually hardcoding the 
Solr URL in their system - most probably it is configurable, and if it's 
not, forcing them to change it is actually a good thing).

Besides,
if there's only one core, why need a name?
Consistency. Having a default core as Israel suggested can probably do 
the trick. But, at first it might seem that having a default core and 
not needing to specify the core name will make it easier for users to 
use. But I actually disagree - don't under estimate the power of being 
consistent. I rather have a manual telling me this is how it works and 
it always work like that in all scenarios then having something like 
this is how it works but if you have scenario A then it works 
differently and you have to do this instead.


Shalin Shekhar Mangar wrote:

On Mon, Sep 14, 2009 at 8:16 PM, Uri Boness ubon...@gmail.com wrote:

  

Is it really a problem? I mean, as i see it, solr to cores is what RDBMS is
to databases. When you connect to a database you also need to specify the
database name.




The problem is compatibility. If we make solr.xml compulsory then we only
force people to do a configuration change. But if we make a core name
mandatory, then we force them to change their applications (or the
applications' configurations). It is better if we can avoid that. Besides,
if there's only one core, why need a name?

Is it possible to query for everything ?

2009-09-14 Thread Jonathan Vanasco


I'm using Solr for seach and faceted browsing

Is it possible to have solr search for 'everything' , at least as far  
as q is concerned ?


The request handlers I've found don't like it if I don't pass in a q  
parameter

Re: Is it possible to query for everything ?

2009-09-14 Thread Matt Weber


Query for *:*

Thanks,

Matt Weber

On Sep 14, 2009, at 4:18 PM, Jonathan Vanasco wrote:


I'm using Solr for seach and faceted browsing

Is it possible to have solr search for 'everything' , at least as  
far as q is concerned ?


The request handlers I've found don't like it if I don't pass in a q  
parameter

Re: Is it possible to query for everything ?

2009-09-14 Thread Jay Hill

Use: ?q=*:*

-Jay
http://www.lucidimagination.com


On Mon, Sep 14, 2009 at 4:18 PM, Jonathan Vanasco jvana...@2xlp.com wrote:

 I'm using Solr for seach and faceted browsing

 Is it possible to have solr search for 'everything' , at least as far as q
 is concerned ?

 The request handlers I've found don't like it if I don't pass in a q
 parameter

Re: Is it possible to query for everything ?

2009-09-14 Thread Jonathan Vanasco


Thanks Jay  Matt

I tried *:* on my app, and it didn't work

I tried it on the solr admin, and it did

I checked the solr config file, and realized that it works on  
standard, but not on dismax, queries


So i have my app checking *:* on a standard qt, and then filtering  
what I need on other qts!


I would never have figured this out without you two!

Re: Is it possible to query for everything ?

2009-09-14 Thread Jay Hill

With dismax you can use q.alt when the q param is missing:
q.alt=*:*
should work.

-Jay


On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco jvana...@2xlp.com wrote:

 Thanks Jay  Matt

 I tried *:* on my app, and it didn't work

 I tried it on the solr admin, and it did

 I checked the solr config file, and realized that it works on standard, but
 not on dismax, queries

 So i have my app checking *:* on a standard qt, and then filtering what I
 need on other qts!

 I would never have figured this out without you two!

Re: KStem download


i was able to declare a field type when the i use the lucid distribution of
solr
fieldtype name=lucidkstemmer class=solr.TextField
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
protected=protwords.txt /
/analyzer
/fieldtype

But if i copy the two jars and put it in lib directory of apache solr
distribution it still gives me the following error.

SEVERE: java.lang.NoClassDefFoundError:
org/apache/solr/util/plugin/ResourceLoaderAware
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375)
at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:278)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:781)
at
org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:56)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:413)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:431)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:440)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:92)
at org.apache.solr.core.SolrCore.init(SolrCore.java:412)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.util.plugin.ResourceLoaderAware
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)

Re: KStem download