date:20100106

Re: Rules engine and Solr

2010-01-06 Thread Avlesh Singh

Thanks for the revert, Ravi.

I am currently working on some of kind rules in front
(application side) of our solr instance. These rules are more application
specific and are not general. Like deciding which fields to facet, which
fields to return in response, which fields to highlight, boost value for
each field (both at query time and at index time).
The approach I have taken is to define a database table which
holds these fields parameters. Which are then interpreted by my application
to decide the query to be sent to Solr. This allow tweaking the Solr fields
on the fly and hence influence the search results.

I guess, this is the usual usage of solr server. In my case this is no
different. Search queries have a personalized experience, which means
behaviors for facets, highlighting etc .. are customizable. We pull it off
using databases and java data structures.

I will be interested to hear from you about the Kind of rules you talk
about and your approach towards it. Are these Rules like a regular
expression that when matched with the user query, execute a specific
solr
query ?

http://en.wikipedia.org/wiki/Business_rules_engine

Cheers
Avlesh

On Wed, Jan 6, 2010 at 12:12 PM, Ravi Gidwani ravi.gidw...@gmail.comwrote:

Avlesh:
I am currently working on some of kind rules in front
(application side) of our solr instance. These rules are more application
specific and are not general. Like deciding which fields to facet, which
fields to return in response, which fields to highlight, boost value for
each field (both at query time and at index time).
The approach I have taken is to define a database table which
holds these fields parameters. Which are then interpreted by my application
to decide the query to be sent to Solr. This allow tweaking the Solr fields
on the fly and hence influence the search results.

~Ravi

On Tue, Jan 5, 2010 at 8:25 PM, Avlesh Singh avl...@gmail.com wrote:

Your question appears to be an XY Problem ... that is: you are
dealing
with X, you are assuming Y will help you, and you are asking about
Y
without giving more details about the X so that we can understand the
full
issue. Perhaps the best solution doesn't involve Y at all? See Also:
http://www.perlmonks.org/index.pl?node_id=542341

Hahaha, thats classic Hoss!
Thanks for introducing me to the XY problem. Had I known the two
completely,
I wouldn't have posted it on the mailing list. And I wasn't looking for a
solution either. Anyways, as I replied back earlier, I'll get back with
questions once I get more clarity.

Cheers
Avlesh

On Wed, Jan 6, 2010 at 2:02 AM, Chris Hostetter
hossman_luc...@fucit.org
wrote:

: I am planning to build a rules engine on top search. The rules are
database
: driven and can't be stored inside solr indexes. These rules would
ultimately
: two do things -
:
:1. Change the order of Lucene hits.
:2. Add/remove some results to/from the Lucene hits.
:
: What should be my starting point? Custom search handler?

This smells like an XY problem ... can you elaborate on the types of
rules/conditions/situations when you want #1 and #2 listed above to
happen?

http://people.apache.org/~hossman/#xyproblemhttp://people.apache.org/%7Ehossman/#xyproblem
http://people.apache.org/%7Ehossman/#xyproblem
http://people.apache.org/%7Ehossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are
dealing
with X, you are assuming Y will help you, and you are asking about
Y
without giving more details about the X so that we can understand the
full issue. Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

-Hoss

readOnly=true IndexReader

2010-01-06 Thread Patrick Sauts

In the Wiki page : 
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found
-Open the IndexReader with readOnly=true. This makes a big difference 
when multiple threads are sharing the same reader, as it removes certain 
sources of thread contention.


How to open the IndexReader with readOnly=true ?
I can't find anything related to this parameter.

Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any 
incidence on solr with a standart solrConfig.xml?


Thank you for your answers.

Patrick.

Re: readOnly=true IndexReader

2010-01-06 Thread Shalin Shekhar Mangar

On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.comwrote:

 In the Wiki page :
 http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found
 -Open the IndexReader with readOnly=true. This makes a big difference when
 multiple threads are sharing the same reader, as it removes certain sources
 of thread contention.

 How to open the IndexReader with readOnly=true ?
 I can't find anything related to this parameter.


Solr always opens IndexReader with readOnly=true. It was added with SOLR-730
and released in Solr 1.3

-- 
Regards,
Shalin Shekhar Mangar.

Re: readOnly=true IndexReader

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.com wrote:
 In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed,
 I've found
 -Open the IndexReader with readOnly=true. This makes a big difference when
 multiple threads are sharing the same reader, as it removes certain sources
 of thread contention.

 How to open the IndexReader with readOnly=true ?
 I can't find anything related to this parameter.

 Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any
 incidence on solr with a standart solrConfig.xml?
these are not variables used by Solr. These are just substituted in
solrconfig.xml and probably consumed by ReplicationHandler (this is
not a standard)

 Thank you for your answers.

 Patrick.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

schema.xml and Xinclude

2010-01-06 Thread Patrick Sauts

As types/ in schema.xml are the same between all our indexes, I'd like 
to make them an XInclude so I tried :


?xml version=1.0 encoding=UTF-8?

schema name=example version=1.2 
xmlns:xi=http://www.w3.org/2001/XInclude;


 xi:include href=solr-types.xml/
fields
-
-
-
/schema

My Syntax might not be correct ?
Or it is not possible ? yet ?

Thank you again for your time.

Patrick.

Yankee's Solr integration

2010-01-06 Thread Nicolas Kern

Hello everybody,

I was wordering how did Yankee (
http://www.yankeegroup.com/search.do?searchType=advancedSearch) did to
provide the possibility to Create Alerts, Save Searches, and generate a RSS
Feed out of a custom search using Solr, do you have any idea ?

Thanks a lot,
Best regards  happy new year !
Nicolas

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Erick Erickson

Hmmm, the name WordDelimiterFilterFactory might be leading
you astray. Its purpose isn't to break things up into words
that have anything to do with grammatical rules. Rather, it's
purpose is to break up strings of funky characters into
searchable stuff. see:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

In the grammatical sense, PowerShot should just be
PowerShot, not power shot (which is what WordDelimiterFactory
gives you, options permitting). So I think you probably want
one of the other analyzers

Have you tried any other analyzers? StandardAnalyzer might be
more friendly

HTH
Erick

On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land caleb.l...@gmail.com wrote:

I've tracked this problem down to the fact that I'm using the
WordDelimiterFilter. I don't quite understand what's happening, but if I
add preserveOriginal=1 as an option, everything looks fine. I think it
has
to do with the period being stripped in the token stream.

On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land caleb.l...@gmail.com wrote:

Hello,
I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse
basic sentences, and I'm running into a problem.

I'm using the default regex specified in the example solr configuration:

[-\w ,/\n\']{20,200}

But I am using a larger fragment size (140) with a slop of 1.0.

Given the passage:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque a
ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue
vitae, molestie quis nunc.

When I search for Nulla (the first word of the second sentence) and
grab
the first highlighted snippet, this is what I get:

. emNulla/em a neque a ipsum accumsan iaculis at id lacus

As you can see, there's a leading period from the previous sentence and
the
period from the current sentence is missing.

I understand this regex isn't that advanced, but I've tried everything I
can think of, regex-wise, to get this to work, and I always end up with
this
problem.

For example, I've tried: \w[^.!?]{0,200}[.!?]

Which seems like it should include the ending punctuation, but it
doesn't,
so I think I'm missing something.

Does anybody know a regex that works?
--
Caleb Land

--
Caleb Land

Re: performance question

2010-01-06 Thread A. Steven Anderson

 Strictly speaking there is some insignificant distinctions in performance
 related to how a field name is resolved -- Grant alluded to this
 earlier in this thread -- but it only comes into play when you actually
 refer to that field by name and Solr has to look them up in the
 metadata.  So for example if your request refered to 100 differnet field
 names in the q, fq, and facet.field params there would be a small overhead
 for any of those 100 fields that existed because of dynamicField/
 declarations, that would not exist for any of those fields that were
 declared using field/ -- but there would be no added overhead to htat
 query if there were 999 other fields that existed in your index
 because of that same dynamicField/ declaration.

 But frankly: we're getting talking about seriously ridiculous
 pico-optimizing at this point ... if you find yourselv with performance
 concerns there are probaly 500 other things worth worrying about before
 this should ever cross your mind.


Thanks for the follow up.

I've converted our schema to required fields only with every other field
being a dynamic field.

The only negative that I've found so far is that you lose the copyField
capability, so it makes my ingest a little bigger, since I have to manually
copy the values myself.

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: performance question

2010-01-06 Thread Erik Hatcher

You don't lose copyField capability with dynamic fields.  You can copy  
dynamic fields into a fixed field name like *_s = text or dynamic  
fields into another dynamic field like  *_s = *_t


Erik

On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote:

Strictly speaking there is some insignificant distinctions in  
performance

related to how a field name is resolved -- Grant alluded to this
earlier in this thread -- but it only comes into play when you  
actually

refer to that field by name and Solr has to look them up in the
metadata.  So for example if your request refered to 100 differnet  
field
names in the q, fq, and facet.field params there would be a small  
overhead

for any of those 100 fields that existed because of dynamicField/
declarations, that would not exist for any of those fields that were
declared using field/ -- but there would be no added overhead to  
htat

query if there were 999 other fields that existed in your index
because of that same dynamicField/ declaration.

But frankly: we're getting talking about seriously ridiculous
pico-optimizing at this point ... if you find yourselv with  
performance
concerns there are probaly 500 other things worth worrying about  
before

this should ever cross your mind.



Thanks for the follow up.

I've converted our schema to required fields only with every other  
field

being a dynamic field.

The only negative that I've found so far is that you lose the  
copyField
capability, so it makes my ingest a little bigger, since I have to  
manually

copy the values myself.

--
A. Steven Anderson
Independent Consultant
st...@asanderson.com

ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin

Hi everyone,

I've been trying to add a date based boost to my queries. I have a field like:

fieldType name=tdate class=solr.TrieDateField omitNorms=true 
precisionStep=6 positionIncrementGap=0/
field name=datetime type=tdate indexed=true stored=true 
required=true /

When I look at the datetime field in the solr schema browser I can see that 
there are 9051 distinct dates.

When I try to add the parameter to my query like: bf=ord(datetime) (on a dismax 
query) I always get 9051 as the result of the function. I see this in the debug 
data:


1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:

9051.0 = 9051

1.0 = boost

0.18767032 = queryNorm



It is exactly the same for every result, even though each result has a 
different value for datetime.



Does anyone have any suggestions as to why this could be happening? I have done 
extensive googling with no luck.



Thanks,

Kallin Nagelberg.

replication -- missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade

I set up replication between 2 cores on one master and 2 cores on one slave. 
Before doing this the master was working without issues, and I stopped all 
indexing on the master.

Now that replication has synced the index files, an .FDT field is suddenly 
missing on both the master and the slave. Pretty much every operation (core 
reload, commit, add document) fails with an error like the one posted below.

How could this happen? How can one recover from such an error? Is there any way 
to regenerate the FDT file without re-indexing everything?

This brings me to a question about backups. If I run the 
replication?command=backup command, where is this backup stored? I've tried 
this a few times and get an OK response from the machine, but I don't see the 
backup generated anywhere.

Thanks,
Gio.

org.apache.solr.common.SolrException: Error handling 'reload' action
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
   at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
   at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
   at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
   at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
   at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
specified)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
   at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
   at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
   ... 18 more
Caused by: java.io.FileNotFoundException: 
Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
specified)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
   at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
   at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
   at 
org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
   at 
org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
   at 
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
   at 
org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103)
   at 
org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27)
   at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
   at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
   at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
   at org.apache.solr.core.SolrCore.getSearcher(So

Re: ord on TrieDateField always returning max

2010-01-06 Thread Yonik Seeley

Besides using up a lot more memory, ord() isn't even going to work for
a field with multiple tokens indexed per value (like tdate).
I'd recommend using a function on the date value itself.
http://wiki.apache.org/solr/FunctionQuery#ms

-Yonik
http://www.lucidimagination.com



On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
 Hi everyone,

 I've been trying to add a date based boost to my queries. I have a field like:

 fieldType name=tdate class=solr.TrieDateField omitNorms=true 
 precisionStep=6 positionIncrementGap=0/
 field name=datetime type=tdate indexed=true stored=true 
 required=true /

 When I look at the datetime field in the solr schema browser I can see that 
 there are 9051 distinct dates.

 When I try to add the parameter to my query like: bf=ord(datetime) (on a 
 dismax query) I always get 9051 as the result of the function. I see this in 
 the debug data:


 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:

    9051.0 = 9051

    1.0 = boost

    0.18767032 = queryNorm



 It is exactly the same for every result, even though each result has a 
 different value for datetime.



 Does anyone have any suggestions as to why this could be happening? I have 
 done extensive googling with no luck.



 Thanks,

 Kallin Nagelberg.

RE: ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin

Thanks Yonik, I was just looking at that actually.
Trying something like recip(ms(NOW,datetime),3.16e-11,1,1)^10  now.
My 'inspiration' for the ord method was actually the Solr 1.4 Enterprise Search 
server book. Page 126 has a section 'using reciprocals and rord with dates'. 
You should let those guys know what's up!

Thanks,
Kallin.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, January 06, 2010 11:23 AM
To: solr-user@lucene.apache.org
Subject: Re: ord on TrieDateField always returning max

Besides using up a lot more memory, ord() isn't even going to work for
a field with multiple tokens indexed per value (like tdate).
I'd recommend using a function on the date value itself.
http://wiki.apache.org/solr/FunctionQuery#ms

-Yonik
http://www.lucidimagination.com



On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
 Hi everyone,

 I've been trying to add a date based boost to my queries. I have a field like:

 fieldType name=tdate class=solr.TrieDateField omitNorms=true 
 precisionStep=6 positionIncrementGap=0/
 field name=datetime type=tdate indexed=true stored=true 
 required=true /

 When I look at the datetime field in the solr schema browser I can see that 
 there are 9051 distinct dates.

 When I try to add the parameter to my query like: bf=ord(datetime) (on a 
 dismax query) I always get 9051 as the result of the function. I see this in 
 the debug data:


 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:

    9051.0 = 9051

    1.0 = boost

    0.18767032 = queryNorm



 It is exactly the same for every result, even though each result has a 
 different value for datetime.



 Does anyone have any suggestions as to why this could be happening? I have 
 done extensive googling with no luck.



 Thanks,

 Kallin Nagelberg.

Re: ord on TrieDateField always returning max

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 11:26 AM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
 Thanks Yonik, I was just looking at that actually.
 Trying something like recip(ms(NOW,datetime),3.16e-11,1,1)^10  now.

I'd also recommend looking into a multiplicative boost too - IMO they
normally make more sense.
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

-Yonik
http://www.lucidimagination.com




 My 'inspiration' for the ord method was actually the Solr 1.4 Enterprise 
 Search server book. Page 126 has a section 'using reciprocals and rord with 
 dates'. You should let those guys know what's up!

 Thanks,
 Kallin.

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Wednesday, January 06, 2010 11:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: ord on TrieDateField always returning max

 Besides using up a lot more memory, ord() isn't even going to work for
 a field with multiple tokens indexed per value (like tdate).
 I'd recommend using a function on the date value itself.
 http://wiki.apache.org/solr/FunctionQuery#ms

 -Yonik
 http://www.lucidimagination.com



 On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg, Kallin
 knagelb...@globeandmail.com wrote:
 Hi everyone,

 I've been trying to add a date based boost to my queries. I have a field 
 like:

 fieldType name=tdate class=solr.TrieDateField omitNorms=true 
 precisionStep=6 positionIncrementGap=0/
 field name=datetime type=tdate indexed=true stored=true 
 required=true /

 When I look at the datetime field in the solr schema browser I can see that 
 there are 9051 distinct dates.

 When I try to add the parameter to my query like: bf=ord(datetime) (on a 
 dismax query) I always get 9051 as the result of the function. I see this in 
 the debug data:


 1698.6041 = (MATCH) FunctionQuery(top(ord(datetime))), product of:

    9051.0 = 9051

    1.0 = boost

    0.18767032 = queryNorm



 It is exactly the same for every result, even though each result has a 
 different value for datetime.



 Does anyone have any suggestions as to why this could be happening? I have 
 done extensive googling with no luck.



 Thanks,

 Kallin Nagelberg.

Re: performance question

2010-01-06 Thread A. Steven Anderson

 You don't lose copyField capability with dynamic fields.  You can copy
 dynamic fields into a fixed field name like *_s = text or dynamic fields
 into another dynamic field like  *_s = *_t


Ahhh...I missed that little detail.  Nice!

Ok, so there are no negatives to using dynamic fields then. ;-)

Thanks for all the info!

-- 
A. Steven Anderson
Independent Consultant
st...@asanderson.com

Re: replication -- missing field data file

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one slave. 
 Before doing this the master was working without issues, and I stopped all 
 indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see the 
 backup generated anywhere.
The backup is done asynchronously. So it always gives an OK response immedietly.
The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
       at 
 org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
       at 
 org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
       at 
 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
       at 
 org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103)
       at 
 org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27)
       at 
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
       at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
       at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68)
       at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
       at

solr and patch - SOLR-64 SOLR-792

2010-01-06 Thread Thibaut Lassalle

hi,

I tried to apply patches to solr-1.4

Here is the result

javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0  SOLR-64.patch
patching file src/java/org/apache/solr/schema/HierarchicalFacetField.java
patching file src/common/org/apache/solr/common/params/FacetParams.java
Hunk #1 FAILED at 108.
1 out of 1 hunk FAILED -- saving rejects to file
src/common/org/apache/solr/common/params/FacetParams.java.rej
patching file example/solr/conf/schema.xml
Hunk #1 FAILED at 144.
Hunk #2 FAILED at 417.
2 out of 2 hunks FAILED -- saving rejects to file
example/solr/conf/schema.xml.rej
patching file src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 227.
Hunk #3 FAILED at 238.
Hunk #4 FAILED at 484.
Hunk #5 FAILED at 541.
5 out of 5 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/request/SimpleFacets.java.rej
javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 SOLR-792.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git a/example/solr/conf/solrconfig.xml
b/example/solr/conf/solrconfig.xml
|index e2c0a48..1cd18bc 100755
|--- a/example/solr/conf/solrconfig.xml
|+++ b/example/solr/conf/solrconfig.xml
--
File to patch:


It failed on windows xp too.
What's wrong with what I'm doing?

Thanks
t.

Re: solr and patch - SOLR-64 SOLR-792

2010-01-06 Thread Erik Hatcher

You probably aren't doing anything wrong, other than those patches are  
a bit out of date with trunk.  You might have to fight through getting  
them current a bit, or wait until I or someone else can get to  
updating them.


Erik

On Jan 6, 2010, at 11:52 AM, Thibaut Lassalle wrote:


hi,

I tried to apply patches to solr-1.4

Here is the result

javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0  SOLR-64.patch
patching file src/java/org/apache/solr/schema/ 
HierarchicalFacetField.java
patching file src/common/org/apache/solr/common/params/ 
FacetParams.java

Hunk #1 FAILED at 108.
1 out of 1 hunk FAILED -- saving rejects to file
src/common/org/apache/solr/common/params/FacetParams.java.rej
patching file example/solr/conf/schema.xml
Hunk #1 FAILED at 144.
Hunk #2 FAILED at 417.
2 out of 2 hunks FAILED -- saving rejects to file
example/solr/conf/schema.xml.rej
patching file src/java/org/apache/solr/request/SimpleFacets.java
Hunk #1 FAILED at 33.
Hunk #2 FAILED at 227.
Hunk #3 FAILED at 238.
Hunk #4 FAILED at 484.
Hunk #5 FAILED at 541.
5 out of 5 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/request/SimpleFacets.java.rej
javad...@javadev5:~/Java/apache-solr-1.4.0$ patch -p0 SOLR-792.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git a/example/solr/conf/solrconfig.xml
b/example/solr/conf/solrconfig.xml
|index e2c0a48..1cd18bc 100755
|--- a/example/solr/conf/solrconfig.xml
|+++ b/example/solr/conf/solrconfig.xml
--
File to patch:


It failed on windows xp too.
What's wrong with what I'm doing?

Thanks
t.

RE: replication -- missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade

How can you differentiate between the backup and the normal index files?

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Wednesday, January 06, 2010 11:52 AM
To: solr-user
Subject: Re: replication -- missing field data file

On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one slave. 
 Before doing this the master was working without issues, and I stopped all 
 indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see the 
 backup generated anywhere.
The backup is done asynchronously. So it always gives an OK response immedietly.
The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
       at 
 org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
       at 
 org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
       at 
 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
       at 
 org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103)
       at 
 org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27)
       at

Re: replication -- missing field data file

2010-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्

the index dir is in the name index others will be stored as
indexdate-as-number

On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 How can you differentiate between the backup and the normal index files?

 -Original Message-
 From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
 Paul ??? ??
 Sent: Wednesday, January 06, 2010 11:52 AM
 To: solr-user
 Subject: Re: replication -- missing field data file

 On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one slave. 
 Before doing this the master was working without issues, and I stopped all 
 indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see 
 the backup generated anywhere.
 The backup is done asynchronously. So it always gives an OK response 
 immedietly.
 The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
       at 
 org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
       at 
 org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
       at 
 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
       at

Re: Solr Cell - PDFs plus literal metadata - GET or POST ?

2010-01-06 Thread Ross

On Tue, Jan 5, 2010 at 2:25 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
Really? Doesn't it have to be delimited differently, if both the file
contents and the document metadata will be part of the POST data? How does
Solr Cell tell the difference between the literals and the start of the file?
I've tried this before and haven't had any luck with it.

Thanks Shalin.

And Giovanni, yes it definitely works.

This will set literal.mydata to the contents of mydata.txt

curl
http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true;
-F myfi...@tutorial.html -F literal.mydata=mydata.txt

Unfortunately I could not get the UTF-8 encoding to work property.
It's probably a curl or o/s configuration issue. I tried mydata.txt
with and without BOM and I can do a more mydata.txt command and the
special characters display correctly on my terminal set to UTF-8 but
they get screwed up when indexed.

I gave up in the end and went back to putting it urlencoded in the url.

Ross

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Monday, January 04, 2010 4:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cell - PDFs plus literal metadata - GET or POST ?

On Wed, Dec 30, 2009 at 7:49 AM, Ross tetr...@gmail.com wrote:

Hi all

I'm experimenting with Solr. I've successfully indexed some PDFs and
all looks good but now I want to index some PDFs with metadata pulled
from another source. I see this example in the docs.

curl
http://localhost:8983/solr/update/extract?literal.id=doc4captureAttr=truedefaultField=textcapture=divfmap.div=foo_tboost.foo_t=3literal.blah_s=Bah

-F tutori...@tutorial.pdf

I can write code to generate a script with those commands substituting
my own literal.whatever. My metadata could be up to a couple of KB in
size. Is there a way of making the literal a POST variable rather than
a GET?

With Curl? Yes, see the man page.

Will Solr Cell accept it as a POST?

Yes, it will.

--
Regards,
Shalin Shekhar Mangar.

Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread danben


The problem:

Not all of the documents that I expect to be indexed are showing up in the
index.

The background:

I start off with an empty index based on a schema with a single field named
'query', marked as unique and using the following analyzer:

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer

My input is a utf-8 encoded file with one sentence per line.  Its total size
is about 60MB.  I would like each line of the file to correspond to a single
document in the solr index.  If I print the number of unique lines in the
file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
the total number of lines in the file gives me around 2.7M.

I use the following to start indexing:

curl
'http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=\'

When this command completes, I see numDocs is approximately 470k (which is
what I find strange) and maxDocs is approximately 890k (which is fine since
I know I have around 700k duplicates).  Even more confusing is that if I run
this exact command a second time without performing any other operations,
numDocs goes up to around 610k, and a third time brings it up to about 750k.

Can anyone tell me what might cause Solr not to index everything in my input
file the first time, and why it would be able to index new documents the
second and third times?

I also have this line in solrconfig.xml, if it matters:

requestParsers enableRemoteStreaming=true
multipartUploadLimitInKB=2048 /

Thanks,
Dan

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 - stats page slow

2010-01-06 Thread Stephen Weiss

Sorry, know I'm a little late in replying but the LukeRequestHandler  
tip was just what I needed!  Thank you so much.


--
Steve

On Dec 25, 2009, at 2:03 AM, Chris Hostetter wrote:



: I've noticed this as well, usually when working with a large field  
cache. I
: haven't done in-depth analysis of this yet, but it seems like when  
the stats
: page is trying to pull data from a large field cache it takes  
quite a long

: time.

In Solr 1.4, the stats page was modified to start reporting stats on  
the

FieldCache (using the new FieldCache introspection API added by Lucene
Java 2.9) so that may be what you are seeing.

:  more than 10 seconds.  We call this programmatically to retrieve  
the last
:  commit date so that we can keep users from committing too  
frequently.  This
:  means some of our administration pages are now taking a long  
time to load.


i'm not really following this ... what piece of data from the  
stats.jsp

are you using to compute/infer a commit date?

if you are looking at registration date of the SolrIndexSearcher you  
can

also get that from the LukeRequestHandler which is much more efficient
(it has options for limiting the work it does)...


http://localhost:8983/solr/admin/luke?numTerms=0fl=BOGUS




-Hoss

How to ignore term frequency 1? Field-specific Similarity class?

2010-01-06 Thread Andreas Schwarz

Hi,

I want to modify scoring to ignore term frequency  1. This is useful for short 
fields like titles or subjects, where the number of times a term appears does 
not correspond to relevancy. I found several discussions of this problem, and 
also an implementation that changes the Similarity class to achieve this 
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-09/msg00672.html). 
However, this change is global, but I only need the behavior for some fields. 
What's the best way to do this? Is there a way to use a field-specific 
similarity class, or to evaluate field names/parameters inside a Similarity 
class?

Thanks!
Andreas

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Caleb Land

I've looked at the docs/source for WordDelimiterFilter, and I understand
what it does now.

Here is my configuration:

http://gist.github.com/270590

I've tried the StandardTokenizerFactory instead of the
WhitespaceTokenizerFactory, but I get the same problem as before, a the
period from the previous sentence shows up and the period from the current
sentence is cut off of highlighter fragments.

I've tried the WhitespaceTokenizer with the StandardFilter, and this kinda
works, but to match a word at the end of a sentence, you need to search for
the period at the end of the sentence (the period is being tokenized along
with the word).

In any case, if I use the WordDelimiterFilter or add preserveOriginal=1,
everything seems to work. (If I remove the WordDelimiterFilter, the periods
are indexed with the word they're connected to, and searching for those
words doesn't match unless the user includes the period)

I'm trying to go through the code to understand how this works.

On Wed, Jan 6, 2010 at 9:13 AM, Erick Erickson erickerick...@gmail.comwrote:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Have you tried any other analyzers? StandardAnalyzer might be
more friendly

HTH
Erick

On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land caleb.l...@gmail.com wrote:

On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land caleb.l...@gmail.com wrote:

Hello,
I'm using Solr 1.4, and I'm trying to get the regex fragmenter to parse
basic sentences, and I'm running into a problem.

I'm using the default regex specified in the example solr
configuration:

[-\w ,/\n\']{20,200}

But I am using a larger fragment size (140) with a slop of 1.0.

Given the passage:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a neque
a
ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut congue
vitae, molestie quis nunc.

When I search for Nulla (the first word of the second sentence) and
grab
the first highlighted snippet, this is what I get:

. emNulla/em a neque a ipsum accumsan iaculis at id lacus

As you can see, there's a leading period from the previous sentence and
the
period from the current sentence is missing.

I understand this regex isn't that advanced, but I've tried everything
I
can think of, regex-wise, to get this to work, and I always end up with
this
problem.

For example, I've tried: \w[^.!?]{0,200}[.!?]

Which seems like it should include the ending punctuation, but it
doesn't,
so I think I'm missing something.

Does anybody know a regex that works?
--
Caleb Land

--
Caleb Land

No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread MitchK


I have tested a lot and all the time I thought I set wrong options for my
custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer.
It seems like it only stores the original input.

I am using the example-configuration of the current Solr 1.4 release.
What's wrong?

Thank you!
-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: replication -- missing field data file

2010-01-06 Thread Giovanni Fernandez-Kincade

How can you tell when the backup is done? 

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Wednesday, January 06, 2010 12:23 PM
To: solr-user
Subject: Re: replication -- missing field data file

the index dir is in the name index others will be stored as
indexdate-as-number

On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 How can you differentiate between the backup and the normal index files?

 -Original Message-
 From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
 Paul ??? ??
 Sent: Wednesday, January 06, 2010 11:52 AM
 To: solr-user
 Subject: Re: replication -- missing field data file

 On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 I set up replication between 2 cores on one master and 2 cores on one slave. 
 Before doing this the master was working without issues, and I stopped all 
 indexing on the master.

 Now that replication has synced the index files, an .FDT field is suddenly 
 missing on both the master and the slave. Pretty much every operation (core 
 reload, commit, add document) fails with an error like the one posted below.

 How could this happen? How can one recover from such an error? Is there any 
 way to regenerate the FDT file without re-indexing everything?

 This brings me to a question about backups. If I run the 
 replication?command=backup command, where is this backup stored? I've tried 
 this a few times and get an OK response from the machine, but I don't see 
 the backup generated anywhere.
 The backup is done asynchronously. So it always gives an OK response 
 immedietly.
 The backup is created in the data dir itself

 Thanks,
 Gio.

 org.apache.solr.common.SolrException: Error handling 'reload' action
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142)
       at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
       at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
       at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
       at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
       at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
       at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
       at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
       at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
       at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
       at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
       at 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
       at 
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
       at 
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
       at 
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
       at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
       at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579)
       at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425)
       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486)
       at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409)
       ... 18 more
 Caused by: java.io.FileNotFoundException: 
 Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file 
 specified)
       at java.io.RandomAccessFile.open(Native Method)
       at java.io.RandomAccessFile.lt;initgt;(Unknown Source)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78)
       at 
 org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108)
       at 
 org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
       at 
 org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104)
       at

Re: Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread Erick Erickson

I think the root of your problem is that unique fields should NOT
be multivalued. See
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)

http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)In
this case, since you're tokenizing, your query field is
implicitly multi-valued, I don't know what the behavior will be.

But there's another problem:
All the filters in your analyzer definition will mess up the
correspondence between the Unix uniq and numDocs even
if you got by the above. I.e

StopFilter would make the lines a problem and the problem identical.
WordDelimiter would do all kinds of interesting things
LowerCaseFilter would make Myproblem and myproblem identical.
RemoveDuplicatesFilter would make interesting interesting and
interesting identical

You could define a second field, make *that* one unique and NOT analyzer
it in any way...

You could hash your sentences and define the hash as your unique key.

You could

HTH
Erick

On Wed, Jan 6, 2010 at 1:06 PM, danben dan...@gmail.com wrote:


 The problem:

 Not all of the documents that I expect to be indexed are showing up in the
 index.

 The background:

 I start off with an empty index based on a schema with a single field named
 'query', marked as unique and using the following analyzer:

 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer

 My input is a utf-8 encoded file with one sentence per line.  Its total
 size
 is about 60MB.  I would like each line of the file to correspond to a
 single
 document in the solr index.  If I print the number of unique lines in the
 file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
 the total number of lines in the file gives me around 2.7M.

 I use the following to start indexing:

 curl
 '
 http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=
 \'

 When this command completes, I see numDocs is approximately 470k (which is
 what I find strange) and maxDocs is approximately 890k (which is fine since
 I know I have around 700k duplicates).  Even more confusing is that if I
 run
 this exact command a second time without performing any other operations,
 numDocs goes up to around 610k, and a third time brings it up to about
 750k.

 Can anyone tell me what might cause Solr not to index everything in my
 input
 file the first time, and why it would be able to index new documents the
 second and third times?

 I also have this line in solrconfig.xml, if it matters:

 requestParsers enableRemoteStreaming=true
 multipartUploadLimitInKB=2048 /

 Thanks,
 Dan

 --
 View this message in context:
 http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Erick Erickson

Hmmm, I'll have to defer to the highlighter experts here

Erick

On Wed, Jan 6, 2010 at 3:23 PM, Caleb Land redhatd...@gmail.com wrote:

I've looked at the docs/source for WordDelimiterFilter, and I understand
what it does now.

Here is my configuration:

http://gist.github.com/270590

I'm trying to go through the code to understand how this works.

On Wed, Jan 6, 2010 at 9:13 AM, Erick Erickson erickerick...@gmail.com
wrote:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Have you tried any other analyzers? StandardAnalyzer might be
more friendly

HTH
Erick

On Tue, Jan 5, 2010 at 5:18 PM, Caleb Land caleb.l...@gmail.com wrote:

I've tracked this problem down to the fact that I'm using the
WordDelimiterFilter. I don't quite understand what's happening, but if
I
add preserveOriginal=1 as an option, everything looks fine. I think
it
has
to do with the period being stripped in the token stream.

On Tue, Jan 5, 2010 at 2:05 PM, Caleb Land caleb.l...@gmail.com
wrote:

Hello,
I'm using Solr 1.4, and I'm trying to get the regex fragmenter to
parse
basic sentences, and I'm running into a problem.

I'm using the default regex specified in the example solr
configuration:

[-\w ,/\n\']{20,200}

But I am using a larger fragment size (140) with a slop of 1.0.

Given the passage:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla a
neque
a
ipsum accumsan iaculis at id lacus. Sed magna velit, aliquam ut
congue
vitae, molestie quis nunc.

When I search for Nulla (the first word of the second sentence) and
grab
the first highlighted snippet, this is what I get:

. emNulla/em a neque a ipsum accumsan iaculis at id lacus

As you can see, there's a leading period from the previous sentence
and
the
period from the current sentence is missing.

I understand this regex isn't that advanced, but I've tried
everything
I
can think of, regex-wise, to get this to work, and I always end up
with
this
problem.

For example, I've tried: \w[^.!?]{0,200}[.!?]

Which seems like it should include the ending punctuation, but it
doesn't,
so I think I'm missing something.

Does anybody know a regex that works?
--
Caleb Land

--
Caleb Land

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Erick Erickson

Well, I have noticed that Solr isn't using ANY analyzer

How do you know this? Because it's highly unlikely that SOLR
is completely broken on that level.

Erick

On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote:


 I have tested a lot and all the time I thought I set wrong options for my
 custom analyzer.
 Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer.
 It seems like it only stores the original input.

 I am using the example-configuration of the current Solr 1.4 release.
 What's wrong?

 Thank you!
 --
 View this message in context:
 http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Ryan McKinley



On Jan 6, 2010, at 3:48 PM, MitchK wrote:



I have tested a lot and all the time I thought I set wrong options  
for my

custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or  
stemmer.

It seems like it only stores the original input.


The stored value is always the original input.

The *indexed* values are transformed by analysis.

If you really need to store the analyzed fields, that may be possible  
with an UpdateRequestProcessor.  also see:

https://issues.apache.org/jira/browse/SOLR-314

ryan

How to set User.dir or CWD for Solr during Tomcat startup

2010-01-06 Thread Turner, Robbin J

Is there anyway to force the cwd that solr starts up in when using the standard 
startup scripts for tomcat?  I'm working on solaris and using the SMF to start 
and stop tomcat sets the path to /root.  I've been doing a bunch of googling 
and haven't seen if there is a parameter to set within Tomcat other than the 
solr/home which is setup in the solr.xml under the 
$CATALINA_HOME/conf/Catalina/localhost/.

I've had one person give me instructions using the Gui on windows, but I'm at a 
loss on which configuration file that would set that or which environment 
variable can or should be defined.

Any help would be appreciated.

Thanks
Robbin

Search query log using solr

2010-01-06 Thread Ravi Gidwani

Hi All:
 I am currently using solr 1.4 as the search engine for my
application. I am planning to add a search query log that will capture all
the search queries (and more information like IP,user info,date time,etc).
I understand I can easily do this on the application side capturing all the
search request, logging them in a DB/File before sending them to solr for
execution.
 But I wanted to check with the forum if there was any better
approach OR best practices OR anything that has been added to Solr for such
requirement.

The idea is then to use this search log for statistical as well as improving
the search results.

Please share your experience/ideas.

TIA
~Ravi.

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 2:43 AM, Andy angelf...@yahoo.com wrote:
 I'd like to boost every query using {!boost b=log(popularity)}. But I'd 
 rather not have to prepend that to every query. It'd be much cleaner for me 
 to configure Solr to use that as default.

 My plan is to make DisMaxRequestHandler the default handler and add the 
 following to solrconfig.xml:

 requestHandler name=dismax class=solr.SearchHandler default=true 
     lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=bf
     log(popularity)
  /str
     /lst
 /requestHandler

 Is this the correct way to do it?

bf adds in the function query
{!boost} multiples the function query
In the new edismax (which may replace dismax soon) you can specify the
multiplicative boost via
boost=log(popularity)


-Yonik
http://www.lucidimagination.com

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Andy

So if I want to configure Solr to turn every query q=foo into q={!boost 
b=log(popularity)}foo, dismax wouldn't work but edismax would?

If that's the case, can you tell me how to set up/use edismax? I can't find 
much documentation on it. Is it recommended for production use?


--- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote:

From: Yonik Seeley yo...@lucidimagination.com
Subject: Re: DisMaxRequestHandler bf configuration
To: solr-user@lucene.apache.org
Date: Wednesday, January 6, 2010, 7:09 PM

On Wed, Jan 6, 2010 at 2:43 AM, Andy angelf...@yahoo.com wrote:
 I'd like to boost every query using {!boost b=log(popularity)}. But I'd 
 rather not have to prepend that to every query. It'd be much cleaner for me 
 to configure Solr to use that as default.

 My plan is to make DisMaxRequestHandler the default handler and add the 
 following to solrconfig.xml:

 requestHandler name=dismax class=solr.SearchHandler default=true 
     lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=bf
     log(popularity)
  /str
     /lst
 /requestHandler

 Is this the correct way to do it?

bf adds in the function query
{!boost} multiples the function query
In the new edismax (which may replace dismax soon) you can specify the
multiplicative boost via
boost=log(popularity)


-Yonik
http://www.lucidimagination.com

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote:
 So if I want to configure Solr to turn every query q=foo into q={!boost 
 b=log(popularity)}foo, dismax wouldn't work but edismax would?

You can do it with dismax it's just that the syntax is slightly
more convoluted.
Check out the section on boosting newer documents:
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

 If that's the case, can you tell me how to set up/use edismax? I can't find 
 much documentation on it. Is it recommended for production use?

It's in trunk (not 1.4).

-Yonik
http://www.lucidimagination.com

Re: SOLR or Hibernate Search?

2010-01-06 Thread Márcio Paulino

Hi!

Thanks for the answers. These were crucial to my decision. I've adapted the
solr in my application.

On Wed, Dec 30, 2009 at 2:00 AM, Ryan McKinley ryan...@gmail.com wrote:

 If you need to search via the Hibernate API, then use hibernate search.

 If you need a scaleable HTTP (REST) then solr may be the way to go.

 Also, i don't think hibernate has anything like the faceting / complex
 query stuff etc.




 On Dec 29, 2009, at 3:25 PM, Márcio Paulino wrote:

  Hey Everyone!

 I was make a comparison of both technologies (SOLR AND Hibernate Search)
 and
 i see many things are equals. Anyone could told me when i must use SOLR
 and
 when i must use Hibernate Search?

 Im my project i will have:

 1. Queries for indexed fields (Strings) and for not indexed Fields
 (Integer,
 Float, Date). [In Hibernate Search on in SOLR, i must search on index and,
 with results of query, search on database (I can't search in both places
 ate
 same time).]
 I Will Have search like:
 Give me all Register Where Value  190 And Name Contains = 'JAVA' 

 2. My client need process a lot of email (20.000 per day) and i must
 indexed
 all fields (excluded sentDate ) included Attachments, and performance is
 requirement of my System

 3. My Application is multiclient, and i need to separate the index by
 client.

 In this Scenario, whats the best solution? SOLR or HIbernateSearch

 I See SOLR is a dedicated server and has a good performance test. I don't
 see advantages to use hibernate-search in comparison with SOLR (Except the
 fact of integrate with my Mapped Object)

 Thanks for Help

 --
 att,

 **
 Márcio Paulino
 Campo Grande - MS
 MSN / Gtalk: mcopaul...@gmail.com
 ICQ: 155897898
 **





-- 
att,

**
Márcio Paulino
Campo Grande - MS
MSN / Gtalk: mcopaul...@gmail.com
ICQ: 155897898
**

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Andy

I meant can I do it with dismax without modifying every single query? I'm 
accessing Solr through haystack and all queries are generated by haystack. I'd 
much rather not have to go under haystack to modify the generated queries.  
Hence I'm trying to find a way to boost every query by default.

--- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote:

From: Yonik Seeley yo...@lucidimagination.com
Subject: Re: DisMaxRequestHandler bf configuration
To: solr-user@lucene.apache.org
Date: Wednesday, January 6, 2010, 7:48 PM

On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote:
 So if I want to configure Solr to turn every query q=foo into q={!boost 
 b=log(popularity)}foo, dismax wouldn't work but edismax would?

You can do it with dismax it's just that the syntax is slightly
more convoluted.
Check out the section on boosting newer documents:
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Yonik Seeley

On Wed, Jan 6, 2010 at 8:24 PM, Andy angelf...@yahoo.com wrote:
 I meant can I do it with dismax without modifying every single query? I'm 
 accessing Solr through haystack and all queries are generated by haystack. 
 I'd much rather not have to go under haystack to modify the generated 
 queries.  Hence I'm trying to find a way to boost every query by default.

If you can get haystack to pass through the user query as something
like qq, then yes - just use something like the last link I showed at
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
and set defaults for everything except qq.

-Yonik
http://www.lucidimagination.com




 --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote:

 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: DisMaxRequestHandler bf configuration
 To: solr-user@lucene.apache.org
 Date: Wednesday, January 6, 2010, 7:48 PM

 On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote:
 So if I want to configure Solr to turn every query q=foo into q={!boost 
 b=log(popularity)}foo, dismax wouldn't work but edismax would?

 You can do it with dismax it's just that the syntax is slightly
 more convoluted.
 Check out the section on boosting newer documents:
 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Re: DisMaxRequestHandler bf configuration

2010-01-06 Thread Andy

Let me make sure I understand you.

I'd get my regular query from haystack as qq=foo rather than q=foo.

Then I put in solrconfig within the dismax section:

str name=q.alt    
  {!boost b=$popularityboost v=$qq}popularityboost=log(popularity)
/str

Is that what you meant?


--- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote:

From: Yonik Seeley yo...@lucidimagination.com
Subject: Re: DisMaxRequestHandler bf configuration
To: solr-user@lucene.apache.org
Date: Wednesday, January 6, 2010, 8:42 PM

On Wed, Jan 6, 2010 at 8:24 PM, Andy angelf...@yahoo.com wrote:
 I meant can I do it with dismax without modifying every single query? I'm 
 accessing Solr through haystack and all queries are generated by haystack. 
 I'd much rather not have to go under haystack to modify the generated 
 queries.  Hence I'm trying to find a way to boost every query by default.

If you can get haystack to pass through the user query as something
like qq, then yes - just use something like the last link I showed at
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
and set defaults for everything except qq.

-Yonik
http://www.lucidimagination.com




 --- On Wed, 1/6/10, Yonik Seeley yo...@lucidimagination.com wrote:

 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: DisMaxRequestHandler bf configuration
 To: solr-user@lucene.apache.org
 Date: Wednesday, January 6, 2010, 7:48 PM

 On Wed, Jan 6, 2010 at 7:43 PM, Andy angelf...@yahoo.com wrote:
 So if I want to configure Solr to turn every query q=foo into q={!boost 
 b=log(popularity)}foo, dismax wouldn't work but edismax would?

 You can do it with dismax it's just that the syntax is slightly
 more convoluted.
 Check out the section on boosting newer documents:
 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread MitchK

Hello Erick,

thank you for answering.

I can do whatever I want - Solr does nothing.
For example: If I use the textgen-fieldtype which is predefined, nothing
happens to the text. Even the stopFilter is not working - no stopword from
stopword.txt was replaced. I think, that this only affects the index,
because, if I query for for he returns nothing, which is quietly correct,
due to the work of the stopFilter.

Everything works fine on analysis.jsp, but not in reality.

If you have got any testcase-data you want me to add, please, tell me and I
will show you the saved data afterwards.

Thank you.

Mitch

Erick Erickson wrote:

Well, I have noticed that Solr isn't using ANY analyzer

How do you know this? Because it's highly unlikely that SOLR
is completely broken on that level.

Erick

On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote:

I have tested a lot and all the time I thought I set wrong options for my
custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or
stemmer.
It seems like it only stores the original input.

I am using the example-configuration of the current Solr 1.4 release.
What's wrong?

Thank you!
--
View this message in context:
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread MitchK


Hello Ryan,

thank you for answering.

In my schema.xml I am defining the field as indexed = true.
The problem is: nothing, even the original predefined analyzers don't work
anyway.
Please, have a look on my response to Erick.

Mitch

P.S.
Oh, I see what you mean. The field is indexed = true. My language was a
little bit tricky ;).


ryantxu wrote:
 
 
 On Jan 6, 2010, at 3:48 PM, MitchK wrote:
 

 I have tested a lot and all the time I thought I set wrong options  
 for my
 custom analyzer.
 Well, I have noticed that Solr isn't using ANY analyzer, filter or  
 stemmer.
 It seems like it only stores the original input.
 
 The stored value is always the original input.
 
 The *indexed* values are transformed by analysis.
 
 If you really need to store the analyzed fields, that may be possible  
 with an UpdateRequestProcessor.  also see:
 https://issues.apache.org/jira/browse/SOLR-314
 
 ryan
 
 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055512.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Erik Hatcher

Mitch,

Again, I think you're misunderstanding what analysis does. You must
be expecting we think, though you've not provided exact duplication
steps to be sure, that the value you get back from Solr is the
analyzer processed output. It's not, it's exactly what you provide.
Internally for searching the analysis takes place and writes to the
index in an inverted fashion, but the stored stuff is left alone.

There's some thinking going on implementing it such that analyzed
output is stored.

You can, however, use the analysis request handler componentry to get
analyzed stuff back as you see it in analysis.jsp on a per-document or
per-field text basis - if you're looking to leverage the analyzer
output in that fashion from a client.

Erik

On Jan 7, 2010, at 1:21 AM, MitchK wrote:

Hello Erick,

thank you for answering.

I can do whatever I want - Solr does nothing.
For example: If I use the textgen-fieldtype which is predefined,
nothing
happens to the text. Even the stopFilter is not working - no
stopword from

stopword.txt was replaced. I think, that this only affects the index,
because, if I query for for he returns nothing, which is quietly
correct,

due to the work of the stopFilter.

Everything works fine on analysis.jsp, but not in reality.

If you have got any testcase-data you want me to add, please, tell
me and I

will show you the saved data afterwards.

Thank you.

Mitch

Erick Erickson wrote:

Well, I have noticed that Solr isn't using ANY analyzer

How do you know this? Because it's highly unlikely that SOLR
is completely broken on that level.

Erick

On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote:

I have tested a lot and all the time I thought I set wrong options
for my

custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or
stemmer.
It seems like it only stores the original input.

I am using the example-configuration of the current Solr 1.4
release.

What's wrong?

43 matches

Mail list logo