Re: Query regarding Solr-2242 patch for getting distinct facet counts.

2011-05-27 Thread Bill Bell
I am pretty sure it does not yet support distributed shards..

But the patch was written for 4.0... So there might be issues with running
it on 1.4.1.

On 5/26/11 11:08 PM, rajini maski rajinima...@gmail.com wrote:

 The patch solr 2242 for getting count of distinct facet terms doesn't
work for distributedProcess

(https://issues.apache.org/jira/browse/SOLR-2242)

The error log says

 HTTP ERROR 500
Problem accessing /solr/select. Reason:

For input string: numFacetTerms

java.lang.NumberFormatException: For input string: numFacetTerms
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:
48)
at java.lang.Long.parseLong(Long.java:403)
at java.lang.Long.parseLong(Long.java:461)
at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331)
at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344)
at
org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac
etComponent.java:619)
at
org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen
t.java:265)
at
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp
onent.java:235)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
ndler.java:290)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
e.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
a:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl
er.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216
)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnect
ion.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:41
0)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:5
82)


The query I passed :
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=2facet.fie
ld=648facet.mincount=1facet.limit=-1f.2.facet.numFacetTerms=1rows=0sh
ards=localhost:8983/solr,localhost:8985/solrtwo

Anyone can suggest me the changes i need to make to enable the same
funcionality for shards?

When i do it across single core.. I get the correct results. I have
applied
the solr 2242 patch in solr1.4.1

Awaiting for reply

Regards,
Rajani




HTMLStripTransformer will remove the content in XML??

2011-05-27 Thread Ellery Leung
I have an XML string like this:

 

?xml version=1.0
encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr
]]/loc/language

 

By using HTMLStripTransformer, I expect to get 'hello,solr'.

 

But actual this transformer will remove ALL THE TEXT INSIDE!

 

Did I do something silly, or is it a bug? 

 

Thank you



Re: Query regarding Solr-2242 patch for getting distinct facet counts.

2011-05-27 Thread rajini maski
 No such issues . Successfully integrated with 1.4.1 and it works across
single index.

for f.2.facet.numFacetTerms=1  parameter it will give the distinct count
result

for f.2.facet.numFacetTerms=2 parameter  it will give counts as well as
results for facets.

But this is working only across single index not distributed process. The
conditions you have added in simple facet.java- if namedistinct count ==int
 ( 0, 1 and 2 condtions).. Should it be added in distributed process
function to enable it work across shards?

Rajani



On Fri, May 27, 2011 at 12:33 PM, Bill Bell billnb...@gmail.com wrote:

 I am pretty sure it does not yet support distributed shards..

 But the patch was written for 4.0... So there might be issues with running
 it on 1.4.1.

 On 5/26/11 11:08 PM, rajini maski rajinima...@gmail.com wrote:

  The patch solr 2242 for getting count of distinct facet terms doesn't
 work for distributedProcess
 
 (https://issues.apache.org/jira/browse/SOLR-2242)
 
 The error log says
 
  HTTP ERROR 500
 Problem accessing /solr/select. Reason:
 
 For input string: numFacetTerms
 
 java.lang.NumberFormatException: For input string: numFacetTerms
 at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:
 48)
 at java.lang.Long.parseLong(Long.java:403)
 at java.lang.Long.parseLong(Long.java:461)
 at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331)
 at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344)
 at
 org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac
 etComponent.java:619)
 at
 org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen
 t.java:265)
 at
 org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp
 onent.java:235)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
 ndler.java:290)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
 e.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
 :338)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
 a:241)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl
 er.java:1212)
 at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216
 )
 at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnect
 ion.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:41
 0)
 at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:5
 82)
 
 
 The query I passed :
 
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=2facet.fie
 ld=648facet.mincount=1facet.limit=-1f.2.facet.numFacetTerms=1rows=0sh
 ards=localhost:8983/solr,localhost:8985/solrtwo
 
 Anyone can suggest me the changes i need to make to enable the same
 funcionality for shards?
 
 When i do it across single core.. I get the correct results. I have
 applied
 the solr 2242 patch in solr1.4.1
 
 Awaiting for reply
 
 Regards,
 Rajani





Facet Query

2011-05-27 Thread Jasneet Sabharwal

Hi

When I do a facet query on my data, it shows me a list of all the words 
present in my database with their count. Is it possible to not get the 
results of common words like a, an, the, http and so one but only get 
the count of stuff we need like microsoft, ipad, solr, etc.


--
Thanx  Regards

Jasneet Sabharwal



Re: Facet Query

2011-05-27 Thread Chandan Tamrakar
which analyzer do you use for indexing ? You could exclude those stop words
during indexing

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters




On Fri, May 27, 2011 at 1:36 PM, Jasneet Sabharwal 
jasneet.sabhar...@ngicorporation.com wrote:

 Hi

 When I do a facet query on my data, it shows me a list of all the words
 present in my database with their count. Is it possible to not get the
 results of common words like a, an, the, http and so one but only get the
 count of stuff we need like microsoft, ipad, solr, etc.

 --
 Thanx  Regards

 Jasneet Sabharwal




-- 
Chandan Tamrakar
*
*


Re: Problem with spellchecking, dont want multiple request to SOLR

2011-05-27 Thread roySolr
mm ok. I configure 2 spellcheckers:

searchComponent name=spellcheck class=solr.SpellCheckComponent
 lst name=spellchecker
str name=namespell_what/str
str name=fieldspell_what/str
str name=buildOnOptimizetrue/str
str name=spellcheckIndexDirspellchecker_what/str
/lst
lst name=spellchecker
str name=namespell_where/str
str name=fieldspell_where/str
str name=buildOnOptimizetrue/str
str name=spellcheckIndexDirspellchecker_where/str
/lst
/searchComponent

How can i enable it in my search request handler and search both in one
request?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p2992076.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Query

2011-05-27 Thread Juan Antonio Farré Basurte
Are you talking about a facet query or a facet field?
If it's a facet query, I don't get what's going on.
If it's a facet field... well, if it's a fixed set of words you're interested 
in, filter the query to only those words and you'll get counts only for them. 
If you just need to filter out common words, I don't remember exactly how it 
works, but when you declare the text field (or its type) you can specify a 
processor that does exactly that: removes common words from the indexed field 
and, hence, you shouldn't get counts on them, because they just aren't there.
Sorry if my information is inexact. I haven't had to deal with this feature yet.

El 27/05/2011, a las 09:51, Jasneet Sabharwal escribió:

 Hi
 
 When I do a facet query on my data, it shows me a list of all the words 
 present in my database with their count. Is it possible to not get the 
 results of common words like a, an, the, http and so one but only get the 
 count of stuff we need like microsoft, ipad, solr, etc.
 
 -- 
 Thanx  Regards
 
 Jasneet Sabharwal
 



Re: copyField of dates unworking?

2011-05-27 Thread Ahmet Arslan
   copyfield source=date dest=text/

The letter f should be capital. copyfield =copyField


RE: Spellcheck: Two dictionaries

2011-05-27 Thread roySolr
That uber dictionary is not what i want. I get also suggestions form the
where in the what. An example:

what  where
chelseaLondon
Soccerclub Bondon London

When i type soccerclub london i want the suggestion from the what
dictionary. Did you mean Soccerclub Bondon. With the uber dictionary i
don't get this suggestion because it is spelled correctly.(based on the
where)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2992093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: HTMLStripTransformer will remove the content in XML??

2011-05-27 Thread bryan rasmussen
I would expect that it doesn't understand CDATA and thinks of
everything between  and  as a 'tag'.

Best Regards,
Bryan Rasmussen

On Fri, May 27, 2011 at 9:41 AM, Ellery Leung elleryle...@be-o.com wrote:
 I have an XML string like this:



 ?xml version=1.0
 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr
 ]]/loc/language



 By using HTMLStripTransformer, I expect to get 'hello,solr'.



 But actual this transformer will remove ALL THE TEXT INSIDE!



 Did I do something silly, or is it a bug?



 Thank you




frange vs TrieRange

2011-05-27 Thread Juan Antonio Farré Basurte
Hello,
I have to perform range queries agains a date field. It is a TrieDateField, and 
I'm already using it for sorting. Hence, there will be already en entry in the 
FieldCache for it.
According to:

http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/

frange queries are typically faster than normal range queries when there are 
many terms between the endpoints (though it could be slower, if there's less 
than a 5% of terms between the endpoints). The cost of this speedup is the 
memory associated with a FieldCache entry for the field. In my case, there's no 
additional memory overhead, as there's already such entry.
It also states that TrieRange queries have the best space/speed tradeoff.
Now my doubt is: if I have no memory overhead, then I only care about relative 
speed between frange and trie. The good speed/space tradeoff of trie is not the 
measure I need in this case, but just a comparison at pure speed level.
Does anybody know if there's data about this? Any clue on whether to choose 
frange or trie in this case?

Thanks,

Juan

TermFreqVector Problem

2011-05-27 Thread deniz
Hi all

here is what I have been trying and the problem

I am trying to see how many times a single word appears in a field.
Basically, I have a field called universal, and lets say the field is like
this:

car house road age sex school education education tree garden

and I am searching useing the word education so I am expecting 2 as my
result.

I have did the configurations on
http://wiki.apache.org/solr/TermVectorComponent and my piece code is this :


TermFreqVector vector = this.reader.getTermFreqVector(this.docId,
universal);
 
int index = vector.indexOf(education);

 int freq = vector.getTermFrequencies()[index];


but here as  vector.indexOf(education); returns -1 i got an error. 


in addition, i have tried this too:

TermFreqVector vector = reader.getTermFreqVector(this.docId, universal);
String universalTerms[] = vector.getTerms();




to see the lenght of universalTerms array, and it is 1 and only value that
array stores is the field value: 

universalTerms[0]= car house road age sex school education education tree
garden



anyone can help me with this?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/TermFreqVector-Problem-tp2992163p2992163.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: HTMLStripTransformer will remove the content in XML??

2011-05-27 Thread Ellery Leung
Got it.  Actually I use solr.MappingCharFilterFactory to replace the ![CDATA[ 
and ]] to empty first, and use HTMLStripCharFilterFactory to get hello and 
solr.

For future reference, here is part of schema.xml

fieldType name=textMaxWord class=solr.TextField 
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mappings.txt/
charFilter class=solr.HTMLStripCharFilterFactory /
...

In mappings.txt (2 lines)

![CDATA[ = 

]] = 

Restart Solr

It works.

Thank you

-Original Message-
From: bryan rasmussen [mailto:rasmussen.br...@gmail.com] 
Sent: 2011年5月27日 4:20 下午
To: solr-user@lucene.apache.org; elleryle...@be-o.com
Subject: Re: HTMLStripTransformer will remove the content in XML??

I would expect that it doesn't understand CDATA and thinks of
everything between  and  as a 'tag'.

Best Regards,
Bryan Rasmussen

On Fri, May 27, 2011 at 9:41 AM, Ellery Leung elleryle...@be-o.com wrote:
 I have an XML string like this:



 ?xml version=1.0
 encoding=UTF-8?languageintl![CDATA[hello]]/intlloc![CDATA[solr
 ]]/loc/language



 By using HTMLStripTransformer, I expect to get 'hello,solr'.



 But actual this transformer will remove ALL THE TEXT INSIDE!



 Did I do something silly, or is it a bug?



 Thank you





Re: Returning documents using multi-valued field

2011-05-27 Thread Kurt Sultana
Thanks for you answer James :)

For guys who would meet up with this problem,
http://markmail.org/thread/xce4qyzs5367yplo also speaks about this, and
reaches James' conclusion too.

On Thu, May 26, 2011 at 10:19 PM, Dyer, James james.d...@ingrambook.comwrote:

 This is a limitation of Lucene/Solr in that there is no way to tell it to
 not match across mutli-valued field occurences.

 A workaround is to convert your query to a phrase and add a slop factor
 less than your posititonIncrementGap.  ex:  q=alice trudy~99  ... This
 example assumes that your positionIncrementGap is set to 100 (the default I
 think) or greater.  This tells it that rather than search for a strict
 phrase, the words in the phrase can be up to 99 positions apart.  Because
 the multi-valued fields are implemented under-the-covers by simply
 increasing the position of the next occurrence by the positionIncrementGap
 value, this will effectively prevent Lucene/Solr from matching across
 occurences.

 The downside to this workaround is that wildcards are not permitted in
 phrase searches.  So if you need wildcard support also, then you're out of
 luck.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Kurt Sultana [mailto:kurtanat...@gmail.com]
 Sent: Thursday, May 26, 2011 3:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Returning documents using multi-valued field

 Hi, maybe I wasn't so clear in my previous post. Here's another go (I'd
 like
 a reply :) ):

 Currently I'm issuing this query on Solr:

 http://localhost:9001/solrfacetsearch/master_Shop/select/?q=%28keyword_text_mv%3A%28alice+AND+trudy%29%29+AND+%28catalogId%3A%22Default%22%29+AND+%28catalogVersion%3AOnline%29start=0rows=http://localhost:9001/solrfacetsearch/master_Shop/select/?q=%28keyword_text_mv%3A%28alice+AND+trudy%29%29+AND+%28catalogId%3A%22Default%22%29+AND+%28catalogVersion%3AOnline%29start=0rows=2147483647facet=truefacet.field=category_string_mvsort=preferred_boolean+desc%2Cgeo_distance+ascfacet.mincount=1facet.limit=50facet.sort=indexradius=111.84681460272012long=5.2864094qt=geolat=52.2119418debugQuery=on
 2147483647
 facet=truefacet.field=category_string_mvsort=preferred_boolean+desc%2Cgeo_distance+ascfacet.mincount=1facet.limit=50facet.sort=indexradius=111.84681460272012long=5.2864094qt=geolat=52.2119418debugQuery=on

 where as you can see I'm searching for keywords Alice AND Trudy. This query
 returns a document which contains:

 arr name=keyword_text_mv
 stralice jill/str
 strtrudy alex/str
 /arr

 The problem is I'd like the document to be returned only if it contains a
 string alice trudy in one of its values, in other words, if it contains :

 arr name=keyword_text_mv
 stralice trudy/str
 strjill alex/str
 /arr

 How could I achieve this? I'm supporting the code written by someone else
 and I'm quite new to Solr.

 Thanks in advance :)

 Kurt


 On Wed, May 25, 2011 at 11:44 AM, Kurt Sultana kurtanat...@gmail.com
 wrote:

 
  Hi all,
 
  I'm quite new to Solr and I'm supporting an existing Solr search engine
  which was written by someone else. I've been reading on Solr for the last
  couple of weeks so I'd consider myself beyond the basics.
 
  A particular field, let's say name, is multi-valued. For example, a
  document has a field name with values Alice, Trudy. We want that the
  document is returned when Alice or Trudy is input and not when Alice
  Trudy is entered. Currently the document is even with Alice Trudy. How
  could this be done?
 
  Thanks a lot!
  Kurt
 
 



How to disable QueryElevationComponent

2011-05-27 Thread Romi
Hi, in my indexed document i do not want a uniqueKey field, but when i do not
give any uniqueKey in schema.xml then it shows an exception
org.apache.solr.common.SolrException: QueryElevationComponent requires the
schema to have a uniqueKeyField.
it means QueryElevationComponent requires a uniqueKey field.then how can i
disable this QueryEvelationComponent. please reply.

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992195.html
Sent from the Solr - User mailing list archive at Nabble.com.


test

2011-05-27 Thread Romi
test

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/test-tp2992199p2992199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to disable QueryElevationComponent

2011-05-27 Thread Markus Jelsma
Remove the component configuration from your solrconfig.

 Hi, in my indexed document i do not want a uniqueKey field, but when i do
 not give any uniqueKey in schema.xml then it shows an exception
 org.apache.solr.common.SolrException: QueryElevationComponent requires the
 schema to have a uniqueKeyField.
 it means QueryElevationComponent requires a uniqueKey field.then how can i
 disable this QueryEvelationComponent. please reply.
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-
 tp2992195p2992195.html Sent from the Solr - User mailing list archive at
 Nabble.com.


Re: How to disable QueryElevationComponent

2011-05-27 Thread Romi
i removed 
searchComponent name=elevator
class=org.apache.solr.handler.component.QueryElevationComponent 
 
 str name=queryFieldTypestring/str
str name=config-fileelevate.xml/str
  /searchComponent

from solrconfig.xml but it is showing the following exception:

java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImporter.identifyPk(DataImporter.java:152)
at
org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:111)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:113)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486)
at org.apache.solr.core.SolrCore.init(SolrCore.java:588)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)




-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992320.html
Sent from the Solr - User mailing list archive at Nabble.com.


what is the need of setting writeLockTimeout and commitLockTimeout in solrconfig.xml

2011-05-27 Thread Romi
I wanted to have the basic idea of setting these parameters in solrconfig.xml
writeLockTimeout/writeLockTimeout
commitLockTimeout/commitLockTimeout

what actually writeLockTimeout and commitLockTimeout indicates here.


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-the-need-of-setting-writeLockTimeout-and-commitLockTimeout-in-solrconfig-xml-tp2992358p2992358.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH render html entities

2011-05-27 Thread anass talby
Is there any way to render html entities in DIH for a specific field?

Thanks
-- 
   Anass


Re: Documents update

2011-05-27 Thread Gora Mohanty
2011/5/27 Denis Kuzmenok forward...@ukr.net:
 Hi.

 I  have  and  indexed  database  which  is indexed few times a day and
 contain  tinyint  flag  (like is_enabled, is_active, etc), and content
 isn't changed too often, but flags are.
 So  if i index via post.jar only flags then entire document is deleted
 and there's only unique key and flags.
 Is  there  any  way  to  index  certain columns, and not to change all
 document?
[...]

Not with 1.4, but apparently there is a patch for trunk. Not
sure if it is in 3.1.

If you are on 1.4, you could first query Solr to get the data
for the document to be changed, change the modified values,
and make a complete XML, including all fields, for post.jar.

Regards,
Gora


Re: Documents update

2011-05-27 Thread Denis Kuzmenok
I'm  using  3.1  now.  Indexing  lasts for a few hours, and have big
plain size. Getting all documents would be rather slow :(


 Not with 1.4, but apparently there is a patch for trunk. Not
 sure if it is in 3.1.

 If you are on 1.4, you could first query Solr to get the data
 for the document to be changed, change the modified values,
 and make a complete XML, including all fields, for post.jar.

 Regards,
 Gora






Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-27 Thread Gora Mohanty
On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:
 Hi All,

 I am using Solr 3.1 for one of our search based applications.
 We are using DIH to index our data and TikaEntityProcessor to index
 attachments.
 Currently we are running into an issue while extracting content from one of
 our MS Excel 2007 files, using TikaEntityProcessor.
[...]

Have not done this with Tika, but we have run into similar
issues while trying to convert Microsoft Word documents
externally, before indexing to Solr. It turned out in our case
that these documents were referring external URLs, which
were not always accessible to our converter sitting behind
a firewall.

 Also, does someone know of a way to just skip this type of behaviour for
 that file and move to the next document to be indexed ?
[...]

This is probably not of much help to you, but what we ended
up doing was killing a conversion process that was taking
longer than a maximum time.

Regards,
Gora


Re: DIH render html entities

2011-05-27 Thread anass talby
Sorry my question was not clear.
when I get data from database, some field contains some html special chars,
and what i want to do is just convert them automatically.

On Fri, May 27, 2011 at 1:00 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Fri, May 27, 2011 at 3:50 PM, anass talby anass.ta...@gmail.com
 wrote:
  Is there any way to render html entities in DIH for a specific field?
 [...]

 This does not make too much sense: What do you mean by
 rendering HTML entities. DIH just indexes, so where would
 it render HTML to, even if it could?

 Please take a look at http://wiki.apache.org/solr/UsingMailingLists

 Regards,
 Gora




-- 
   Anass


Re: Nested grouping/field collapsing

2011-05-27 Thread Martijn Laarman
Hi,
I was wondering if this issue had already been raised.

We currently have a use case where nested field collapsing would be really
helpful

I.e Collapse on field X then Collapse on Field Y within the groups returned
by field X

The current behavior of specifying multiple fields seem to be returning
mutiple result sets.

Has this already been feature requested ? Does anybody know of a workaround
?

Many thanks,

Martijn


Re: Nested grouping/field collapsing

2011-05-27 Thread Juan Antonio Farré Basurte
I've found the same issue.
As long as I know, the only solution is to create a copy field which combines 
both-fields values and facet on this field.
If one of the fields has a set of distinct values known in advance and its 
cardinality c is not too big, it isn't a great problem: you can do with c 
queries.

El 27/05/2011, a las 15:03, Martijn Laarman escribió:

 Hi,
 I was wondering if this issue had already been raised.
 
 We currently have a use case where nested field collapsing would be really
 helpful
 
 I.e Collapse on field X then Collapse on Field Y within the groups returned
 by field X
 
 The current behavior of specifying multiple fields seem to be returning
 mutiple result sets.
 
 Has this already been feature requested ? Does anybody know of a workaround
 ?
 
 Many thanks,
 
 Martijn



Re: Comma delemitered words shawn in terms like one word.

2011-05-27 Thread abhay kumar
Thanks I was looking exactly for this.
I needed to spli tokens based on comma.

On Fri, Jun 18, 2010 at 10:12 PM, Joe Calderon calderon@gmail.comwrote:

 set generateWordParts=1 on wordDelimiter or use
 PatternTokenizerFactory to split on commas


 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTokenizerFactory


 you can use the analysis page to see what your filter chains are going
 to do before you index

 /admin/analysis.jsp

 On Fri, Jun 18, 2010 at 6:41 AM, Vitaliy Avdeev vavd...@sistyma.net
 wrote:
  Hello.
  In indexing text I have such string John,Mark,Sam. Then I looks at it in
  TermVectorComponent it looks like this johnmarksam.
 
  I am using this type for storing data
 
 fieldType name=textTight2 class=solr.TextField
  positionIncrementGap=100 
   analyzer
 tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=0 catenateWords=1
  catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
 
  What filter I need to use to get John Mark Sam as different words?
 




-- 
Thanks and Regards
Abhay Kumar Singh


Splitting fields

2011-05-27 Thread Joe Fitzgerald
Hello,

 

I am in an odd position.  The application server I use has built-in
integration with SOLR.  Unfortunately, its native capabilities are
fairly limited, specifically, it only supports a standard/pre-defined
set of fields which can be indexed.  As a result, it has left me
kludging how I work with Solr and doing things like putting what I'd
like to be multiple, separate fields into a single Solr field.

 

As an example, I may put a customer id and name into a single field
called 'custom1'.  Ideally, I'd like this information to be returned in
separate fields...and even better would be for them to be indexed as
separate fields but I can live without the latter.  Currently, I'm
building out a json representation of this information which makes it
easy for me to deal with when I extract the results...but it all feels
wrong.

 

I do have complete control over the actual Solr installation (just not
the indexing call to Solr), so I was hoping there may be a way to
configure Solr to take my single field and split it up into a different
field for each key in my json representation.

 

I don't see anything native to Solr that would do this for me but there
are a few features that I thought sounded similar and was hoping to get
some opinions on how I may be able to move forward with this...

 

Poly fields, such as the spatial location, might help?  Can I build my
own poly-field that would split up the main field into subfields?  Do
poly-fields let me return the subfields?  I don't quite have my head
around polyfields yet.

 

Another option although I suspect this won't be considered a good
approach, but what about extending the copyField functionality of
schema.xml to support my needs?  It would seem not entirely unreasonable
that copyField would provide a means to extract only a portion of the
contents of the source field to place in the destination field, no?  I'm
sure people more familiar with Solr's architecture could explain why
this isn't really an appropriate thing for Solr to handle (just because
it could doesn't mean it should)...

The other - and probably best -- option would be to leverage Solr
directly, bypassing the native integration of my application server,
which we've already done for most cases.  I'd love to go this route but
I'm having a hard time figuring out how to easily accomplish the same
functionality provided by my app server integration...perhaps someone on
the list could help me with this path forward?  Here is what I'm trying
to accomplish:

 

I'm indexing documents (text, pdf, html...) but I need to include fields
in the results of my searches which are only available from a db query.
I know how to have Solr index results from a db query, but I'm having
trouble getting it to index the documents that are associated to each
record of that query (full path/filename is one of the fields of that
query).

 

I started to try to use the dataImport handler to do this, by setting up
a FileDataSource in addition to my jdbc data source.  I tried to
leverage the filedatasource to populate a sub-entity based on the db
field that contains the full path/filename, but I wasn't sure how to
specify the db field from the root query/entity.  Before I spent too
much time, I also realized I wasn't sure how to get Solr to deal with
binary file types this way either which upon further reading seemed like
I would need to leverage Tika - can that be done within the confines of
dataimporthandler?

 

Any advice is greatly appreciated.  Thanks in advance,

 

Joe



Re: solr Invalid Date in Date Math String/Invalid Date String

2011-05-27 Thread Mike Sokolov
The * endpoint for range terms wasn't implemented yet in 1.4.1  As a 
workaround, we use very large and very small values.


-Mike

On 05/27/2011 12:55 AM, alucard001 wrote:

Hi all

I am using SOLR 1.4.1 (according to solr info), but no matter what date
field I use (date or tdate) defined in default schema.xml, I cannot do a
search in solr-admin analysis.jsp:

fieldtype: date(or tdate)
fieldvalue(index): 2006-12-22T13:52:13Z (I type it in manually, no trailing
space)
fieldvalue(query):

The only success case:
2006-12-22T13:52:13Z

All search below are failed:
* TO NOW
[* TO NOW]

2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z
2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z
[2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z]
[2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z]

2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z
2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z
[2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z]
[2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z]

2006-12-22T00:00:00Z TO *
2006\-12\-22T00\:00\:00Z TO *
[2006-12-22T00:00:00Z TO *]
[2006\-12\-22T00\:00\:00Z TO *]

2006-12-22T00:00:00.000Z TO *
2006\-12\-22T00\:00\:00\.000Z TO *
[2006-12-22T00:00:00.000Z TO *]
[2006\-12\-22T00\:00\:00\.000Z TO *]
(vice versa)

I get either:
Invalid Date in Date Math String or
Invalid Date String
error

What's wrong with it?  Can anyone please help me on that?

Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-Invalid-Date-in-Date-Math-String-Invalid-Date-String-tp2991763p2991763.html
Sent from the Solr - User mailing list archive at Nabble.com.
   


Re: highlighting in multiValued field

2011-05-27 Thread Jeffrey Chang
Hi Bob,

Hmm... I don't think this approach will scale with bigger and more documents :(

Thanks for your help though; I think I should take a look at customizing 
highlight component to achieve this...

Thanks,
Jeff



On May 27, 2011, at 12:24 PM, Bob Sandiford bob.sandif...@sirsidynix.com 
wrote:

 The only thing I can think of is to post-process your snippets.  I.E. pull 
 the highlighting tags out of the strings, look for the match in your result 
 description field looking for a match, and if you find one, replace that 
 description with the original highlight text (i.e. with the highlight tags 
 still in place).
 
 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.com 
 Join the conversation - you may even get an iPad or Nook out of it!
 
 Like us on Facebook!
 
 Follow us on Twitter!
 
 
 
 -Original Message-
 From: Jeffrey Chang [mailto:jclal...@gmail.com]
 Sent: Friday, May 27, 2011 12:16 AM
 To: solr-user@lucene.apache.org
 Subject: Re: highlighting in multiValued field
 
 Hi Bob,
 
 I have no idea how I missed that! Thanks for pointing me to use
 hl.snippets
 - that did the magic!
 
 Please allow me squeeze one more question along the same line.
 
 Since I'm now able to display multiple snippets - what I'm trying to
 achieve
 is, determine which highlighted snippet maps back to what position in
 the
 original document.
 
 e.g. If I search for Tel, with highlighting and hl.snippets=2 it'll
 return
 me:
 doc
 ...
 arr name=descID
  str1/str
  str2/str
  str3/str
 /arr
 arr name=description
  strTel to talent 1/str
  strTel to talent 2/str
  strTel to talent 3/str
 /arr
 ...
 /doc
 lst name=highlighting
  lst name=1
  arr name=description
stremTel/em to talent 1/str
stremTel/em to talent 2/str
  /arr
 /lst
 ...
 
 Is there a way for me to figure out which highlighted snippet belongs
 to
 which descID so I can display also display the non-highlighted rows for
 my
 search results.
 
 Or is this not the way how highlighting is designed and to be used?
 
 Thanks so much,
 Jeff
 [snip]
 


RE: Spellcheck: Two dictionaries

2011-05-27 Thread Dyer, James
You're up against a couple of real limitations with Solr's spell checking.  The 
first limitation is that you can only use 1 dictionary per query.  

The second limitation is that if a word is in the dictionary it never tries to 
correct it.  This will happen even if you *don't* combine your two dictionaries 
(albeit it will happen less because the dictionary you use will be smaller).  
The best workaround to this second limitation is to use 
spellcheck.onlyMorePopular=true.  This is a pretty bad solution though 
because onlyMorePopular then makes the spellchecker assume *all* of the words 
in the query need to be re-spelled.  

The solr spellchecker really does need a hybrid option that will both correct 
the obviously misspelled words and also try some more popular alternates.  It 
could then try different combinations, creating collation queries and testing 
them against the index prior to returning them.  SOLR-2010 (included in 3.1) 
got us part of the way there but there is still more work to do.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: roySolr [mailto:royrutten1...@gmail.com] 
Sent: Friday, May 27, 2011 3:09 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck: Two dictionaries

That uber dictionary is not what i want. I get also suggestions form the
where in the what. An example:

what  where
chelseaLondon
Soccerclub Bondon London

When i type soccerclub london i want the suggestion from the what
dictionary. Did you mean Soccerclub Bondon. With the uber dictionary i
don't get this suggestion because it is spelled correctly.(based on the
where)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-Two-dictionaries-tp2931458p2992093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Nested grouping/field collapsing

2011-05-27 Thread Martijn Laarman
Hi,

I was wondering if this issue had already been raised.

We currently have a use case where nested field collapsing would be really
helpful

I.e Collapse on field X then Collapse on Field Y within the groups returned
by field X

The current behavior of specifying multiple fields seem to be returning
mutiple result sets.

Has this already been feature requested ? Does anybody know of a workaround
?

Many thanks,

Martijn


Re: Nested grouping/field collapsing

2011-05-27 Thread Bill Bell
Did you try pivot?

Bill Bell
Sent from mobile


On May 27, 2011, at 4:13 AM, Martijn Laarman mpdre...@gmail.com wrote:

 Hi,
 
 I was wondering if this issue had already been raised.
 
 We currently have a use case where nested field collapsing would be really
 helpful
 
 I.e Collapse on field X then Collapse on Field Y within the groups returned
 by field X
 
 The current behavior of specifying multiple fields seem to be returning
 mutiple result sets.
 
 Has this already been feature requested ? Does anybody know of a workaround
 ?
 
 Many thanks,
 
 Martijn


Re: what is the need of setting autocommit in solrconfig.xml

2011-05-27 Thread Yury Kats
On 5/27/2011 6:48 AM, Romi wrote:
 What is the benifit of setting autocommit in solrconfig.xml.
 i read somewhere that these settings control how often pending updates will
 be automatically pushed to the index.
 does it mean if solr server is running then it  automaticaly starts indexing
 process if it finds any updates in database???

No, it means it automatically commits recently added documents to the index
so that they become searchable.


Re: Edgengram

2011-05-27 Thread Brian Lamb
For this, I ended up just changing it to string and using abcdefg* to
match. That seems to work so far.

Thanks,

Brian Lamb

On Wed, May 25, 2011 at 4:53 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 I'm running into some confusion with the way edgengram works. I have the
 field set up as:

 fieldType name=edgengram class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=100 side=front /
/analyzer
 /fieldType

 I've also set up my own similarity class that returns 1 as the idf score.
 What I've found this does is if I match a string abcdefg against a field
 containing abcdefghijklmnop, then the idf will score that as a 7:

 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2)

 I get why that's happening, but is there a way to avoid that? Do I need to
 do a new field type to achieve the desired affect?

 Thanks,

 Brian Lamb



Re: Similarity per field

2011-05-27 Thread Brian Lamb
I'm still not having any luck with this. Has anyone actually gotten this to
work so far? I feel like I've followed the directions to the letter but it
just doesn't work.

Thanks,

Brian Lamb

On Wed, May 25, 2011 at 2:48 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 I looked at the patch page and saw the files that were changed. I went into
 my install and looked at those same files and found that they had indeed
 been changed. So it looks like I have the correct version of solr.


 On Wed, May 25, 2011 at 1:01 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 I sent a mail in about this topic a week ago but now that I have more
 information about what I am doing, as well as a better understanding of how
 the similarity class works, I wanted to start a new thread with a bit more
 information about what I'm doing, what I want to do, and how I can make it
 work correctly.

 I have written a similarity class that I would like applied to a specific
 field.

 This is how I am defining the fieldType:

 fieldType name=edgengram_cust class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=1 side=front /
/analyzer
similarity class=my.package.similarity.MySimilarity/
 /fieldType

 And then I assign a specific field to that fieldType:

 field name=myfield multiValued=true type=edgengram_cust
 indexed=true stored=true required=false omitNorms=true /

 Then, I restarted solr and did a fullimport. However, the changes I have
 made do not appear to be taking hold. For simplicity, right now I just have
 the idf function returning 1. When I do a search with debugQuery=on, the idf
 behaves as it normally does. However, when I search on this field, the idf
 should be 1 and that is not the case.

 To try and nail down where the problem occurs, I commented out the
 similarity class definition in the fieldType and added it globally to the
 schema file:

 similarity class=my.package.similarity.MySimilarity/

 Then, I restarted solr and did a fullimport. This time, the idf scores
 were all 1. So it seems to me the problem is not with my similarity class
 but in trying to apply it to a specific fieldType.

 According to https://issues.apache.org/jira/browse/SOLR-2338, this should
 be in the trunk now yes? I have run svn up on both my lucene and solr
 installs and it still is not recognizing it on a per field basis.

 Is the tag different inside a fieldType? Did I not update solr correctly?
 Where is my mistake?

 Thanks,

 Brian Lamb





Re: Nested grouping/field collapsing

2011-05-27 Thread Michael McCandless
Can you open a Lucene issue (against the new grouping module) for
this?

I think this is a compelling use case that we should try to support.

In theory, with the general two-pass grouping collector, this should
be possible, but will require three passes, and we also must
generalize the 2nd pass collector to accept arbitrary collectors for
each group (today it's hardwired to sort-by-SortField collectors).

I suspect coupling the single-pass grouping collector (currently still
a patch on LUCENE-3129) with the two-pass collector could also work.

Also, can you describe more details about the two fields you want to
group/collapse by?

Mike

http://blog.mikemccandless.com

On Fri, May 27, 2011 at 6:13 AM, Martijn Laarman mpdre...@gmail.com wrote:
 Hi,

 I was wondering if this issue had already been raised.

 We currently have a use case where nested field collapsing would be really
 helpful

 I.e Collapse on field X then Collapse on Field Y within the groups returned
 by field X

 The current behavior of specifying multiple fields seem to be returning
 mutiple result sets.

 Has this already been feature requested ? Does anybody know of a workaround
 ?

 Many thanks,

 Martijn



Result Grouping always returns grouped output

2011-05-27 Thread kare...@gmail.com
Hello,

I am using the latest nightly build of Solr 4.0 and I would like to
use grouping/field collapsing while maintaining compatibility with my
current parser.  I am using the regular webinterface to test it, the
same commands like in the wiki, just with the field names matching my
dataset.

Grouping itself works, group=true and group.field return the expected
results, but neither group.main=true or group.format=simple seem to
change anything.

Do I have to include something special in solrconconfig.xml or
scheme.xml to make the simple output work?

Thanks for any hints,
K


Re: Pivot with Stats (or Stats with Pivot)

2011-05-27 Thread eduardo
Nobody?

Please, help





edua...@calandra.com.br 
17/05/2011 16:13
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Pivot with Stats (or Stats with Pivot)






Hi All,

  Is it possible to get stats (like Stats Component: min ,max, sum, count, 

missing, sumOfSquares, mean and stddev) from numeric fields inside 
hierarchical facets (with more than one level, like Pivot)?

  I would like to query: 
...?q=*:*version=2.2start=0rows=0stats=truestats.field=numeric_field1stats.field=numeric_field2stats.pivot=field_x,field_y,field_z
  and get min, max, sum, count, etc. from numeric_field1 and 
numeric_field2 from all combinations of field_x, field_y and field_z 
(hierarchical values).


  Using stats.facet I get just one field at one level and using 
facet.pivot I get just counts, but no stats.

  Looping in client application to do all combinations of facets values 
will be to slow because there is a lot of combinations.


  Thanks a lot!



Re: copyField of dates unworking?

2011-05-27 Thread Jack Repenning

On May 27, 2011, at 1:04 AM, Ahmet Arslan wrote:

 The letter f should be capital

Hah! Well-spotted! Thanks.

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep











PGP.sig
Description: This is a digitally signed message part


problem getting Solr to commit

2011-05-27 Thread David Hill

We verified with the fiddler proxy server that when we use the Java 
CommonsHttpSolrServer to communicate with our Solr server we are not able to 
get the client to post a commit/ message back to Solr. The result is that we 
can't force the tail end of a batch job to commit after it has run and we can't 
do integration testing that needs data to have been committed to Solr.

We have tried all variations of the .commit method and the workaround posted a
http://www.mail-archive.com/solr-dev@lucene.apache.org/msg12289.html

Our solution was to hack the source code supplied by Solr for SimplePostTool to 
create a utility to post a commit/ tag to Solr.

http://stackoverflow.com/questions/6141417/solr-lucene-server-does-not-post-commit-message


This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this e-mail in error please notify the originator of the 
message. This footer also confirms that this e-mail message has been scanned 
for the presence of computer viruses. Any views expressed in this message are 
those of the individual sender, except where the sender specifies and with 
authority, states them to be the views of Iowa Student Loan.



 



Re: very slow commits and overlapping commits

2011-05-27 Thread Bill Au
I managed to get a thread dump during a slow commit:

resin-tcp-connection-*:5062-129 Id=12721 in RUNNABLE total cpu
time=391530.ms user time=390620.ms
at java.lang.String.intern(Native Method)
at
org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74)
at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
at
org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:116)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:691)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:667)
at
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:956)
at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5207)
at
org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4370)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4209)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4200)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2195)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2158)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2122)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:48)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:70)
at
com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:173)
at
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:229)
at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:274)
at com.caucho.server.port.TcpConnection.run(TcpConnection.java:511)
at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:520)
at com.caucho.util.ThreadPool.run(ThreadPool.java:442)
at java.lang.Thread.run(Thread.java:619)

It looks like Lucene's StringHelper is hardcoding the max size of the hash
table of SimpleStringInterner to 1024 and I might be hitting that limit,
causing an actual call to java.lang.String.intern().

I think I need to reduce the number of fields in my index.  Any other things
I can do to help in this case.

Bill

On Wed, May 25, 2011 at 11:28 AM, Bill Au bill.w...@gmail.com wrote:

 I am taking a snapshot after every commit.  From looking at the snapshots,
 it does not look like the delay in caused by segments merging because I am
 not seeing any large new segments after a commit.

 I still can't figure out why there is a 2 minutes gap between start
 commit and SolrDelectionPolicy.onCommit.  Will changing the deletion
 policy make any difference?  I am using the default deletion policy now.

 Bill

 2011/5/21 Erick Erickson erickerick...@gmail.com

 Well, committing less offside a possibilty  g. Here's what's probably
 happening. When you pass certain thresholds, segments are merged which can
 take quite some time.  His are you triggering commits? If it's external,
 think about using auto commit instead.

 Best
 Erick
 On May 20, 2011 6:04 PM, Bill Au bill.w...@gmail.com wrote:
  On my Solr 1.4.1 master I am doing commits regularly at a fixed
 interval.
 I
  noticed that from time to time commit will take longer than the commit
  interval, causing commits to overlap. Then things will get worse as
 commit
  will take longer and longer. Here is the logs for a long commit:
 
 
  [2011-05-18 23:47:30.071] start
 

 commit(optimize=false,waitFlush=false,waitSearcher=false,expungeDeletes=false)
  [2011-05-18 23:49:48.119] SolrDeletionPolicy.onCommit: commits:num=2
  [2011-05-18 23:49:48.119]
 

 commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpa,version=1247782702272,generation=249742,filenames=[_4dqu_2g.del,
  _4e66.tis, _4e3r.tis, _4e59.nrm, _4e68_1.del, _4e4n.prx, _4e4n.fnm,
  _4e67.fnm, _4e3r.frq, _4e3r.tii, _4e6d.fnm, _4e6c.prx, _4e68.fdx,
 _4e68.nrm,
  _4e6a.frq, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt,
 _4e0e.nrm,
  _4e4n.tis, _4e6e.fnm, _4e3r.prx, _4e66.fnm, _4e3r.nrm, _4e0e.prx,
 

Re: Nested grouping/field collapsing

2011-05-27 Thread Martijn Laarman
Thanks Mike,

I've opened https://issues.apache.org/jira/browse/SOLR-2553 for this.

It's exciting to hear a workable implementation might be possible!

On Fri, May 27, 2011 at 6:23 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Can you open a Lucene issue (against the new grouping module) for
 this?

 I think this is a compelling use case that we should try to support.

 In theory, with the general two-pass grouping collector, this should
 be possible, but will require three passes, and we also must
 generalize the 2nd pass collector to accept arbitrary collectors for
 each group (today it's hardwired to sort-by-SortField collectors).

 I suspect coupling the single-pass grouping collector (currently still
 a patch on LUCENE-3129) with the two-pass collector could also work.

 Also, can you describe more details about the two fields you want to
 group/collapse by?

 Mike

 http://blog.mikemccandless.com

 On Fri, May 27, 2011 at 6:13 AM, Martijn Laarman mpdre...@gmail.com
 wrote:
  Hi,
 
  I was wondering if this issue had already been raised.
 
  We currently have a use case where nested field collapsing would be
 really
  helpful
 
  I.e Collapse on field X then Collapse on Field Y within the groups
 returned
  by field X
 
  The current behavior of specifying multiple fields seem to be returning
  mutiple result sets.
 
  Has this already been feature requested ? Does anybody know of a
 workaround
  ?
 
  Many thanks,
 
  Martijn
 



Custom Scoring relying on another server.

2011-05-27 Thread arian487
I know this question has been asked before but I think my situation is a
little different.  Basically I need to do custom scores that the traditional
function queries simply won't allow me to do.  I actually need to hit
another server from Java (passing in a bunch of things mostly relying on how
to score result).  So I want to extend the current scorer and add in the
things I need it to do for the scoring (make a trip to the scoring server
with a bunch of parameters, and come back with the scores).  

Can someone point me to the right direction to doing this?  Exactly where
does the document scoring happen in Solr?  Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p2994546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck Phrases

2011-05-27 Thread Tanner Postert
are there any updates on this? any third party apps that can make this work
as expected?

On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James james.d...@ingrambook.comwrote:

 Tanner,

 Currently Solr will only make suggestions for words that are not in the
 dictionary, unless you specifiy spellcheck.onlyMorePopular=true.  However,
 if you do that, then it will try to improve every word in your query, even
 the ones that are spelled correctly (so while it might change brake to
 break it might also change leg to log.)

 You might be able to alleviate some of the pain by setting the
 thresholdTokenFrequency so as to remove misspelled and rarely-used words
 from your dictionary, although I personally haven't been able to get this
 parameter to work.  It also doesn't seem to be documented on the wiki but it
 is in the 1.4.1. source code, in class IndexBasedSpellChecker.  Its also
 mentioned in SmileyPugh's book.  I tried setting it like this, but got a
 ClassCastException on the float value:

 searchComponent name=spellcheck class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetext_spelling/str
  lst name=spellchecker
  str name=namespellchecker/str
  str name=fieldSpelling_Dictionary/str
  str name=fieldTypetext_spelling/str
  str name=buildOnOptimizetrue/str
  str name=thresholdTokenFrequency.001/str
  /lst
 /searchComponent

 I have it on my to-do list to look into this further but haven't yet.  If
 you decide to try it and can get it to work, please let me know how you do
 it.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: Tanner Postert [mailto:tanner.post...@gmail.com]
 Sent: Wednesday, February 23, 2011 12:53 PM
 To: solr-user@lucene.apache.org
 Subject: Spellcheck Phrases

 right now when I search for 'brake a leg', solr returns valid results with
 no indication of misspelling, which is understandable since all of those
 terms are valid words and are probably found in a few pieces of our
 content.
 My question is:

 is there any way for it to recognize that the phase should be break a leg
 and not brake a leg and suggest the proper phrase?



Re: K-Stemmer for Solr 3.1

2011-05-27 Thread Mark

Where can one find the KStemmer source for 4.0?

On 5/12/11 11:28 PM, Bernd Fehling wrote:
I backported a Lucid KStemmer version from solr 4.0 which I found 
somewhere.

Just changed from
import org.apache.lucene.analysis.util.CharArraySet;  // solr4.0
to
import org.apache.lucene.analysis.CharArraySet;  // solr3.1

Bernd


Am 12.05.2011 16:32, schrieb Mark:
java.lang.AbstractMethodError: 
org.apache.lucene.analysis.TokenStream.incrementToken()Z


Would you mind explaining your modifications? Thanks

On 5/11/11 11:14 PM, Bernd Fehling wrote:


Am 12.05.2011 02:05, schrieb Mark:
It appears that the older version of the Lucid Works KStemmer is 
incompatible with Solr 3.1. Has anyone been able to get this to 
work? If not,

what are you using as an alternative?

Thanks


Lucid KStemmer works nice with Solr3.1 after some minor mods to
KStemFilter.java and KStemFilterFactory.java.
What problems do you have?

Bernd


LucidWorks source

2011-05-27 Thread Mark
Is LucidWorks source no longer available? In earlier versions their 
source code was available but after the latest install I can not seem to 
find it?


RE: solr Invalid Date in Date Math String/Invalid Date String

2011-05-27 Thread Ellery Leung
Thank you Mike.

So I understand that now.  But what about the other items that have values
on both size?  They don't work at all.


-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com] 
Sent: 2011年5月27日 10:23 下午
To: solr-user@lucene.apache.org
Cc: alucard001
Subject: Re: solr Invalid Date in Date Math String/Invalid Date String

The * endpoint for range terms wasn't implemented yet in 1.4.1  As a 
workaround, we use very large and very small values.

-Mike

On 05/27/2011 12:55 AM, alucard001 wrote:
 Hi all

 I am using SOLR 1.4.1 (according to solr info), but no matter what date
 field I use (date or tdate) defined in default schema.xml, I cannot do a
 search in solr-admin analysis.jsp:

 fieldtype: date(or tdate)
 fieldvalue(index): 2006-12-22T13:52:13Z (I type it in manually, no
trailing
 space)
 fieldvalue(query):

 The only success case:
 2006-12-22T13:52:13Z

 All search below are failed:
 * TO NOW
 [* TO NOW]

 2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z
 2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z
 [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z]
 [2006\-12\-22T00\:00\:00Z TO 2006\-12\-22T23\:59\:59Z]

 2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z
 2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z
 [2006-12-22T00:00:00.000Z TO 2006-12-22T23:59:59.999Z]
 [2006\-12\-22T00\:00\:00\.000Z TO 2006\-12\-22T23\:59\:59\.999Z]

 2006-12-22T00:00:00Z TO *
 2006\-12\-22T00\:00\:00Z TO *
 [2006-12-22T00:00:00Z TO *]
 [2006\-12\-22T00\:00\:00Z TO *]

 2006-12-22T00:00:00.000Z TO *
 2006\-12\-22T00\:00\:00\.000Z TO *
 [2006-12-22T00:00:00.000Z TO *]
 [2006\-12\-22T00\:00\:00\.000Z TO *]
 (vice versa)

 I get either:
 Invalid Date in Date Math String or
 Invalid Date String
 error

 What's wrong with it?  Can anyone please help me on that?

 Thank you.

 --
 View this message in context:
http://lucene.472066.n3.nabble.com/solr-Invalid-Date-in-Date-Math-String-Inv
alid-Date-String-tp2991763p2991763.html
 Sent from the Solr - User mailing list archive at Nabble.com.