date:20120501

Re: Removing old documents

2012-05-01 Thread Paul Libbrecht

With which client?

paul


Le 2 mai 2012 à 01:29, alx...@aim.com a écrit :

> all caching is disabled and I restarted jetty. The same results.

Re: Lucene FieldCache - Out of memory exception

2012-05-01 Thread Rahul R

Here is one sample query that I picked up from the log file :

q=*%3A*&fq=Category%3A%223__107%22&fq=S_P1540477699%3A%22MICROCIRCUIT%2C+LINE+TRANSCEIVERS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S_C1503120369&facet.field=S_P1406389942&facet.field=S_P1430116878&facet.field=S_P1430116881&facet.field=S_P1406453552&facet.field=S_P1406451296&facet.field=S_P1406452465&facet.field=S_C2968809156&facet.field=S_P1406389980&facet.field=S_P1540477699&facet.field=S_P1406389982&facet.field=S_P1406389984&facet.field=S_P1406451284&facet.field=S_P1406389926&facet.field=S_P1424886581&facet.field=S_P2017662632&facet.field=F_P1946367021&facet.field=S_P1430116884&facet.field=S_P2017662620&facet.field=F_P1406451304&facet.field=F_P1406451306&facet.field=F_P1406451308&facet.field=S_P1500901421&facet.field=S_P1507138990&facet.field=I_P1406452433&facet.field=I_P1406453565&facet.field=I_P1406452463&facet.field=I_P1406453573&facet.field=I_P1406451324&facet.field=I_P1406451288&facet.field=S_P1406451282&facet.field=S_P1406452471&facet.field=S_P1424886605&facet.field=S_P1946367015&facet.field=S_P1424886598&facet.field=S_P1946367018&facet.field=S_P1406453556&facet.field=S_P1406389932&facet.field=S_P2017662623&facet.field=S_P1406450978&facet.field=F_P1406452455&facet.field=S_P1406389972&facet.field=S_P1406389974&facet.field=S_P1406389986&facet.field=F_P1946367027&facet.field=F_P1406451294&facet.field=F_P1406451286&facet.field=F_P1406451328&facet.field=S_P1424886593&facet.field=S_P1406453567&facet.field=S_P2017662629&facet.field=S_P1406453571&facet.field=F_P1946367030&facet.field=S_P1406453569&facet.field=S_P2017662626&facet.field=S_P1406389978&facet.field=F_P1946367024

My primary question here is, can Solr handle this kind of queries with so
many facet fields. I have tried using both enum and fc for facet.method and
there is no improvement with either.

Appreciate any help on this. Thank you.

- Rahul


On Mon, Apr 30, 2012 at 2:53 PM, Rahul R  wrote:

> Hello,
> I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
> server on Solaris. I use embedded solr server. More details :
> Number of docs in solr index : 1.4 million
> Physical size of index : 640MB
> Total number of fields in the index : 700 (99% of these are dynamic fields)
> Total number of fields enabled for faceting : 440
> Avg number of facet fields participating in a faceted query : 50-70
> Total RAM allocated to weblogic appserver : 3GB (max possible)
>
> In a multi user environment with 3 users using this application for a
> period of around 40 minutes, the application runs out of memory. Analysis
> of the heap dump shows that almost 85% of the memory is retained by the
> FieldCache. Now I understand that the field cache is out of our control but
> would appreciate some suggestions on how to handle this issue.
>
> Some questions on this front :
> - some mail threads on this forum seem to indicate that there could be
> some connection between having dynamic fields and usage of FieldCache. Is
> this true ? Most of the fields in my index are dynamic fields.
> - as mentioned above, most of my faceted queries could have around 50-70
> facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
> per query). Could this be the source of the problem ? Is this too high for
> solr to support ?
> - Initially, I had a facet.sort defined in solrconfig.xml. Since
> FieldCache builds up on sorting, I even removed the facet.sort and tried,
> but no respite. The behavior is same as before.
> - The document id that I have for each document is quite big (around 50
> characters on average). Can this be a problem ? I reduced this to around 15
> characters and tried but still there is no improvement.
> - Can the size of the data be a problem ? But on this forum, I see many
> users talking of more than 100 million documents in their index. I have
> only 1.4 million with physical size of 640MB. The physical server on which
> this application is running, has sufficient RAM and CPU.
> - What gets stored in the FieldCache ? Is it the entire document or just
> the document Id ?
>
>
> Any help is much appreciated. Thank you.
>
> regards
> Rahul
>
>
>

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Robert Muir

On Tue, May 1, 2012 at 6:48 PM, Ken Krugler  wrote:
> Hi list,
>
> Does anybody know if the Suggester component is designed to work with shards?
>

I'm not really sure it is? They would probably have to override the
default merge implementation specified by SpellChecker.

But, all of the current suggesters pump out over 100,000 QPS on my
machine, so I'm wondering what the usefulness of this is?

And if it was useful, merging results from different machines is
pretty inefficient, for suggest you would shard by term instead so
that you need only contact a single host?

-- 
lucidimagination.com

Looking for a way to separate MySQL query from DIH data-config.xml

2012-05-01 Thread Peter Boudreau

Hello everyone,

I have a working DIH setup with a couple of long and complicated MySQL queries 
in data-config.xml. To make it easier/safer for myself and other developers in 
my company to edit the MySQL query, I’d like to remove it from data-config.xml 
and store it in a separate file, and then call to that from data-config.xml.

Is there anyone who’s currently doing this right now and could share what 
method was used to accomplish this?

At some point on this list I saw someone mention that they had done just what 
I’m trying to do by putting the query in a separate SQL file as a MySQL stored 
procedure, and then calling that procedure from the query=”” portion of 
data-config.xml, but I don’t quite understand how/at what point that SQL file 
with the stored procedure would be read by DIH.

Does anyone know how this would be done, or have any other suggestions for how 
to move the query into a separate document?

Thanks in advance,
Peter

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Benson Margulies

Yes, I'm the author of that JIRA.

On Tue, May 1, 2012 at 8:45 PM, Ryan McKinley  wrote:
> check a release since r1332752
>
> If things still look problematic, post a comment on:
> https://issues.apache.org/jira/browse/SOLR-3426
>
> this should now have a less verbose message with an older SLF4j and with Log4j
>
>
> On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa  wrote:
>> I have similar issue using log4j for logging with trunk build, the
>> CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
>> using sjfj 1.5.2
>>
>> 10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
>> java.lang.NoSuchMethodError:
>> org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
>> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
>> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
>> at
>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)
>>
>>
>>
>> On Tue, May 1, 2012 at 9:25 AM, Benson Margulies 
>> wrote:
>>
>>> On Tue, May 1, 2012 at 12:16 PM, Mark Miller 
>>> wrote:
>>> > There is a recent JIRA issue about keeping the last n logs to display in
>>> the admin UI.
>>> >
>>> > That introduced a problem - and then the fix introduced a problem - and
>>> then the fix mitigated the problem but left that ugly logging as a by
>>> product.
>>> >
>>> > Don't remember the issue # offhand. I think there was a dispute about
>>> what should be done with it.
>>> >
>>> > On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
>>> >
>>> >> CoreContainer.java, in the method 'load', finds itself calling
>>> >> loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
>>> >> 'Log4j'.
>>>
>>> Couldn't someone just fix the if statement to say, 'OK, if we're doing
>>> log4j, we have no log watcher' and skip all the loud failing on the
>>> way?
>>>
>>>
>>>
>>> >>
>>> >> e.g.:
>>> >>
>>> >> 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
>>> >> to load LogWatcher
>>> >> org.apache.solr.common.SolrException: Error loading class 'Log4j'
>>> >>
>>> >> What is it actually looking for? Have I misplaced something?
>>> >
>>> > - Mark Miller
>>> > lucidimagination.com
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>

Re: Ampersand issue

2012-05-01 Thread Ryan McKinley

If your json value is & the proper xml value is &

What is the value you are setting on the stored field?  is is & or &?


On Mon, Apr 30, 2012 at 12:57 PM, William Bell  wrote:
> One idea was to wrap the field with CDATA. Or base64 encode it.
>
>
>
> On Fri, Apr 27, 2012 at 7:50 PM, Bill Bell  wrote:
>> We are indexing a simple XML field from SQL Server into Solr as a stored 
>> field. We have noticed that the & is outputed as & when using 
>> wt=XML. When using wt=JSON we get the normal &. If there a way to 
>> indicate that we don't want to encode the field since it is already XML when 
>> using wt=XML ?
>>
>> Bill Bell
>> Sent from mobile
>>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Ryan McKinley

check a release since r1332752

If things still look problematic, post a comment on:
https://issues.apache.org/jira/browse/SOLR-3426

this should now have a less verbose message with an older SLF4j and with Log4j


On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa  wrote:
> I have similar issue using log4j for logging with trunk build, the
> CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
> using sjfj 1.5.2
>
> 10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
> java.lang.NoSuchMethodError:
> org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)
>
>
>
> On Tue, May 1, 2012 at 9:25 AM, Benson Margulies wrote:
>
>> On Tue, May 1, 2012 at 12:16 PM, Mark Miller 
>> wrote:
>> > There is a recent JIRA issue about keeping the last n logs to display in
>> the admin UI.
>> >
>> > That introduced a problem - and then the fix introduced a problem - and
>> then the fix mitigated the problem but left that ugly logging as a by
>> product.
>> >
>> > Don't remember the issue # offhand. I think there was a dispute about
>> what should be done with it.
>> >
>> > On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
>> >
>> >> CoreContainer.java, in the method 'load', finds itself calling
>> >> loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
>> >> 'Log4j'.
>>
>> Couldn't someone just fix the if statement to say, 'OK, if we're doing
>> log4j, we have no log watcher' and skip all the loud failing on the
>> way?
>>
>>
>>
>> >>
>> >> e.g.:
>> >>
>> >> 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
>> >> to load LogWatcher
>> >> org.apache.solr.common.SolrException: Error loading class 'Log4j'
>> >>
>> >> What is it actually looking for? Have I misplaced something?
>> >
>> > - Mark Miller
>> > lucidimagination.com
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Donald Organ

Perfect, this is working well.

On Tue, May 1, 2012 at 5:33 PM, Jeevanandam  wrote:

> Yes, you can add in last-components section on default query handler.
>
> 
> elevator
> 
>
> - Jeevanandam
>
>
> On 02-05-2012 3:53 am, Donald Organ wrote:
>
>> query elevation was exactly what I was talking about.
>>
>> Now is there a way to add this to the default query handler?
>>
>> On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky
>> **wrote:
>>
>>  Do you mean besides "query elevation"?
>>>
>>>
>>> http://wiki.apache.org/solr/QueryElevationComponent
>>> http://wiki.apache.org/solr/QueryElevationComponent>
>>> >
>>>
>>> And besides explicit boosting by the user (the "^" suffix operator after
>>> a
>>> term/phrase)?
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Donald Organ
>>> Sent: Tuesday, May 01, 2012 3:59 PM
>>> To: solr-user
>>> Subject: Boosting documents based on search term/phrase
>>>
>>> Is there a way to boost documents based on the search term/phrase?
>>>
>>>
>

Re: NPE when faceting

2012-05-01 Thread Jamie Johnson

I don't have any more details than I provided here, but I created a
ticket with this information.  Thanks again

https://issues.apache.org/jira/browse/SOLR-3427

On Tue, May 1, 2012 at 5:20 PM, Yonik Seeley  wrote:
> Darn... looks likely that it's another bug from when part of
> UnInvertedField was refactored into Lucene.
> We really need some random tests that can catch bugs like these though
> - I'll see if I can reproduce.
>
> Can you open a JIRA issue for this?
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
>
>
> On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson  wrote:
>> I had reported this issue a while back, hoping that it was something
>> with my environment, but that doesn't seem to be the case.  I am
>> getting the following stack trace on certain facet queries.
>> Previously when I did an optimize the error went away, does anyone
>> have any insight into why specifically this could be happening?
>>
>> May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log
>> SEVERE: java.lang.NullPointerException
>>        at 
>> org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807)
>>        at 
>> org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636)
>>        at 
>> org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411)
>>        at 
>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300)
>>        at 
>> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396)
>>        at 
>> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
>>        at 
>> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>>        at 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>>        at 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>>        at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>>        at 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>>        at 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>>        at 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>>        at 
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>>        at 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>>        at 
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>>        at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>>        at 
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>>        at 
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>>        at 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>>        at org.eclipse.jetty.server.Server.handle(Server.java:351)
>>        at 
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>>        at 
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>>        at 
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
>>        at 
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
>>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
>>        at 
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>>        at 
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
>>        at 
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>>        at 
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>>        at 
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>>        at java.lang.Thread.run(Thread.java:662)

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Otis Gospodnetic

Hi,

Can you please give an example of what you mean?

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: Donald Organ 
>To: solr-user  
>Sent: Tuesday, May 1, 2012 3:59 PM
>Subject: Boosting documents based on search term/phrase
> 
>Is there a way to boost documents based on the search term/phrase?
>
>
>

Re: get latest 50 documents the fastest way

2012-05-01 Thread Li Li

you should reverse your sort algorithm. maybe you can override the tf
method of Similarity and return -1.0f * tf(). (I don't know whether
default collector allow score smaller than zero)
Or you can hack this by add a large number or write your own
collector, in its collect(int doc) method, you can do like this:
collect(int doc){
float score=scorer.score();
score*=-1.0f;

}
if you don't sort by relevant score, just set Sort

On Tue, May 1, 2012 at 10:38 PM, Yuval Dotan  wrote:
> Hi Guys
> We have a use case where we need to get the 50 *latest *documents that
> match my query - without additional ranking,sorting,etc on the results.
> My index contains 1,000,000,000 documents and i noticed that if the number
> of found documents is very big (larger than 50% of the index size -
> 500,000,000 docs) than it takes more than 5 seconds to get the results even
> with rows=50 parameter.
> Is there a way to get the results faster?
> Thanks
> Yuval

Re: Removing old documents

2012-05-01 Thread alxsss


 

 all caching is disabled and I restarted jetty. The same results.

Thanks.
Alex.

 

-Original Message-
From: Lance Norskog 
To: solr-user 
Sent: Tue, May 1, 2012 2:57 pm
Subject: Re: Removing old documents


Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.

On Tue, May 1, 2012 at 11:53 AM,   wrote:
> Hello,
>
> I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
>
> without and with -noCommit  and restarted solr server
>
> Log  shows that 5 documents were removed but they are still in the search 
results.
> Is this a bug or something is missing?
> I use nutch-1.4 and solr 3.5
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -Original Message-
> From: Markus Jelsma 
> To: solr-user 
> Sent: Tue, May 1, 2012 7:41 am
> Subject: Re: Removing old documents
>
>
> Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
> index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
> in one run based on segment data.
>
> On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
>> I'm running Nutch, so it's updating the documents, but I'm wanting to
>> remove ones that are no longer available.  So in that case, there's no
>> update possible.
>>
>> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
>>
>> mav.p...@holidaylettings.co.uk> wrote:
>> > Not sure if there is an automatic way but we do it via a delete query and
>> > where possible we update doc under same id to avoid deletes.
>> >
>> > On 01/05/2012 13:43, "Bai Shen"  wrote:
>> > >What is the best method to remove old documents?  Things that no
>> > >generate 404 errors, etc.
>> > >
>> > >Is there an automatic method or do I have to do it manually?
>> > >
>> > >THanks.
>
> --
> Markus Jelsma - CTO - Openindex
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: How to integrate sen and lucene-ja in SOLR 3.x

2012-05-01 Thread Koji Sekiguchi


(12/05/02 1:47), Shanmugavel SRD wrote:

Hi,
   Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4
or 3.5 or 3.6 version?


I think lucene-ja.jar no longer exists in Internet and doesn't work with
Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know
AttributeSource).

Use lucene-gosen which is the descendant project of sen/lucene-ja instead.

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Ken Krugler

I should have also included one more bit of information.

If I configure the top-level (sharding) request handler to use just the suggest 
component as such:

  


  explicit
  suggest-core
  localhost:8080/solr/core0/,localhost:8080/solr/core1/



  suggest

  

Then I don't get a NPE, but I also get a response with no results.


  
0
0

  r

  


For completeness, here are the other pieces to the solrconfig.xml puzzle:

  

  true
  suggest-one
  10



  suggest

  
  
  

  suggest-one
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.FSTLookup
  name  
  0.05
  true


  suggest-two
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.FSTLookup
  content  
  0.0
  true

  

Thanks,

-- Ken

On May 1, 2012, at 3:48pm, Ken Krugler wrote:

> Hi list,
> 
> Does anybody know if the Suggester component is designed to work with shards?
> 
> I'm asking because the documentation implies that it should (since 
> ...Suggester reuses much of the SpellCheckComponent infrastructure…, and the 
> SpellCheckComponent is documented as supporting a distributed setup).
> 
> But when I make a request, I get an exception:
> 
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:493)
>   at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:390)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
>   at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
>   at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Looking at the QueryComponent.java:493 code, I see:
> 
>SolrDocumentList docs = 
> (SolrDocumentList)srsp.getSolrResponse().getResponse().get("response");
> 
>// calculate global maxScore and numDocsFound
>if (docs.getMaxScore() != null) {   This is line 493
> 
> So I'm assuming the "docs" variable is null, which would happen if there is 
> no "response" element in the Solr response.
> 
> If I make a direct request to the request handler in one core (e.g. 
> http://hostname:8080/solr/core0/select?qt=suggest-core&q=rad), the query 
> works.
> 
> But I see that there's no element named "response", unlike a regular query.
> 
> 
>  
>0
>1
>  
>  
>
>  
>10
>0
>3
>
>  radair
>  radar
>
>  
>
>  
> 
> 
> So I'm wondering if my configuration is just borked and this should work, or 
> the fact that the Suggester doesn't return a response field means that it 
> just doesn't work with shards.
> Thanks,
> -- Ken
> 
> http://about.me/kkrugler
> +1 530-210-6378
> 
> 
> 
> 
> 
> 
> --
> Ken Krugler
> http://www.scaleunlimited.com
>

response codes from http update requests

2012-05-01 Thread Welty, Richard

should i be concerned with the http response codes from update requests?

i can't find documentation on what values come back from them anywhere
(although maybe i'm not looking hard enough.) are they just http standard
with 200 for success and 400/500 for failures?

thanks,
   richard

Error with distributed search and Suggester component (Solr 3.4)

2012-05-01 Thread Ken Krugler

Hi list,

Does anybody know if the Suggester component is designed to work with shards?

I'm asking because the documentation implies that it should (since ...Suggester 
reuses much of the SpellCheckComponent infrastructure…, and the 
SpellCheckComponent is documented as supporting a distributed setup).

But when I make a request, I get an exception:

java.lang.NullPointerException
at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:493)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:390)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:132)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Looking at the QueryComponent.java:493 code, I see:

SolrDocumentList docs = 
(SolrDocumentList)srsp.getSolrResponse().getResponse().get("response");

// calculate global maxScore and numDocsFound
if (docs.getMaxScore() != null) {   This is line 493

So I'm assuming the "docs" variable is null, which would happen if there is no 
"response" element in the Solr response.

If I make a direct request to the request handler in one core (e.g. 
http://hostname:8080/solr/core0/select?qt=suggest-core&q=rad), the query works.

But I see that there's no element named "response", unlike a regular query.


  
0
1
  
  

  
10
0
3

  radair
  radar

  

  


So I'm wondering if my configuration is just borked and this should work, or 
the fact that the Suggester doesn't return a response field means that it just 
doesn't work with shards.
Thanks,
-- Ken

http://about.me/kkrugler
+1 530-210-6378






--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr

Re: Removing old documents

2012-05-01 Thread Paul Libbrecht

I've been surprised to see Firefox cache even after empty-cache was ordered for 
JSOn results...
this is quite annoying but I have get accustomed to it by doing the following 
when I need to debug: add a random parameter extra. But only when debugging!

Using wget or curl showed me that the browser (and not solr-caching) was guilty 
of caching.
I think the "If-Modified-Since" might be guilt, it would be still sent even 
after empty cache...

paul



Le 1 mai 2012 à 23:57, Lance Norskog a écrit :

> Maybe this is the HTTP caching feature? Solr comes with HTTP caching
> turned on by default and so when you do queries and changes your
> browser does not fetch your changed documents.
> 
> On Tue, May 1, 2012 at 11:53 AM,   wrote:
>> Hello,
>> 
>> I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
>> 
>> without and with -noCommit  and restarted solr server
>> 
>> Log  shows that 5 documents were removed but they are still in the search 
>> results.
>> Is this a bug or something is missing?
>> I use nutch-1.4 and solr 3.5
>> 
>> Thanks.
>> Alex.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -Original Message-
>> From: Markus Jelsma 
>> To: solr-user 
>> Sent: Tue, May 1, 2012 7:41 am
>> Subject: Re: Removing old documents
>> 
>> 
>> Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
>> index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
>> in one run based on segment data.
>> 
>> On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
>>> I'm running Nutch, so it's updating the documents, but I'm wanting to
>>> remove ones that are no longer available.  So in that case, there's no
>>> update possible.
>>> 
>>> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
>>> 
>>> mav.p...@holidaylettings.co.uk> wrote:
 Not sure if there is an automatic way but we do it via a delete query and
 where possible we update doc under same id to avoid deletes.
 
 On 01/05/2012 13:43, "Bai Shen"  wrote:
> What is the best method to remove old documents?  Things that no
> generate 404 errors, etc.
> 
> Is there an automatic method or do I have to do it manually?
> 
> THanks.
>> 
>> --
>> Markus Jelsma - CTO - Openindex
>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com

Re: Removing old documents

2012-05-01 Thread Lance Norskog

Maybe this is the HTTP caching feature? Solr comes with HTTP caching
turned on by default and so when you do queries and changes your
browser does not fetch your changed documents.

On Tue, May 1, 2012 at 11:53 AM,   wrote:
> Hello,
>
> I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/
>
> without and with -noCommit  and restarted solr server
>
> Log  shows that 5 documents were removed but they are still in the search 
> results.
> Is this a bug or something is missing?
> I use nutch-1.4 and solr 3.5
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -Original Message-
> From: Markus Jelsma 
> To: solr-user 
> Sent: Tue, May 1, 2012 7:41 am
> Subject: Re: Removing old documents
>
>
> Nutch 1.4 has a separate tool to remove 404 and redirects documents from your
> index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents
> in one run based on segment data.
>
> On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
>> I'm running Nutch, so it's updating the documents, but I'm wanting to
>> remove ones that are no longer available.  So in that case, there's no
>> update possible.
>>
>> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
>>
>> mav.p...@holidaylettings.co.uk> wrote:
>> > Not sure if there is an automatic way but we do it via a delete query and
>> > where possible we update doc under same id to avoid deletes.
>> >
>> > On 01/05/2012 13:43, "Bai Shen"  wrote:
>> > >What is the best method to remove old documents?  Things that no
>> > >generate 404 errors, etc.
>> > >
>> > >Is there an automatic method or do I have to do it manually?
>> > >
>> > >THanks.
>
> --
> Markus Jelsma - CTO - Openindex
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Jack Krupansky


Here's some doc from Lucid:
http://lucidworks.lucidimagination.com/display/solr/The+Query+Elevation+Component

-- Jack Krupansky

-Original Message- 
From: Donald Organ

Sent: Tuesday, May 01, 2012 5:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting documents based on search term/phrase

query elevation was exactly what I was talking about.

Now is there a way to add this to the default query handler?

On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky 
wrote:



Do you mean besides "query elevation"?

http://wiki.apache.org/solr/**QueryElevationComponent

And besides explicit boosting by the user (the "^" suffix operator after a
term/phrase)?

-- Jack Krupansky

-Original Message- From: Donald Organ
Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Subject: Boosting documents based on search term/phrase

Is there a way to boost documents based on the search term/phrase?

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Jeevanandam


Yes, you can add in last-components section on default query handler.


 elevator


- Jeevanandam


On 02-05-2012 3:53 am, Donald Organ wrote:

query elevation was exactly what I was talking about.

Now is there a way to add this to the default query handler?

On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky
wrote:


Do you mean besides "query elevation"?


http://wiki.apache.org/solr/**QueryElevationComponent

And besides explicit boosting by the user (the "^" suffix operator 
after a

term/phrase)?

-- Jack Krupansky

-Original Message- From: Donald Organ
Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Subject: Boosting documents based on search term/phrase

Is there a way to boost documents based on the search term/phrase?

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Donald Organ

query elevation was exactly what I was talking about.

Now is there a way to add this to the default query handler?

On Tue, May 1, 2012 at 4:26 PM, Jack Krupansky wrote:

> Do you mean besides "query elevation"?
>
> http://wiki.apache.org/solr/**QueryElevationComponent
>
> And besides explicit boosting by the user (the "^" suffix operator after a
> term/phrase)?
>
> -- Jack Krupansky
>
> -Original Message- From: Donald Organ
> Sent: Tuesday, May 01, 2012 3:59 PM
> To: solr-user
> Subject: Boosting documents based on search term/phrase
>
> Is there a way to boost documents based on the search term/phrase?
>

Re: NPE when faceting

2012-05-01 Thread Yonik Seeley

Darn... looks likely that it's another bug from when part of
UnInvertedField was refactored into Lucene.
We really need some random tests that can catch bugs like these though
- I'll see if I can reproduce.

Can you open a JIRA issue for this?

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson  wrote:
> I had reported this issue a while back, hoping that it was something
> with my environment, but that doesn't seem to be the case.  I am
> getting the following stack trace on certain facet queries.
> Previously when I did an optimize the error went away, does anyone
> have any insight into why specifically this could be happening?
>
> May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
>        at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807)
>        at 
> org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636)
>        at 
> org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411)
>        at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
>        at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>        at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>        at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>        at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>        at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>        at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>        at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>        at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>        at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>        at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>        at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>        at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>        at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>        at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>        at org.eclipse.jetty.server.Server.handle(Server.java:351)
>        at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>        at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>        at 
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
>        at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
>        at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>        at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
>        at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>        at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>        at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>        at java.lang.Thread.run(Thread.java:662)

Re: NPE when faceting

2012-05-01 Thread Jamie Johnson

it may be related this this

http://stackoverflow.com/questions/10124055/solr-faceted-search-throws-nullpointerexception-with-http-500-status

we are doing deletes from our index as well so it is possible that
we're running into the same issue.  I hope that sheds more light on
things.

On Tue, May 1, 2012 at 4:51 PM, Jamie Johnson  wrote:
> I had reported this issue a while back, hoping that it was something
> with my environment, but that doesn't seem to be the case.  I am
> getting the following stack trace on certain facet queries.
> Previously when I did an optimize the error went away, does anyone
> have any insight into why specifically this could be happening?
>
> May 1, 2012 8:48:52 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
>        at org.apache.lucene.index.DocTermOrds.lookupTerm(DocTermOrds.java:807)
>        at 
> org.apache.solr.request.UnInvertedField.getTermValue(UnInvertedField.java:636)
>        at 
> org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:411)
>        at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:300)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:396)
>        at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
>        at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1550)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>        at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>        at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>        at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>        at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>        at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>        at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>        at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>        at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>        at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>        at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>        at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>        at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>        at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>        at org.eclipse.jetty.server.Server.handle(Server.java:351)
>        at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>        at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>        at 
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
>        at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
>        at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>        at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
>        at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>        at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>        at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>        at java.lang.Thread.run(Thread.java:662)

Re: question on tokenization control

2012-05-01 Thread Walter Underwood

Use synonyms at index time. Make "eval" and "evaluate" equivalent words.

wunder

On May 1, 2012, at 1:31 PM, Dan Tuffery wrote:

> Hi,
> 
> "Is that an indexing setting or query setting that will tokenize 'evalu'
> but not 'eval'?"
> 
> Without seeing the tokenizers you're using for the field type it's hard to
> say. You can use Solr's analysis page to see the tokens that are generated
> by the tokenizers in your analysis chain at both query time and index time.
> 
> http://localhost:8983/solr/admin/analysis.jsp
> 
> "how do I get 'eval' to be a match?"
> 
> You could use synonyms to map 'eval' to 'evaluation'.
> 
> Dan
> 
> On Tue, May 1, 2012 at 8:17 PM, kfdroid  wrote:
> 
>> I have a field that is defined using what I believe is fairly standard
>> "text"
>> fieldType. I have documents with the words 'evaluate', 'evaluating',
>> 'evaluation' in them. When I search on the whole word, obviously it works,
>> if I search on 'eval' it finds nothing. However for some reason if I search
>> on 'evalu' it finds all the matches.  Is that an indexing setting or query
>> setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval'
>> to
>> be a match?
>> 
>> Thanks,
>> Ken
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 

--
Walter Underwood
wun...@wunderwood.org

Re: question on tokenization control

2012-05-01 Thread Dan Tuffery

Hi,

"Is that an indexing setting or query setting that will tokenize 'evalu'
but not 'eval'?"

Without seeing the tokenizers you're using for the field type it's hard to
say. You can use Solr's analysis page to see the tokens that are generated
by the tokenizers in your analysis chain at both query time and index time.

http://localhost:8983/solr/admin/analysis.jsp

"how do I get 'eval' to be a match?"

You could use synonyms to map 'eval' to 'evaluation'.

Dan

On Tue, May 1, 2012 at 8:17 PM, kfdroid  wrote:

> I have a field that is defined using what I believe is fairly standard
> "text"
> fieldType. I have documents with the words 'evaluate', 'evaluating',
> 'evaluation' in them. When I search on the whole word, obviously it works,
> if I search on 'eval' it finds nothing. However for some reason if I search
> on 'evalu' it finds all the matches.  Is that an indexing setting or query
> setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval'
> to
> be a match?
>
> Thanks,
> Ken
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Boosting documents based on search term/phrase

2012-05-01 Thread Jack Krupansky


Do you mean besides "query elevation"?

http://wiki.apache.org/solr/QueryElevationComponent

And besides explicit boosting by the user (the "^" suffix operator after a 
term/phrase)?


-- Jack Krupansky

-Original Message- 
From: Donald Organ

Sent: Tuesday, May 01, 2012 3:59 PM
To: solr-user
Subject: Boosting documents based on search term/phrase

Is there a way to boost documents based on the search term/phrase?

Re: core sleep/wake

2012-05-01 Thread Ofer Fort

My random searches can be a bit slow on startup, so i still would like to
get that lazy load but have more cores available.
I'm actually trying now the LotsOfCores way of handling things.
Had to work a bit to get the patch suitable for 3.5 but it seems to be
doing what i need.


On Tue, May 1, 2012 at 2:31 PM, Erick Erickson wrote:

> Well, that'll be kinda self-defeating. The whole point of auto-warming
> is to fill up the caches, consuming memory. Without that, searches
> will be slow. So the idea of using minimal resources is really
> antithetical to having these in-memory structures filled up.
>
> You can try configuring minimal caches & etc. Or just give it
> lots of memory and count on your OS to swap the pages out
> if the particular core doesn't get used.
>
> Best
> Erick
>
> On Mon, Apr 30, 2012 at 5:18 PM, oferiko  wrote:
> > I have a multicore solr with a lot of cores that contains a lot of data
> (~50M
> > documents), but are rarely used.
> > Can i load a core from configuration, but have keep it in sleep mode,
> where
> > is has all the configuration available, but it hardly consumes resources,
> > and based on a query or an update, it will "come to life"?
> > Thanks
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/core-sleep-wake-tp3951850.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>

Boosting documents based on search term/phrase

2012-05-01 Thread Donald Organ

Is there a way to boost documents based on the search term/phrase?

Re: dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread Gora Mohanty

On 1 May 2012 23:12, geeky2  wrote:
> Hello all,
>
> is there a notification / trigger / callback mechanism people use that
> allows them to know when a dataimport process has finished?
>
> we will be doing daily delta-imports and i need some way for an operations
> group to know when the DIH has finished.
>

Never tried it myself, but this should meet your needs:
http://wiki.apache.org/solr/DataImportHandler#EventListeners

Regards,
Gora

question on tokenization control

2012-05-01 Thread kfdroid

I have a field that is defined using what I believe is fairly standard "text"
fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on 'evalu' it finds all the matches.  Is that an indexing setting or query
setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to
be a match? 

Thanks, 
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-on-tokenization-control-tp3953550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: correct XPATH syntax

2012-05-01 Thread Twomey, David

Ludovic,

Thanks for your help.  I tried your suggestion but it didn't work for
Authors.  Below are 3 snippets from data-config.xml, the XML file and the
XML response from the DB

Data-config:
 








  



XML Snippet for Author:

 
  Malathi
  K
  K
 
 
  Xiao
  Y
  Y
 
 
  Mitchell
  A P
  AP
 



Response from SOLR:

















Journal of cancer research and clinical
oncology




Thanks
David

On 5/1/12 8:05 AM, "lboutros"  wrote:

>Hi David,
>
>I think you should add this option : flatten=true
>
>and the could you try to use this XPath :
>
>/MedlineCitationSet/MedlineCitation/AuthorList/Author
>
>see here for the description :
>
>http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config
>.xml-1
>
>I don't think the that the commonField option is needed here, I think you
>should suppress it.
>
>Ludovic. 
>
>-
>Jouve
>France.
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.
>html
>Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing old documents

2012-05-01 Thread alxsss

Hello,

I did bin/nutch solrclean crawl/crawldb http://127.0.0.1:8983/solr/

without and with -noCommit  and restarted solr server

Log  shows that 5 documents were removed but they are still in the search 
results.
Is this a bug or something is missing?
I use nutch-1.4 and solr 3.5

Thanks.
Alex. 

 

 

 

-Original Message-
From: Markus Jelsma 
To: solr-user 
Sent: Tue, May 1, 2012 7:41 am
Subject: Re: Removing old documents


Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
> I'm running Nutch, so it's updating the documents, but I'm wanting to
> remove ones that are no longer available.  So in that case, there's no
> update possible.
> 
> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
> 
> mav.p...@holidaylettings.co.uk> wrote:
> > Not sure if there is an automatic way but we do it via a delete query and
> > where possible we update doc under same id to avoid deletes.
> > 
> > On 01/05/2012 13:43, "Bai Shen"  wrote:
> > >What is the best method to remove old documents?  Things that no
> > >generate 404 errors, etc.
> > >
> > >Is there an automatic method or do I have to do it manually?
> > >
> > >THanks.

-- 
Markus Jelsma - CTO - Openindex

Re: How to expand list into multi-valued fields?

2012-05-01 Thread Jeevanandam


here you go

specify regex transformer in entity tag of DIH config xml like below



and then



That's it!

- Jeevanandam


On 02-05-2012 12:35 am, invisbl wrote:
I am indexing content from a RDBMS. I have a column in a table with 
pipe
separated values, and upon indexing I would like to transform these 
values

into multi-valued fields in SOLR's index. For example,

ColumnA (From RDBMS)
-
apple|orange|banana

I want to expand this to,

SOLR Index

FruitField=apple
FruitField=orange
FruitField=banana

or number expand to,

SOLR Index

FruitField1=apple
FruitField2=orange
FruitField3=banana

Please help, thank you!


--
View this message in context:

http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Grouping ngroups count

2012-05-01 Thread Francois Perron

Thanks for your response Cody,

  First, I used distributed grouping on 2 shards and I'm sure then all 
documents of each group are in the same shard.  

I take a look on JIRA issue and it seem really similar.  There is the same 
problem with group.ngroups.  The count is calculated in second pass so we only 
had result from "useful" shards and it's why when I increase rows limit i got 
the right count (they must use all my shards).

Except it's a feature (i hope not), I will create a new JIRA issue for this.

Thanks

On 2012-05-01, at 12:32 PM, Young, Cody wrote:

> Hello,
> 
> When you say 2 slices, do you mean 2 shards? As in, you're doing a 
> distributed query?
> 
> If you're doing a distributed query, then for group.ngroups to work you need 
> to ensure that all documents for a group exist on a single shard.
> 
> However, what you're describing sounds an awful lot like this JIRA issue that 
> I entered a while ago for distributed grouping. I found that the hit count 
> was coming only from the shards that ended up having results in the documents 
> that were returned. I didn't test group.ngroups at the time.
> 
> https://issues.apache.org/jira/browse/SOLR-3316
> 
> If this is a similar issue then you should make a new Jira issue.
> 
> Cody
> 
> -Original Message-
> From: Francois Perron [mailto:francois.per...@wantedanalytics.com] 
> Sent: Tuesday, May 01, 2012 6:47 AM
> To: solr-user@lucene.apache.org
> Subject: Grouping ngroups count
> 
> Hello all,
> 
>  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
> ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
> ask for top 20K rows, the ngroups property is now at 30K.  
> 
> Do you know why and of course how to fix it ?
> 
> Thanks.

How to expand list into multi-valued fields?

2012-05-01 Thread invisbl

I am indexing content from a RDBMS. I have a column in a table with pipe
separated values, and upon indexing I would like to transform these values
into multi-valued fields in SOLR's index. For example,

ColumnA (From RDBMS)
-
apple|orange|banana

I want to expand this to,

SOLR Index

FruitField=apple
FruitField=orange
FruitField=banana

or number expand to,

SOLR Index

FruitField1=apple
FruitField2=orange
FruitField3=banana

Please help, thank you!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-expand-list-into-multi-valued-fields-tp3953378.html
Sent from the Solr - User mailing list archive at Nabble.com.

dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread geeky2

Hello all,

is there a notification / trigger / callback mechanism people use that
allows them to know when a dataimport process has finished?

we will be doing daily delta-imports and i need some way for an operations
group to know when the DIH has finished.

thank you,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-handler-DIH-notify-when-it-has-finished-tp3953339.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AW: Email classification with solr

2012-05-01 Thread Jack Krupansky

If you have the code that does all of that analysis, then you could 
integrate it with Solr using one of the approaches I listed, but Solr itself 
would not provide any of that analysis.


-- Jack Krupansky

-Original Message- 
From: Ramo Karahasan

Sent: Tuesday, May 01, 2012 1:14 PM
To: solr-user@lucene.apache.org
Subject: AW: Email classification with solr

Hi Jack,

thanks for the feedback. I'm really new to that stuff and not sure if I have
fully understood it.

Currently I've split emails in their properties and saved them into
relational tables, for example the body part. Most of my e-mails are html
emails. Now I have for example three categories: newsletter is on of this
category. I would like to classify incoming emails as newsletter, if they
fulfill an amount of attributes, e.g. the email address of the sender
comprised newsletter and variants of this word in the address AND a
newsletter content (body) should be classified as an newsletter.

Is that possible to do that just with solr? Or do I need another tools for
classifiying on the basis of text analysis? Isn't it necessary to build up a
taxonomy for "newsletter emails" so that the classifier can match the mail
text with some ruleset (defined taxonomy)?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Dienstag, 1. Mai 2012 18:49
An: solr-user@lucene.apache.org
Betreff: Re: Email classification with solr

There are a number of different routes you can go, one of which is to use
SolrCell (Tika) to parse mbox files and then add your own update processor
that does whatever mail classification analysis you desire and then
generates addition field values for the classification.

A simpler approach is to do the analysis yourself outside of Solr and then
feed the mbox data for each message into SolrCell along with the specific
literal field values derived from your classification analysis. SolrCell
(Tika) would then parse the mail message and add your literal field values.

Or, you may want to consider fully parsing the mail messages outside of Solr
so that you have full control over what gets parsed and which schema fields
are used or not used, in additional to your content analysis field values.

-- Jack Krupansky

-Original Message-
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: Email classification with solr

Hello,



just a short question:



Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?





Thanks,

Ramo

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Gopal Patwa

I have similar issue using log4j for logging with trunk build, the
CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
using sjfj 1.5.2

10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
java.lang.NoSuchMethodError:
org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)



On Tue, May 1, 2012 at 9:25 AM, Benson Margulies wrote:

> On Tue, May 1, 2012 at 12:16 PM, Mark Miller 
> wrote:
> > There is a recent JIRA issue about keeping the last n logs to display in
> the admin UI.
> >
> > That introduced a problem - and then the fix introduced a problem - and
> then the fix mitigated the problem but left that ugly logging as a by
> product.
> >
> > Don't remember the issue # offhand. I think there was a dispute about
> what should be done with it.
> >
> > On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
> >
> >> CoreContainer.java, in the method 'load', finds itself calling
> >> loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
> >> 'Log4j'.
>
> Couldn't someone just fix the if statement to say, 'OK, if we're doing
> log4j, we have no log watcher' and skip all the loud failing on the
> way?
>
>
>
> >>
> >> e.g.:
> >>
> >> 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
> >> to load LogWatcher
> >> org.apache.solr.common.SolrException: Error loading class 'Log4j'
> >>
> >> What is it actually looking for? Have I misplaced something?
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

AW: Email classification with solr

2012-05-01 Thread Ramo Karahasan

Hi Jack,

thanks for the feedback. I'm really new to that stuff and not sure if I have
fully understood it.

Currently I've split emails in their properties and saved them into
relational tables, for example the body part. Most of my e-mails are html
emails. Now I have for example three categories: newsletter is on of this
category. I would like to classify incoming emails as newsletter, if they
fulfill an amount of attributes, e.g. the email address of the sender
comprised newsletter and variants of this word in the address AND a
newsletter content (body) should be classified as an newsletter.

Is that possible to do that just with solr? Or do I need another tools for
classifiying on the basis of text analysis? Isn't it necessary to build up a
taxonomy for "newsletter emails" so that the classifier can match the mail
text with some ruleset (defined taxonomy)?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Dienstag, 1. Mai 2012 18:49
An: solr-user@lucene.apache.org
Betreff: Re: Email classification with solr

There are a number of different routes you can go, one of which is to use
SolrCell (Tika) to parse mbox files and then add your own update processor
that does whatever mail classification analysis you desire and then
generates addition field values for the classification.

A simpler approach is to do the analysis yourself outside of Solr and then
feed the mbox data for each message into SolrCell along with the specific
literal field values derived from your classification analysis. SolrCell
(Tika) would then parse the mail message and add your literal field values.

Or, you may want to consider fully parsing the mail messages outside of Solr
so that you have full control over what gets parsed and which schema fields
are used or not used, in additional to your content analysis field values.

-- Jack Krupansky

-Original Message-
From: Ramo Karahasan
Sent: Tuesday, May 01, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: Email classification with solr

Hello,



just a short question:



Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?





Thanks,

Ramo

Re: Does Solr fit my needs?

2012-05-01 Thread Mikhail Khludnev

no problem - you are welcome.
Nothing out-of-the-box yet. Only approach is ready

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
https://issues.apache.org/jira/browse/SOLR-3076

Regards

On Mon, Apr 30, 2012 at 12:06 PM, G.Long  wrote:

> Hi :)
>
> Thank you all for your answers. I'll try these solutions :)
>
> Kind regards,
>
> Gary
>
> Le 27/04/2012 16:31, G.Long a écrit :
>
>> Hi there :)
>>
>> I'm looking for a way to save xml files into some sort of database and
>> i'm wondering if Solr would fit my needs.
>> The xml files I want to save have a lot of child nodes which also contain
>> child nodes with multiple values. The depth level can be more than 10.
>>
>> After having indexed the files, I would like to be able to query for
>> subparts of those xml files and be able to reconstruct them as xml files
>> with all their children included. However, I'm wondering if it is possible
>> with an index like solr lucene to keep or easily recover the structure of
>> my xml data?
>>
>> Thanks for your help,
>>
>> Regards,
>>
>> Gary
>>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: Upgrading to 3.6 broke cachedsqlentityprocessor

2012-05-01 Thread Mikhail Khludnev

I know about one regression at least. Fix is already committed. see
https://issues.apache.org/jira/browse/SOLR-3360

On Tue, May 1, 2012 at 12:53 AM, Brent Mills  wrote:

> I've read some things in jira on the new functionality that was put into
> caching in the DIH but I wouldn't think it should break the old behavior.
>  It doesn't look as though any errors are being thrown, it's just ignoring
> the caching part and opening a ton of connections.  Also I cannot find any
> documentation on the new functionality that was added so I'm not sure what
> syntax is valid and what's not.  Here is my entity that worked in 3.1 but
> no longer works in 3.6:
>
>  processor="CachedSqlEntityProcessor" where="UserID=Users.UserID">
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: Email classification with solr

2012-05-01 Thread Jack Krupansky

There are a number of different routes you can go, one of which is to use 
SolrCell (Tika) to parse mbox files and then add your own update processor 
that does whatever mail classification analysis you desire and then 
generates addition field values for the classification.


A simpler approach is to do the analysis yourself outside of Solr and then 
feed the mbox data for each message into SolrCell along with the specific 
literal field values derived from your classification analysis. SolrCell 
(Tika) would then parse the mail message and add your literal field values.


Or, you may want to consider fully parsing the mail messages outside of Solr 
so that you have full control over what gets parsed and which schema fields 
are used or not used, in additional to your content analysis field values.


-- Jack Krupansky

-Original Message- 
From: Ramo Karahasan

Sent: Tuesday, May 01, 2012 12:17 PM
To: solr-user@lucene.apache.org
Subject: Email classification with solr

Hello,



just a short question:



Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?





Thanks,

Ramo

How to integrate sen and lucene-ja in SOLR 3.x

2012-05-01 Thread Shanmugavel SRD

Hi,
  Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4
or 3.5 or 3.6 version?
Thanks,
Shanmugavel

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-integrate-sen-and-lucene-ja-in-SOLR-3-x-tp3953266.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: post.jar failing

2012-05-01 Thread William Bell

I am not sure. It just started working.

On Tue, May 1, 2012 at 9:39 AM, Jack Krupansky  wrote:
> Sounds as if maybe it was some other kind of error having nothing to do with
> the data itself. Were there any additional errors or exceptions shortly
> before the failure? Maybe memory was low and some component wouldn't load,
> or somebody caught an exception without reporting the actual cause. After
> all, the message you provided said nothing about the actual problem. Maybe
> Solr itself needs a better diagnostic in that case.
>
>
> -- Jack Krupansky
>
> -Original Message- From: William Bell
> Sent: Tuesday, May 01, 2012 11:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: post.jar failing
>
>
> OK. I am using SOLR 3.6.
>
> I restarted SOLR and it started working. No idea why. You were right I
> showed the error log from a different document.
>
> We might want to add a test case for CDATA.
>
> 
> 
>  SP2514N
>  Samsung SpinPoint P120 SP2514N - hard drive - 250
> GB - ATA-133
>  Samsung Electronics Co. Ltd.
>  electronics
>  hard drive
>  7200RPM, 8MB cache, IDE Ultra ATA-133
>  NoiseGuard, SilentSeek technology, Fluid
> Dynamic Bearing (FDB) motor
>  92
>  6
>  true
>  
>  2006-02-13T15:26:37Z
>  
>  35.0752,-97.032
> 
>
> 
>
>
>
> On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky 
> wrote:
>>
>> Please clarify the problem, because the error message you provide refers
>> to
>> address data that is not in the input data that you provide. It doesn't
>> match!
>>
>> The error refers to an "edu" element, but the input data uses a "poff"
>> element. Maybe you have multiple "SP2514N" documents; maybe somebody made
>> a
>> copy of the original and edited the address_xml field value. And maybe
>> that
>> edited version that has an "edu" element has some obvious error.
>>
>> In short, show us the full actual input address_xml field element, but
>> preferably the entire Solr input document for the version of the SP2514N
>> document that actually generates the error .
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: William Bell
>> Sent: Monday, April 30, 2012 4:18 PM
>> To: solr-user@lucene.apache.org
>> Subject: post.jar failing
>>
>>
>> I am getting a post.jar failure when trying to post the following
>> CDATA field... It used to work on older versions. This is in SOlr 3.6.
>>
>> 
>> 
>>  SP2514N
>>  Samsung SpinPoint P120 SP2514N - hard drive - 250
>> GB - ATA-133
>>  Samsung Electronics Co. Ltd.
>>  electronics
>>  hard drive
>>  7200RPM, 8MB cache, IDE Ultra ATA-133
>>  NoiseGuard, SilentSeek technology, Fluid
>> Dynamic Bearing (FDB) motor
>>  92
>>  6
>>  true
>>  
>>  2006-02-13T15:26:37Z
>>  
>>  35.0752,-97.032
>> 
>>
>> 
>>
>> Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error
>> adding
>> field 'address_xml'='
>>  
>>      MEDSCH
>>      
>>          UNIVERSITY OF COLORADO SCHOOL OF MEDICINE
>>          1974
>>          MD
>>      
>>  
>> '
>>
>>
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: Solr Parent/Child Searching

2012-05-01 Thread Mikhail Khludnev

Hello Simon,

Let me reply to solr-user. We consider BJQ as a promising solution for
parent/child usecase, we have a facet component prototype for it; but it's
too raw and my team had to switch to another challenges temporarily.
I participated in SOLR-3076, but achievement is really modest. I've
attached essential BJQParser with god-mode syntax. I think the next stage
should be a block indexing support in Solr, I'm not sure how to do that
right. I suppose that by next month I'll be able to provide something like
essential support for block updates.

Regards

On Tue, May 1, 2012 at 12:05 AM, Simon Guindon   Hello Mikhail,
>
> ** **
>
> I came across your blog post about Solr with an alternative approach to
> the block join solution for LUCENE-3171. We have hit the same situation
> where we need the parent/child relationship for our Solr queries.
>
> ** **
>
> I was wondering if your solution was available anywhere? It would be nice
> if a solution could make its way into Solr at some point J
>
> ** **
>
> Thanks and take care,
>
> Simon Guindon
>



-- 
Sincerely yours
Mikhail Khludnev.
Tech Lead,
Grid Dynamics.

RE: Grouping ngroups count

2012-05-01 Thread Young, Cody

Hello,

When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed 
query?

If you're doing a distributed query, then for group.ngroups to work you need to 
ensure that all documents for a group exist on a single shard.

However, what you're describing sounds an awful lot like this JIRA issue that I 
entered a while ago for distributed grouping. I found that the hit count was 
coming only from the shards that ended up having results in the documents that 
were returned. I didn't test group.ngroups at the time.

https://issues.apache.org/jira/browse/SOLR-3316

If this is a similar issue then you should make a new Jira issue.

Cody

-Original Message-
From: Francois Perron [mailto:francois.per...@wantedanalytics.com] 
Sent: Tuesday, May 01, 2012 6:47 AM
To: solr-user@lucene.apache.org
Subject: Grouping ngroups count

Hello all,

  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
ask for top 20K rows, the ngroups property is now at 30K.  

Do you know why and of course how to fix it ?

Thanks.

Re: hierarchical faceting?

2012-05-01 Thread sam ”

yup.














and ?facet.field=colors_facet



On Mon, Apr 30, 2012 at 9:35 PM, Chris Hostetter
wrote:

>
> : Is there a tokenizer that tokenizes the string as one token?
>
> Using KeywordTokenizer at query time should do whta you want.
>
>
> -Hoss
>

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Benson Margulies

On Tue, May 1, 2012 at 12:16 PM, Mark Miller  wrote:
> There is a recent JIRA issue about keeping the last n logs to display in the 
> admin UI.
>
> That introduced a problem - and then the fix introduced a problem - and then 
> the fix mitigated the problem but left that ugly logging as a by product.
>
> Don't remember the issue # offhand. I think there was a dispute about what 
> should be done with it.
>
> On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
>
>> CoreContainer.java, in the method 'load', finds itself calling
>> loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
>> 'Log4j'.

Couldn't someone just fix the if statement to say, 'OK, if we're doing
log4j, we have no log watcher' and skip all the loud failing on the
way?



>>
>> e.g.:
>>
>> 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
>> to load LogWatcher
>> org.apache.solr.common.SolrException: Error loading class 'Log4j'
>>
>> What is it actually looking for? Have I misplaced something?
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Email classification with solr

2012-05-01 Thread Ramo Karahasan

Hello,

 

just a short question:

 

Is it possible to use solr/Lucene as a e-mail classifier? I mean, analyzing
an e-mail to add it automatically to a category (four are available)?

 

 

Thanks,

Ramo

Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Mark Miller

There is a recent JIRA issue about keeping the last n logs to display in the 
admin UI.

That introduced a problem - and then the fix introduced a problem - and then 
the fix mitigated the problem but left that ugly logging as a by product.

Don't remember the issue # offhand. I think there was a dispute about what 
should be done with it.

On May 1, 2012, at 11:14 AM, Benson Margulies wrote:

> CoreContainer.java, in the method 'load', finds itself calling
> loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
> 'Log4j'.
> 
> e.g.:
> 
> 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
> to load LogWatcher
> org.apache.solr.common.SolrException: Error loading class 'Log4j'
> 
> What is it actually looking for? Have I misplaced something?

- Mark Miller
lucidimagination.com

Re: question on word parsing control

2012-05-01 Thread Jack Krupansky

This is a stemming artifact, that all of the forms of evaluat* are being 
stemmed to "evalu". That may seem odd, but stemming/stemmers are odd to 
begin with.


1. You could choose a different stemmer.
2. You could add synonyms to map various forms of the word to the desired 
form, such as eval.

3. Accept that Solr ain't perfect or optimal for every fine detail.
4. Or, maybe the stemmer behavior is technically "perfect", but perfection 
can be subjective.


In this particular case, maybe you might consider a synonym rule such as 
"eval=>evaluate".


-- Jack Krupansky

-Original Message- 
From: kenf_nc

Sent: Tuesday, May 01, 2012 9:23 AM
To: solr-user@lucene.apache.org
Subject: question on word parsing control

I have a field that is defined using what I believe is fairly standard 
"text"

fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on 'evalu' it finds all the matches.  Is that an indexing setting or query
setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to
be a match?

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-on-word-parsing-control-tp3952925.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: post.jar failing

2012-05-01 Thread Jack Krupansky

Sounds as if maybe it was some other kind of error having nothing to do with 
the data itself. Were there any additional errors or exceptions shortly 
before the failure? Maybe memory was low and some component wouldn't load, 
or somebody caught an exception without reporting the actual cause. After 
all, the message you provided said nothing about the actual problem. Maybe 
Solr itself needs a better diagnostic in that case.


-- Jack Krupansky

-Original Message- 
From: William Bell

Sent: Tuesday, May 01, 2012 11:09 AM
To: solr-user@lucene.apache.org
Subject: Re: post.jar failing

OK. I am using SOLR 3.6.

I restarted SOLR and it started working. No idea why. You were right I
showed the error log from a different document.

We might want to add a test case for CDATA.



 SP2514N
 Samsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133
 Samsung Electronics Co. Ltd.
 electronics
 hard drive
 7200RPM, 8MB cache, IDE Ultra ATA-133
 NoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor
 92
 6
 true
 
 2006-02-13T15:26:37Z
 
 35.0752,-97.032






On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky  
wrote:
Please clarify the problem, because the error message you provide refers 
to

address data that is not in the input data that you provide. It doesn't
match!

The error refers to an "edu" element, but the input data uses a "poff"
element. Maybe you have multiple "SP2514N" documents; maybe somebody made 
a
copy of the original and edited the address_xml field value. And maybe 
that

edited version that has an "edu" element has some obvious error.

In short, show us the full actual input address_xml field element, but
preferably the entire Solr input document for the version of the SP2514N
document that actually generates the error .

-- Jack Krupansky

-Original Message- From: William Bell
Sent: Monday, April 30, 2012 4:18 PM
To: solr-user@lucene.apache.org
Subject: post.jar failing


I am getting a post.jar failure when trying to post the following
CDATA field... It used to work on older versions. This is in SOlr 3.6.



 SP2514N
 Samsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133
 Samsung Electronics Co. Ltd.
 electronics
 hard drive
 7200RPM, 8MB cache, IDE Ultra ATA-133
 NoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor
 92
 6
 true
 
 2006-02-13T15:26:37Z
 
 35.0752,-97.032




Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error
adding
field 'address_xml'='
  
  MEDSCH
  
  UNIVERSITY OF COLORADO SCHOOL OF MEDICINE
  1974
  MD
  
  
'


--
Bill Bell
billnb...@gmail.com
cell 720-256-8076




--
Bill Bell
billnb...@gmail.com
cell 720-256-8076

question on word parsing control

2012-05-01 Thread kenf_nc

I have a field that is defined using what I believe is fairly standard "text"
fieldType. I have documents with the words 'evaluate', 'evaluating',
'evaluation' in them. When I search on the whole word, obviously it works,
if I search on 'eval' it finds nothing. However for some reason if I search
on 'evalu' it finds all the matches.  Is that an indexing setting or query
setting that will tokenize 'evalu' but not 'eval' and how do I get 'eval' to
be a match?

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-on-word-parsing-control-tp3952925.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr error after relacing schema.xml

2012-05-01 Thread BillB1951

PROBLEM RESOLVED.

Solr 3.6.0 changed where it looks for stopwords_en.txt (now in sub-directory
/lang) .  Schema.xml generated by Haystack 2.0.0 beta need to be edited. 
Everthing working now.

-
BillB1951
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3953115.html
Sent from the Solr - User mailing list archive at Nabble.com.

Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Benson Margulies

CoreContainer.java, in the method 'load', finds itself calling
loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
'Log4j'.

e.g.:

2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
to load LogWatcher
org.apache.solr.common.SolrException: Error loading class 'Log4j'

What is it actually looking for? Have I misplaced something?

Re: post.jar failing

2012-05-01 Thread William Bell

OK. I am using SOLR 3.6.

I restarted SOLR and it started working. No idea why. You were right I
showed the error log from a different document.

We might want to add a test case for CDATA.



  SP2514N
  Samsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133
  Samsung Electronics Co. Ltd.
  electronics
  hard drive
  7200RPM, 8MB cache, IDE Ultra ATA-133
  NoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor
  92
  6
  true
  
  2006-02-13T15:26:37Z
  
  35.0752,-97.032






On Tue, May 1, 2012 at 7:03 AM, Jack Krupansky  wrote:
> Please clarify the problem, because the error message you provide refers to
> address data that is not in the input data that you provide. It doesn't
> match!
>
> The error refers to an "edu" element, but the input data uses a "poff"
> element. Maybe you have multiple "SP2514N" documents; maybe somebody made a
> copy of the original and edited the address_xml field value. And maybe that
> edited version that has an "edu" element has some obvious error.
>
> In short, show us the full actual input address_xml field element, but
> preferably the entire Solr input document for the version of the SP2514N
> document that actually generates the error .
>
> -- Jack Krupansky
>
> -Original Message- From: William Bell
> Sent: Monday, April 30, 2012 4:18 PM
> To: solr-user@lucene.apache.org
> Subject: post.jar failing
>
>
> I am getting a post.jar failure when trying to post the following
> CDATA field... It used to work on older versions. This is in SOlr 3.6.
>
> 
> 
>  SP2514N
>  Samsung SpinPoint P120 SP2514N - hard drive - 250
> GB - ATA-133
>  Samsung Electronics Co. Ltd.
>  electronics
>  hard drive
>  7200RPM, 8MB cache, IDE Ultra ATA-133
>  NoiseGuard, SilentSeek technology, Fluid
> Dynamic Bearing (FDB) motor
>  92
>  6
>  true
>  
>  2006-02-13T15:26:37Z
>  
>  35.0752,-97.032
> 
>
> 
>
> Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error
> adding
> field 'address_xml'='
>   
>       MEDSCH
>       
>           UNIVERSITY OF COLORADO SCHOOL OF MEDICINE
>           1974
>           MD
>       
>   
> '
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: get latest 50 documents the fastest way

2012-05-01 Thread Otis Gospodnetic

Hi,

The first thing that comes to mind is to not query with *:*, which I'm guessing 
you are doing, but by running a query with a time range constraint that you 
know will return you enough docs, but not so many that performance suffers.

And, of course, thinking beyond Solr, if you really know you always need last 
50, you could simply keep last 50 in memory somewhere and get it from there, 
not from Solr, which should be faster.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: Yuval Dotan 
>To: solr-user  
>Sent: Tuesday, May 1, 2012 10:38 AM
>Subject: get latest 50 documents the fastest way
> 
>Hi Guys
>We have a use case where we need to get the 50 *latest *documents that
>match my query - without additional ranking,sorting,etc on the results.
>My index contains 1,000,000,000 documents and i noticed that if the number
>of found documents is very big (larger than 50% of the index size -
>500,000,000 docs) than it takes more than 5 seconds to get the results even
>with rows=50 parameter.
>Is there a way to get the results faster?
>Thanks
>Yuval
>
>
>

Re: get a total count

2012-05-01 Thread Rahul R

Hello,
A related question on this topic. How do I programmatically find the total
number of documents across many shards ? For EmbeddedSolrServer, I use the
following command to get the total count :
solrSearcher.getStatistics().get("numDocs")

With distributed search, how do i get the count of all records in all
shards. Apart from doing a *:* query, is there a way to get the total count
? I am not able to use the same command above because, I am not able to get
a handle to the SolrIndexSearcher object with distributed search. The conf
and data directories of my index reside directly under a folder called solr
(no core) under the weblogic domain directly. I dont have a SolrCore
object. With EmbeddedSolrServer, I used to get the SolrIndexSearcher object
using the following call :
solrSearcher = (SolrIndexSearcher)SolrCoreObject.getSearcher().get();

Stack Information :
OS : Solaris
jdk : 1.5.0_14 32 bit
Solr : 1.3
App Server : Weblogic 10MP1

Thank you.

- Rahul

On Tue, Nov 15, 2011 at 10:49 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> I'm assuming the question was about how MANY documents have been indexed
> across all shards.
>
> Answer #1:
> Look at the Solr Admin Stats page on each of your Solr instances and add
> up the numDocs numbers you see there
>
> Answer #2:
> Use Sematext's free Performance Monitoring tool for Solr
> On Index report choose "all, sum" in the Solr Host selector and that will
> show you the total # of docs across the cluster, total # of deleted docs,
> total segments, total size on disk, etc.
> URL: http://www.sematext.com/spm/solr-performance-monitoring/index.html
>
> Otis
> 
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >
> >From: U Anonym 
> >To: solr-user@lucene.apache.org
> >Sent: Monday, November 14, 2011 11:50 AM
> >Subject: get a total count
> >
> >Hello everyone,
> >
> >A newbie question:  how do I find out how documents have been indexed
> >across all shards?
> >
> >Thanks much!
> >
> >
> >
>

Re: Logging from data-config.xml

2012-05-01 Thread Twomey, David

 fixed the error, stupid typo, but log msg didn't appear until typo was
fixed.  I would have thought they would be unrelated.


On 5/1/12 10:42 AM, "Twomey, David"  wrote:

>
>
>I'm getting this error (below) when doing an import.   I'd like to add a
>Log line so I can see if the file path is messed up.
>
>So my data-config.xml looks like below but I'm not getting any extra info
>in the solr.log file under jetty.  Is there a way to log to this log file
>from data-import.xml?
>
>
>
>
>
>processor="FileListEntityProcessor"
>  fileName=".*xml"
>  rootEntity="false"
> dataSource="null"
>  baseDir="/index_files/pubmed/">
>
>  url="${medlineFileList.fileAblsolutePath}"
>forEach="/MedlineCitationSet/MedlineCitation"
>  
> 
>transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,Lo
>gTransformer"
>logTemplate="   processing
>${medlineFileList.fileAbsolutePath}"
>logLevel="info"
>stream="true">
>
>xpath="/MedlineCitationSet/MedlineCitation/PMID"   commonField="true" />
>  ...
>
>
>Thanks.
>
>
>INFO: Starting Full Import
>May 1, 2012 10:34:29 AM org.apache.solr.handler.dataimport.SolrWriter
>readIndexerProperties
>INFO: Read dataimport.properties
>May 1, 2012 10:34:29 AM org.apache.solr.common.SolrException log
>SEVERE: Exception while processing: medlineFileList document :
>null:org.apache.solr.handler.dataimport.DataImportHandlerException:
>java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
>file: 
> at 
>org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow
>(
>DataImportHandlerException.java:64)
> at 
>org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnt
>i
>tyProcessor.java:286)
> at 
>org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPath
>E
>ntityProcessor.java:224)
> at 
>org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntit
>y
>Processor.java:204)
> at 
>org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr
>o
>cessorWrapper.java:238)
> at 
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav
>a
>:591)
> at 
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav
>a
>:617)
> at 
>org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2
>6
>7)
> at 
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
> at 
>org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.
>j
>ava:353)
> at 
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4
>1
>1)
> at 
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:39
>2
>)
>Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
>Could not find file:
> at 
>org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.j
>a
>va:113)
> at 
>org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.j
>a
>va:85)
> at 
>org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.j
>a
>va:47)
> at 
>org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnt
>i
>tyProcessor.java:283)
> ... 10 more
>Caused by: java.io.FileNotFoundException: Could not find file:
> at 
>org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.j
>a
>va:111)
> ... 13 more
>

Re: Removing old documents

2012-05-01 Thread mav.p...@holidaylettings.co.uk

Hi

What I do is I put the date created for when the doc was inserted or
updated and then I do a search/delete query based on that

Mav



On 01/05/2012 15:31, "Bai Shen"  wrote:

>I'm running Nutch, so it's updating the documents, but I'm wanting to
>remove ones that are no longer available.  So in that case, there's no
>update possible.
>
>On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
>mav.p...@holidaylettings.co.uk> wrote:
>
>> Not sure if there is an automatic way but we do it via a delete query
>>and
>> where possible we update doc under same id to avoid deletes.
>>
>>
>>
>>
>>
>> On 01/05/2012 13:43, "Bai Shen"  wrote:
>>
>> >What is the best method to remove old documents?  Things that no
>>generate
>> >404 errors, etc.
>> >
>> >Is there an automatic method or do I have to do it manually?
>> >
>> >THanks.
>>
>>

Logging from data-config.xml

2012-05-01 Thread Twomey, David



I'm getting this error (below) when doing an import.   I'd like to add a
Log line so I can see if the file path is messed up.

So my data-config.xml looks like below but I'm not getting any extra info
in the solr.log file under jetty.  Is there a way to log to this log file
from data-import.xml?





  

  


  ...


Thanks.


INFO: Starting Full Import
May 1, 2012 10:34:29 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
May 1, 2012 10:34:29 AM org.apache.solr.common.SolrException log
SEVERE: Exception while processing: medlineFileList document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: 
 at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(
DataImportHandlerException.java:64)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnti
tyProcessor.java:286)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathE
ntityProcessor.java:224)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntity
Processor.java:204)
 at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPro
cessorWrapper.java:238)
 at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java
:591)
 at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java
:617)
 at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:26
7)
 at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
 at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.j
ava:353)
 at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:41
1)
 at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392
)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
Could not find file:
 at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.ja
va:113)
 at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.ja
va:85)
 at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.ja
va:47)
 at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEnti
tyProcessor.java:283)
 ... 10 more
Caused by: java.io.FileNotFoundException: Could not find file:
 at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.ja
va:111)
 ... 13 more

Re: Removing old documents

2012-05-01 Thread Markus Jelsma

Nutch 1.4 has a separate tool to remove 404 and redirects documents from your 
index based on your CrawlDB. Trunk's SolrIndexer can add and remove documents 
in one run based on segment data.

On Tuesday 01 May 2012 16:31:47 Bai Shen wrote:
> I'm running Nutch, so it's updating the documents, but I'm wanting to
> remove ones that are no longer available.  So in that case, there's no
> update possible.
> 
> On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
> 
> mav.p...@holidaylettings.co.uk> wrote:
> > Not sure if there is an automatic way but we do it via a delete query and
> > where possible we update doc under same id to avoid deletes.
> > 
> > On 01/05/2012 13:43, "Bai Shen"  wrote:
> > >What is the best method to remove old documents?  Things that no
> > >generate 404 errors, etc.
> > >
> > >Is there an automatic method or do I have to do it manually?
> > >
> > >THanks.

-- 
Markus Jelsma - CTO - Openindex

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc

Awesome, I'll give it try.  Thanks Jack!

On Tue, May 1, 2012 at 10:23 AM, Jack Krupansky wrote:

> Sorry for the confusion. It is doable. If you feed the raw HTML into a
> field that has the HTMLStripCharFilter, the stored value will retain the
> HTML tags, while the indexed text will be stripped of the of the tags
> during analysis and be searchable just like a normal text field. Then,
> search will not see "".
>
>
> -- Jack Krupansky
>
> -Original Message- From: okayndc
> Sent: Tuesday, May 01, 2012 10:08 AM
> To: solr-user@lucene.apache.org
> Subject: Re: extracting/indexing HTML via cURL
>
>
> Thank you Jack.
>
> So, it's not doable/possible to search and highlight keywords within a
> field that contains the raw formatted HTML?  and strip out the HTML tags
> during analysis...so that a user would get back nothing if they did a
> search for (ex. )?
>
> On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky *
> *wrote:
>
>  I was thinking that you wanted to index the actual text from the HTML
>> page, but have the stored field value still have the raw HTML with tags.
>> If
>> you just want to store only the raw HTML, a simple string field is
>> sufficient, but then you can't easily do a text search on it.
>>
>> Or, you can have two fields, one string field for the raw HTML (stored,
>> but not indexed) and then do a CopyField to a text field field that has
>> the
>> HTMLStripCharFilter to strip the HTML tags and index only the text
>> (indexed, but not stored.)
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: okayndc
>> Sent: Monday, April 30, 2012 5:06 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr: extracting/indexing HTML via cURL
>>
>> Great, thank you for the input.  My understanding of HTMLStripCharFilter
>> is
>> that it strips HTML tags, which is not what I want ~ is this correct?  I
>> want to keep the HTML tags intact.
>>
>> On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky > >
>> **wrote:
>>
>>
>>  If by "extracting HTML content via cURL" you mean using SolrCell to parse
>>
>>> html files, this seems to make sense. The sequence is that regardless of
>>> the file type, each file extraction "parser" will strip off all
>>> formatting
>>> and produce a raw text stream. Office, PDF, and HTML files are all
>>> treated
>>> the same in that way. Then, the unformatted text stream is sent through
>>> the
>>> field type analyzers to be tokenized into terms that Lucene can index.
>>> The
>>> input string to the field type analyzer is what gets stored for the
>>> field,
>>> but this occurs after the extraction file parser has already removed
>>> formatting.
>>>
>>> No way for the formatting to be preserved in that case, other than to go
>>> back to the original input document before extraction parsing.
>>>
>>> If you really do want to preserve full HTML formatted text, you would
>>> need
>>> to define a field whose field type uses the HTMLStripCharFilter and then
>>> directly add documents that direct the raw HTML to that field.
>>>
>>> There may be some other way to hook into the update processing chain, but
>>> that may be too much effort compared to the HTML strip filter.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: okayndc
>>> Sent: Monday, April 30, 2012 10:07 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr: extracting/indexing HTML via cURL
>>>
>>>
>>> Hello,
>>>
>>> Over the weekend I experimented with extracting HTML content via cURL and
>>> just
>>> wondering why the extraction/indexing process does not include the HTML
>>> tags.
>>> It seems as though the HTML tags either being ignored or stripped
>>> somewhere
>>> in the pipeline.
>>> If this is the case, is it possible to include the HTML tags, as I would
>>> like to keep the
>>> formatted HTML intact?
>>>
>>> Any help is greatly appreciated.
>>>
>>>
>>>
>>
>

Re: Removing old documents

2012-05-01 Thread Bai Shen

I'm running Nutch, so it's updating the documents, but I'm wanting to
remove ones that are no longer available.  So in that case, there's no
update possible.

On Tue, May 1, 2012 at 8:47 AM, mav.p...@holidaylettings.co.uk <
mav.p...@holidaylettings.co.uk> wrote:

> Not sure if there is an automatic way but we do it via a delete query and
> where possible we update doc under same id to avoid deletes.
>
>
>
>
>
> On 01/05/2012 13:43, "Bai Shen"  wrote:
>
> >What is the best method to remove old documents?  Things that no generate
> >404 errors, etc.
> >
> >Is there an automatic method or do I have to do it manually?
> >
> >THanks.
>
>

Re: extracting/indexing HTML via cURL

2012-05-01 Thread Jack Krupansky

Sorry for the confusion. It is doable. If you feed the raw HTML into a field 
that has the HTMLStripCharFilter, the stored value will retain the HTML 
tags, while the indexed text will be stripped of the of the tags during 
analysis and be searchable just like a normal text field. Then, search will 
not see "".


-- Jack Krupansky

-Original Message- 
From: okayndc

Sent: Tuesday, May 01, 2012 10:08 AM
To: solr-user@lucene.apache.org
Subject: Re: extracting/indexing HTML via cURL

Thank you Jack.

So, it's not doable/possible to search and highlight keywords within a
field that contains the raw formatted HTML?  and strip out the HTML tags
during analysis...so that a user would get back nothing if they did a
search for (ex. )?

On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky 
wrote:



I was thinking that you wanted to index the actual text from the HTML
page, but have the stored field value still have the raw HTML with tags. 
If

you just want to store only the raw HTML, a simple string field is
sufficient, but then you can't easily do a text search on it.

Or, you can have two fields, one string field for the raw HTML (stored,
but not indexed) and then do a CopyField to a text field field that has 
the

HTMLStripCharFilter to strip the HTML tags and index only the text
(indexed, but not stored.)

-- Jack Krupansky

-Original Message- From: okayndc
Sent: Monday, April 30, 2012 5:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr: extracting/indexing HTML via cURL

Great, thank you for the input.  My understanding of HTMLStripCharFilter 
is

that it strips HTML tags, which is not what I want ~ is this correct?  I
want to keep the HTML tags intact.

On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky 
**wrote:

 If by "extracting HTML content via cURL" you mean using SolrCell to parse

html files, this seems to make sense. The sequence is that regardless of
the file type, each file extraction "parser" will strip off all 
formatting
and produce a raw text stream. Office, PDF, and HTML files are all 
treated

the same in that way. Then, the unformatted text stream is sent through
the
field type analyzers to be tokenized into terms that Lucene can index. 
The
input string to the field type analyzer is what gets stored for the 
field,

but this occurs after the extraction file parser has already removed
formatting.

No way for the formatting to be preserved in that case, other than to go
back to the original input document before extraction parsing.

If you really do want to preserve full HTML formatted text, you would 
need

to define a field whose field type uses the HTMLStripCharFilter and then
directly add documents that direct the raw HTML to that field.

There may be some other way to hook into the update processing chain, but
that may be too much effort compared to the HTML strip filter.

-- Jack Krupansky

-Original Message- From: okayndc
Sent: Monday, April 30, 2012 10:07 AM
To: solr-user@lucene.apache.org
Subject: Solr: extracting/indexing HTML via cURL


Hello,

Over the weekend I experimented with extracting HTML content via cURL and
just
wondering why the extraction/indexing process does not include the HTML
tags.
It seems as though the HTML tags either being ignored or stripped
somewhere
in the pipeline.
If this is the case, is it possible to include the HTML tags, as I would
like to keep the
formatted HTML intact?

Any help is greatly appreciated.

Re: Solr Merge during off peak times

2012-05-01 Thread Otis Gospodnetic

Hi Prabhu,

I don't think such a merge policy exists, but it would be nice to have this 
option and I imagine it wouldn't be hard to write if you really just base the 
merge or no merge decision on the time of day (and maybe day of the week).

Note that this should go into Lucene, not Solr, so if you decide to contribute 
your work, please see http://wiki.apache.org/lucene-java/HowToContribute

Otis

Performance Monitoring for Solr - http://sematext.com/spm




>
> From: "Prakashganesh, Prabhu" 
>To: "solr-user@lucene.apache.org"  
>Sent: Tuesday, May 1, 2012 8:45 AM
>Subject: Solr Merge during off peak times
> 
>Hi,
>  I would like to know if there is a way to configure index merge policy in 
>solr so that the merging happens during off peak hours. Can you please let me 
>know if such a merge policy configuration exists?
>
>Thanks
>Prabhu
>
>
>

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc

Thank you Jack.

So, it's not doable/possible to search and highlight keywords within a
field that contains the raw formatted HTML?  and strip out the HTML tags
during analysis...so that a user would get back nothing if they did a
search for (ex. )?

On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky wrote:

> I was thinking that you wanted to index the actual text from the HTML
> page, but have the stored field value still have the raw HTML with tags. If
> you just want to store only the raw HTML, a simple string field is
> sufficient, but then you can't easily do a text search on it.
>
> Or, you can have two fields, one string field for the raw HTML (stored,
> but not indexed) and then do a CopyField to a text field field that has the
> HTMLStripCharFilter to strip the HTML tags and index only the text
> (indexed, but not stored.)
>
> -- Jack Krupansky
>
> -Original Message- From: okayndc
> Sent: Monday, April 30, 2012 5:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr: extracting/indexing HTML via cURL
>
> Great, thank you for the input.  My understanding of HTMLStripCharFilter is
> that it strips HTML tags, which is not what I want ~ is this correct?  I
> want to keep the HTML tags intact.
>
> On Mon, Apr 30, 2012 at 11:55 AM, Jack Krupansky 
> **wrote:
>
>  If by "extracting HTML content via cURL" you mean using SolrCell to parse
>> html files, this seems to make sense. The sequence is that regardless of
>> the file type, each file extraction "parser" will strip off all formatting
>> and produce a raw text stream. Office, PDF, and HTML files are all treated
>> the same in that way. Then, the unformatted text stream is sent through
>> the
>> field type analyzers to be tokenized into terms that Lucene can index. The
>> input string to the field type analyzer is what gets stored for the field,
>> but this occurs after the extraction file parser has already removed
>> formatting.
>>
>> No way for the formatting to be preserved in that case, other than to go
>> back to the original input document before extraction parsing.
>>
>> If you really do want to preserve full HTML formatted text, you would need
>> to define a field whose field type uses the HTMLStripCharFilter and then
>> directly add documents that direct the raw HTML to that field.
>>
>> There may be some other way to hook into the update processing chain, but
>> that may be too much effort compared to the HTML strip filter.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: okayndc
>> Sent: Monday, April 30, 2012 10:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr: extracting/indexing HTML via cURL
>>
>>
>> Hello,
>>
>> Over the weekend I experimented with extracting HTML content via cURL and
>> just
>> wondering why the extraction/indexing process does not include the HTML
>> tags.
>> It seems as though the HTML tags either being ignored or stripped
>> somewhere
>> in the pipeline.
>> If this is the case, is it possible to include the HTML tags, as I would
>> like to keep the
>> formatted HTML intact?
>>
>> Any help is greatly appreciated.
>>
>>
>

Grouping ngroups count

2012-05-01 Thread Francois Perron

Hello all,

  I tried to use grouping with 2 slices with a index of 35K documents.  When I 
ask top 10 rows, grouped by filed A, it gave me about 16K groups.  But, if I 
ask for top 20K rows, the ngroups property is now at 30K.  

Do you know why and of course how to fix it ?

Thanks.

Re: post.jar failing

2012-05-01 Thread Jack Krupansky

Please clarify the problem, because the error message you provide refers to 
address data that is not in the input data that you provide. It doesn't 
match!


The error refers to an "edu" element, but the input data uses a "poff" 
element. Maybe you have multiple "SP2514N" documents; maybe somebody made a 
copy of the original and edited the address_xml field value. And maybe that 
edited version that has an "edu" element has some obvious error.


In short, show us the full actual input address_xml field element, but 
preferably the entire Solr input document for the version of the SP2514N 
document that actually generates the error .


-- Jack Krupansky

-Original Message- 
From: William Bell

Sent: Monday, April 30, 2012 4:18 PM
To: solr-user@lucene.apache.org
Subject: post.jar failing

I am getting a post.jar failure when trying to post the following
CDATA field... It used to work on older versions. This is in SOlr 3.6.



 SP2514N
 Samsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133
 Samsung Electronics Co. Ltd.
 electronics
 hard drive
 7200RPM, 8MB cache, IDE Ultra ATA-133
 NoiseGuard, SilentSeek technology, Fluid
Dynamic Bearing (FDB) motor
 92
 6
 true
 
 2006-02-13T15:26:37Z
 
 35.0752,-97.032




Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error 
adding

field 'address_xml'='
   
   MEDSCH
   
   UNIVERSITY OF COLORADO SCHOOL OF MEDICINE
   1974
   MD
   
   
'


--
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: should slave replication be turned off / on during master clean and re-index?

2012-05-01 Thread geeky2

hello shawn,

thanks for the reply.

ok - i did some testing and yes you are correct.

autocommit is doing the "commit" work in chunks. yes - the slaves are also
going to having everything to nothing, then slowly building back up again,
lagging behind the master.

... and yes - this is probably not what we need - as far as a replication
strategy for the slaves.

you said, you don't use autocommit. if so - then why don't you use / like
autocommit?

since we have not done this here - there is no established reference point,
from an operations perspective.

i am looking to formulate some sort of operation strategy, so ANY ideas or
input is really welcome.

it seems to me that we have to account for two operational strategies -

the first operational mode is a "daily" append to the solr core after the
database tables have been updated. this can probably be done with a simple
delta import. i would think that autocommit could remain on for the master
and replication could also be left on so the slaves picked up the changes
ASAP. this seems like the mode that we would / should be in most of the
time.

the second operational mode would be a "build from scratch" mode, where
changes in the schema necessitated a full re-index of the data. given that
our site (powered by solr) must be up all of the time, and that our full
index time on the master (for the moment) is hovering somewhere around 16
hours - it makes sense that some sort of parallel path - with a cut-over,
must be used.

in this situation is it possible to have the indexing process going on in
the background - then have one commit at the end - then turn replication on
for the slaves?

are there disadvantages to this approach?

also - i really like your suggestion of a "build core" and "live core". is
this approach you use?

thank you for all of the great input

then

--
View this message in context:
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Removing old documents

2012-05-01 Thread mav.p...@holidaylettings.co.uk

Not sure if there is an automatic way but we do it via a delete query and
where possible we update doc under same id to avoid deletes.

On 01/05/2012 13:43, "Bai Shen"  wrote:

>What is the best method to remove old documents?  Things that no generate
>404 errors, etc.
>
>Is there an automatic method or do I have to do it manually?
>
>THanks.

Re: correct XPATH syntax

2012-05-01 Thread lboutros

Hi David,

I think you should add this option : flatten=true

and the could you try to use this XPath :

/MedlineCitationSet/MedlineCitation/AuthorList/Author

see here for the description :

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

I don't think the that the commonField option is needed here, I think you
should suppress it.

Ludovic. 

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types

2012-05-01 Thread Lance Norskog

I've no experience in the language nuances. I've found that I had to
mix unigram phrase searches with free-text searces in bigram fields.
This is for Chinese language, not Japanese. The bigram idea comes
about apparently because Chinese characters tend to be clumped into
2-3 letter "words", in a way that is not consistent across different
kinds of text. I have no pretense of understanding the whys.

On Mon, Apr 30, 2012 at 2:21 PM, Burton-West, Tom  wrote:
> Thanks wunder,
>
> I really appreciate the help.
>
> Tom
>

-- 
Lance Norskog
goks...@gmail.com

Re: core sleep/wake

2012-05-01 Thread Erick Erickson

Well, that'll be kinda self-defeating. The whole point of auto-warming
is to fill up the caches, consuming memory. Without that, searches
will be slow. So the idea of using minimal resources is really
antithetical to having these in-memory structures filled up.

You can try configuring minimal caches & etc. Or just give it
lots of memory and count on your OS to swap the pages out
if the particular core doesn't get used.

Best
Erick

On Mon, Apr 30, 2012 at 5:18 PM, oferiko  wrote:
> I have a multicore solr with a lot of cores that contains a lot of data (~50M
> documents), but are rarely used.
> Can i load a core from configuration, but have keep it in sleep mode, where
> is has all the configuration available, but it hardly consumes resources,
> and based on a query or an update, it will "come to life"?
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/core-sleep-wake-tp3951850.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: post.jar failing

2012-05-01 Thread Erick Erickson

Works fine for me with address_xml as string type, indexed, stored
on 3.6. What version of Solr are you using?

Best
Erick

On Mon, Apr 30, 2012 at 4:18 PM, William Bell  wrote:
> I am getting a post.jar failure when trying to post the following
> CDATA field... It used to work on older versions. This is in SOlr 3.6.
>
> 
> 
>  SP2514N
>  Samsung SpinPoint P120 SP2514N - hard drive - 250
> GB - ATA-133
>  Samsung Electronics Co. Ltd.
>  electronics
>  hard drive
>  7200RPM, 8MB cache, IDE Ultra ATA-133
>  NoiseGuard, SilentSeek technology, Fluid
> Dynamic Bearing (FDB) motor
>  92
>  6
>  true
>  
>  2006-02-13T15:26:37Z
>  
>  35.0752,-97.032
> 
>
> 
>
> Apr 30, 2012 1:53:49 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=SP2514N] Error 
> adding
> field 'address_xml'='
>    
>        MEDSCH
>        
>            UNIVERSITY OF COLORADO SCHOOL OF MEDICINE
>            1974
>            MD
>        
>    
> '
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076

Re: Newbie question on sorting

2012-05-01 Thread Erick Erickson

The easiest way is to do that in the app. That is, return the top
10 to the app (by score) then re-order them there. There's nothing
in Solr that I know of that does what you want out of the box.

Best
Erick

On Mon, Apr 30, 2012 at 11:10 AM, Jacek  wrote:
> Hello all,
>
> I'm facing this simple problem, yet impossible to resolve for me (I'm a
> newbie in Solr).
> I need to sort the results by score (it is simple, of course), but then
> what I need is to take top 10 results, and re-order it (only those top 10
> results) by a date field.
> It's not the same as sort=score,creationdate
>
> Any suggestions will be greatly appreciated!

76 matches

Mail list logo