RE: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind
"Staging" the data in a non-Solr store sounds like a potentially reasonable 
idea to me. You might want to consider a NoSQL store of some kind like MongoDB 
perhaps, instead of an rdbms. 

The way to think about Solr is not as a store or a database -- it's an index 
for serving your application. That's also the way to think about how to get 
your multiple tables in there -- denormalize, denormalize, denormalize.  You 
need to think about what you actually need to search over, and build your index 
to serve that efficiently, rather than thinking about normalization or data 
modelling the way we are used to with rdbms's, it's a different way of 
thinking.  

A Solr index basically gives you one collection of documents. But the documents 
can all have different fields -- so you _could_ (but probably don't want to) 
essentially put all your tables in there with unique fields --they're all in 
the same index, they're all just "documents", but some have a table1_title and 
table1_author, and others have no data in those fields but a table2_productName 
and a table2_price.  Then if you want to query on just one type of thing, you 
just query on those fields.  Except... you don't get any joins.  Which is why 
you probably don't want to do that after all, it probably won't serve your 
needs. 

Figuring out the right way to model your data in Solr can be tricky, and it is 
sometimes hard to do exactly what you want. Solr isn't an rdbms, and in some 
ways isn't as powerful as an rdbms -- in the sense of being as flexible with 
what kinds of queries you can run on any given data.   What it does is give you 
very fast access to inverted index lookups and set combinations and facetting 
that would be very hard to do efficiently in an rdbms. It is a trade-off.  But 
there's not really a general answer to "how do I take these dozen rdbms tables 
and store them in Solr the best way?" -- it depends on what kinds of searching 
you need to support and the nature of your data. 

From: Sharma, Raghvendra [sraghven...@corelogic.com]
Sent: Tuesday, September 28, 2010 2:15 AM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

Thanks for the responses people.

@Grant

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements.

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable.

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

> When do you need to deploy?
>
> As I understand it, the spatial search in Solr is being rewritten and is 
> slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

>
> The existing spatial search has some serious problems and is deprecated.
>
> Right now, I think the only way to get spatial search in Solr is to deploy a 
> nightly snapshot from the active development on trunk. If you are deploying a 
> year from now, that might change.
>
> There is not any support for SQL-like statements or for joins. The best 
> practice for Solr is to think of your data as a sing

RE: Is Solr right for my business situation ?

2010-09-27 Thread Sharma, Raghvendra
Thanks for the responses people.

@Grant  

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements. 

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

> When do you need to deploy?
> 
> As I understand it, the spatial search in Solr is being rewritten and is 
> slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

> 
> The existing spatial search has some serious problems and is deprecated.
> 
> Right now, I think the only way to get spatial search in Solr is to deploy a 
> nightly snapshot from the active development on trunk. If you are deploying a 
> year from now, that might change.
> 
> There is not any support for SQL-like statements or for joins. The best 
> practice for Solr is to think of your data as a single table, essentially 
> creating a view from your database. The rows become Solr documents, the 
> columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

> 
> wunder
> 
> On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
> 
>> I am sure these kind of questions keep coming to you guys, but I want to 
>> raise the same question in a different context...my own business situation.
>> I am very very new to solr and though I have tried to read through the 
>> documentation, I have nowhere near completing the whole read.
>> 
>> The need is like this - 
>> 
>> We have a huge rdbms database/table. A single table perhaps houses 100+ 
>> million rows. Though oracle is doing a fine job of handling the insertion 
>> and updation of data, the querying is where our main concerns lie.  Since we 
>> have spatial data, the index building takes hours and hours for such tables.
>> 
>> That's when we thought of moving away from standard rdbms and thought of 
>> trying something different and fast. 
>> My last week has been spent in a journey reading through bigtable to hadoop 
>> to hbase, to hive and then finally landed on solr. As far as I am in my 
>> tests, it looks pretty good, but I have a few unanswered questions still. 
>> Trying this group for them  :)  (I am sure I can find some answers if I 
>> read/google more on the topic, but now I m being lazy and feel asking the 
>> people who are already using it/or perhaps developing it is a better bet).
>> 
>> 1. Can I get my solr instance to load data (fresh data for indexing) from a 
>> stream (imagine a mq kind of queue, or similar) ?

Yes, with a little bit of work.

>> 2. Can I host my solr instance to use hbase as the database/file system 
>> (read HDFS) ?

Probably, but I doubt it will be fast.  Local disk is usually the best.  100+ M 
rows is large but not unreasonable.

>> 3. are there somewhere any reports available (as in benchmarks ) for a solr 
>> instance's performance ? 

You can probably search the web for these.  I've personally seen several 
installs w/ 1B+ docs and subsecond search and faceting and heard of others.  
You might look at the stuff the Hathi tr

Re: Solr UIMA integration

2010-09-27 Thread Tommaso Teofili
Hi Maheshkumar,
I attached a patch for inclusion of this project as a Solr contrib module
[1] , there you can find the patch to apply to the Solr trunk along with
needed jars (attached as a zip archive).
I think that your issue could be related to the fact that GC project
dependency is from Solr 1.4.1, not from trunk, so the patch should fix it.
Hope this helps,
Tommaso

[1] : https://issues.apache.org/jira/browse/SOLR-2129

2010/9/27 maheshkumar 

>
> Hi Tommaso,
>
> All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator,
> Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn
>
> AlchemyAPIAnnotator:
> http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator
> OpenCalaisAnnotator:
> http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator
> Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger
> WhitespaceTokenizer:
> http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer
>
> solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima
>
> I am using the the latest Solr version checkout from svn i guess it is
> greater than 1.4.1.
>
> Tommaso, is it possible for you to upload all the dependency jar @
> http://code.google.com/p/solr-uima/downloads/list.
>
> Thanks
> Mahesh
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Search Interface

2010-09-27 Thread Claudio Devecchi
Hi everybody,

I`m implementing my first solr engine for conceptual tests, I`m crawling my
wiki intranet to make some searches, the engine is working fine already, but
I need some interface to make my searchs.
Somebody knows where can I find some search interface just for
customizations?

Tks
-- 
Claudio Devecchi
flickr.com/cdevecchi


Re: FieldType for storing date

2010-09-27 Thread Chris Hostetter

: I was wondering what would be the best FieldType for storing date with a 
: millisecond precision that would allow me to sort and run range queries 
: against this field. We would like to achieve the best query performance, 
: minimal heap - fieldcache - requirements, good indexing throughput and 
: minimal index size in that order.

if you don't need sortMissingLast or sortMissingFirst then TrieDateField 
should be exactly what you are looking for.

: We could probably use TrieLongField, however, as we understand, this 
: doubles the heap requirements for fieldcache. Was wondering if there is 
: a clever way of achieving this without adding to the heap.

TrieDateField uses the long[] FieldCache, I'm not sure what you mean by 
"doubles the heap requirements" ... unless you are comparing to "int" ?

In that case: using TrieIntField seems like what you want?

(but if you are comparing to DateField, the FieldCache for TrieDateField 
is going to be a lot smaller)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Grouping in solr ?

2010-09-27 Thread Chris Hostetter
: References:
: 
: In-Reply-To:
: 
: Subject: Grouping in solr ?

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Renaming Solr mbean

2010-09-27 Thread Chris Hostetter

: In our setup, we run several instances of Solr under one instance of 
: Tomcat.  I simply rename the WAR to soemthing we use internally - 
: solr-people, solr-connections, solr-companies, etc etc.  This part works 
: fine and lets us have, use, and maintain invidual instances.
...
: What I'm finding is every instance reporting to /solr, which skews my 
: queries.  Although I can somewhat predict which searcher is which, its 
: not reliable enough to be able to associate statistics with our named 
: versions of our indices.

Take a look at the commit associated with this Jira issue...

https://issues.apache.org/jira/browse/SOLR-1843

It's available on the trunk, but is not yet available in a released 
version of solr.

http://svn.apache.org/viewvc?view=revision&revision=942292


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: DIH ConcurrentModificationException

2010-09-27 Thread Reuben A Christie


  
  
is this fixed in solr-1.4.1 ?
  I have seen ConcurrentModificationException during search
  operation using EmbeddedSolrServer, when tested using jmeter with
  on more than one Concurrent users.

best,
Reuben

On 5/5/2009 2:25 AM, Shalin Shekhar Mangar wrote:

  This is fixed in trunk.

2009/5/5 Noble Paul നോബിള്‍ नोब्ळ् 


  
hi Walter,
it needs synchronization. I shall open a bug.



On Mon, May 4, 2009 at 7:31 PM, Walter Ferrara 
wrote:


  I've got a ConcurrentModificationException during a cron-ed delta import


of


  DIH, I'm using multicore solr nightly from hudson 2009-04-02_08-06-47.
I don't know if this stacktrace maybe useful to you, but here it is:

java.util.ConcurrentModificationException
   at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(Unknown
Source)
   at java.util.LinkedHashMap$EntryIterator.next(Unknown Source)
   at java.util.LinkedHashMap$EntryIterator.next(Unknown Source)
   at



org.apache.solr.handler.dataimport.DataImporter.getStatusMessages(DataImporter.java:384)


 at



org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:210


  )
   at



org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)


 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at



org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)


 at



org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)


 at



org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)


 at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at



org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)


 at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at



org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)


 at



org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)


 at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at



org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)


 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
   at


org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)


 at


org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)


 at



org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)


 at



org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


  
of-course due to the nature of this exception I doubt it can be


reproduced


  easily (this is the only one I've got, and the croned job runned a lot of
times), but maybe should a synchronized be put somewhere?
ciao,
Walter






--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


  
  





-- 
  
  



Re: Is Solr right for our project?

2010-09-27 Thread Jan Høydahl / Cominvent
Solr will match this in version 3.1 which is the next major release.
Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions
Coming to a trunk near you - see https://issues.apache.org/jira/browse/SOLR-1873

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 27. sep. 2010, at 17.44, Mike Thomsen wrote:

> (I apologize in advance if I missed something in your documentation,
> but I've read through the Wiki on the subject of distributed searches
> and didn't find anything conclusive)
> 
> We are currently evaluating Solr and Autonomy. Solr is attractive due
> to its open source background, following and price. Autonomy is
> expensive, but we know for a fact that it can handle our distributed
> search requirements perfectly.
> 
> What we need to know is if Solr has capabilities that match or roughly
> approximate Autonomy's Distributed Search Handler. What it does it
> acts as a front-end for all of Autonomy's IDOL search servers (which
> correspond in this scenario to Solr shards). It is configured to know
> what is on each shard, which servers hold each shard and intelligently
> farms out queries based on that configuration. There is no need to
> specify which IDOL servers to hit while querying; the DiSH just knows
> where to go. Additionally, I believe in cases where an index piece is
> mirrored, it also monitors server health and falls back intelligently
> on other backup instances of a shard/index piece based on that.
> 
> I'd appreciate it if someone can give me a frank explanation of where
> Solr stands in this area.
> 
> Thanks,
> 
> Mike



Re: Need help with spellcheck city name

2010-09-27 Thread Savannah Beckett
No, I checked, there is a city called Swan in Iowa.  So, it is getting from the 
city index, so is Clerk.  But why does it favor Swan than San?  Spellcheck get 
weird after I treat city name as one token.  If I do it in the old way, it let 
San go, and correct Jos as Ojos instead of Jose because Ojos is ranked as #1 
and 
Jose at the middle.  Any more suggestions?  Rank it by frequency first then 
score doesn't work neither.  


 


From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 5:24:25 PM
Subject: Re: Need help with spellcheck city name

Hmmm, did you rebuild your spelling index after the config changes?

And it really looks like somehow you're getting results from a field other
than city. Are you also sure that your cityname field is of type
autocomplete1?

Shooting in the dark here, but these results are so weird that I suspect
it's
something fundamental

Best
Erick

On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett <
savannah_becket...@yahoo.com> wrote:

> No, it doesn't work, I got weird result. I set my city name field to be
> parsed
> as a token as following:
>
>         positionIncrementGap="100">
>          
>            
>            
>          
>          
>            
>            
>          
>        
>
> I got following result for spellcheck:
>
> 
> -    
> -        
>              1
>              0
>              3
> -            
>                  swan
>          
>      
> -        
>              1
>              4
>        8
>                
>          clark
>      
>      
>  
>
>
>
>
>
> 
> From: Tom Hill 
> To: solr-user@lucene.apache.org
> Sent: Mon, September 27, 2010 3:52:48 PM
> Subject: Re: Need help with spellcheck city name
>
> Maybe process the city name as a single token?
>
> On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
>  wrote:
> > Hi,
> >  I have city name as a text field, and I want to do spellcheck on it.  I
> use
> > setting in http://wiki.apache.org/solr/SpellCheckComponent
> >
> > If I setup city name as text field and do spell check on "San Jos" for
> San
> >Jose,
> > I get suggestion for Jos as "ojos".  I checked the extendedresult and I
> found
> > that Jose is in the middle of all 10 suggestions in term of score and
> > frequency.  I then set city name as string field, and spell check again,
> I got
> > Van for San and Ross for Jos, which is weird because San is correct.
> >
> >
> > How do you setup spellchecker to spellcheck city names?  City name can
> have
> > multiple words.
> > Thanks.
> >
> >
> >
>
>
>
>
>



  

Re: Need help with spellcheck city name

2010-09-27 Thread Erick Erickson
Hmmm, did you rebuild your spelling index after the config changes?

And it really looks like somehow you're getting results from a field other
than city. Are you also sure that your cityname field is of type
autocomplete1?

Shooting in the dark here, but these results are so weird that I suspect
it's
something fundamental

Best
Erick

On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett <
savannah_becket...@yahoo.com> wrote:

> No, it doesn't work, I got weird result. I set my city name field to be
> parsed
> as a token as following:
>
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 
>
> I got following result for spellcheck:
>
> 
> - 
> - 
>   1
>   0
>   3
> - 
>   swan
>   
>   
> - 
>   1
>   4
>8
> 
>  clark
>  
>   
>   
>
>
>
>
>
> 
> From: Tom Hill 
> To: solr-user@lucene.apache.org
> Sent: Mon, September 27, 2010 3:52:48 PM
> Subject: Re: Need help with spellcheck city name
>
> Maybe process the city name as a single token?
>
> On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
>  wrote:
> > Hi,
> >   I have city name as a text field, and I want to do spellcheck on it.  I
> use
> > setting in http://wiki.apache.org/solr/SpellCheckComponent
> >
> > If I setup city name as text field and do spell check on "San Jos" for
> San
> >Jose,
> > I get suggestion for Jos as "ojos".  I checked the extendedresult and I
> found
> > that Jose is in the middle of all 10 suggestions in term of score and
> > frequency.  I then set city name as string field, and spell check again,
> I got
> > Van for San and Ross for Jos, which is weird because San is correct.
> >
> >
> > How do you setup spellchecker to spellcheck city names?  City name can
> have
> > multiple words.
> > Thanks.
> >
> >
> >
>
>
>
>
>


Re: Need help with spellcheck city name

2010-09-27 Thread Savannah Beckett
No, it doesn't work, I got weird result. I set my city name field to be parsed 
as a token as following:

    
  
    
    
  
  
    
    
  
    

I got following result for spellcheck:

 
-     
-         
              1 
              0 
              3 
-             
              swan 
          
      
- 
              1 
              4 
   8 
                
     clark 
 
      
  

 




From: Tom Hill 
To: solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 3:52:48 PM
Subject: Re: Need help with spellcheck city name

Maybe process the city name as a single token?

On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
 wrote:
> Hi,
>   I have city name as a text field, and I want to do spellcheck on it.  I use
> setting in http://wiki.apache.org/solr/SpellCheckComponent
>
> If I setup city name as text field and do spell check on "San Jos" for San 
>Jose,
> I get suggestion for Jos as "ojos".  I checked the extendedresult and I found
> that Jose is in the middle of all 10 suggestions in term of score and
> frequency.  I then set city name as string field, and spell check again, I got
> Van for San and Ross for Jos, which is weird because San is correct.
>
>
> How do you setup spellchecker to spellcheck city names?  City name can have
> multiple words.
> Thanks.
>
>
>



  

Re: Need help with spellcheck city name

2010-09-27 Thread Tom Hill
Maybe process the city name as a single token?

On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
 wrote:
> Hi,
>   I have city name as a text field, and I want to do spellcheck on it.  I use
> setting in http://wiki.apache.org/solr/SpellCheckComponent
>
> If I setup city name as text field and do spell check on "San Jos" for San 
> Jose,
> I get suggestion for Jos as "ojos".  I checked the extendedresult and I found
> that Jose is in the middle of all 10 suggestions in term of score and
> frequency.  I then set city name as string field, and spell check again, I got
> Van for San and Ross for Jos, which is weird because San is correct.
>
>
> How do you setup spellchecker to spellcheck city names?  City name can have
> multiple words.
> Thanks.
>
>
>


Need help with spellcheck city name

2010-09-27 Thread Savannah Beckett
Hi,
  I have city name as a text field, and I want to do spellcheck on it.  I use 
setting in http://wiki.apache.org/solr/SpellCheckComponent

If I setup city name as text field and do spell check on "San Jos" for San 
Jose, 
I get suggestion for Jos as "ojos".  I checked the extendedresult and I found 
that Jose is in the middle of all 10 suggestions in term of score and 
frequency.  I then set city name as string field, and spell check again, I got 
Van for San and Ross for Jos, which is weird because San is correct.  


How do you setup spellchecker to spellcheck city names?  City name can have 
multiple words.
Thanks.


  

DIH XML Entity Help (Newbie)

2010-09-27 Thread audev

I am trying to configure the data-config.xml using the XPathEntityProcessor
to index nested xml entities such as the following:

 
Drug
fentanyl sublingual spray
  
  
Other
questionnaire administration
  


The data-config.xml looks like this:

  



but it only indexes the first occurrence of  intervention_type_t and
intervention_name_t and they are placed as children of root entity instead
of being children of intervention.

I would appreciate your help!

Thanks in advance,

Aurelia
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-XML-Entity-Help-Newbie-tp1592723p1592723.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question Related to sorting on Date

2010-09-27 Thread Peter Sturge
Hi Ahson,

You'll really want to store an additional date field (make it a
TrieDateField type) that has only the date, and in the reverse order
from how you've shown it. You can still keep the one you've got, just
use it only for 'human viewing' rather than sorting.
Something like:
20080205  if your example is 5 Feb, or 20080502 for May 2nd.

This way, the parsing is most efficient, you won't have to do any
tricky parsing at sort time, and, when your index gets large, your
sorted searches will remain fast.




On Mon, Sep 27, 2010 at 7:45 PM, Ahson Iqbal  wrote:
> hi all
>
> I have a question related to sorting of date field i have Date field  that is
> indexed like a string and look like "5/2/2008 4:33:30 PM" i want  to do 
> sorting
> on this field on the basis of date, time does not  matters. any suggestion 
> how i
> could ignore the time part from this field  and just sort on the date?
>
>
>


Re: Is Solr right for my business situation ?

2010-09-27 Thread PeterKerk

Ah, totally looked over that news: spatial search in 3.x! :-D :-D

Any idea already when this will be released? 

Awesome to hear that it has been moved forward! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592448.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is Solr right for my business situation ?

2010-09-27 Thread Dennis Gearon
Wow, that is a relief!

I was going to have to look at ElasticSearch instead.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/27/10, Grant Ingersoll  wrote:

> From: Grant Ingersoll 
> Subject: Re: Is Solr right for my business situation ?
> To: solr-user@lucene.apache.org
> Date: Monday, September 27, 2010, 12:35 PM
> Inline.
> 
> On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
> 
> > When do you need to deploy?
> > 
> > As I understand it, the spatial search in Solr is
> being rewritten and is slated for Solr 4.0, the release
> after next.
> 
> It will be in 3.x, the next release
> 
> > 
> > The existing spatial search has some serious problems
> and is deprecated.
> > 
> > Right now, I think the only way to get spatial search
> in Solr is to deploy a nightly snapshot from the active
> development on trunk. If you are deploying a year from now,
> that might change.
> > 
> > There is not any support for SQL-like statements or
> for joins. The best practice for Solr is to think of your
> data as a single table, essentially creating a view from
> your database. The rows become Solr documents, the columns
> become Solr fields.
> 
> There is now group-by capabilities in trunk as well, which
> may or may not help.
> 
> > 
> > wunder
> > 
> > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra
> wrote:
> > 
> >> I am sure these kind of questions keep coming to
> you guys, but I want to raise the same question in a
> different context...my own business situation.
> >> I am very very new to solr and though I have tried
> to read through the documentation, I have nowhere near
> completing the whole read.
> >> 
> >> The need is like this - 
> >> 
> >> We have a huge rdbms database/table. A single
> table perhaps houses 100+ million rows. Though oracle is
> doing a fine job of handling the insertion and updation of
> data, the querying is where our main concerns lie. 
> Since we have spatial data, the index building takes hours
> and hours for such tables.
> >> 
> >> That's when we thought of moving away from
> standard rdbms and thought of trying something different and
> fast. 
> >> My last week has been spent in a journey reading
> through bigtable to hadoop to hbase, to hive and then
> finally landed on solr. As far as I am in my tests, it looks
> pretty good, but I have a few unanswered questions still.
> Trying this group for them  :)  (I am sure I can
> find some answers if I read/google more on the topic, but
> now I m being lazy and feel asking the people who are
> already using it/or perhaps developing it is a better bet).
> >> 
> >> 1. Can I get my solr instance to load data (fresh
> data for indexing) from a stream (imagine a mq kind of
> queue, or similar) ?
> 
> Yes, with a little bit of work.
> 
> >> 2. Can I host my solr instance to use hbase as the
> database/file system (read HDFS) ?
> 
> Probably, but I doubt it will be fast.  Local disk is
> usually the best.  100+ M rows is large but not
> unreasonable.
> 
> >> 3. are there somewhere any reports available (as
> in benchmarks ) for a solr instance's performance ? 
> 
> You can probably search the web for these.  I've
> personally seen several installs w/ 1B+ docs and subsecond
> search and faceting and heard of others.  You might
> look at the stuff the Hathi trust has put up.  
> 
> >> 4. are there any APIs available which might help
> me apply ANSI sql kind of statements to my solr data ? 
> 
> No.  Question back?  What kinds of things are you
> trying to do?
> 
> >> 
> >> It would be great if people could help share their
> experience in the area... if it's too much trouble writing
> all of it, perhaps url would be easier... I welcome all
> kinds of help here... any advice/suggestions are good ...
> >> 
> >> Looking forward to your viewpoints..
> >> 
> >> --raghav..
> >>
> **
> 
> >> This message may contain confidential or
> proprietary information intended only for the use of the 
> >> addressee(s) named above or may contain
> information that is legally privileged. If you are 
> >> not the intended addressee, or the person
> responsible for delivering it to the intended addressee, 
> >> you are hereby notified that reading,
> disseminating, distributing or copying this message is
> strictly 
> >> prohibited. If you have received this message by
> mistake, please immediately notify us by  
> >> replying to the message and delete the original
> message and any copies immediately thereafter. 
> >> 
> >> Thank you. 
> >>
> **
> 
> >> CLLD
> >> 
> > 
> > 
> > 
> > 
> 
> --
> Grant Ingersoll
> http://lucenerevolution.org Apache Lucene/Solr
> Conference, Boston Oct 7-8
> 
>


resources for relevancy score tuning

2010-09-27 Thread Luke Crouch
Can someone share some good resources (books, articles, links, etc.) for
tuning relevancy scores with multiple factors? I'm playing with different
fields and boosts in my 'qf', 'pf', and 'bf' defaults but I feel like I'm
shooting in the dark. http://wiki.apache.org/solr/SolrRelevancyCookbook has
a couple of individual tips, but I need some help devising a good
combination of boosts across multiple fields for scoring.

E.g., I want to tweak scoring derived from a primary identifier field, a
name field, a description field, a rating field, and a "number of downloads"
field. But it seems when I adjust any single factor, it affects too many
others.

Thanks,
-L


Re: Is Solr right for my business situation ?

2010-09-27 Thread PeterKerk

@Walter Underwood:

Walter Underwood wrote:
> 
> Right now, I think the only way to get spatial search in Solr is to deploy
> a nightly snapshot from the active development on trunk.
> 

Could you give me the link to this trunk, I need it very much!

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind
Right, I know, I was curious about it's current closeness to being in 
main distro, not a patch.  Among other things, when those who know 
better decide it goes in core distro, that makes me more comfortable 
that they've decided it works acceptably, and also makes more more 
comfortable that it will continue to be supported in _future_ versions 
without someone having to prepare a new patch.


Ravi Julapalli wrote:

Hi Jonathan,

Field collpasing is available in 1.4 by applying patch 
https://issues.apache.org/jira/browse/SOLR-236


-Ravi





From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Mon, September 27, 2010 9:18:20 PM
Subject: Re: Is Solr right for my business situation ?

Grant Ingersoll wrote:
  
There is now group-by capabilities in trunk as well, which may or may not 


help.
  
 

Really, the field collapsing stuff has been committed to trunk finally? Or are 
you talking about something else?


If it's the field collapsing stuff, and it's been committed to trunk, does that 
mean it'll be in the 3.0 release?


Jonathan

  
 




  
  


Re: Is Solr right for my business situation ?

2010-09-27 Thread Ravi Julapalli
Hi Jonathan,

Field collpasing is available in 1.4 by applying patch 
https://issues.apache.org/jira/browse/SOLR-236

-Ravi





From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Mon, September 27, 2010 9:18:20 PM
Subject: Re: Is Solr right for my business situation ?

Grant Ingersoll wrote:
> 
> There is now group-by capabilities in trunk as well, which may or may not 
help.
>  
Really, the field collapsing stuff has been committed to trunk finally? Or are 
you talking about something else?

If it's the field collapsing stuff, and it's been committed to trunk, does that 
mean it'll be in the 3.0 release?

Jonathan

>  


  

Re: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind

Grant Ingersoll wrote:


There is now group-by capabilities in trunk as well, which may or may not help.
  
Really, the field collapsing stuff has been committed to trunk finally? 
Or are you talking about something else?


If it's the field collapsing stuff, and it's been committed to trunk, 
does that mean it'll be in the 3.0 release?


Jonathan

  


Re: Is Solr right for my business situation ?

2010-09-27 Thread Grant Ingersoll
Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

> When do you need to deploy?
> 
> As I understand it, the spatial search in Solr is being rewritten and is 
> slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

> 
> The existing spatial search has some serious problems and is deprecated.
> 
> Right now, I think the only way to get spatial search in Solr is to deploy a 
> nightly snapshot from the active development on trunk. If you are deploying a 
> year from now, that might change.
> 
> There is not any support for SQL-like statements or for joins. The best 
> practice for Solr is to think of your data as a single table, essentially 
> creating a view from your database. The rows become Solr documents, the 
> columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

> 
> wunder
> 
> On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
> 
>> I am sure these kind of questions keep coming to you guys, but I want to 
>> raise the same question in a different context...my own business situation.
>> I am very very new to solr and though I have tried to read through the 
>> documentation, I have nowhere near completing the whole read.
>> 
>> The need is like this - 
>> 
>> We have a huge rdbms database/table. A single table perhaps houses 100+ 
>> million rows. Though oracle is doing a fine job of handling the insertion 
>> and updation of data, the querying is where our main concerns lie.  Since we 
>> have spatial data, the index building takes hours and hours for such tables.
>> 
>> That's when we thought of moving away from standard rdbms and thought of 
>> trying something different and fast. 
>> My last week has been spent in a journey reading through bigtable to hadoop 
>> to hbase, to hive and then finally landed on solr. As far as I am in my 
>> tests, it looks pretty good, but I have a few unanswered questions still. 
>> Trying this group for them  :)  (I am sure I can find some answers if I 
>> read/google more on the topic, but now I m being lazy and feel asking the 
>> people who are already using it/or perhaps developing it is a better bet).
>> 
>> 1. Can I get my solr instance to load data (fresh data for indexing) from a 
>> stream (imagine a mq kind of queue, or similar) ?

Yes, with a little bit of work.

>> 2. Can I host my solr instance to use hbase as the database/file system 
>> (read HDFS) ?

Probably, but I doubt it will be fast.  Local disk is usually the best.  100+ M 
rows is large but not unreasonable.

>> 3. are there somewhere any reports available (as in benchmarks ) for a solr 
>> instance's performance ? 

You can probably search the web for these.  I've personally seen several 
installs w/ 1B+ docs and subsecond search and faceting and heard of others.  
You might look at the stuff the Hathi trust has put up.  

>> 4. are there any APIs available which might help me apply ANSI sql kind of 
>> statements to my solr data ? 

No.  Question back?  What kinds of things are you trying to do?

>> 
>> It would be great if people could help share their experience in the area... 
>> if it's too much trouble writing all of it, perhaps url would be easier... I 
>> welcome all kinds of help here... any advice/suggestions are good ...
>> 
>> Looking forward to your viewpoints..
>> 
>> --raghav..
>> **
>>  
>> This message may contain confidential or proprietary information intended 
>> only for the use of the 
>> addressee(s) named above or may contain information that is legally 
>> privileged. If you are 
>> not the intended addressee, or the person responsible for delivering it to 
>> the intended addressee, 
>> you are hereby notified that reading, disseminating, distributing or copying 
>> this message is strictly 
>> prohibited. If you have received this message by mistake, please immediately 
>> notify us by  
>> replying to the message and delete the original message and any copies 
>> immediately thereafter. 
>> 
>> Thank you. 
>> **
>>  
>> CLLD
>> 
> 
> 
> 
> 

--
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8



Question Related to sorting on Date

2010-09-27 Thread Ahson Iqbal
hi all

I have a question related to sorting of date field i have Date field  that is 
indexed like a string and look like "5/2/2008 4:33:30 PM" i want  to do sorting 
on this field on the basis of date, time does not  matters. any suggestion how 
i 
could ignore the time part from this field  and just sort on the date?


  

Re: The search response time is too loong

2010-09-27 Thread Simon Willnauer
2010/9/27 newsam :
> I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the 
> response time is too long. Here is my scenario:
> 1. The index file is 8.2G. The doc num is 6110745.
> 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem.
>
> I used "Key:*" to query all records by localhost:8080. The response time is 
> 68703 milliseconds. The cpu load is 50% and mem useage is over 400M.

If you wanna get all records use q=*:* instead of Key:*  that should
give you faster results - way faster :)

Why are you actually requesting all results and how many of them are
you fetching? Maybe it would be a good idea to explain your usecase /
problem first.

simon

>
> Any comments are welcomed.
>
>
>


RE: bi-grams for common terms - any analyzers do that?

2010-09-27 Thread Burton-West, Tom
Hi Yonik,

>>If the new "autoGeneratePhraseQueries" is off, position doesn't matter, and 
>>the query will 
>>be treated as "index" OR "reader".

Just wanted to make sure, in Solr does autoGeneratePhraseQueries = "off" treat 
the query with the *default* query operator as set in SolrConfig rather than 
necessarily using the Boolean "OR" operator?

i.e.  if 
 and autoGeneratePhraseQueries = off 

then "IndexReader" -> "index"  "reader" -> "index" AND "reader"

Tom




RE: bi-grams for common terms - any analyzers do that?

2010-09-27 Thread Burton-West, Tom
Hi Jonathan,

>> I'm afraid I'm having trouble understanding   "if the analyzer returns more 
>> than one position back from a "queryparser token"

>>I'm not sure if "the queryparser forms a phrase query without explicit phrase 
>>quotes" is a problem for me, I had no idea it happened until now, never 
>>noticed, and still don't really understand in what circumstances it happens.

The problem I had was for a Boolean query "l'art AND historie" that the 
WordDelimiterFilter tokenized "l'art"  as two tokens "l" at position 1 and 
"art" at position 2.   So the queryparser decided this means a phrase query for 
"l" followed immediately by "art".  See
http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance 
for details.  

This would happen whenever any token filter split a token into more than one 
token.  For example a filter that splits foo-bar into "foo" "bar".  The 
exception is  SynonymFilter or something like it.  In the case of 
SynonymFilter, its not really a case of "splitting" one token into multiple 
tokens, but given one token of input, it outputs all the synonyms of the term.  
However all the tokens have the same position attribute. (see: 
http://www.lucidimagination.com/search/document/CDRG_ch05_5.6.19?q=synonym%20filter)

 So for example for the string "the small thing"  if you had a synonym list for 
small:
small=>tiny,teeny"

input:
postion|1   |2|3
token  |the |small|thing
Would output

postion|1   |2|2|2|3
token  |the |small| tiny|teeny|thing

In this case when the queryParser gets back "small teeny tiny"  since they have 
the same position, they are not turned into a phrase query.

for "l'art"

input
postion|1 
token  |l'art

output
postion|1|2 
token  |l|art
In this case there are two tokens with different positions so it treats them as 
a phrase query.

Tom Burton-West


Re: Is Solr right for my business situation ?

2010-09-27 Thread Walter Underwood
When do you need to deploy?

As I understand it, the spatial search in Solr is being rewritten and is slated 
for Solr 4.0, the release after next.

The existing spatial search has some serious problems and is deprecated.

Right now, I think the only way to get spatial search in Solr is to deploy a 
nightly snapshot from the active development on trunk. If you are deploying a 
year from now, that might change.

There is not any support for SQL-like statements or for joins. The best 
practice for Solr is to think of your data as a single table, essentially 
creating a view from your database. The rows become Solr documents, the columns 
become Solr fields.

wunder

On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:

> I am sure these kind of questions keep coming to you guys, but I want to 
> raise the same question in a different context...my own business situation.
> I am very very new to solr and though I have tried to read through the 
> documentation, I have nowhere near completing the whole read.
> 
> The need is like this - 
> 
> We have a huge rdbms database/table. A single table perhaps houses 100+ 
> million rows. Though oracle is doing a fine job of handling the insertion and 
> updation of data, the querying is where our main concerns lie.  Since we have 
> spatial data, the index building takes hours and hours for such tables.
> 
> That's when we thought of moving away from standard rdbms and thought of 
> trying something different and fast. 
> My last week has been spent in a journey reading through bigtable to hadoop 
> to hbase, to hive and then finally landed on solr. As far as I am in my 
> tests, it looks pretty good, but I have a few unanswered questions still. 
> Trying this group for them  :)  (I am sure I can find some answers if I 
> read/google more on the topic, but now I m being lazy and feel asking the 
> people who are already using it/or perhaps developing it is a better bet).
> 
> 1. Can I get my solr instance to load data (fresh data for indexing) from a 
> stream (imagine a mq kind of queue, or similar) ?
> 2. Can I host my solr instance to use hbase as the database/file system (read 
> HDFS) ?
> 3. are there somewhere any reports available (as in benchmarks ) for a solr 
> instance's performance ? 
> 4. are there any APIs available which might help me apply ANSI sql kind of 
> statements to my solr data ? 
> 
> It would be great if people could help share their experience in the area... 
> if it's too much trouble writing all of it, perhaps url would be easier... I 
> welcome all kinds of help here... any advice/suggestions are good ...
> 
> Looking forward to your viewpoints..
> 
> --raghav..
> **
>  
> This message may contain confidential or proprietary information intended 
> only for the use of the 
> addressee(s) named above or may contain information that is legally 
> privileged. If you are 
> not the intended addressee, or the person responsible for delivering it to 
> the intended addressee, 
> you are hereby notified that reading, disseminating, distributing or copying 
> this message is strictly 
> prohibited. If you have received this message by mistake, please immediately 
> notify us by  
> replying to the message and delete the original message and any copies 
> immediately thereafter. 
> 
> Thank you. 
> **
>  
> CLLD
> 






Is Solr right for my business situation ?

2010-09-27 Thread Sharma, Raghvendra
I am sure these kind of questions keep coming to you guys, but I want to raise 
the same question in a different context...my own business situation.
I am very very new to solr and though I have tried to read through the 
documentation, I have nowhere near completing the whole read.

The need is like this - 

We have a huge rdbms database/table. A single table perhaps houses 100+ million 
rows. Though oracle is doing a fine job of handling the insertion and updation 
of data, the querying is where our main concerns lie.  Since we have spatial 
data, the index building takes hours and hours for such tables.

That's when we thought of moving away from standard rdbms and thought of trying 
something different and fast. 
My last week has been spent in a journey reading through bigtable to hadoop to 
hbase, to hive and then finally landed on solr. As far as I am in my tests, it 
looks pretty good, but I have a few unanswered questions still. Trying this 
group for them  :)  (I am sure I can find some answers if I read/google more on 
the topic, but now I m being lazy and feel asking the people who are already 
using it/or perhaps developing it is a better bet).

1. Can I get my solr instance to load data (fresh data for indexing) from a 
stream (imagine a mq kind of queue, or similar) ?
2. Can I host my solr instance to use hbase as the database/file system (read 
HDFS) ?
3. are there somewhere any reports available (as in benchmarks ) for a solr 
instance's performance ? 
4. are there any APIs available which might help me apply ANSI sql kind of 
statements to my solr data ? 

It would be great if people could help share their experience in the area... if 
it's too much trouble writing all of it, perhaps url would be easier... I 
welcome all kinds of help here... any advice/suggestions are good ...

Looking forward to your viewpoints..

--raghav..
**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD



Is Solr right for our project?

2010-09-27 Thread Mike Thomsen
(I apologize in advance if I missed something in your documentation,
but I've read through the Wiki on the subject of distributed searches
and didn't find anything conclusive)

We are currently evaluating Solr and Autonomy. Solr is attractive due
to its open source background, following and price. Autonomy is
expensive, but we know for a fact that it can handle our distributed
search requirements perfectly.

What we need to know is if Solr has capabilities that match or roughly
approximate Autonomy's Distributed Search Handler. What it does it
acts as a front-end for all of Autonomy's IDOL search servers (which
correspond in this scenario to Solr shards). It is configured to know
what is on each shard, which servers hold each shard and intelligently
farms out queries based on that configuration. There is no need to
specify which IDOL servers to hit while querying; the DiSH just knows
where to go. Additionally, I believe in cases where an index piece is
mirrored, it also monitors server health and falls back intelligently
on other backup instances of a shard/index piece based on that.

I'd appreciate it if someone can give me a frank explanation of where
Solr stands in this area.

Thanks,

Mike


Re: urgent SOLR query server request hangs

2010-09-27 Thread Yonik Seeley
On Mon, Sep 27, 2010 at 11:09 AM, Bharat Jain  wrote:
>   We are running into issues with SOLR queries. Our solr queries just hang.

Are you perhaps using distributed search and accidentally set up an
infinite loop?
Do *not* configure a default "shards" param on your /select handler.

Other than that - you'll need to get some thread dumps from Solr to
see why it's hanging, and provide an example of what requests you are
sending.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


urgent SOLR query server request hangs

2010-09-27 Thread Bharat Jain
Hi,
   We are running into issues with SOLR queries. Our solr queries just hang.
We are using SOLR 1.3 and below is the stack trace from threaddump. We are
clueless about what can be causing this issue. We are in the midst of
firefighting with our customer and any help is appreciated. Thanks,Bharat

"TP-Processor113" daemon prio=3 tid=0x071c3400 nid=0x134
runnable [0xfd7ed72a..0xfd7ed72a3920]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0xfd7f26c1caf0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
- locked <0xfd7f2a260c50> (a 
sun.net.www.protocol.http.HttpURLConnection)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
at 
com.xxx..search.solr.SolrSearchServiceImpl.query(SolrSearchServiceImpl.java:271)
at com.xxx..search.Searchable.query(Searchable.java:460)
at 
com.xxx..search.JobReqSearchObject.query(JobReqSearchObject.java:903)




Thanks
Bharat Jain


Re: Re:The search response time is too loong

2010-09-27 Thread Timothy Potter
Also, how many rows are you requesting at one time? I've seen cases where
the query time is blazing fast and the response writing is terribly slow
because of too many documents being sent in the response.

On Mon, Sep 27, 2010 at 6:37 AM, kenf_nc  wrote:

>
> "mem usage is over 400M", do you mean Tomcat mem size? If you don't give
> your
> cache sizes enough room to grow you will choke the performance. You should
> adjust your Tomcat settings to let the cache grow to at least 1GB or better
> would be 2GB. You may also want to look into
> http://wiki.apache.org/solr/SolrCaching warming the cache  to make the
> first
> time call a little faster.
>
> For comparison, I also have about 8GB in my index but only 2.8 million
> documents. My search query times on a smaller box than you specify are 6533
> milliseconds on an unwarmed (newly rebooted) instance.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Re:The search response time is too loong

2010-09-27 Thread kenf_nc

"mem usage is over 400M", do you mean Tomcat mem size? If you don't give your
cache sizes enough room to grow you will choke the performance. You should
adjust your Tomcat settings to let the cache grow to at least 1GB or better
would be 2GB. You may also want to look into 
http://wiki.apache.org/solr/SolrCaching warming the cache  to make the first
time call a little faster. 

For comparison, I also have about 8GB in my index but only 2.8 million
documents. My search query times on a smaller box than you specify are 6533
milliseconds on an unwarmed (newly rebooted) instance. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Concurrent DB updates and delta import misses few records

2010-09-27 Thread Shawn Heisey
 You could get it from Solr, yes.  That didn't even occur to me because 
when I was designing my scripts, I didn't yet have a fully integrated 
Solr index. :)  With hindsight, I still wouldn't get it from Solr.  I 
would lose some flexibility and ease of administration.


It's certainly possible to store all build-related tracking information 
in the database.  The build system for our old search product did it 
that way.  I decided to go with simple text files in an NFS-mounted 
directory for the rewrite.  It's easier for me to administer, just ssh 
to a server and examine or modify simple one-line text files.  On the 
script side, the files get read into a Perl hash.  With the old system, 
I found it cumbersome to go through the database interfaces.  The only 
thing that's still in the database is the delete table, because it is 
populated by triggers on the metadata table.





On 9/23/2010 12:48 AM, Shashikant Kore wrote:

Thanks for the pointer, Shawn.  It, definitely, is useful.

I am wondering if you could retrieve minDid from the solr rather than
storing it externally. Max id from Solr index and max id from DB should
define the lower and upper thresholds, respectively, of the delta range. Am
I missing something?




Multi-lingual auto-complete?

2010-09-27 Thread Andy
I want to provide auto-complete to users when they're inputting tags. The 
auto-complete tag suggestions would be based on tags that are already in the 
system.

Multiple tags are separated by commas. A single tag could contain multiple 
words such as "Apple computer".

One issue is that a tag could be in multiple languages, including both 
languages (e.g. English, French) that use whitespace as word separator and 
languages that don't (e.g. CJK)

An example of such a multi-lingual tag is "Apple 电脑".

If a user types "apple", I'd like the autocomplete suggestions to include both 
"Apple computer" (ie. matches are case insensitive) and "green apple" (ie. 
matches aren't restricted to prefixes). And a user typing "电脑" should match 
"Apple 电脑".

Is it possible to do that? I read the article:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

In that article KeywordTokenizerFactor is used. If I changed it to CJKTokenizer 
would that work? 

With an input of "Apple 电脑", what would CJKTokenizer produce?

-is it "Apple", "电", "脑" ?
or
- is it "A", "p", "p", "l", "e", "电", "脑" ?

Any help would be greatly appreciated.

Andy





Re: Solr UIMA integration

2010-09-27 Thread maheshkumar

Hi Tommaso,

All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator,
Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn

AlchemyAPIAnnotator:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator
OpenCalaisAnnotator:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator
Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger
WhitespaceTokenizer:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer

solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima

I am using the the latest Solr version checkout from svn i guess it is
greater than 1.4.1.

Tommaso, is it possible for you to upload all the dependency jar @
http://code.google.com/p/solr-uima/downloads/list.

Thanks
Mahesh




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TokenFilter that removes payload ?

2010-09-27 Thread Teruhiko Kurosaka
Robert & Erik,
I appreciate your suggestions but we use Type for other purpose.
Also, the product is out and we can't change the design so easily.

So it seems the conclusion there is no such TokenFilter.
I'll write one.

Thanks.

On Sep 27, 2010, at 1:00 PM, Robert Muir wrote:

> On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka wrote:
> 
>> 
>> As I understand it, payloads go to the Lucene index.
>> In most cases, the part-of-speech tags are not used if
>> retrieved by the search applications.  So they shouldn't
>> go to the index.  So I'd like to know if there is an
>> existing TokenFilter that does this.  Otherwise, I'd like
>> to write one.
>> 
> 
> I agree with Erick, I think a better approach would be to put the part of
> speech tags into another attribute.
> 
> For example, you can put them in TypeAttribute, which is not stored in the
> index by default.
> Then, if the user wants to store them in the index, they just add
> TypeAsPayloadTokenFilterFactory, which copies the type into the payload...
> but otherwise they would not be stored.
> 
> -- 
> Robert Muir
> rcm...@gmail.com


T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)





RE: spellcheck on multiple fields?

2010-09-27 Thread Markus Jelsma
You can use copyField to get multiple fields in the field you use for spell 
checking, don't forget to set it to multiValued. 
 
-Original message-
From: Savannah Beckett 
Sent: Mon 27-09-2010 10:08
To: solr-user@lucene.apache.org; 
Subject: spellcheck on multiple fields?

Is it possible to do spellcheck on multiple fields in my solr index?  If so, 
how?  The following setup works for only one field:
    

  default
  solr.IndexBasedSpellChecker
  myfield
  ./spellchecker1
  0.5
  true
    


Thanks.


      

Re:The search response time is too loong

2010-09-27 Thread newsam
We used SOLR 1.4. All queries were excuted in SOLR back-end. I guess that I/O 
operations consume the time too much.

>From: "newsam" 
>Reply-To: solr-user@lucene.apache.org"newsam" 
>To: solr-user@lucene.apache.org
>Subject: Re:The search response time is too loong
>Date: Mon, 27 Sep 2010 16:05:49 +0800
>
>I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the 
>response time is too long. Here is my scenario:
>1. The index file is 8.2G. The doc num is 6110745.
>2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem.
>
>I used "Key:*" to query all records by localhost:8080. The response time is 
>68703 milliseconds. The cpu load is 50% and mem useage is over 400M.
>
>Any comments are welcomed.
>
>
> 

The search response time is too loong

2010-09-27 Thread newsam
I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the response 
time is too long. Here is my scenario:
1. The index file is 8.2G. The doc num is 6110745.
2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem.

I used "Key:*" to query all records by localhost:8080. The response time is 
68703 milliseconds. The cpu load is 50% and mem useage is over 400M.

Any comments are welcomed.




spellcheck on multiple fields?

2010-09-27 Thread Savannah Beckett
Is it possible to do spellcheck on multiple fields in my solr index?  If so, 
how?  The following setup works for only one field:
    

  default
  solr.IndexBasedSpellChecker
  myfield
  ./spellchecker1
  0.5
  true
    


Thanks.