Shard Query in Solrsharp

2010-09-30 Thread Maddy.Jsh

Hi,

I have been using solrsharp to integrate solr in my project. Everything was
going fine until I tried to incorporate shard query.
I tested the shard query using the browser and everything went fine. I tried
to do the same in solrsharp by adding the following line 

queryBuilder.AddSearchParameter(shards, @solrserver:8983/solr/Core1);

When running the app I get the following error in logs. Can someone help me
with this?


Sep 30, 2010 9:37:19 AM org.apache.solr.core.SolrCore execute
INFO: [Core1] webapp=/solr path=/select/
params={shards=solrserver:8983/solr/Core1fl=idsort=id+ascdebugQuery=falsestart=0q=text:*rows=2147483647}
status=500 QTime=0 
Sep 30, 2010 9:37:19 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NegativeArraySizeException
at 
org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:89)
at
org.apache.solr.handler.component.ShardFieldSortedHitQueue.init(ShardDoc.java:110)
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:393)
at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:298)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Shard-Query-in-Solrsharp-tp1606768p1606768.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: WordDelimiterFilter combined with PositionFilter

2010-09-30 Thread Mathias Walter
Hi Robert,

 On Fri, Sep 24, 2010 at 3:54 AM, Mathias Walter mathias.wal...@gmx.netwrote:
 
  Hi,
 
  I'm combined the WordDelimiterFilter with the PositionFilter to prevent the
  creation of expensive Phrase and MultiPhraseQueries. But
  if I now parse an escaped string consisting of two terms, the analyser
  returns a BooleanQuery. That's not what I would expect. If a
  string is escaped, I would expect a PhraseQuery and not a BooleanQuery.
 
  What should be the correct behavior?
 
 
 instead of PositionFilter, you can upgrade to either trunk or branch_3x from
 svn, and use:
 
 fieldType name=text class=solr.TextField positionIncrementGap=100
 autoGeneratePhraseQueries=false
 
 then you will get phrase queries when the user asked for them, but not
 automatically.

Are term vector positions still correctly computed if positionIncrementGap is 
used?

--
Kind regards,
Mathias



Re: DataImportHandler dynamic fields clarification

2010-09-30 Thread David Stuart
Two things, one are your DB column uppercase as this would effect the out.

Second what does your db-data-config.xml look like

Regards,

Dave

On 30 Sep 2010, at 03:01, harrysmith wrote:

 
 Looking for some clarification on DIH to make sure I am interpreting this
 correctly.
 
 I have a wide DB table, 100 columns. I'd rather not have to add 100 values
 in schema.xml and data-config.xml. I was under the impression that if the
 column name matched a dynamic Field name, it would be added. I am not
 finding this is the case, but only works when the column name is explicitly
 listed as a static field.
 
 Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100'
 
 If I add something like:
 field name=column_60  type=string  indexed=true  stored=true/
 to schema.xml, and don't reference the column in data-config entity/field
 tag, it gets imported, as expected.
 
 However, if I use:
 dynamicField name=column_*  type=string  indexed=true 
 stored=true/
 It does not get imported into Solr, I would expect it would.
 
 
 Is this the expected behavior?
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr Cluster Indexing data question

2010-09-30 Thread ZAROGKIKAS,GIORGOS
Hi there solr experts


I have an solr cluster with two nodes  and separate index files for each
node

Node1 is master 
Node2 is slave 


Node1 is the one that I index my data and replicate them to Node2

How can I index my data at both nodes simultaneously ?
Is there any specific setup 


The problem is when my Node1 is down and I index the data from Node2 ,
Solr creates backup index folders like this index.20100929060410
and reduce the space of my hard disk

Thanks in advance








General hardware requirements?

2010-09-30 Thread Nicholas Swarr

Our index is about 10 gigs in size with about 3 million documents.  The 
documents range in size from dozens to hundreds of kilobytes.  Per week, we 
only get about 50k queries.
Currently, we use lucene and have one box for our indexer that has 32 gigs of 
memory and an 8 core CPU.  We have a pair of search boxes that have about 16 
gigs of ram a piece and 8 core CPUs.  They hardly break a sweat.
We're looking to adopt Solr.  Should we consider changing our configuration at 
all?  Are there any other hardware considerations for adopting Solr?
Thanks,Nick   

Re: General hardware requirements?

2010-09-30 Thread Gora Mohanty
On Thu, Sep 30, 2010 at 8:09 PM, Nicholas Swarr nsw...@hotmail.com wrote:

 Our index is about 10 gigs in size with about 3 million documents.  The 
 documents range in size from dozens to hundreds of kilobytes.  Per week, we 
 only get about 50k queries.
 Currently, we use lucene and have one box for our indexer that has 32 gigs of 
 memory and an 8 core CPU.  We have a pair of search boxes that have about 16 
 gigs of ram a piece and 8 core CPUs.  They hardly break a sweat.
 We're looking to adopt Solr.  Should we consider changing our configuration 
 at all?  Are there any other hardware considerations for adopting Solr?
[...]

On the face of it, your machines should easily be adequate for the
the search volume you are looking at. However, there are other things
that you should consider:
* How are you indexing? What are acceptable times for this?
* Are there any new Solr-specific features that you are considering
  using, e.g., faceting? What performance benchmarks are you looking
  to achieve?
* What is your front-end for the search? Where is it running?

Regards,
Gora


Multiple Indexes and relevance ranking question

2010-09-30 Thread Valli Indraganti
I an new to Solr and the search technologies. I am playing around with
multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
so that two solr webapps listen on port 8080 in tomcat. I have created two
separate indexes using each webapp successfully.

My documents are very primitive. Below is the structure. I have four such
documents with different doc id and increased number of the word Hello
corresponding to the name of the document (this is only to make my analysis
of the results easier). Documents One and two are in shar1 and three and
four are in shard 2. obviously, document two is ranked higher when queried
against that index (for the word Hello). And document four is ranked higher
when queried against second index. When using the shards, parameter, the
scores remain unaltered.
My question is, if the distributed search does not consider IDF, how is it
able to rank these documents correctly? Or do I not have the indexes truely
distributed? Is something wrong with my term distribution?

add
 - # doc
   field name=*id*Valli1/field
   field name=*name*One/field
   field name=*text*Hello!This is a test document testing relevancy
scores./field
  /doc
/add


Re: Solr Cluster Indexing data question

2010-09-30 Thread Jak Akdemir
If you want to use both of your nodes for building index (which means
two master), it makes them unified and collapses master slave
relation.

Would you take a look the link below for index snapshot problem?
http://wiki.apache.org/solr/SolrCollectionDistributionScripts

On Thu, Sep 30, 2010 at 11:03 AM, ZAROGKIKAS,GIORGOS
g.zarogki...@multirama.gr wrote:
 Hi there solr experts


 I have an solr cluster with two nodes  and separate index files for each
 node

 Node1 is master
 Node2 is slave


 Node1 is the one that I index my data and replicate them to Node2

 How can I index my data at both nodes simultaneously ?
 Is there any specific setup 


 The problem is when my Node1 is down and I index the data from Node2 ,
 Solr creates backup index folders like this index.20100929060410
 and reduce the space of my hard disk

 Thanks in advance









can i have more update processors with solr

2010-09-30 Thread Dilshod Temirkhodjaev
I don't know if this is bug or not, but when i'm writing this in
solrconfig.xml

requestHandler name=/update class=solr.XmlUpdateRequestHandler
  lst name=defaults
str name=update.processorCustomRank/str
str name=update.processordedupe/str
  /lst
/requestHandler

only first update.processor works, why second is not working?


Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Renee Sun

Hi -
I posted this problem but no response, I guess I need to post this in the
Solr-User forum. Hopefully you will help me on this.

We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr
1.4, then when we start the Solr, it took about 45 minutes. The catalina.log
shows Solr is very slowly loading all the cores.

We did optimize, did not help at all.

I run JConsole to monitor the memory. I notice the first 70 cores were
loaded pretty fast, like in 3,4 minutes.

But after that, the memory went all way up to about 15GB (we allocated 16GB
to solr), and it slows down
right there, slower and slower. We use concurrent GC. JConsole shows only
ParNew GCs kicked off, but it doesnt bring down the memory.

With Solr 1.3, all 130 cores loaded in 5,6 minutes. 

Please let me know if there is known memory issue with Solr 1.4. Or is there
something (configuration)
we need to tweak to make it work efficiently in 1.4?

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1608728.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is Solr right for my business situation ?

2010-09-30 Thread Dennis Gearon
You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra sraghven...@corelogic.com wrote:

 From: Sharma, Raghvendra sraghven...@corelogic.com
 Subject: RE: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 29, 2010, 9:40 AM
 Some questions.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 Do you think having multiple indexes could be a solution
 for this case ?? or do I really need to spend effort in
 denormalizing the data ?
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 --raghav..
 
 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?
 
 Thanks for the responses people.
 
 @Grant  
 
 1. can you show me some direction on that.. loading data
 from an incoming stream.. do I need some third party tools,
 or need to build something myself...
 
 4. I am basically attempting to build a very fast search
 interface for the existing data. The volume I mentioned is
 more like static one (data is already there). The sql
 statements I mentioned are daily updates coming. The good
 thing is that the history is not there, so the overall
 volume is not growing, but I need to apply the update
 statements. 
 
 One workaround I had in mind is, (though not so great
 performance) is to apply the updates to a copy of rdbms, and
 then feed the rdbms extract to solr.  Sounds like
 overkill, but I don't have another idea right now. Perhaps
 business discussions would yield something.
 
 @All -
 
 Some more questions guys.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 Looks like I m close to my solution.. :)
 
 --raghav
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 
 Sent: Tuesday, September 28, 2010 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is Solr right for my business situation ?
 
 Inline.
 
 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
 
  When do you need to deploy?
  
  As I understand it, the spatial search in Solr is
 being rewritten and is slated for Solr 4.0, the release
 after next.
 
 It will be in 3.x, the next release
 
  
  The existing spatial search has some serious problems
 and is deprecated.
  
  Right now, I think the only way to get spatial search
 in Solr is to deploy a nightly snapshot from the active
 development on trunk. If you are deploying a year from now,
 that might 

Re: spatial sorting

2010-09-30 Thread dan sutton
Hi All,

This is more of an FYI for those wanting to filter and sort by distance, and
have the values returned in the result set after determining a way to do
this with existing code.

Using solr 4.0 an example query would contain the following parameters:

/select?

q=stevenage^0.0
+_val_:ghhsin(6371,geohash(52.0274,-0.4952),location)^1.0

Make the boost on all parts of the query other than the ghhsin distance
value function 0 ,and 1 on the function, this is so that the score is then
equal to the distance. (52.0274,-0.4952) here is the query point and
'location' is the geohash field to search against



sort=score asc

basically sort by distance asc (closest first)



fq={!sfilt%20fl=location}pt=52.0274,-0.4952d=30

This is the spatial filter to limit the necessary distance calculations.



fl=*,score

Return all fields (if required) but include the score (which contains the
distance calculation)


Does anyone know if it's possible to return the distance and score
separately?  I know there has been a patch to sort by value function, but
how can one return the values from this?

Cheers,
Dan


On Fri, Sep 17, 2010 at 2:45 PM, dan sutton danbsut...@gmail.com wrote:

 Hi,

 I'm trying to filter and sort by distance with this URL:


 http://localhost:8080/solr/select/?q=*:*fq={!sfilt%20fl=loc_lat_lon}pt=52.02694,-0.49567d=2sort={!func}hsin(52.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205)http://localhost:8080/solr/select/?q=*:*fq=%7B%21sfilt%20fl=loc_lat_lon%7Dpt=52.02694,-0.49567d=2sort=%7B%21func%7Dhsin%2852.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205%29asc

 Filtering is fine but it's failing in parsing the sort with :

 The request sent by the client was syntactically incorrect (can not sort
 on undefined field or function: {!func}(52.02694,-0.49567,loc_lat_lon_0_d,
 loc_lat_lon_1_d, 3963.205)).*

 *I'm using the solr/lucene trunk to try this out ... does anyone know what
 is wrong with the syntax?

 Additionally am I able to return the distance sort values e.g. with param
 fl ? ... else am I going to have to either write my own component (which
 would also look up the filtered cached values rather than re-calculating
 distance) or use an alternative like localsolr ?

 Dan



Re: Memory usage

2010-09-30 Thread Jeff Moss
There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
not sure which is most likely to cause an impact. We're sorting on a dynamic
field there are about 1000 different variants of this field that look like
priority_sort_for_client_id, which is an integer field. I've heard that
sorting can have a big impact on memory consumption, could that be it?

How do I find out about the number of unique words in a field? These aren't
very large documents but there is a text field that contains user input,
which may be html and javascript so lots of symbols in there.

Thanks,

-Jeff

On Wed, Sep 29, 2010 at 9:07 PM, Lance Norskog goks...@gmail.com wrote:

 How many documents are there? How many unique words are in a text
 field? Both of these numbers can have a non-linear effect on the
 amount of space used.

 But, usually a 22Gb index (on disk) might need 6-12G of ram total.
 There is something odd going on here.

 Lance

 On Wed, Sep 29, 2010 at 4:34 PM, Jeff Moss jm...@heavyobjects.com wrote:
  My server has 128GB of ram, the index is 22GB large. It seems the memory
  consumption goes up on every query and the garbage collector will never
 free
  up as much memory as I expect it to. The memory consumption looks like a
  curve, it eventually levels off but the old gen is always 60 or 70GB. I
 have
  tried adjusting the cache settings but it doesn't seem to make any
  difference.
 
  Is there something I'm doing wrong or is this expected behavior?
 
  Here is a screenshot of what I see in jconsole after running for a few
  minutes:
  http://i51.tinypic.com/2qntca1.png
 
  Here is a 24 hour period of the same data taken from a custom jmx
 monitor:
  http://i51.tinypic.com/2vcu9u8.png
 
  The server performs pretty much as good at the beginning of this cycle as
 it
  does at the end so all of this memory accumulation seems to not be doing
  anything useful.
 
  I am running the 1.4 war but I was having this problem with 1.3 also.
 Tomcat
  6.0.18, Java 1.6.0. I haven't gone as far as doing any memory profiling
 or
  java debugging because I'm inexperienced, but that will be the next thing
 I
  try. Any help would be appreciated.
 
  Thanks,
 
  -Jeff
 



 --
 Lance Norskog
 goks...@gmail.com



Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread webdev1977

I have been reading through all the jira issues and patches, as well as the
wikis and I still have a few things that are not clear to me. 

I am currently running with Solr 1.4.1 and using Nutch for my crawling. 
Everything is working great, I am using a Nutch plugin to add lat long
information, I just don't know if it is possible to do what I am wanting to
do. 

1.  I noticed that it said that the type of LatLongType can not be
mulitvalued. Does that mean that I can not have multiple lat/lon values for
one document.  If so, that would be quite a limitation.  I have an average
of 10 geotags per document. 

2. Is LocalSolr and SpatialSearch the same thing?   

3. If I did want to use the LatLonType with the BBOX filter,  where would I
go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have
to go to an entirely different dev version of Solr? 

Thanks for your input!!! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Local-Solr-Spatial-Search-and-LatLonType-clarification-tp1609570p1609570.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler dynamic fields clarification

2010-09-30 Thread harrysmith


Two things, one are your DB column uppercase as this would effect the out.



Interesting, I was under the impression that case does not matter. 

From http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config :
It is possible to totally avoid the field entries in entities if the names
of the fields are same (case does not matter) as those in Solr schema

I confirmed that matching the schema.xml field case to the database table is
needed for dynamic fields, and the wiki statement above is incorrect, or at
the very least confusing, possibly a bug.

My database is Oracle 10g and the column names have been created in all
uppercase in the database. 

In Oracle: 
Table name: wide_table
Column names: COLUMN_1 ... COLUMN_100 (yes, uppercase)

Please see following scenarios and results I found:

data-config.xml
entity name=item query=select column_1,column_100 from wide_table
field column=column_100 name=id/
/entity

schema.xml
dynamicField name=column_*  type=string  indexed=true  stored=true
multiValued=true /

Result:
Nothing Imported

=

data-config.xml
entity name=item query=select COLUMN_1,COLUMN_100 from wide_table
field column=column_100 name=id/
/entity

schema.xml
dynamicField name=column_*  type=string  indexed=true  stored=true
multiValued=true /

Result:
Note query column names changed to uppercase.
Nothing Imported

=


data-config.xml
entity name=item query=select column_1,column_100 from wide_table
field column=COLUMN_100 name=id/
/entity

schema.xml
dynamicField name=column_*  type=string  indexed=true  stored=true
multiValued=true /

Result:
Note ONLY the field entry was changed to caps

All records imported, with only COLUMN_100 id field.



data-config.xml
entity name=item query=select column_1,column_100 from wide_table
field column=COLUMN_100 name=id/
/entity

schema.xml
dynamicField name=COLUMN_*  type=string  indexed=true  stored=true
multiValued=true /

Result:
Note BOTH the field entry was changed to caps in data-config.xml, and the
dynamicField wildcard in schema.xml

All records imported, with all fields specified. This is the behavior
desired.

=









 










Second what does your db-data-config.xml look like 



The relevant data-config.xml is as follows:

document name=
entity name=item query=select COLUMN_1,COLUMN_100 from wide_table
 field column=COLUMN_100 name=id/
/entity
/document

Ideally, I would rather have the query be 'select * from wide_table with
the fields being dynamically matched by the column name from the
dynamicField wildcard from the schema.xml.

dynamicField name=COLUMN_*  type=string  indexed=true stored=true/ 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1609578.html
Sent from the Solr - User mailing list archive at Nabble.com.


Automatic xslt to responses ??

2010-09-30 Thread Sharma, Raghvendra
Is there a way to specify a xslt at the server side, and make it default, i.e. 
whenever a response is returned, that xslt is applied to the response 
automatically...
**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD


Faster loading to solr...

2010-09-30 Thread Sharma, Raghvendra
I have been able to load around a million rows/docs in around 5+ minutes.  The 
schema contains around 250+ fields.  For the moment, I have kept everything as 
string. 
I am sure there are ways to get better loading speeds than this.

Will the data type matter in loading speeds ?? or anything else ?

Can someone help me with any tips ? perhaps any best practices  kind of 
document/article..
Anything ..

--raghav..

**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD


Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 1:09 PM, webdev1977 webdev1...@gmail.com wrote:
 1.  I noticed that it said that the type of LatLongType can not be
 mulitvalued. Does that mean that I can not have multiple lat/lon values for
 one document.

That means that if you want to have multiple points per document, each
point must be in a different field.
This often makes sense anyway, when the points have different
semantics - i.e. work and home locations.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


RE: General hardware requirements?

2010-09-30 Thread Nicholas Swarr


I think the indexing will be fine.  We are looking to use multi-select 
faceting, spelling suggestions, and highlighting to name a few.  On the front 
end (and on separate machines) are .NET web applications that issue queries via 
HTTP requests to our searchers.
I can't think of anything else that will require extra processing.  Thanks for 
bringing those considerations to my attention.  Is there anything there that 
significantly impacts the hardware needs?

 

-Original Message-

From: Gora Mohanty [mailto:g...@mimirtech.com] 

Sent: Thursday, September 30, 2010 10:47 AM

To: solr-user@lucene.apache.org

Subject: Re: General hardware requirements?

 

On Thu, Sep 30, 2010 at 8:09 PM, Nicholas Swarr
nsw...@hotmail.com wrote:

 

 Our index is about 10 gigs in size with about 3
million documents.  The documents range in size from dozens to hundreds of
kilobytes.  Per week, we only get about 50k queries.

 Currently, we use lucene and have one box for our
indexer that has 32 gigs of memory and an 8 core CPU.  We have a pair of
search boxes that have about 16 gigs of ram a piece and 8 core CPUs.  They
hardly break a sweat.

 We're looking to adopt Solr.  Should we
consider changing our configuration at all?  Are there any other hardware
considerations for adopting Solr?

[...]

 

On the face of it, your machines should easily be
adequate for the

the search volume you are looking at. However, there are
other things

that you should consider:

* How are you indexing? What are acceptable times for
this?

* Are there any new Solr-specific features that you are
considering

  using, e.g.,
faceting? What performance benchmarks are you looking

  to achieve?

* What is your front-end for the search? Where is it
running?

 

Regards,

Gora

  

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 1:40 PM, webdev1977 webdev1...@gmail.com wrote:
 Or.. do you mean each field must have a unique name, but both be of type
 latLon(solr.LatLonType).
 work x,y/work
 homex,y/home

Yes.

 If the statement directly above is true (I hope that it is not), how does
 one dynamically create fields when adding geotags?

Dynamic field types.  You can configure it such that anything ending
with _latlon is of type LatLonType.
Perhaps we should do this in the example schema.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


error sending a delete all request

2010-09-30 Thread Christopher Gross
I'm writing some code that pushes data into a Solr instance.  I have my
Tomcat (5.5.28) set up to use 2 indexes, I'm hitting the second one for
this.
I try to issue the basic command to clear out the index
(deletequery*:*/query/delete), and I get the error posted below
back.

Does anyone have an idea of what I'm missing or what could cause this
error?  I can clip in more from the logs if need be.

Thanks!

Logs:
2010-09-30 13:21:35,078 [pool-2-thread-1] DEBUG httpclient.wire.header - 
POST /solr2/update HTTP/1.1[\r][\n]
2010-09-30 13:21:35,078 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Adding Host request header
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
User-Agent: Jakarta Commons-HttpClient/3.0[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Host: localhost:8080[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Content-Length: 35[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Content-Type: text/xml; charset=UTF-8[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - 
deletequery*:*/query/delete
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.methods.EntityEnclosingMethod - Request body
sent
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
HTTP/1.1 500 Internal Server Error[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Server: Apache-Coyote/1.1[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Content-Type: text/html;charset=utf-8[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Content-Length: 7117[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Date: Thu, 30 Sep 2010 17:21:35 GMT[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.header - 
Connection: close[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Buffering response body
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - 
htmlheadtitleApache Tomcat/5.5.28 -Error report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - null[\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - 
[\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - 
java.lang.AbstractMethodError[\r][\n]
2010-09-30 13:21:35,093 [pool-2-thread-1] DEBUG httpclient.wire.content - 
[0x9]at
org.apache.lucene.search.Searcher.search(Searcher.java:150)[\r][\n]

Then the actual stack trace in case that helps:
java.lang.AbstractMethodError
at org.apache.lucene.search.Searcher.search(Searcher.java:150)
at
org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:343)
at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:260)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:204)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at

Re: Local Solr, Spatial Search, and LatLonType clarification

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 1:48 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 Dynamic field types.  You can configure it such that anything ending
 with _latlon is of type LatLonType.
 Perhaps we should do this in the example schema.

Looks like we already have it:

   dynamicField name=*_p  type=location indexed=true stored=true/

So you should be able to add stuff like home_p and work_p w/o defining
them ahead of time.  Anything ending in _p is of type location.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


Re: SEVERE: Unable to move index file

2010-09-30 Thread wojtekpia

Hi,
I ran into this problem again the other night. I've looked through my log
files in more detail, and nothing seems out of place (I stripped user
queries out and included it below). I have the following setup:
1. Indexer has 2 cores. One core gets incremental updates, the other is for
full re-syncs with a database. The last step in my full re-sync process is
to swap cores (so that the searchers don't have to change their replication
master URLs).
2. Searcher that is subscribed to a constant indexer URL.

I noticed this replication error occurred right after I swapped my indexer's
cores. Since the index version and generation numbers are independent across
the 2 cores, could the searcher's index clean up be pre-emptively deleting
the active searcher index? When the error occurred, index.20100921053730 did
not exist, but index.properties was pointing to it. Previous entries in the
log make it seem like the directory did exist a few minutes earlier
(replication + warmup succeeded pointing at that directory). 

I've tried to reproduce this in a development environment, but haven't been
able to so far. 
https://issues.apache.org/jira/browse/SOLR-1822?focusedCommentId=12845175
SOLR-1822  seems to address a similar issue. I suspect that it would solve
what I'm seeing, but it treats the symptom rather than the cause (and I'd
like to be able to repro before trying it). Any insight/theories are
appreciated.

Thanks,

Wojtek

Sep 21, 2010 5:35:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1271723727936, generation: 18616
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1271723727935, generation: 18615
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 118
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13n9.prx
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13nx.fnm
...
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13m5.fnm
...
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13n9.frq
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 0 secs
Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2
__AW_commit
INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening searc...@61080339 main
Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2
__AW_commit
INFO: end_commit_flush
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm
INFO: autowarming searc...@61080339 main from searc...@26aebd8c main

fieldValueCache{lookups=866,hits=866,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493365,cumulative_hits=493351,cumulative_hitratio=0.99,cumulative_inserts=7,cumulative_evictions=0,item_FeaturesFacet={field=FeaturesFacet,memSize=51896798,tindexSize=56,time=988,phase1=936,nTerms=50,bigTerms=9,termInstances=5403271,uses=146},...}
...
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm
INFO: autowarming result for searc...@61080339 main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=2036931,cumulative_hits=836191,cumulative_hitratio=0.41,cumulative_inserts=1200740,cumulative_evictions=1103563}
Sep 21, 2010 5:37:31 PM org.apache.solr.core.QuerySenderListener
__AW_newSearcher
INFO: QuerySenderListener sending requests to searc...@61080339 main
Sep 21, 2010 5:37:31 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field
{field=BedFacet,memSize=48178130,tindexSize=42,time=313,phase1=261,nTerms=6,bigTerms=4,termInstances=328351,uses=0}
...
INFO: [] webapp=null path=null params={*:*} hits=11546888 status=0
QTime=20687 
Sep 21, 2010 5:37:58 PM org.apache.solr.core.QuerySenderListener
__AW_newSearcher
INFO: QuerySenderListener done.
Sep 21, 2010 5:37:58 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@61080339 main
Sep 21, 2010 5:37:58 PM org.apache.solr.search.SolrIndexSearcher __AW_close
INFO: Closing searc...@26aebd8c main


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Yonik Seeley
On Thu, Sep 30, 2010 at 10:41 AM, Renee Sun renee_...@mcafee.com wrote:

 Hi -
 I posted this problem but no response, I guess I need to post this in the
 Solr-User forum. Hopefully you will help me on this.

 We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr
 1.4, then when we start the Solr, it took about 45 minutes. The catalina.log
 shows Solr is very slowly loading all the cores.

Have you tried 1.4.1 yet?
Could you open a JIRA issue for this and give whatever info you can?
Info like:
  - do you have any warming queries configured?
  - do the cores have documents already, and if so, how many per core?
  - are you using the same schema  solrconfig, or did you upgrade?
  - have you tried finding out what is taking up all the memory (or
all the CPU time)?

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


RE: Is Solr right for my business situation ?

2010-09-30 Thread Sharma, Raghvendra
Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra sraghven...@corelogic.com wrote:

 From: Sharma, Raghvendra sraghven...@corelogic.com
 Subject: RE: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 29, 2010, 9:40 AM
 Some questions.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 Do you think having multiple indexes could be a solution
 for this case ?? or do I really need to spend effort in
 denormalizing the data ?
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 --raghav..
 
 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?
 
 Thanks for the responses people.
 
 @Grant  
 
 1. can you show me some direction on that.. loading data
 from an incoming stream.. do I need some third party tools,
 or need to build something myself...
 
 4. I am basically attempting to build a very fast search
 interface for the existing data. The volume I mentioned is
 more like static one (data is already there). The sql
 statements I mentioned are daily updates coming. The good
 thing is that the history is not there, so the overall
 volume is not growing, but I need to apply the update
 statements. 
 
 One workaround I had in mind is, (though not so great
 performance) is to apply the updates to a copy of rdbms, and
 then feed the rdbms extract to solr.  Sounds like
 overkill, but I don't have another idea right now. Perhaps
 business discussions would yield something.
 
 @All -
 
 Some more questions guys.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work 

Re: Automatic xslt to responses ??

2010-09-30 Thread Gora Mohanty
On Thu, Sep 30, 2010 at 10:47 PM, Sharma, Raghvendra
sraghven...@corelogic.com wrote:
 Is there a way to specify a xslt at the server side, and make it default, 
 i.e. whenever a response is returned, that xslt is applied to the response 
 automatically...

This should be of help: http://wiki.apache.org/solr/XsltResponseWriter

Regards,
Gora


RE: Grouping in solr ?

2010-09-30 Thread Papp Richard
I'm really sorry - thank you for the note.

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, September 28, 2010 05:12
To: solr-user@lucene.apache.org
Subject: Re: Grouping in solr ?

: References:
: abcc5d9ce0798544a169c584b8f1447d230313c...@exchange01.toolbox.local
: In-Reply-To:
: abcc5d9ce0798544a169c584b8f1447d230313c...@exchange01.toolbox.local
: Subject: Grouping in solr ?

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 5419 (20100902) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 



Re: Solr Cluster Indexing data question

2010-09-30 Thread Steve Cohen
So how would one set it up to use multiple nodes for building an index? I
see a document for solr + hadoop (http://wiki.apache.org/solr/HadoopIndexing)
and it says it has an example but the example is missing.

Thanks,
Steve Cohen

On Thu, Sep 30, 2010 at 10:58 AM, Jak Akdemir jakde...@gmail.com wrote:

 If you want to use both of your nodes for building index (which means
 two master), it makes them unified and collapses master slave
 relation.

 Would you take a look the link below for index snapshot problem?
 http://wiki.apache.org/solr/SolrCollectionDistributionScripts

 On Thu, Sep 30, 2010 at 11:03 AM, ZAROGKIKAS,GIORGOS
 g.zarogki...@multirama.gr wrote:
  Hi there solr experts
 
 
  I have an solr cluster with two nodes  and separate index files for each
  node
 
  Node1 is master
  Node2 is slave
 
 
  Node1 is the one that I index my data and replicate them to Node2
 
  How can I index my data at both nodes simultaneously ?
  Is there any specific setup 
 
 
  The problem is when my Node1 is down and I index the data from Node2 ,
  Solr creates backup index folders like this index.20100929060410
  and reduce the space of my hard disk
 
  Thanks in advance
 
 
 
 
 
 
 



parsedquery is different from querystrin

2010-09-30 Thread abhayd

hi 
I am searching for blackberry for some reason parsedquery shows up as
blackberri.

I check synonyms but i don't see anywhere.

lst name=debug
str name=rawquerystringtext:blackberry/str
str name=querystringtext:blackberry/str
str name=parsedquerytext:blackberri/str
str name=parsedquery_toStringtext:blackberri/str

Not sure if its related query results are showing up when its matched with
black 

Any help or directions for knowing why a document is showing up in the
result what word in doc hit the search term? I am seeing docs in results
which do not have search term at all

thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/parsedquery-is-different-from-querystrin-tp1610081p1610081.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory usage

2010-09-30 Thread Chris Hostetter

: There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
: not sure which is most likely to cause an impact. We're sorting on a dynamic
: field there are about 1000 different variants of this field that look like
: priority_sort_for_client_id, which is an integer field. I've heard that
: sorting can have a big impact on memory consumption, could that be it?

sorting on a field requires that an array of the corrisponding type be 
constructed for that field - the size of the array is the size of maxDoc 
(ie: the number of documents in your index, including deleted documents).

If you are using TrieInts, and have an index with no deletions, sorting 
~14.7Mil docs on 1000 diff int fields will take up about ~55GB.

Thats a minimum just for the sorting of those int fields (SortablIntField 
which keeps a string version of the field value will be signifcantly 
bigger) and doesn't take into consideration any other data structures used 
for searching.

I'm not a GC expert, but based on my limited understanding your graph 
actually seems fine to me .. particularly the part where it says 
you've configured a Max heap of ~122GB or ram, and it's 
never spend anytime doing ConcurrentMarkSweep.  My uneducated 
understanding of those two numbers is that you've told the JVM it can use 
an ungodly amount of RAM, so it is.  It's done some basic cleanup of 
young gen (ParNew) but because the heap size has never gone above 50GB, 
it hasn't found any reason to actualy start a CMS GC to look for dea 
objects in Old Gen that it can clean up.


(Can someone who understands GC and JVM tunning better then me please 
sanity check me on that?)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Memory usage

2010-09-30 Thread Jeff Moss
I think you've probably nailed it Chris, thanks for that, I think I can get
by with a different approach than this.

Do you know if I will get the same memory consumption using the
RandomFieldType vs the TrieInt?

-Jeff

On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
 : not sure which is most likely to cause an impact. We're sorting on a
 dynamic
 : field there are about 1000 different variants of this field that look
 like
 : priority_sort_for_client_id, which is an integer field. I've heard
 that
 : sorting can have a big impact on memory consumption, could that be it?

 sorting on a field requires that an array of the corrisponding type be
 constructed for that field - the size of the array is the size of maxDoc
 (ie: the number of documents in your index, including deleted documents).

 If you are using TrieInts, and have an index with no deletions, sorting
 ~14.7Mil docs on 1000 diff int fields will take up about ~55GB.

 Thats a minimum just for the sorting of those int fields (SortablIntField
 which keeps a string version of the field value will be signifcantly
 bigger) and doesn't take into consideration any other data structures used
 for searching.

 I'm not a GC expert, but based on my limited understanding your graph
 actually seems fine to me .. particularly the part where it says
 you've configured a Max heap of ~122GB or ram, and it's
 never spend anytime doing ConcurrentMarkSweep.  My uneducated
 understanding of those two numbers is that you've told the JVM it can use
 an ungodly amount of RAM, so it is.  It's done some basic cleanup of
 young gen (ParNew) but because the heap size has never gone above 50GB,
 it hasn't found any reason to actualy start a CMS GC to look for dea
 objects in Old Gen that it can clean up.


 (Can someone who understands GC and JVM tunning better then me please
 sanity check me on that?)


 -Hoss

 --
 http://lucenerevolution.org/  ...  October 7-8, Boston
 http://bit.ly/stump-hoss  ...  Stump The Chump!




SolrJ

2010-09-30 Thread Christopher Gross
Where can I get SolrJ?  The wiki makes reference to it, and says that it is
a part of the Solr builds that you download, but I can't find it in the jars
that come with it.  Can anyone shed some light on this for me?

Thanks!

-- Chris


updating the solr index

2010-09-30 Thread Vicedomine, James (TS)
Sometimes with I update the solr index (for example post new DOCs with
the same id) old DOC ATTRIBUTE VALUES appear to be available to queries;
but not visible when the DOC ATTRIBUTE VALUES are listed?  In other
words, queries sometimes return results based upon old attribute values?

Thank you in advance.



James Vicedomine 
Software Development Analyst 4 
Northrop Grumman, Integrated Data and Software Solutions 
978-247-7842




Re: SolrJ

2010-09-30 Thread Allistair Crossley
it's in the dist folder with the name provided by the wiki page you refer to

On Sep 30, 2010, at 3:01 PM, Christopher Gross wrote:

 Where can I get SolrJ?  The wiki makes reference to it, and says that it is
 a part of the Solr builds that you download, but I can't find it in the jars
 that come with it.  Can anyone shed some light on this for me?
 
 Thanks!
 
 -- Chris



LocalSolr, Spatial Search, LatLonType clarification

2010-09-30 Thread webdev1977

I have been reading through all the jira issues and patches, as well as the
wikis and I still have a few things that are not clear to me. 

I am currently running with Solr 1.4.1 and using Nutch for my crawling. 
Everything is working great, I am using a Nutch plugin to add lat long
information, I just don't know if it is possible to do what I am wanting to
do.

1.  I noticed that it said that the type of LatLongType can not be
mulitvalued. Does that mean that I can not have multiple lat/lon values for
one document.  If so, that would be quite a limitation.  I have an average
of 10 geotags per document.

2. Is LocalSolr and SpatialSearch the same thing?  

3. If I did want to use the LatLonType with the BBOX filter,  where would I
go to get a patch for 1.4.1? Is it even possible to patch 1.4.1 or do I have
to go to an entirely different dev version of Solr? 

Thanks for your input!!! 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalSolr-Spatial-Search-LatLonType-clarification-tp1609043p1609043.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ

2010-09-30 Thread Christopher Gross
Now I feel dumb, it was right there.  Thanks! :)

-- Chris


On Thu, Sep 30, 2010 at 3:04 PM, Allistair Crossley a...@roxxor.co.ukwrote:

 it's in the dist folder with the name provided by the wiki page you refer
 to

 On Sep 30, 2010, at 3:01 PM, Christopher Gross wrote:

  Where can I get SolrJ?  The wiki makes reference to it, and says that it
 is
  a part of the Solr builds that you download, but I can't find it in the
 jars
  that come with it.  Can anyone shed some light on this for me?
 
  Thanks!
 
  -- Chris




RE: updating the solr index

2010-09-30 Thread Markus Jelsma
Updates will not show up if they weren't committed, either through a manual 
commit or auto commit. 
 
-Original message-
From: Vicedomine, James (TS) james.vicedom...@ngc.com
Sent: Thu 30-09-2010 21:04
To: solr-user@lucene.apache.org; 
Subject: updating the solr index

Sometimes with I update the solr index (for example post new DOCs with
the same id) old DOC ATTRIBUTE VALUES appear to be available to queries;
but not visible when the DOC ATTRIBUTE VALUES are listed?  In other
words, queries sometimes return results based upon old attribute values?

Thank you in advance.



James Vicedomine 
Software Development Analyst 4 
Northrop Grumman, Integrated Data and Software Solutions 
978-247-7842




Re: Memory usage

2010-09-30 Thread Lance Norskog
You can also sort on a field by using a function query instead of the
sort=field+desc parameter. This will not eat up memory. Instead, it
will be slower. In short, it is a classic speed v.s. space trade-off.

You'll have to benchmark and decide which you want, and maybe some
fields need the fast sort and some can get away with the slow one.

http://www.lucidimagination.com/search/?q=function+query

On Thu, Sep 30, 2010 at 11:47 AM, Jeff Moss jm...@heavyobjects.com wrote:
 I think you've probably nailed it Chris, thanks for that, I think I can get
 by with a different approach than this.

 Do you know if I will get the same memory consumption using the
 RandomFieldType vs the TrieInt?

 -Jeff

 On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:


 : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm
 : not sure which is most likely to cause an impact. We're sorting on a
 dynamic
 : field there are about 1000 different variants of this field that look
 like
 : priority_sort_for_client_id, which is an integer field. I've heard
 that
 : sorting can have a big impact on memory consumption, could that be it?

 sorting on a field requires that an array of the corrisponding type be
 constructed for that field - the size of the array is the size of maxDoc
 (ie: the number of documents in your index, including deleted documents).

 If you are using TrieInts, and have an index with no deletions, sorting
 ~14.7Mil docs on 1000 diff int fields will take up about ~55GB.

 Thats a minimum just for the sorting of those int fields (SortablIntField
 which keeps a string version of the field value will be signifcantly
 bigger) and doesn't take into consideration any other data structures used
 for searching.

 I'm not a GC expert, but based on my limited understanding your graph
 actually seems fine to me .. particularly the part where it says
 you've configured a Max heap of ~122GB or ram, and it's
 never spend anytime doing ConcurrentMarkSweep.  My uneducated
 understanding of those two numbers is that you've told the JVM it can use
 an ungodly amount of RAM, so it is.  It's done some basic cleanup of
 young gen (ParNew) but because the heap size has never gone above 50GB,
 it hasn't found any reason to actualy start a CMS GC to look for dea
 objects in Old Gen that it can clean up.


 (Can someone who understands GC and JVM tunning better then me please
 sanity check me on that?)


 -Hoss

 --
 http://lucenerevolution.org/  ...  October 7-8, Boston
 http://bit.ly/stump-hoss      ...  Stump The Chump!






-- 
Lance Norskog
goks...@gmail.com


RE: Automatic xslt to responses ??

2010-09-30 Thread Markus Jelsma
You can add a default setting to your request handler. Read about defaults, 
appends and invariants in requesthandlers defined in your solrconfig.xml. 
 
-Original message-
From: Sharma, Raghvendra sraghven...@corelogic.com
Sent: Thu 30-09-2010 19:17
To: solr-user@lucene.apache.org; 
Subject: Automatic xslt to responses ??

Is there a way to specify a xslt at the server side, and make it default, i.e. 
whenever a response is returned, that xslt is applied to the response 
automatically...
**
 
This message may contain confidential or proprietary information intended only 
for the use of the 
addressee(s) named above or may contain information that is legally privileged. 
If you are 
not the intended addressee, or the person responsible for delivering it to the 
intended addressee, 
you are hereby notified that reading, disseminating, distributing or copying 
this message is strictly 
prohibited. If you have received this message by mistake, please immediately 
notify us by  
replying to the message and delete the original message and any copies 
immediately thereafter. 

Thank you. 
**
 
CLLD


RE: can i have more update processors with solr

2010-09-30 Thread Markus Jelsma
Almost, you can define a updateRequestProcessorChain that houses multiple 
update processors.

 

  updateRequestProcessorChain name=dedupe
    processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldtitle_signature/str
  bool name=overwriteDupestrue/bool
  str name=fieldstitle/str
  str 
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
    /processor

    processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldcontent_signature/str
  bool name=overwriteDupestrue/bool
  str name=fieldscontent/str
  str 
name=signatureClassorg.apache.solr.update.processor.TextProfileSignature/str
    /processor
 processor class=solr.LogUpdateProcessorFactory /
    processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain
 
-Original message-
From: Dilshod Temirkhodjaev tdils...@gmail.com
Sent: Thu 30-09-2010 17:12
To: solr-user@lucene.apache.org; 
Subject: can i have more update processors with solr

I don't know if this is bug or not, but when i'm writing this in
solrconfig.xml

requestHandler name=/update class=solr.XmlUpdateRequestHandler
 lst name=defaults
   str name=update.processorCustomRank/str
   str name=update.processordedupe/str
 /lst
/requestHandler

only first update.processor works, why second is not working?


Re: error sending a delete all request

2010-09-30 Thread Christopher Gross
I have also tried using SolrJ to hit my index, and I get this error:

2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.useragent = Jakarta Commons-HttpClient/3.0
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.version = HTTP/1.1
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.class = class
org.apache.commons.httpclient.SimpleHttpConnectionManager
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.cookie-policy = rfc2109
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.element-charset = US-ASCII
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.protocol.content-charset = ISO-8859-1
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.method.retry-handler =
org.apache.commons.httpclient.defaulthttpmethodretryhand...@1a082e2
2010-09-30 16:23:14,406 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.dateparser.patterns = [EEE, dd MMM  HH:mm:ss zzz, , dd-MMM-yy
HH:mm:ss zzz, EEE MMM d HH:mm:ss , EEE, dd-MMM- HH:mm:ss z, EEE,
dd-MMM- HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM- HH:mm:ss
z, EEE dd MMM  HH:mm:ss z, EEE dd-MMM- HH-mm-ss z, EEE dd-MMM-yy
HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z,
EEE,dd-MMM- HH:mm:ss z, EEE, dd-MM- HH:mm:ss z]
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-per-host = {HostConfiguration[]=32}
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-total = 128
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.socket.timeout = 2
2010-09-30 16:23:14,421 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.timeout = 4
2010-09-30 16:23:14,453 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-per-host = {HostConfiguration[]=100}
2010-09-30 16:23:14,453 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.params.DefaultHttpParams - Set parameter
http.connection-manager.max-total = 100
2010-09-30 16:23:14,484 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager -
HttpConnectionManager.getConnection:  config = HostConfiguration[host=
http://localhost:8080], timeout = 4
2010-09-30 16:23:14,484 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager -
Allocating new connection, hostConfig=HostConfiguration[host=
http://localhost:8080]
2010-09-30 16:23:14,500 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpConnection - Open connection to
localhost:8080
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Adding Host request header
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.methods.EntityEnclosingMethod - Request body
sent
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpMethodBase - Should close connection in
response to directive: close
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.HttpConnection - Releasing connection back to
connection manager.
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - Freeing
connection, hostConfig=HostConfiguration[host=http://localhost:8080]
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.util.IdleConnectionHandler - Adding connection
at: 1285878194515
2010-09-30 16:23:14,515 [pool-2-thread-1] DEBUG
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - Notifying
no-one, there are no waiting threads
2010-09-30 16:23:14,515 [pool-2-thread-1] WARN
gov.dni.search.intelsync.exporter.SyncExporter -
org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

request: http://localhost:8080/solr2/update?wt=xmlversion=2.2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:343)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
at

DataImportHandler Error CHARBytesToJavaChars

2010-09-30 Thread harrysmith

Anyone ever see this error on an import? 

Caused by: java.lang.NullPointerException
at
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.java:1015)

The Oracle column being converted is VARCHAR2(4000 Char) and there are NULLs
present in the record set.

Envrionment: Solr 1.4, Windows, Jetty 


Full stack trace below:

at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
39)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
2)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo
nnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
.java:442)
Caused by: java.lang.NullPointerException
at
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.ja
va:1015)
at
oracle.jdbc.driver.DBConversion.CHARBytesToJavaChars(DBConversion.jav
a:892)
at
oracle.jdbc.driver.T4CVarcharAccessor.unmarshalOneRow(T4CVarcharAcces
sor.java:282)
at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:919)
at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:843)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:630)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at
oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:961)
at
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStateme
nt.java:1072)
at
oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.jav
a:845)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStateme
nt.java:1154)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.ja
va:1726)
at
oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1696)

at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.
init(JdbcDataSource.java:246)
... 32 more
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-Error-CHARBytesToJavaChars-tp1611016p1611016.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: parsedquery is different from querystrin

2010-09-30 Thread Markus Jelsma
We cannot really give an answer without knowing your fieldType and query. We 
can see that the blackberry = blackberri is caused by a stemmer you have, 
perhaps a porter or snowball stemmer. Anyway, that's normal.
 
-Original message-
From: abhayd ajdabhol...@hotmail.com
Sent: Thu 30-09-2010 20:32
To: solr-user@lucene.apache.org; 
Subject: parsedquery is different from querystrin


hi 
I am searching for blackberry for some reason parsedquery shows up as
blackberri.

I check synonyms but i don't see anywhere.

lst name=debug
str name=rawquerystringtext:blackberry/str
str name=querystringtext:blackberry/str
str name=parsedquerytext:blackberri/str
str name=parsedquery_toStringtext:blackberri/str

Not sure if its related query results are showing up when its matched with
black 

Any help or directions for knowing why a document is showing up in the
result what word in doc hit the search term? I am seeing docs in results
which do not have search term at all

thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/parsedquery-is-different-from-querystrin-tp1610081p1610081.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is Solr right for my business situation ?

2010-09-30 Thread Markus Jelsma
Recent versions supports sharding and handles distribution of your query and 
result set merging. The problem, it won't help you to join on separate 
`tables`. The fields you query need to be present in each shard or you'll end 
up with an HTTP 400 - undefined field error.

 

Indeed, there is no escape.
 
-Original message-
From: Sharma, Raghvendra sraghven...@corelogic.com
Sent: Thu 30-09-2010 20:07
To: solr-user@lucene.apache.org; 
Subject: RE: Is Solr right for my business situation ?

Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra sraghven...@corelogic.com wrote:

 From: Sharma, Raghvendra sraghven...@corelogic.com
 Subject: RE: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 29, 2010, 9:40 AM
 Some questions.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 Do you think having multiple indexes could be a solution
 for this case ?? or do I really need to spend effort in
 denormalizing the data ?
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 --raghav..
 
 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?
 
 Thanks for the responses people.
 
 @Grant  
 
 1. can you show me some direction on that.. loading data
 from an incoming stream.. do I need some third party tools,
 or need to build something myself...
 
 4. I am basically attempting to build a very fast search
 interface for the existing data. The volume I mentioned is
 more like static one (data is already there). The sql
 statements I mentioned are daily updates coming. The good
 thing is that the history is not there, so the overall
 volume is not growing, but I need to apply the update
 statements. 
 
 One workaround I had in mind is, (though not so great
 performance) is to apply the updates to a copy of rdbms, and
 then feed the rdbms extract to solr.  Sounds like
 overkill, but I don't have another idea right now. Perhaps
 business discussions would yield something.
 
 @All -
 
 Some more questions guys.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 2. Further, loading into solr can 

Re: tomcat, solr and dismax syntax

2010-09-30 Thread Chris Hostetter

: it turns the plus(es) into spaces. Is this a tomcat setting or a solr 
: one to stop this happening? How can I get the plus into solr so it 
: actually means a required word.

It's part of the URL specification -- all of your query params (not just 
the query string) need to be properly URL escaped regardless of what 
QParser you use...

http://wiki.apache.org/solr/SolrQuerySyntax#NOTE:_URL_Escaping_Special_Characters
http://en.wikipedia.org/wiki/Percent-encoding


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



PHP Solr API

2010-09-30 Thread Scott Yeadon

 Hi,

I have inherited an application which uses Solr search and the PHP Solr 
API (http://pecl.php.net/package/solr). While the list of search results 
with appropriate highlighting is all good, when selecting a result that 
navigates to an individual article the users want to have all the hits 
highlighted in the full text.


The problem is that the article text is HTML and Solr appears to strip 
the HTML by default. The highlight snippets contain no formatting and 
neither does the stored version of the text. This means that using a 
large snippet size and using the returned text as the article text is 
not satisfactory, nor is using the stored version returned by the return 
response.


Obtaining offset information from the search and applying the 
highlighting myself within the webapp using the HTML version would be 
fine, but the offsets will be wrong due to the stripping of the tags. 
Does anyone have any advice on how I might get this to work, it doesn't 
seem to be a particularly unusual use case yet I could not find 
information on how to achieve it. It's likely I'm overlooking something 
simple. Anyone have any advice?


Thanks.

Scott.


Re: PHP Solr API

2010-09-30 Thread Scott Yeadon
 Thanks, but I still need to store text at any rate in order to get 
the highlighted snippets for the search results list. This isn't a 
problem. The issue is how to obtain correct offsets or other mechanisms 
for being able to display the original HTML text plus term highlighting 
when navigating to an individual search result.


Scott.

On 1/10/10 12:53 PM, Neil Lunn wrote:

On Fri, 2010-10-01 at 12:00 +1000, Scott Yeadon wrote:

Hi,

The problem is that the article text is HTML and Solr appears to strip
the HTML by default.

I think what you need to look at is how the fields are defined by
default in your schema. If Data sent as HTML is being added to the
standard html-text type and stored then the html is stripped and words
indexed by default. If you want to store the raw html then maybe you
should be doing that and not storing the stripped version, just indexing
it.





Re: Faster loading to solr...

2010-09-30 Thread Gora Mohanty
On Thu, Sep 30, 2010 at 10:49 PM, Sharma, Raghvendra
sraghven...@corelogic.com wrote:
 I have been able to load around a million rows/docs in around 5+ minutes.  
 The schema contains around 250+ fields.  For the moment, I have kept 
 everything as string.
 I am sure there are ways to get better loading speeds than this.

A million documents with 250 fields in 5 minutes sounds fast to
me. As a comparison, we do a million documents with about 60 fields
in an hour, using multiple Solr cores. However, this is very likely an
apples to oranges comparison, as we are pulling large amounts of
data from a database over a network. What indexing times are you
aiming for?

If you can shard your data, using multiple cores on a single Solr
instance, and/or multiple Solr instances will speed up your indexing.
However, if you want a complete, non-sharded index, you will need
to merge the sharded ones.

 Will the data type matter in loading speeds ?? or anything else ?

Data type might matter if there is a lot of processing involved for
that data type. E.g., the text type has several analyzers and tokenizers.

 Can someone help me with any tips ? perhaps any best practices  kind of 
 document/article..
 Anything ..
[...]

The Solr Wiki has many suggestions, e.g., look at the documentation
on the DataImportHandler. In our experience, XML import has been
very fast. A generic document is difficult as the speed is dependent
on many things, such as the data source, number and type of fields,
size of data, etc. Your best bet is to try out several approaches.

Regards,
Gora


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Renee Sun

Hi Yonik,
thanks for your reply.

I entered a bug for this at :
https://issues.apache.org/jira/browse/SOLR-2138

to answer your questions here:
  - do you have any warming queries configured? 
 no, all autowarmingcount are set to 0 for all caches
  - do the cores have documents already, and if so, how many per core? 
 yes, 130 cores total, 2,3 of them already have 1~2.4 million
documents, others have about 50,000 documents
  - are you using the same schema  solrconfig, or did you upgrade? 
 yes, absolutely no change
  - have you tried finding out what is taking up all the memory (or 
all the CPU time)? 
 yes, JConsole shows after 70 cores are loaded in about 4 minutes, all
16GB memory are taken and rest of cores load extremely slow. The memory
remain high and never dropped.

We are in process to upgrade to 1.4.1

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1611030.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlighting match term in bold rather than italic

2010-09-30 Thread efr...@gmail.com
Hi all -

Does anyone know how to produce solr results where the match term is
highlighted in bold rather than italic?

thanks in advance,

Brad


Re: Highlighting match term in bold rather than italic

2010-09-30 Thread Scott Gonyea
Your solrconfig has a highlighting section.  You can make that CDATA
thing whatever you want.  I changed it to strong.

On Thu, Sep 30, 2010 at 2:54 PM, efr...@gmail.com efr...@gmail.com wrote:
 Hi all -

 Does anyone know how to produce solr results where the match term is
 highlighted in bold rather than italic?

 thanks in advance,

 Brad



Re: Highlighting match term in bold rather than italic

2010-09-30 Thread Scott Yeadon

 Check out
http://wiki.apache.org/solr/HighlightingParameters
and the hl.simple.pre/hl.simple.post options

You may be also able to control the display of the default em/ via CSS 
but will depend on your rendering context as to whether this is feasible.


Scott.

On 1/10/10 7:54 AM, efr...@gmail.com wrote:

Hi all -

Does anyone know how to produce solr results where the match term is
highlighted in bold rather than italic?

thanks in advance,

Brad