deadlock in solrj?

2010-09-29 Thread Michal Stefanczak
Hello!

 

I' using solrj 1.4.0 with java 1.6, on two occasions when indexing
~18000 documents we got the following problem:

 

(trace from jconsole)

 

Name: pool-1-thread-1

State: WAITING on
java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11
e464a

Total blocked: 25  Total waited: 1

 

Stack trace: 

sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
ait(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:25
4)

org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(Stre
amingUpdateSolrServer.java:196)

org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
actUpdateRequest.java:105)

 

This is the codeblock that's used for indexing

 

public UpdateResponse
indexDocuments(CollectionSolrInputDocument docs, int commitWithin){

   UpdateResponse updated = null;

   if(docs.isEmpty()){

   return null;

   }

   try {

   UpdateRequest req = new
UpdateRequest();

 
req.setCommitWithin(commitWithin);

   req.add(docs);

   updated =
req.process(solr); 

   } catch (SolrServerException e) {

   logger.error(Error while
indexing documents [+docs+], e);

   } catch (IOException e) {

   logger.error(IOException
while indexing documents [+docs+],e);

   }

   return updated;

}

 

 

The commitWithin used in application is 1.

 

 

If I'm not wrong it's a deadlock. Is this a known issue? 

 

With regards

Michal Stefanczak



Re: Best way to check Solr index for completeness

2010-09-29 Thread Dennis Gearon
How soon do you need to know? Couldn't you just regenerate the index using some 
kind of 'nice' factor to not use too much processor/disk/etc?

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 9/28/10, dshvadskiy dshvads...@gmail.com wrote:

 From: dshvadskiy dshvads...@gmail.com
 Subject: Re: Best way to check Solr index for completeness
 To: solr-user@lucene.apache.org
 Date: Tuesday, September 28, 2010, 2:11 PM
 
 That will certainly work for most recent updates but I need
 to compare entire
 index.
 
 Dmitriy
 
 Luke Crouch wrote:
  
  Is there a 1:1 ratio of db records to solr documents?
 If so, couldn't you
  simply select the most recent updated record from the
 db and check to make
  sure the corresponding solr doc has the same
 timestamp?
  
  -L
  
  On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
  dshvads...@gmail.comwrote:
  
  Hello,
  What would be the best way to check Solr index
 against original system
  (Database) to make sure index is up to date? I can
 use Solr fields like
  Id
  and timestamp to check against appropriate fields
 in database. Our index
  currently contains over 2 mln documents across
 several cores. Pulling all
  documents from Solr index via search (1000 docs at
 a time) is very slow.
  Is
  there a better way to do it?
 
  Thanks,
  Dmitriy
 
  
  
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


Re: Best way to check Solr index for completeness

2010-09-29 Thread Peter Karich
How long does it take to get 1000 docs?
Why not ensure this while indexing?

I think besides your suggestion or the suggestion of Luke there is no
other way...

Regards,
Peter.

 Hello,
 What would be the best way to check Solr index against original system
 (Database) to make sure index is up to date? I can use Solr fields like Id
 and timestamp to check against appropriate fields in database. Our index
 currently contains over 2 mln documents across several cores. Pulling all
 documents from Solr index via search (1000 docs at a time) is very slow. Is
 there a better way to do it?

 Thanks,
 Dmitriy
   

-- 
http://jetwick.com twitter search prototype



Re: deadlock in solrj?

2010-09-29 Thread Avi Rosenschein
This sounds like https://issues.apache.org/jira/browse/SOLR-1711. It is a
known issue in Solr 1.4.0, which is apparently fixed in Solr 1.4.1. We also
encountered it when indexing large numbers of documents with SolrJ, and are
therefore in the process of upgrading to 1.4.1.

-- Avi

On Wed, Sep 29, 2010 at 8:14 AM, Michal Stefanczak 
michal.stefanc...@nhst.no wrote:

 Hello!



 I' using solrj 1.4.0 with java 1.6, on two occasions when indexing
 ~18000 documents we got the following problem:



 (trace from jconsole)



 Name: pool-1-thread-1

 State: WAITING on
 java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11
 e464a

 Total blocked: 25  Total waited: 1



 Stack trace:

 sun.misc.Unsafe.park(Native Method)

 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
 ait(AbstractQueuedSynchronizer.java:1925)

 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:25
 4)

 org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(Stre
 amingUpdateSolrServer.java:196)

 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
 actUpdateRequest.java:105)



 This is the codeblock that's used for indexing



public UpdateResponse
 indexDocuments(CollectionSolrInputDocument docs, int commitWithin){

   UpdateResponse updated = null;

   if(docs.isEmpty()){

   return null;

   }

   try {

   UpdateRequest req = new
 UpdateRequest();


 req.setCommitWithin(commitWithin);

   req.add(docs);

   updated =
 req.process(solr);

   } catch (SolrServerException e) {

   logger.error(Error while
 indexing documents [+docs+], e);

   } catch (IOException e) {

   logger.error(IOException
 while indexing documents [+docs+],e);

   }

   return updated;

}





 The commitWithin used in application is 1.





 If I'm not wrong it's a deadlock. Is this a known issue?



 With regards

 Michal Stefanczak




Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
Hello list,

I am implementing a directory using Solr. The user is able to search with a 
free-text query or 2 filters (provided as pick-lists) for country. A directory 
entry only has one country.

I am using Solr facets for country and I use the facet counts generated 
initially by a *:* search to generate my pick-list.

This is working fairly well but there are a couple of issues I am facing.

Specifically the countries pick-list does not contain ALL possible countries. 
It only contains those that have been indexed against a document. 

I have looked at facet.missing but I cannot see how this will work - if no 
documents have a country of Sweden, then how would Solr know to generate a 
missing total of zero for Sweden - it's never heard of it.

I feel I am missing something - is there a way by which you tell Solr all 
possible countries rather than relying on counts generated from the index? 

The countries in question reside in a database table belonging to our 
application.

Thanks, Allistair

Re: Missing facet values for zero counts

2010-09-29 Thread Chantal Ackermann
Hi Allistair,


On Wed, 2010-09-29 at 15:37 +0200, Allistair Crossley wrote:
 Hello list,
 
 I am implementing a directory using Solr. The user is able to search with a 
 free-text query or 2 filters (provided as pick-lists) for country. A 
 directory entry only has one country.
 
 I am using Solr facets for country and I use the facet counts generated 
 initially by a *:* search to generate my pick-list.
 
 This is working fairly well but there are a couple of issues I am facing.
 
 Specifically the countries pick-list does not contain ALL possible countries. 
 It only contains those that have been indexed against a document. 
 
 I have looked at facet.missing but I cannot see how this will work - if no 
 documents have a country of Sweden, then how would Solr know to generate a 
 missing total of zero for Sweden - it's never heard of it.
 
 I feel I am missing something - is there a way by which you tell Solr all 
 possible countries rather than relying on counts generated from the index? 
 

I don't think you are missing anything. Instead, you've described it
very well: how should SOLR know of something that never made it into the
index?

Why not just state in the interface that for all missing countries (and
deduce that from the facets and the list retrieved from the database),
there are no hits. You can list those countries separately (or even add
them to the facets after processing solr's result).

If you do want to have them in the index, you'd have to add them by
adding empty documents. But you might get into trouble with required
fields etc. And you will change the statistics of the fields.


Chantal





Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy

Using TermComponent is an interesting suggestion. However my understanding it
will work only for unique terms. For example compare database primary key
with Solr id field.  A variation of that is to calculate some kind of unique
record hash and store it in the index.Then retrieve id and hash via
TermComponent and compare them with hash calculated on database record. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to set up multiple indexes?

2010-09-29 Thread Andy
I installed Solr according to the tutorial. My schema.xml  solrconfig.xml is 
in 
~/apache-solr-1.4.1/example/solr/conf

Everything so far is just like that in the tutorial. But I want to set up a 2nd 
index (separate from the main index) just for the purpose of auto-complete.

I understand that I need to set up multicore for this. But I'm not sure how to 
do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am still 
pretty confused.

- where do I put the 2nd index?
- do I need separate schema.xml  solrconfig.xml for the 2nd index? Where do I 
put them?
- how do I tell solr which index do I want a document to go to?
- how do I tell solr which index do I want to query against?
- any step-by-step instruction on setting up multicore?

Thanks.
Andy



  


Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy

Regenerating index is a slow operation due to limitation of the source
systems. We run several complex SQL statements to generate 1 Solr document.
Full reindex takes about 24 hours.  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602610.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to set up multiple indexes?

2010-09-29 Thread Christopher Gross
Hi Andy!

I configured this a few days ago, and found a good resource --
http://wiki.apache.org/solr/MultipleIndexes

That page has links that will give you the instructions for setting up
Tomcat, Jetty and Resin.  I used the Tomcat ones the other day, and it gave
me everything that I needed to get it up and running.  You basically just
need to create a new directory to contain the second instance, then create a
context file for it in the TOMCAT_HOME/conf/Catalina/localhost directory.

Good luck!

-- Chris


On Wed, Sep 29, 2010 at 10:41 AM, Andy angelf...@yahoo.com wrote:

 I installed Solr according to the tutorial. My schema.xml  solrconfig.xml
 is in
 ~/apache-solr-1.4.1/example/solr/conf

 Everything so far is just like that in the tutorial. But I want to set up a
 2nd index (separate from the main index) just for the purpose of
 auto-complete.

 I understand that I need to set up multicore for this. But I'm not sure how
 to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am
 still pretty confused.

 - where do I put the 2nd index?
 - do I need separate schema.xml  solrconfig.xml for the 2nd index? Where
 do I put them?
 - how do I tell solr which index do I want a document to go to?
 - how do I tell solr which index do I want to query against?
 - any step-by-step instruction on setting up multicore?

 Thanks.
 Andy







Re: How to set up multiple indexes?

2010-09-29 Thread Luke Crouch
Check
http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features

It's for eZ-Find, but it's the basic setup for multiple cores in any
environment.

We have cores designed like so:

solr/sfx/
solr/forum/
solr/mail/
solr/news/
solr/tracker/

each of those core directories has its own conf/ with schema.xml and
solrconfig.xml. then solr/solr.xml looks like:

  cores adminPath=/admin/cores
core name=sfx instanceDir=sfx /
core name=tracker instanceDir=tracker /

etc.

After that you add the core name into the url for all requests to the core:

http:///solr/sfx/select?...
http:///solr/sfx/update...
http:///solr/tracker/select?...
http:///solr/tracker/update...

On Wed, Sep 29, 2010 at 9:41 AM, Andy angelf...@yahoo.com wrote:

 I installed Solr according to the tutorial. My schema.xml  solrconfig.xml
 is in
 ~/apache-solr-1.4.1/example/solr/conf

 Everything so far is just like that in the tutorial. But I want to set up a
 2nd index (separate from the main index) just for the purpose of
 auto-complete.

 I understand that I need to set up multicore for this. But I'm not sure how
 to do that. I read the doc (http://wiki.apache.org/solr/CoreAdmin) but am
 still pretty confused.

 - where do I put the 2nd index?
 - do I need separate schema.xml  solrconfig.xml for the 2nd index? Where
 do I put them?
 - how do I tell solr which index do I want a document to go to?
 - how do I tell solr which index do I want to query against?
 - any step-by-step instruction on setting up multicore?

 Thanks.
 Andy







Re: Queries, Functions, and Params

2010-09-29 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer
robert.tha...@bankserv.com wrote:
 On the http://wiki.apache.org/solr/FunctionQuery page, the following query 
 function is listed:

 q={!func}add($v1,$v2)v1=sqrt(popularity)v2=100.0

 When run against the default solr instance, server returns the error(400): 
 undefined field $v1.

 Any way to remedy this?

 Using version: 3.1-2010-09-28_05-53-44


The wiki page indicates this is a 4.0 feature - so you need a recent
4.0-dev build to try it out.

-Yonik
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Glen Newton
In a recent blog entry (The MySQL “swap insanity” problem and the
effects of the NUMA architecture
http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/),
Jeremy Cole describes a particular but common problem with large memory
installations of MySql on multi-core multi-cpu 64bit NUMA machines,
where debilitating swapping of large amounts of memory occurs even
when there is no (direct) indication of a need to swap.

Without getting into the details (it involves how Linux assigns memory
to the different nodes (each multi-core CPU is viewed as a
'node' in the Linux NUMA view)), the offered partial solution is to
start MySql using the
numactl[1] program, like:
 numactl --interleave all mysql

I was wondering if any of the SOLR people have used this when starting
up Apache
(or whatever servlet engine you use for your SOLR) to reduce unnecessary swap.

You probably want to be monitoring the NUMA memory hit statistics
found here, with and without the numactl, while testing this:
 /sys/devices/system/node/node*/numastat

--

Note that numactl has a number of other interesting and useful
features. One that I have used is the --cpubind  which restricts the
number of CPUs that an application can run on. There are times when
this can improve performance, such as when you have 2 demanding
applications running: by assigning one to half of the CPUs and the
other to the other half of
the CPUs, you _can_ have improved performance due to better locality, cache
hits, etc. It takes some tuning and experimentation. YMWV

-Glen
http://zzzoot.blogspot.com/

[1]http://linuxmanpages.com/man8/numactl.8.php



-- 

-


Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy

Actually retrieving 1000 docs via search isn't that bad. Turned out it takes
under 1 sec.  I still like the idea of using TermComponent and will use it
in the future if number of docs in the index will grow. Thanks for all
suggestions.
Dmitriy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to check Solr index for completeness

2010-09-29 Thread Walter Underwood
Think about what fields you need to return. For this, you probably only need 
the id. That could be a lot faster than the default set of fields.

wunder

On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote:

 
 Actually retrieving 1000 docs via search isn't that bad. Turned out it takes
 under 1 sec.  I still like the idea of using TermComponent and will use it
 in the future if number of docs in the index will grow. Thanks for all
 suggestions.
 Dmitriy
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html
 Sent from the Solr - User mailing list archive at Nabble.com.






RE: Is Solr right for my business situation ?

2010-09-29 Thread Sharma, Raghvendra
Some questions.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

Do you think having multiple indexes could be a solution for this case ?? or do 
I really need to spend effort in denormalizing the data ?

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

--raghav..

-Original Message-
From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] 
Sent: Tuesday, September 28, 2010 11:45 AM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

Thanks for the responses people.

@Grant  

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements. 

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?
 
 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

 
 The existing spatial search has some serious problems and is deprecated.
 
 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.
 
 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

 
 wunder
 
 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion 
 and updation of data, the querying is where our main concerns lie.  Since we 
 

Re: Best way to check Solr index for completeness

2010-09-29 Thread Erick Erickson
Yep, I was thinking of this on a uniqueKey field. I was assuming that
there was
a PK in the database that you were mapping to the uniqueKey field, but if
that's
not so then it's more of a problem.

But you'd have problems anyway if you *don't* have a uniqueKey when it comes
time
to update any records, so it might be worth going back around and putting
one in...

Erick

On Wed, Sep 29, 2010 at 10:40 AM, dshvadskiy dshvads...@gmail.com wrote:


 Using TermComponent is an interesting suggestion. However my understanding
 it
 will work only for unique terms. For example compare database primary key
 with Solr id field.  A variation of that is to calculate some kind of
 unique
 record hash and store it in the index.Then retrieve id and hash via
 TermComponent and compare them with hash calculated on database record.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is Solr right for my business situation ?

2010-09-29 Thread Erick Erickson
If at all possible, denormalize the data. Anytime you find yourself trying
to make Solr
behave like a database, the probability is high that you're mis-using Solr
or the DB.

Best
Erick

On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra 
sraghven...@corelogic.com wrote:

 Some questions.

 1. I have about 3-5 tables. Now designing schema.xml for a single table
 looks ok, but whats the direction for handling multiple table structures is
 something I am not sure about. Would it be like a big huge xml, wherein
 those three tables (assuming its three) would show up as three different
 tag-trees, nullable.

 My source provides me a single flat file per table (tab delimited).

 Do you think having multiple indexes could be a solution for this case ??
 or do I really need to spend effort in denormalizing the data ?

 2. Further, loading into solr can use some perf tuning.. any tips ? best
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it
 default, i.e. whenever a response is returned, that xslt is applied to the
 response automatically...

 4. And last question for the day - :) there was one post saying that the
 spatial support is really basic in solr and is going to be improved in next
 versions... Can you ppl help me get a definitive yes or no on spatial
 support... in the current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...

 --raghav..

 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?

 Thanks for the responses people.

 @Grant

 1. can you show me some direction on that.. loading data from an incoming
 stream.. do I need some third party tools, or need to build something
 myself...

 4. I am basically attempting to build a very fast search interface for the
 existing data. The volume I mentioned is more like static one (data is
 already there). The sql statements I mentioned are daily updates coming. The
 good thing is that the history is not there, so the overall volume is not
 growing, but I need to apply the update statements.

 One workaround I had in mind is, (though not so great performance) is to
 apply the updates to a copy of rdbms, and then feed the rdbms extract to
 solr.  Sounds like overkill, but I don't have another idea right now.
 Perhaps business discussions would yield something.

 @All -

 Some more questions guys.

 1. I have about 3-5 tables. Now designing schema.xml for a single table
 looks ok, but whats the direction for handling multiple table structures is
 something I am not sure about. Would it be like a big huge xml, wherein
 those three tables (assuming its three) would show up as three different
 tag-trees, nullable.

 My source provides me a single flat file per table (tab delimited).

 2. Further, loading into solr can use some perf tuning.. any tips ? best
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it
 default, i.e. whenever a response is returned, that xslt is applied to the
 response automatically...

 4. And last question for the day - :) there was one post saying that the
 spatial support is really basic in solr and is going to be improved in next
 versions... Can you ppl help me get a definitive yes or no on spatial
 support... in the current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...

 Looks like I m close to my solution.. :)

 --raghav

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, September 28, 2010 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is Solr right for my business situation ?

 Inline.

 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

  When do you need to deploy?
 
  As I understand it, the spatial search in Solr is being rewritten and is
 slated for Solr 4.0, the release after next.

 It will be in 3.x, the next release

 
  The existing spatial search has some serious problems and is deprecated.
 
  Right now, I think the only way to get spatial search in Solr is to
 deploy a nightly snapshot from the active development on trunk. If you are
 deploying a year from now, that might change.
 
  There is not any support for SQL-like statements or for joins. The best
 practice for Solr is to think of your data as a single table, essentially
 creating a view from your database. The rows become Solr documents, the
 columns become Solr fields.

 There is now group-by capabilities in trunk as well, which may or may not
 help.

 
  wunder
 
  On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
  I am sure these kind of questions keep coming to you guys, but I want to
 raise the same question in a different context...my own business situation.
  I am very very new to solr and though I have tried to read through the
 documentation, I 

Re: Missing facet values for zero counts

2010-09-29 Thread kenf_nc

I don't understand why you would want to show Sweden if it isn't in the
index, what will your UI do if the user selects Sweden?

However, one way to handle this would be to make a second document type.
Have a field called type or some such, and make the new document type be
'dummy' or 'system' or something like that. You can put documents in here
with fields for any pick-lists you want to facet on and include all possible
values from your database.

Do your facets on either just this doc, or all docs, either way should work.
However on your search queries always include   fq=-type:system
basically exclude all documents of type system from all your searches. 
Messy, but should do what you want.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Missing-facet-values-for-zero-counts-tp1602276p1603893.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Queries, Functions, and Params

2010-09-29 Thread Robert Thayer
Yes, just after sending the email I reread the wiki and noticed the 4.0 
requirement.  I will try that, thanks.



From: ysee...@gmail.com on behalf of Yonik Seeley
Sent: Wed 9/29/2010 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Queries, Functions, and Params



On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer
robert.tha...@bankserv.com wrote:
 On the http://wiki.apache.org/solr/FunctionQuery page, the following query 
 function is listed:

 q={!func}add($v1,$v2)v1=sqrt(popularity)v2=100.0

 When run against the default solr instance, server returns the error(400): 
 undefined field $v1.

 Any way to remedy this?

 Using version: 3.1-2010-09-28_05-53-44


The wiki page indicates this is a 4.0 feature - so you need a recent
4.0-dev build to try it out.

-Yonik
http://lucenerevolution.org http://lucenerevolution.org/   Lucene/Solr 
Conference, Boston Oct 7-8




Re: Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
Hi,

For us this is a usability concern. You either don't show Sweden in a pick-list 
called Country and some users go away thinking you don't *ever* support Sweden 
(not true). OR you allow a user to execute an empty result search - but at 
least they know you do support Sweden.

It is we believe undesirable for a pick-list to change from day to day as the 
index changes - we have a category pick-list also that acts the same. One day a 
user could see Productions, the next day nothing. Regular users would see this 
as odd.

We believe that usability dictates we show all possible values and add a zero 
after to prevent the user executing searches but at least they see the 
possibilities. The best of both worlds we hope.

I have solved this using earlier suggestions of merging a database list query 
with the Solr facet counts. I like your idea though - good thinking but the way 
I've done is working great also :)

Thanks and best wishes, Allistair

On 29 Sep 2010, at 14:08, kenf_nc wrote:

 
 I don't understand why you would want to show Sweden if it isn't in the
 index, what will your UI do if the user selects Sweden?
 
 However, one way to handle this would be to make a second document type.
 Have a field called type or some such, and make the new document type be
 'dummy' or 'system' or something like that. You can put documents in here
 with fields for any pick-lists you want to facet on and include all possible
 values from your database.
 
 Do your facets on either just this doc, or all docs, either way should work.
 However on your search queries always include   fq=-type:system
 basically exclude all documents of type system from all your searches. 
 Messy, but should do what you want.
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Missing-facet-values-for-zero-counts-tp1602276p1603893.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Issues with SolrJ and IndexReader reopening (again)

2010-09-29 Thread Antoniya Statelova
I saw there had been a previous discussion on commit failing for
EmbeddedSolrServer here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html

But it was never resolved. I have an embedded solr server and it does not
seem to pick up changes in the index after a commit through Solrj.

Looking at the logs, I can see a new searcher was opened -
20100929.141930/162 INFO  [pool-1-thread-1] [] core.SolrCore - [] Registered
new searcher searc...@8611b5c main
20100929.141930/162 INFO  [pool-1-thread-1] [] search.SolrIndexSearcher -
Closing searc...@5ff78541 main

I'm using a single searcher, autowarming sizes of 0 to make sure no invalid
entries get transfered over to the new searcher, I even set the httpCaching
max-age=0 (i know pointless but i believe it technically is off then).

Am I missing a form of caching or a configuration that will make sure this
new searcher is pure or at least after time will be purified once invalid
results expire?

Thanks,
Tony


Solr rate limiting / DoS attacks

2010-09-29 Thread Ian Upright
Hi, I'm curious as to what approaches one would take to defend against users
attacking a Solr service, especially if exposed to the internet as opposed
to an intranet.  I'm fairly new to Solr, is there anything built in?

Is there anything in place to prevent the search engine from getting
overwhelmed by a particular user or group of users, submitting loads of
time-consuming queries as some form of a DoS attack?  

Additionally, is there a way of rate-limiting it so that only a certain
number of queries per user/per hour can be submitted, etc?  (for example, to
prevent programmatic access to the search engine as opposed to a human user)

Thanks, Ian


Re: Solr rate limiting / DoS attacks

2010-09-29 Thread Allistair Crossley
This kind of thing is not limited to Solr and you normally wouldn't solve it in 
software - it's more a network concern. I'd be looking at a web server solution 
such as Apache mod_evasive combined with a good firewall for more conventional 
DOS attacks. Just hide your Solr install behind the firewall and communicate 
with it locally from your web application or whatever.

Rate limiting sounds like something Solr should or could provide but I don't 
know the answer to that. 

Cheers

On Sep 29, 2010, at 2:52 PM, Ian Upright wrote:

 Hi, I'm curious as to what approaches one would take to defend against users
 attacking a Solr service, especially if exposed to the internet as opposed
 to an intranet.  I'm fairly new to Solr, is there anything built in?
 
 Is there anything in place to prevent the search engine from getting
 overwhelmed by a particular user or group of users, submitting loads of
 time-consuming queries as some form of a DoS attack?  
 
 Additionally, is there a way of rate-limiting it so that only a certain
 number of queries per user/per hour can be submitted, etc?  (for example, to
 prevent programmatic access to the search engine as opposed to a human user)
 
 Thanks, Ian



How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
Hi,
  I am using xpath to index different parts of the html pages into different 
fields.  Now, I have some pure text documents that has no html.  So I can't use 
xpath.  How do I index these pure text into different fields of the index?  How 
do I make nutch/solr understand these different parts belong to different 
fields?  Maybe I can use existing content in the fields in my index?
Thanks.


  

Re: Data Import Handler Rich Format Documents

2010-09-29 Thread Chris Hostetter

: What's a GA release?

http://en.wikipedia.org/wiki/Software_release_life_cycle#General_availability

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Dismax Request handler and Solrconfig.xml

2010-09-29 Thread Chris Hostetter
: In Solrconfig.xml, default request handler is set to standard. I am
: planning to change that to use dismax as the request handler but when I
: set default=true for dismax - Solr does not return any results - I get
: results only when I comment out str name=defTypedismax/str. 

you need to elaborate on what you mean by does not return any results 
... doesn't return results for what exactly?  what do your requests look 
like? (ie full URLs with all params) what do you expect to get back?  

what URLs are you using when you don't use defType=dismax? what do you get 
back then?

not setting defType means you are getting the standard LuceneQParser 
instead o the DismaxQParser which means the qf param is being ignored and 
hte defaultSearchField is being used instead.  are the terms you are 
searching for in your default search field but not in your title or 
pagedescription field?

Please note these guidelines
http://wiki.apache.org/solr/UsingMailingLists#Information_useful_for_searching_problems


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



terms / stemming?

2010-09-29 Thread Peter A. Kirk
Hi

I issue a request like the following, in order to get a list of search-terms in 
a particular field:

http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext

But some of the terms which are returned are not quite the same as those which 
were indexed (or which are returned in a search). For example, my request above 
might return a term like famili when the indexed term was familie.

Could this have something to do with stemming?

If so, how do I ensure that I get the same search-terms from my terms request, 
as those which were indexed?

Thanks,
Peter


Re: terms / stemming?

2010-09-29 Thread Luke Crouch
Make sure your index and query analyzers are identical, and pay special
attention if you're using any of the
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemminganalyzers
- many of them have a number of configurable attributes that could
cause differences.

-L

On Wed, Sep 29, 2010 at 4:42 PM, Peter A. Kirk p...@alpha-solutions.dkwrote:

 Hi

 I issue a request like the following, in order to get a list of
 search-terms in a particular field:

 http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext

 But some of the terms which are returned are not quite the same as those
 which were indexed (or which are returned in a search). For example, my
 request above might return a term like famili when the indexed term was
 familie.

 Could this have something to do with stemming?

 If so, how do I ensure that I get the same search-terms from my terms
 request, as those which were indexed?

 Thanks,
 Peter



Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Erick Erickson
Can you provide a few more details? You mention xpath, which leads me
to believe that you are using DIH, is that true? How are you getting
your documents to index? Parts of a filesystem?

Because it's possible to do many things. If you're using DIH against a
filesystem,
you could use two fileDataSources, one that works only on files with
a particular extension (xml, say) and another that processes .txt files.

But that said, if you're trying to index just the text of a Word document,
you
have to parse it quite differently than a plain text file, take a look at
Tika.

Al of which may not help you at all, because I'm guessing...

So I think a more complete problem statement would help us help you.

Best
Erick

On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett 
savannah_becket...@yahoo.com wrote:

 Hi,
   I am using xpath to index different parts of the html pages into
 different
 fields.  Now, I have some pure text documents that has no html.  So I can't
 use
 xpath.  How do I index these pure text into different fields of the index?
 How
 do I make nutch/solr understand these different parts belong to different
 fields?  Maybe I can use existing content in the fields in my index?
 Thanks.





Re: terms / stemming?

2010-09-29 Thread Erick Erickson
Yes, this is almost certainly stemming. Take a look at solr/admin, [schema
browser],
then click on Homefieldsyour field here. Then the index and query
details link
shows you exactly what's happening.

You can also get some joy from the admin [analysis] page. That takes input
and
shows you exactly what transformations occur given your schema. Both of
these
are well worth taking an our to understand, it'll save you hours and hours
of head-
scratching.

You could use copyField to copy your bodytext to a field that doesn't stem,
then
query the copy for the terms.

HTH
Erick

On Wed, Sep 29, 2010 at 5:42 PM, Peter A. Kirk p...@alpha-solutions.dkwrote:

 Hi

 I issue a request like the following, in order to get a list of
 search-terms in a particular field:

 http://localhost:8983/solr/terms?terms.limit=-1terms.fl=bodytext

 But some of the terms which are returned are not quite the same as those
 which were indexed (or which are returned in a search). For example, my
 request above might return a term like famili when the indexed term was
 familie.

 Could this have something to do with stemming?

 If so, how do I ensure that I get the same search-terms from my terms
 request, as those which were indexed?

 Thanks,
 Peter



Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
No, I am using xpath for html, this is not the question.  I am indexing pure 
text in addition to html that I was indexing.  Pure text like TXT file or 
Microsoft Word doc.  So, no xpath for TXT, how do I index TXT file into 
different fields in my index like the way I use xpath to index html into 
differernt fields in my index?

My question is referring to pure TXT like .txt file and microsoft word, not 
html.  I am completely fine with html.
Thanks.





From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, September 29, 2010 2:59:26 PM
Subject: Re: How to Index Pure Text into Seperate Fields?

Can you provide a few more details? You mention xpath, which leads me
to believe that you are using DIH, is that true? How are you getting
your documents to index? Parts of a filesystem?

Because it's possible to do many things. If you're using DIH against a
filesystem,
you could use two fileDataSources, one that works only on files with
a particular extension (xml, say) and another that processes .txt files.

But that said, if you're trying to index just the text of a Word document,
you
have to parse it quite differently than a plain text file, take a look at
Tika.

Al of which may not help you at all, because I'm guessing...

So I think a more complete problem statement would help us help you.

Best
Erick

On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett 
savannah_becket...@yahoo.com wrote:

 Hi,
  I am using xpath to index different parts of the html pages into
 different
 fields.  Now, I have some pure text documents that has no html.  So I can't
 use
 xpath.  How do I index these pure text into different fields of the index?
 How
 do I make nutch/solr understand these different parts belong to different
 fields?  Maybe I can use existing content in the fields in my index?
 Thanks.






  

Memory usage

2010-09-29 Thread Jeff Moss
My server has 128GB of ram, the index is 22GB large. It seems the memory
consumption goes up on every query and the garbage collector will never free
up as much memory as I expect it to. The memory consumption looks like a
curve, it eventually levels off but the old gen is always 60 or 70GB. I have
tried adjusting the cache settings but it doesn't seem to make any
difference.

Is there something I'm doing wrong or is this expected behavior?

Here is a screenshot of what I see in jconsole after running for a few
minutes:
http://i51.tinypic.com/2qntca1.png

Here is a 24 hour period of the same data taken from a custom jmx monitor:
http://i51.tinypic.com/2vcu9u8.png

The server performs pretty much as good at the beginning of this cycle as it
does at the end so all of this memory accumulation seems to not be doing
anything useful.

I am running the 1.4 war but I was having this problem with 1.3 also. Tomcat
6.0.18, Java 1.6.0. I haven't gone as far as doing any memory profiling or
java debugging because I'm inexperienced, but that will be the next thing I
try. Any help would be appreciated.

Thanks,

-Jeff


DataImportHandler dynamic fields clarification

2010-09-29 Thread harrysmith

Looking for some clarification on DIH to make sure I am interpreting this
correctly.

I have a wide DB table, 100 columns. I'd rather not have to add 100 values
in schema.xml and data-config.xml. I was under the impression that if the
column name matched a dynamic Field name, it would be added. I am not
finding this is the case, but only works when the column name is explicitly
listed as a static field.

Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100'

If I add something like:
field name=column_60  type=string  indexed=true  stored=true/
to schema.xml, and don't reference the column in data-config entity/field
tag, it gets imported, as expected.

However, if I use:
dynamicField name=column_*  type=string  indexed=true 
stored=true/
It does not get imported into Solr, I would expect it would.


Is this the expected behavior?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr with example Jetty and score problem

2010-09-29 Thread Floyd Wu
Does anybody can help on this ?
Many thanks

2010/9/29 Floyd Wu floyd...@gmail.com

 Hi there

 I have a problem, the situation is when I issue a query to single instance,
 Solr response XML like following
 as you can see, the score is normal(float name=score value=...)
 ===
  response
 lst name=responseHeader
 int name=status0/int
 int name=QTime23/int
 lst name=params
 str name=fl_l_title,score/str
 str name=start0/str
 str name=q_l_unique_key:12/str
 str name=hl.fl*/str
 str name=hltrue/str
 str name=rows999/str
 /lst
 /lst
 result name=response numFound=1 start=0 maxScore=1.9808292
 doc
 float name=score1.9808292/float
 str name=_l_titleGTest/str
 /doc
 /result
 lst name=highlighting
 lst name=12
 arr name=_l_unique_key
 strem12/em/str
 /arr
 /lst
 /lst
 /response

 ===

 But when I issue the query with shard(two instances), the response XML will
 be like following.
 as you can see, that score has bee tranfer to a element arr of doc
 ===
  response
 lst name=responseHeader
 int name=status0/int
 int name=QTime64/int
 lst name=params
 str name=shardslocalhost:8983/solr/core0,172.16.6.35:8983/solr/str
 str name=fl_l_title,score/str
 str name=start0/str
 str name=q_l_unique_key:12/str
 str name=hl.fl*/str
 str name=hltrue/str
 str name=rows999/str
 /lst
 /lst
 result name=response numFound=1 start=0 maxScore=1.9808292
 doc
 str name=_l_titleGtest/str
 arr name=score
 float name=score1.9808292/float
 /arr
 /doc
 /result
 lst name=highlighting
 lst name=12
 arr name=_l_unique_key
 strem12/em/str
 /arr
 /lst
 /lst
 /response

 ===
 My Schema.xml like following
 
field name=_l_unique_key type=string indexed=true stored=true
 required=true omitNorms=true/
field name=_l_read_permission type=string indexed=true
 stored=true omitNorms=true multiValued=true/
field name=_l_title type=text indexed=true stored=true
 omitNorms=false termVectors=true termPositions=true
 termOffsets=true/
field name=_l_summary type=text indexed=true stored=true
 omitNorms=false termVectors=true termPositions=true
 termOffsets=true/
field name=_l_body type=text indexed=true stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true omitNorms=false/

dynamicField name=* type=text indexed=true stored=true
 multiValued=true termVectors=true
 termPositions=true
 termOffsets=true omitNorms=false/
  /fields
  uniqueKey_l_unique_key/uniqueKey
  defaultSearchField_l_body/defaultSearchField
 
 I don't really know what happended. Is my schema problem or is the behavior
 of Solr?
 please help on this.



Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Lance Norskog
Simple text .txt files and MS office .doc files are very very different beasts.
You can do simple .txt files with some more lines in your
DataImportHandler script.
With DOC files it is easiest to use the extracting request handler
*/extract. This is on the wiki.
If you want to do this inside the DataImporthandler, you need to use
3.x or the trunk. And it has bugs.

On Wed, Sep 29, 2010 at 3:55 PM, Savannah Beckett
savannah_becket...@yahoo.com wrote:
 No, I am using xpath for html, this is not the question.  I am indexing pure
 text in addition to html that I was indexing.  Pure text like TXT file or
 Microsoft Word doc.  So, no xpath for TXT, how do I index TXT file into
 different fields in my index like the way I use xpath to index html into
 differernt fields in my index?

 My question is referring to pure TXT like .txt file and microsoft word, not
 html.  I am completely fine with html.
 Thanks.




 
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, September 29, 2010 2:59:26 PM
 Subject: Re: How to Index Pure Text into Seperate Fields?

 Can you provide a few more details? You mention xpath, which leads me
 to believe that you are using DIH, is that true? How are you getting
 your documents to index? Parts of a filesystem?

 Because it's possible to do many things. If you're using DIH against a
 filesystem,
 you could use two fileDataSources, one that works only on files with
 a particular extension (xml, say) and another that processes .txt files.

 But that said, if you're trying to index just the text of a Word document,
 you
 have to parse it quite differently than a plain text file, take a look at
 Tika.

 Al of which may not help you at all, because I'm guessing...

 So I think a more complete problem statement would help us help you.

 Best
 Erick

 On Wed, Sep 29, 2010 at 3:56 PM, Savannah Beckett 
 savannah_becket...@yahoo.com wrote:

 Hi,
  I am using xpath to index different parts of the html pages into
 different
 fields.  Now, I have some pure text documents that has no html.  So I can't
 use
 xpath.  How do I index these pure text into different fields of the index?
 How
 do I make nutch/solr understand these different parts belong to different
 fields?  Maybe I can use existing content in the fields in my index?
 Thanks.










-- 
Lance Norskog
goks...@gmail.com


Re: Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Lance Norskog
This would be a Java VM option, not something Solr or other apps can know about.
Using this or procset seems like a great way to handle it.

On Wed, Sep 29, 2010 at 8:46 AM, Glen Newton glen.new...@gmail.com wrote:
 In a recent blog entry (The MySQL “swap insanity” problem and the
 effects of the NUMA architecture
 http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/),
 Jeremy Cole describes a particular but common problem with large memory
 installations of MySql on multi-core multi-cpu 64bit NUMA machines,
 where debilitating swapping of large amounts of memory occurs even
 when there is no (direct) indication of a need to swap.

 Without getting into the details (it involves how Linux assigns memory
 to the different nodes (each multi-core CPU is viewed as a
 'node' in the Linux NUMA view)), the offered partial solution is to
 start MySql using the
 numactl[1] program, like:
  numactl --interleave all mysql

 I was wondering if any of the SOLR people have used this when starting
 up Apache
 (or whatever servlet engine you use for your SOLR) to reduce unnecessary swap.

 You probably want to be monitoring the NUMA memory hit statistics
 found here, with and without the numactl, while testing this:
  /sys/devices/system/node/node*/numastat

 --

 Note that numactl has a number of other interesting and useful
 features. One that I have used is the --cpubind  which restricts the
 number of CPUs that an application can run on. There are times when
 this can improve performance, such as when you have 2 demanding
 applications running: by assigning one to half of the CPUs and the
 other to the other half of
 the CPUs, you _can_ have improved performance due to better locality, cache
 hits, etc. It takes some tuning and experimentation. YMWV

 -Glen
 http://zzzoot.blogspot.com/

 [1]http://linuxmanpages.com/man8/numactl.8.php



 --

 -




-- 
Lance Norskog
goks...@gmail.com


Re: Memory usage

2010-09-29 Thread Lance Norskog
How many documents are there? How many unique words are in a text
field? Both of these numbers can have a non-linear effect on the
amount of space used.

But, usually a 22Gb index (on disk) might need 6-12G of ram total.
There is something odd going on here.

Lance

On Wed, Sep 29, 2010 at 4:34 PM, Jeff Moss jm...@heavyobjects.com wrote:
 My server has 128GB of ram, the index is 22GB large. It seems the memory
 consumption goes up on every query and the garbage collector will never free
 up as much memory as I expect it to. The memory consumption looks like a
 curve, it eventually levels off but the old gen is always 60 or 70GB. I have
 tried adjusting the cache settings but it doesn't seem to make any
 difference.

 Is there something I'm doing wrong or is this expected behavior?

 Here is a screenshot of what I see in jconsole after running for a few
 minutes:
 http://i51.tinypic.com/2qntca1.png

 Here is a 24 hour period of the same data taken from a custom jmx monitor:
 http://i51.tinypic.com/2vcu9u8.png

 The server performs pretty much as good at the beginning of this cycle as it
 does at the end so all of this memory accumulation seems to not be doing
 anything useful.

 I am running the 1.4 war but I was having this problem with 1.3 also. Tomcat
 6.0.18, Java 1.6.0. I haven't gone as far as doing any memory profiling or
 java debugging because I'm inexperienced, but that will be the next thing I
 try. Any help would be appreciated.

 Thanks,

 -Jeff




-- 
Lance Norskog
goks...@gmail.com


Re: Is Solr right for my business situation ?

2010-09-29 Thread Lance Norskog
Some of these are big questions- try them in different emails.

On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra
sraghven...@corelogic.com wrote:
 Some questions.

 1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
 ok, but whats the direction for handling multiple table structures is 
 something I am not sure about. Would it be like a big huge xml, wherein those 
 three tables (assuming its three) would show up as three different tag-trees, 
 nullable.

 My source provides me a single flat file per table (tab delimited).

 Do you think having multiple indexes could be a solution for this case ?? or 
 do I really need to spend effort in denormalizing the data ?

 2. Further, loading into solr can use some perf tuning.. any tips ? best 
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it 
 default, i.e. whenever a response is returned, that xslt is applied to the 
 response automatically...

 4. And last question for the day - :) there was one post saying that the 
 spatial support is really basic in solr and is going to be improved in next 
 versions... Can you ppl help me get a definitive yes or no on spatial 
 support... in the current form, does it work on not ? I would store lat and 
 long, and would need to make them searchable...

 --raghav..

 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?

 Thanks for the responses people.

 @Grant

 1. can you show me some direction on that.. loading data from an incoming 
 stream.. do I need some third party tools, or need to build something 
 myself...

 4. I am basically attempting to build a very fast search interface for the 
 existing data. The volume I mentioned is more like static one (data is 
 already there). The sql statements I mentioned are daily updates coming. The 
 good thing is that the history is not there, so the overall volume is not 
 growing, but I need to apply the update statements.

 One workaround I had in mind is, (though not so great performance) is to 
 apply the updates to a copy of rdbms, and then feed the rdbms extract to 
 solr.  Sounds like overkill, but I don't have another idea right now. Perhaps 
 business discussions would yield something.

 @All -

 Some more questions guys.

 1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
 ok, but whats the direction for handling multiple table structures is 
 something I am not sure about. Would it be like a big huge xml, wherein those 
 three tables (assuming its three) would show up as three different tag-trees, 
 nullable.

 My source provides me a single flat file per table (tab delimited).

 2. Further, loading into solr can use some perf tuning.. any tips ? best 
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it 
 default, i.e. whenever a response is returned, that xslt is applied to the 
 response automatically...

 4. And last question for the day - :) there was one post saying that the 
 spatial support is really basic in solr and is going to be improved in next 
 versions... Can you ppl help me get a definitive yes or no on spatial 
 support... in the current form, does it work on not ? I would store lat and 
 long, and would need to make them searchable...

 Looks like I m close to my solution.. :)

 --raghav

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, September 28, 2010 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is Solr right for my business situation ?

 Inline.

 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?

 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

 It will be in 3.x, the next release


 The existing spatial search has some serious problems and is deprecated.

 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying 
 a year from now, that might change.

 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

 There is now group-by capabilities in trunk as well, which may or may not 
 help.


 wunder

 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:

 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.

 The need is like this -

 We have a huge rdbms 

Re: Why the query performance is so different for queries?

2010-09-29 Thread Lance Norskog
How much ram does the JVM have?

Wildcard queries are slow. Starting with '*' are even slower. If you
want all values try field:[* TO *]. This is a range query and lets
you pick a range of values- this picks everything.

The *:* is not a wildcard. It is a magic syntax for all documents
and does not cause a search.

2010/9/28 newsam new...@zju.edu.cn:
 Hi guys,

 I have posted a thread The search response time is too long.


 The SOLR searcher instance is deployed with Tomcat 5.5.21.
 .
 The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) 
 Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM.

 In SOLR back-end, query=key:* costs almost 60s while query=*:* only needs 
 500ms. Another case is query=product_name_title:*, which costs 7s. I am 
 confused about the query performance. Do you have any suggestions?

 btw, the cache setting is as follows:

 filterCache: 256, 256, 0
 queryResultCache: 1024, 512, 128
 documentCache: 16384, 4096, n/a

 Thanks.






-- 
Lance Norskog
goks...@gmail.com


Re: Why the query performance is so different for queries?

2010-09-29 Thread newsam
Thanks for your reply.

Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may 
not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to 
-Xms521m -Xmx1400m. Is my understanding right? 

Thanks.

From: Lance Norskog goks...@gmail.com
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org, newsam new...@zju.edu.cn
Subject: Re: Why the query performance is so different for queries?
Date: Wed, 29 Sep 2010 20:13:20 -0700

How much ram does the JVM have?

Wildcard queries are slow. Starting with '*' are even slower. If you
want all values try field:[* TO *]. This is a range query and lets
you pick a range of values- this picks everything.

The *:* is not a wildcard. It is a magic syntax for all documents
and does not cause a search.

2010/9/28 newsam
:
 Hi guys,

 I have posted a thread The search response time is too long.


 The SOLR searcher instance is deployed with Tomcat 5.5.21.
 .
 The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) 
 Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM.

 In SOLR back-end, query=key:* costs almost 60s while query=*:* only 
 needs 500ms. Another case is query=product_name_title:*, which costs 7s. I 
 am confused about the query performance. Do you have any suggestions?

 btw, the cache setting is as follows:

 filterCache: 256, 256, 0
 queryResultCache: 1024, 512, 128
 documentCache: 16384, 4096, n/a

 Thanks.






-- 
Lance Norskog
goks...@gmail.com
 

Re: Why the query performance is so different for queries?

2010-09-29 Thread Walter Underwood
Stop running 32-bit operating systems. You'll never get good performance with a 
toy like that. --wunder

On Sep 29, 2010, at 8:18 PM, newsam wrote:

 Thanks for your reply.
 
 Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (2G) may 
 not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to 
 -Xms521m -Xmx1400m. Is my understanding right? 
 
 Thanks.
 
 From: Lance Norskog goks...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 To: solr-user@lucene.apache.org, newsam new...@zju.edu.cn
 Subject: Re: Why the query performance is so different for queries?
 Date: Wed, 29 Sep 2010 20:13:20 -0700
 
 How much ram does the JVM have?
 
 Wildcard queries are slow. Starting with '*' are even slower. If you
 want all values try field:[* TO *]. This is a range query and lets
 you pick a range of values- this picks everything.
 
 The *:* is not a wildcard. It is a magic syntax for all documents
 and does not cause a search.
 
 2010/9/28 newsam
 :
 Hi guys,
 
 I have posted a thread The search response time is too long.
 
 
 The SOLR searcher instance is deployed with Tomcat 5.5.21.
 .
 The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) 
 Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM.
 
 In SOLR back-end, query=key:* costs almost 60s while query=*:* only 
 needs 500ms. Another case is query=product_name_title:*, which costs 7s. 
 I am confused about the query performance. Do you have any suggestions?
 
 btw, the cache setting is as follows:
 
 filterCache: 256, 256, 0
 queryResultCache: 1024, 512, 128
 documentCache: 16384, 4096, n/a
 
 Thanks.
 
 
 
 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 

--
Walter Underwood
Venture ASM, Troop 14, Palo Alto





Where is the lock file?

2010-09-29 Thread Steve Cohen
Hello,

We were testing nutch configurations and apparently we got heavy handed with
our approach to stopping things.

Now when nutch starts indexing solr, we are seeing these messages:

org.apache.solr.common.SolrException: Lock obtain timed out:
SingleInstanceLock: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
SingleInstanceLock: write
.lock   at org.apache.lucene.store.Lock.obtain(Lock.java:85)at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140)  at
org.apache.lucene.index.IndexWriter.init(IndexWrite
r.java:938) at
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
at org.a
pache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221)
at org.apache.so
lr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)

I've looked through the configuration file. I can see where it defines the
lock type and I can see the unlock configuration. But I don't see where it
specifies the lock file. Where is it? What is its name?

Also, to speed up nutch, we changed the configuration to start several map
tasks at once. Is nutch trying to kick off several solr sessions at once and
is that causing messages like the above? Should we just change the lock to
simple?

Thanks,
Steve Cohen