date:20090416

On Thu, Apr 16, 2009 at 10:31 AM, Mani Kumar manikumarchau...@gmail.comwrote:

 Aah, Bryan you got it ... Thanks!
 Noble: so i can hope that it'll be fixed soon :) thank you for fixing it
 ...
 please lemme know when its done..


This is fixed in trunk. The next nightly build should have this fix.

-- 
Regards,
Shalin Shekhar Mangar.

truncating indexed docs

2009-04-16 Thread CIF Search

Is it possible to truncate large documents once they are indexed? (Can this
be done without re-indexing)

Regards,
CI

Re: Using CSV for indexing ... Remote Streaming disabled

2009-04-16 Thread vivek sar

Any help on this? Could this error be because of something else (not
remote streaming issue)?

Thanks.

On Wed, Apr 15, 2009 at 10:04 AM, vivek sar vivex...@gmail.com wrote:
 Hi,

  I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki
 (http://wiki.apache.org/solr/UpdateCSV). I've updated the
 solrconfig.xml to have this lines,

    requestDispatcher handleSelect=true 
        requestParsers enableRemoteStreaming=true
 multipartUploadLimitInKB=20480 /
        ...
    /requestDispatcher

   requestHandler name=/update/csv class=solr.CSVRequestHandler
 startup=lazy /

 When I try to upload the csv,

  curl 
 'http://localhost:8080/solr/20090414_1/update/csv?commit=trueseparator=%09escape=%5cstream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv'

 I get following response,

    /headbodyh1HTTP Status 400 - Remote Streaming is
 disabled./h1HR size=1 noshade=noshadepbtype/b Status
 report/ppbmessage/b uRemote Streaming is
 disabled./u/ppbdescription/b uThe request sent by the
 client was syntactically incorrect (Remote Streaming is
 disabled.)./u/pHR size=1 noshade=noshadeh3Apache
 Tomcat/6.0.18/h3/body/html

 Why is it complaining about the remote streaming if it's already
 enabled? Is there anything I'm missing?

 Thanks,
 -vivek

Invalid_Date_String on posting XML to the index

2009-04-16 Thread Mark Allan


Hi all,

I'm encountering a problem when I try to add records with a date field  
to the index.


The records I'm adding have very little date precision, usually  
MMDD but some only have year and month, others only have a year.   
I'm trying to get around this by using a text pattern factory to  
modify the field before indexing.  This seems to work fine if the  
class is solr.TextField and a date will be converted from eg 1953 to  
1953-01-01T00:00:00.000Z and then inserted into the index.


However, if I want to have the field as an actual date field (for  
doing range searches etc) I get the following error when I post the  
XML file.


SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953

The corresponding stack trace from the solr server is:

Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953'
at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
at org.apache.solr.schema.FieldType.createField(FieldType.java:179)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
	at  
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 
243)
	at  
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:123)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)
	at  
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at  
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)
	at  
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at  
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)
	at  
org 
.mortbay 
.jetty 
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 
211)
	at  
org 
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 
114)
	at  
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)
	at org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)


My schema.xml file looks something like this:

...
   fieldType name=dateFormatter class=solr.DateField  
sortMissingLast=true omitNorms=true

analyzer
filter class=solr.TrimFilterFactory /
tokenizer class=solr.KeywordTokenizerFactory/
			filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4}) 
$ replacement=$1.01.01 replace=all /
			filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4})\. 
(\d{2})$ replacement=$1.$2.01 replace=all /
			filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4})\. 
(\d{2})\.(\d{2})$ replacement=$1-$2-$3T00:00:00.000Z replace=all /

/analyzer
   /fieldType
...
field name=DateRecorded type=dateFormatter indexed=true  
stored=true multiValued=false/

...


My thinking is that Solr is trying to add the field directly as '1953'  
before doing the text factory stuff and is therefore not in the right  
format for indexing.  Does that sound like a reasonable assumption and  
am I missing something which is causing it to go wrong?  Can anyone  
help please?


I was originally storing the date in YYMMDD format as a text field and  
searching with wildcards, but that strikes me as somewhat  
inefficient.  I could go back to doing that if necessary, but I'd  
rather do it the right way if I can.


Many thanks for your help.

Mark
PS. Apologies if this message comes through twice - I sent it  
yesterday afternoon but it hasn't turned up on the mailing list yet,  
so I'm trying again.


--
The University of Edinburgh is a charitable body, registered in
Scotland,

Re: Invalid_Date_String on posting XML to the index

On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan mark.al...@ed.ac.uk wrote:


 My thinking is that Solr is trying to add the field directly as '1953'
 before doing the text factory stuff and is therefore not in the right format
 for indexing.  Does that sound like a reasonable assumption and am I missing
 something which is causing it to go wrong?  Can anyone help please?


That is correct. You'll need to do the date creation in your own code so
that you send a well-formed date to Solr.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Mark Allan



On 16 Apr 2009, at 9:00 am, Shalin Shekhar Mangar wrote:

On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan mark.al...@ed.ac.uk  
wrote:


My thinking is that Solr is trying to add the field directly as  
'1953'
before doing the text factory stuff and is therefore not in the  
right format
for indexing.  Does that sound like a reasonable assumption and am  
I missing

something which is causing it to go wrong?  Can anyone help please?


That is correct. You'll need to do the date creation in your own  
code so

that you send a well-formed date to Solr.



Hi, thanks for your prompt reply.  I'm a bit confused though - the  
only way to do this is a two-step process?


I have to write code to munge the XML into another document which is  
exactly the same except for the format of the Date field, and then  
import that second file?  Isn't that the whole purpose of having an  
analyzer with the solr.PatternReplaceFilterFactory filters?  What's  
odd is that the pattern replacement works if I store the field as text  
but not as a date.  Are you sure this isn't a bug?


Mark

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: Invalid_Date_String on posting XML to the index

On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan mark.al...@ed.ac.uk wrote:


 Hi, thanks for your prompt reply.  I'm a bit confused though - the only way
 to do this is a two-step process?

 I have to write code to munge the XML into another document which is
 exactly the same except for the format of the Date field, and then import
 that second file?  Isn't that the whole purpose of having an analyzer with
 the solr.PatternReplaceFilterFactory filters?  What's odd is that the
 pattern replacement works if I store the field as text but not as a date.
  Are you sure this isn't a bug?


Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.

There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.

-- 
Regards,
Shalin Shekhar Mangar.

OutofMemory on Highlightling

2009-04-16 Thread Gargate, Siddharth

Hi,

I am analyzing the memory usage for my Solr setup. I am
testing with 500 text documents of 2 MB each.

I have defined a field for displaying the teasers and storing 1 MB of
text in it.  I am testing with just 128 MB maxHeap(I know I should be
increasing it but just testing the worst case scenario).

If I search for all 500 documents with row size as 500 and highlighting
disabled, it works fine. But if I enable highlighting I get
OutofMemoryError. 

Looks like stored field for all the matched results are read into the
memory. How to avoid this memory consumption?

 

Thanks,

Siddharth

Re: Invalid_Date_String on posting XML to the index

On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan mark.al...@ed.ac.uk wrote:


 Hi, thanks for your prompt reply.  I'm a bit confused though - the only way
 to do this is a two-step process?

 I have to write code to munge the XML into another document which is
 exactly the same except for the format of the Date field, and then import
 that second file?  Isn't that the whole purpose of having an analyzer with
 the solr.PatternReplaceFilterFactory filters?  What's odd is that the
 pattern replacement works if I store the field as text but not as a date.
  Are you sure this isn't a bug?


Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.

There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.

-- 
Regards,
Shalin Shekhar Mangar.

Re: using multisearcher

2009-04-16 Thread Brent Palmer

Thanks Hoss.  I haven't had time to try it yet, but that is exactly the 
kind of help I was looking for. 


Brent

Chris Hostetter wrote:

: As for the second part, I was thinking of trying to replace the standard
: SolrIndexSearcher with one that employs a MultiSearcher.  But I'm not very
: familiar with the workings of Solr, especially with respect to the caching
: that goes on.  I thought that maybe people who are more familiar with it might
: have some tips on how to go about it.  Or perhaps there are reasons that make
: this a bad idea. 

If your indexes are all local, then using a MultiReader would be simpler 
trying to shoehorn MultiSearcher type logic into SolrIndexSearcher.


https://issues.apache.org/jira/browse/SOLR-243


-Hoss

  


--
Brent Palmer
Widernet.org
University of Iowa
319-335-2200

Stored Document encoding

2009-04-16 Thread AlexxelA


I'm using the DataImportHandler and my database is in latin1.  When i
retreive documents that i have indexed in solr they seem to have been
converted in utf-8. Is it normal ?  Is it possible to store in latin1 in
solr ?
-- 
View this message in context: 
http://www.nabble.com/Stored-Document-encoding-tp23078724p23078724.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImport, remove doc when marked as deleted

2009-04-16 Thread Ruben Chadien


Hi

I am new to Solr, but have been using Lucene for a while. I am trying  
to rewrite
some old lucene indexing code using the Jdbc DataImport i Solr, my  
problem:


I have Entities that can be marked in the db as deleted, these i  
don't want to index
and thats no problem when doing a full-import. When doing a delta- 
import my deltaQuery will catch
Entities that has been marked as deleted since last index, but how do  
i get it to delete those from the index ?
I tried making the deltaImportQuery so that in don't return the Entity  
if its deleted, that didnt help...


Any ideas ?

Thanks
Ruben

Re: solr 1.3 + tomcat 5.5

2009-04-16 Thread andrysha nihuhoid

No there is no such file there.
How can i configure more detailed error reporting for this message?

2009/4/15 Shalin Shekhar Mangar shalinman...@gmail.com:
 From the log it seems like there is a solr.xml inside
 var/lib/tomcat5/webapps/ which tomcat is trying deploy and failing. Very
 strange. You should remove that file and see if that fixes it.


 On Tue, Apr 14, 2009 at 11:35 PM, andrysha nihuhoid nihuh...@gmail.comwrote:

 Hi, got problem setting up solr + tomcat
 Tomcat5.5 + apache solr 1.3.0 + centos 5.3
 I don't familiar with java at all, so sorry if it's dumb question.
 Here is what i did:
 placed solr.war in webapps folder
 changed solr home to /etc/solr
 copied contents of solr distribution example folder to /etc/solr

 tomcat starting successfully and i even can access admin interface but
 following error appears in catalina.out every 10 seconds:
 SEVERE: Error deploying configuration descriptor
 var#lib#tomcat5#webapps#solr.xml
 Apr 14, 2009 1:30:14 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor etc#solr#.xml
 Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor
 var#lib#tomcat5#webapps#solr.xml
 Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor etc#solr#.xml
 Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor
 var#lib#tomcat5#webapps#solr.xml
 Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor etc#solr#.xml
 Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor
 var#lib#tomcat5#webapps#solr.xml
 Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor etc#solr#.xml
 Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor
 var#lib#tomcat5#webapps#solr.xml
 Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor etc#solr#.xml


 Googled about 3 hours.

 tried to set allow write permissions for all to /etc, /etc/solr /var/
 lib/tomcat5/webapps
 tried to create empty file named solr.xml in /etc, /etc/solr
 tried to copy solrconfig.xml to /etc/, /etc/solr




 --
 Regards,
 Shalin Shekhar Mangar.

Re: OutofMemory on Highlightling


Hi,

Have you tried: 
http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2ba3ff0cfc05e85b94898c59cf

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Gargate, Siddharth sgarg...@ptc.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 6:33:46 AM
 Subject: OutofMemory on Highlightling
 
 Hi,
 
 I am analyzing the memory usage for my Solr setup. I am
 testing with 500 text documents of 2 MB each.
 
 I have defined a field for displaying the teasers and storing 1 MB of
 text in it.  I am testing with just 128 MB maxHeap(I know I should be
 increasing it but just testing the worst case scenario).
 
 If I search for all 500 documents with row size as 500 and highlighting
 disabled, it works fine. But if I enable highlighting I get
 OutofMemoryError. 
 
 Looks like stored field for all the matched results are read into the
 memory. How to avoid this memory consumption?
 
 
 
 Thanks,
 
 Siddharth

Re: Using CSV for indexing ... Remote Streaming disabled

Hi,

Are you absolutely sure you are changing the correct config file?
What is the 20090414_1 part in your URL? The name of the core? Be sure to
change ITS config (you can get to it from Solr Admin page) and to restart Solr.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message
From: vivek sar vivex...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, April 16, 2009 3:34:42 AM
Subject: Re: Using CSV for indexing ... Remote Streaming disabled

Any help on this? Could this error be because of something else (not
remote streaming issue)?

Thanks.

On Wed, Apr 15, 2009 at 10:04 AM, vivek sar wrote:
Hi,

I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki
(http://wiki.apache.org/solr/UpdateCSV). I've updated the
solrconfig.xml to have this lines,

multipartUploadLimitInKB=20480 /
...

startup=lazy /

When I try to upload the csv,

curl
'http://localhost:8080/solr/20090414_1/update/csv?commit=trueseparator=%09escape=%5cstream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv'

I get following response,

HTTP Status 400 - Remote Streaming is
disabled.

type Status
report
message Remote Streaming is
disabled.
description The request sent by the
client was syntactically incorrect (Remote Streaming is
disabled.).

Apache
Tomcat/6.0.18

Why is it complaining about the remote streaming if it's already
enabled? Is there anything I'm missing?

Thanks,
-vivek

Re: truncating indexed docs


Hi,


No, you typically truncate (i.e. index first N terms) them while indexing using 
maxFieldLength setting in solrconfig.xml.  You can, however, limit how many 
characters (or bytes?) to copy when using copyField functionality.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: CIF Search cifsea...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 2:44:22 AM
 Subject: truncating indexed docs
 
 Is it possible to truncate large documents once they are indexed? (Can this
 be done without re-indexing)
 
 Regards,
 CI

Re: Question on StreamingUpdateSolrServer


Hi,

Lots of little things to look at here.
You should do lsof as root, and it looks like you aren't doing that.
You should double-check Tomcat's maxThreads param in server.xml.
You should give Jetty a try.
I don't think you said anything about looking at the container's or solr logs 
and finding errors.


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, April 15, 2009 7:28:57 PM
 Subject: Re: Question on StreamingUpdateSolrServer
 
 Thanks Otis.
 
 I did increase the number of file descriptors to 22K, but I still get
 this problem. I've noticed following so far,
 
 1) As soon as I get to around 1140 index segments (this is total over
 multiple cores) I start seeing this problem.
 2) When the problem starts occassionally the index request
 (solrserver.commit) also fails with the following error,
   java.net.SocketException: Connection reset
 3) Whenever the commit fails, I'm able to access Solr by the browser
 (http://ets11.co.com/solr). If the commit is succssfull and going on I
 get blank page on Firefox. Even the telnet to 8080 fails with
 Connection closed by foreign host.
 
 It does seem like there is some resource issue as it happens only once
 we reach a breaking point (too many index segment files) - lsof at
 this point usually shows at 1400, but my ulimit is much higher than
 that.
 
 I already use compound format for index files. I can also run optimize
 occassionally (though not preferred as it blocks the whole index cycle
 for a long time). I do want to find out what resource limitation is
 causing this and it has to do something with when Indexer is
 committing the records where there are large number of segment files.
 
 Any other ideas?
 
 Thanks,
 -vivek
 
 On Wed, Apr 15, 2009 at 3:10 PM, Otis Gospodnetic
 wrote:
 
  One more thing.  I don't think this was mentioned, but you can:
  - optimize your indices
  - use compound index format
 
  That will lower the number of open file handles.
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar 
  To: solr-user@lucene.apache.org
  Sent: Friday, April 10, 2009 5:59:37 PM
  Subject: Re: Question on StreamingUpdateSolrServer
 
  I also noticed that the Solr app has over 6000 file handles open -
 
  lsof | grep solr | wc -l   - shows 6455
 
  I've 10 cores (using multi-core) managed by the same Solr instance. As
  soon as start up the Tomcat the open file count goes up to 6400.  Few
  questions,
 
  1) Why is Solr holding on to all the segments from all the cores - is
  it because of auto-warmer?
  2) How can I reduce the open file count?
  3) Is there a way to stop the auto-warmer?
  4) Could this be related to Tomcat returning blank page for every 
  request?
 
  Any ideas?
 
  Thanks,
  -vivek
 
  On Fri, Apr 10, 2009 at 1:48 PM, vivek sar wrote:
   Hi,
  
I was using CommonsHttpSolrServer for indexing, but having two
   threads writing (10K batches) at the same time was throwing,
  
ProtocolException: Unbuffered entity enclosing request can not be 
 repeated.
  
  
   I switched to StreamingUpdateSolrServer (using addBeans) and I don't
   see the problem anymore. The speed is very fast - getting around
   25k/sec (single thread), but I'm facing another problem. When the
   indexer using StreamingUpdateSolrServer is running I'm not able to
   send any url request from browser to Solr web app. I just get blank
   page. I can't even get to the admin interface. I'm also not able to
   shutdown the Tomcat running the Solr webapp when the Indexer is
   running. I've to first stop the Indexer app and then stop the Tomcat.
   I don't have this problem when using CommonsHttpSolrServer.
  
   Here is how I'm creating it,
  
   server = new StreamingUpdateSolrServer(url, 1000,3);
  
   I simply call server.addBeans(...) on it. Is there anything else I
   need to do to make use of StreamingUpdateSolrServer? Why does Tomcat
   become unresponsive  when Indexer using StreamingUpdateSolrServer is
   running (though, indexing happens fine)?
  
   Thanks,
   -vivek

Re: httpclient.ProtocolException using Solrj


I don't think you gain anything on the Solr end of things by using multiple 
threads if you are already using StreamingUpdateSolrServer.
 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 9, 2009 5:47:11 PM
 Subject: Re: httpclient.ProtocolException using Solrj
 
 Here is what I'm doing,
 
 SolrServer server = new StreamingUpdateSolrServer(url, 1000,5);
 
 server.addBeans(dataList);  //where dataList is Listwith 10K elements
 
 I run two threads each using the same server object and then each call
 server.addBeans(...).
 
 I'm able to get 50K/sec inserted using that, but the commit after that
 (after 100k records) takes 70sec - which messes up the avg time.
 
 There are two problems here,
 
 1) Once in a while I get connection reset error,
 
 Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:168)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at 
 org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
 at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
 
 Note: if I use CommonsHttpSolrServer I get the buffer error.
 
 2) The commit takes way too long for every 100k  (I may commit more
 often if this can not be improved)
 
 I'm trying to fix this error problem which happens only if I run two
 threads both calling addBeans (10k at a time). One thread work fine.
 I'm not sure how can I use the MultiThreadedConnectionManager to
 create StreamingUpdateSolrServer and if they would help?
 
 Thanks,
 -vivek
 
 2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
  using a single request is the fatest
 
  
 http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65
 
  I could index at the rate of 10,000 docs/sec using this and 
 BinaryRequestWriter
 
  On Thu, Apr 9, 2009 at 10:36 PM, vivek sar wrote:
  I'm inserting 10K in a batch (using addBeans method). I read somewhere
  in the wiki that it's better to use the same instance of SolrServer
  for better performance. Would MultiThreadedConnectionManager help? How
  do I use it?
 
  I also wanted to know how can use EmbeddedSolrServer - does my app
  needs to be running in the same jvm with Solr webapp?
 
  Thanks,
  -vivek
 
  2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
  how many documents are you inserting ?
  may be you can create multiple instances of CommonshttpSolrServer and
  upload in parallel
 
 
  On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote:
  Thanks Shalin and Paul.
 
  I'm not using MultipartRequest. I do share the same SolrServer between
  two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
  simply using CommonsHttpSolrServer to create the SolrServer. I've also
  tried StreamingUpdateSolrServer, which works much faster, but does
  throws connection reset exception once in a while.
 
  Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
  anything on it on Wiki.
 
  I was also thinking of using EmbeddedSolrServer - in what case would I
  be able to use it? Does my application and the Solr web app need to
  run into the same JVM for this to work? How would I use the
  EmbeddedSolrServer?
 
  Thanks,
  -vivek
 
 
  On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
  wrote:
  Vivek, do you share the same SolrServer instance between your two 
  threads?
  If so, are you using the MultiThreadedHttpConnectionManager when 
  creating
  the HttpClient instance?
 
  On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote:
 
  single thread everything works fine. Two threads are fine too for a
  while and all the sudden problem starts happening.
 
  I tried indexing using REST services as well (instead of Solrj), but
  with that too I get following error after a while,
 
  2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
  indexData()- Failed to index
  java.net.SocketException: Broken pipe
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at
  java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at 
  java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at 
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
 at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
 at
  
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
 at
  
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
  at
  
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
 at
  
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
 at

Re: Field Collapsing Patch


I know of a company that used it, but then determined it was this component 
that was slowing down their search.  They might have modified it some, too, I 
don't recall now.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Matthew Runo mr...@zappos.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, April 8, 2009 1:18:08 PM
 Subject: Field Collapsing Patch
 
 Hello folks -
 
 Is anyone using the Field Collapsing patch from SOLR-236 
 (https://issues.apache.org/jira/browse/SOLR-236) in their production 
 environment? We're considering using it, but wanted to ensure it was at a 
 point 
 where it could be used before spending a lot of time on it.
 
 Any thoughts on the patch / issue? Any reasons not to use it?
 
 Thanks for your time!
 
 Matthew Runo
 Software Engineer, Zappos.com
 mr...@zappos.com - 702-943-7833

Re: hardware requirements for solr

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्


Roman,

This depends on multiple factors - amount of data, type of data/analysis, query 
rate and query complexity, etc.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Roman Dissertori r.dissert...@ecom.at
 To: solr-user@lucene.apache.org
 Sent: Wednesday, April 8, 2009 9:39:07 AM
 Subject: hardware requirements for solr
 
 Hello all,
 
 I want to use solr only(for performance reason) on a single server.
 What would be the minimum hardware requirements and what OS would you
 suggest for that?
 
 Thanks and regards,
 Roman Dissertori

Re: Solr Search Error

2009-04-16 Thread vivek sar

Hi,

  I'm using the Solr 1.4 (03/29 nightly build) and when searching on a
large index (40G) I get the same exception as in this thread,

 HTTP Status 500 - 13724 java.lang.ArrayIndexOutOfBoundsException:
13724 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at
org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:262)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250)
at org.apache.lucene.search.Searcher.search(Searcher.java:126) at
org.apache.lucene.search.Searcher.search(Searcher.java:105) at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1072)
at
...

The search url  is,

http://think2.co.com:8080/solr/20090415_1/select/?q=japanversion=2.2start=0rows=10indent=on

It would have millions of records matching this term, but I guess that
shouldn't throw this exception. I saw a similar jira to
ArrayOutOfBoundException,
https://issues.apache.org/jira/browse/SOLR-450  (it's not the same
though).

I also see someone reported this same problem back in 2007 so I'm not
sure whether it's a real bug or some configuration issue,

http://www.nabble.com/ArrayIndexOutOfBoundsException-on-TermScorer-td11750899.html#a11750899

Any ideas?

Thanks,
-vivek



On Fri, Mar 27, 2009 at 10:11 AM, Narayanan, Karthikeyan
karthikeyan.naraya...@gs.com wrote:
 Hi Otis,
              Thanks for the  recommendation. Will try with latest
 nightly build.. I  did couple of full data import and got this error at
 few times while searching..


 Thanks.

 Karthik


 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Friday, March 27, 2009 12:57 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Search Error


 Hi Karthik,

 First thing I'd do is get the latest Solr nightly build.
 If that doesn't fix thing, I'd grab the latest Lucene nightly build and
 use it to replace Lucene jars that are in your version of Solr.
 If that doesn't work I'd email the ML with a bit more info about the
 type of search that causes this (e.g. Do all searches cause this or only
 some?  What do those that trigger this error look like or have in
 common?)

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Narayanan, Karthikeyan karthikeyan.naraya...@gs.com
 To: solr-user@lucene.apache.org
 Sent: Friday, March 27, 2009 11:42:12 AM
 Subject: Solr Search Error

 Hi All,
            I am intermittently getting this Exception when I do the
 search.
 What could be the reason?.

 Caused by: org.apache.solr.common.SolrException: 11938
 java.lang.ArrayIndexOutOfBoundsException: 11938         at
 org.apache.lucene.search.TermScorer.score(TermScorer.java:74)
 at
 org.apache.lucene.search.TermScorer.score(TermScorer.java:61)
 at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137)
 at
 org.apache.lucene.search.Searcher.search(Searcher.java:126)  at
 org.apache.lucene.search.Searcher.search(Searcher.java:105)  at

 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.
 java:966)
   at

 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j
 ava:838)
    at

 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2
 69)  at

 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
 java:160)
   at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
 Handler.java:169)
   at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
 ase.java:131)
       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
 va:303)
     at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
 ava:232)
    at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
 tionFilterChain.java:215)
   at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
 erChain.java:188)
   at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
 e.java:210)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
 e.java:174)
 at

 org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator
 Base.java:433)
      at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
 :127)
     at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
 :117)
     at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
 java:108)
   at

 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
 51)  at

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87
 0)   at

 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc
 essConnection(Http11BaseProtocol.java:665)
 at

 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint

Re: Question on StreamingUpdateSolrServer

2009-04-16 Thread Yonik Seeley

On Wed, Apr 15, 2009 at 7:28 PM, vivek sar vivex...@gmail.com wrote:
 lsof at
 this point usually shows at 1400, but my ulimit is much higher than
 that.

Could you be hitting a kernel limit?

cat /proc/sys/fs/file-max
cat /proc/sys/fs/file-nr

http://www.netadmintools.com/art295.html

-Yonik
http://www.lucidimagination.com

Authentication Error

2009-04-16 Thread Allahbaksh Asadullah

Hi,I have followed the procedure given on this blog to setup the solr

Below is my code. I am trying to index the data but I am not able to connect
to server and getting authentication error.


HttpClient client=new HttpClient();
client.getState().setCredentials(new AuthScope(localhost, 80,
AuthScope.ANY_SCHEME),
new UsernamePasswordCredentials(admin, admin));

Can you please let me know what may be the problem.

The other problem which I am facing is using Load Banlancing
SolrServer lbHttpSolrServer = new LBHttpSolrServer(
http://localhost:8080/solr,http://localhost:8983/solr;);

Now the problem is the first server is down then I will get an error. If I
swap the server in constructor by giving port 8983 server as first and 8080
as second it works fine. The thing

Problem is If only the last server which is set is active and the rest of
other are down then Solr throws and exception and search is not performed.

Regards,
Allahbaksh

Re: DataImport, remove doc when marked as deleted

did you try the deletedPkQuery?

On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien ruben.chad...@aspiro.com wrote:
 Hi

 I am new to Solr, but have been using Lucene for a while. I am trying to
 rewrite
 some old lucene indexing code using the Jdbc DataImport i Solr, my problem:

 I have Entities that can be marked in the db as deleted, these i don't
 want to index
 and thats no problem when doing a full-import. When doing a delta-import my
 deltaQuery will catch
 Entities that has been marked as deleted since last index, but how do i get
 it to delete those from the index ?
 I tried making the deltaImportQuery so that in don't return the Entity if
 its deleted, that didnt help...

 Any ideas ?

 Thanks
 Ruben






-- 
--Noble Paul

Re: Authentication Error

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah
allahbaks...@gmail.com wrote:
 Hi,I have followed the procedure given on this blog to setup the solr

 Below is my code. I am trying to index the data but I am not able to connect
 to server and getting authentication error.


 HttpClient client=new HttpClient();
 client.getState().setCredentials(new AuthScope(localhost, 80,
 AuthScope.ANY_SCHEME),
                new UsernamePasswordCredentials(admin, admin));

 Can you please let me know what may be the problem.

 The other problem which I am facing is using Load Banlancing
 SolrServer lbHttpSolrServer = new LBHttpSolrServer(
 http://localhost:8080/solr,http://localhost:8983/solr;);

 Now the problem is the first server is down then I will get an error. If I
 swap the server in constructor by giving port 8983 server as first and 8080
 as second it works fine. The thing

 Problem is If only the last server which is set is active and the rest of
 other are down then Solr throws and exception and search is not performed.

I shall write a testcase and let you know
 Regards,
 Allahbaksh




-- 
--Noble Paul

Re: Stored Document encoding

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्

I guess strings are stored by lucene in utf-8 always. BTW As you pass
the Object as a String the encoding is lost

On Thu, Apr 16, 2009 at 7:37 PM, AlexxelA alexandre.boudrea...@canoe.ca wrote:

 I'm using the DataImportHandler and my database is in latin1.  When i
 retreive documents that i have indexed in solr they seem to have been
 converted in utf-8. Is it normal ?  Is it possible to store in latin1 in
 solr ?
 --
 View this message in context: 
 http://www.nabble.com/Stored-Document-encoding-tp23078724p23078724.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul

Re: What is QTime a measure of?


Not sure if you got the answer - QTime represents the number of milliseconds it 
took Solr to execute a search.  It does not include the time it takes to send 
back the response (that depends on its size, network speed...)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Andrew McCombe eupe...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, April 6, 2009 7:08:45 AM
 Subject: What is QTime a measure of?
 
 Hi
 
 Just started using Solr/Lucene and am getting to grips with it.  Great
 product!
 
 What is the QTime a measure of?  is it milliseconds, seconds?  I tried a
 Google search but couldn't fins anything definitive.
 
 Thanks In Advance
 
 Andrew McCombe

Re: Phrase Query Issue


Let me second this.  People ask for this pretty often.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Erik Hatcher e...@ehatchersolutions.com
 To: solr-user@lucene.apache.org
 Sent: Saturday, April 4, 2009 8:33:46 PM
 Subject: Re: Phrase Query Issue
 
 
 On Apr 4, 2009, at 1:25 AM, dabboo wrote:
 
  
  Erik,
  
  Thanks a lot for your reply. I have made some changes in the solr code and
  now field clauses are working fine with dismax request. Not only this,
  wildcard characters are also working with dismax and q query parameter.
  
  If you want I can share modified code with you.
 
 That'd be good to share.  Simply open a Solr JIRA issue with this enhancement 
 request and post your code there.  Test cases and documentation always 
 appreciated too, but working code to start with is fine.
 
 Erik

Re: Spelling Component


Hi,

It looks like your spellchecker index did get created (doesn't it get created 
automatically when Solr starts?), but it looks rather empty. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Anoop Atre anoop.a...@mnsu.edu
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Friday, April 3, 2009 3:46:10 PM
 Subject: Re: Spelling Component
 
 Shalin, I think I did build the spellcheck index, I made the changes
 to solrconfig and schema, restarted, passed a spellcheck.build=true
 which created the index.
 
 ls -ltr ./spellchecker
 -rw-r--r-- 1 XXX users 20 2009-04-03 13:23 segments.gen
 -rw-r--r-- 1 XXX users 28 2009-04-03 13:23 segments_f
 
 Hmm...how would I know if the word is in the index? As for the threshold
 do you mean reduce the 0.7 entry in
 solrconfig? Thanks!
 
 Shalin Shekhar Mangar wrote:
  On Sat, Apr 4, 2009 at 12:01 AM, Anoop Atre wrote:
  
  I still don't get any suggestions when I do
  /spellCheckCompRH?q=helultrasharspellcheck=truespellcheck.collate=true
 
 
  Did you build the spellcheck index? Try specifying a correct word which you
  know is in the index. See if spellchecker returns it. If it does, then it
  might be that no suggestions are available or there are no suggestions above
  the configured threshold.

Garbage Collectors

2009-04-16 Thread David Baker

I have an issue with garbage collection on our solr servers.  We have an 
issue where the  old generation  never  gets cleaned up on one of our 
servers.  This server has a little over 2 million records which are 
updated every hour or so.  I have tried the parallel GC and the 
concurrent GC.  The parallel seems more stable for us, but both end up 
running out of memory.  I have increased the memory allocated to the 
servers, but this just seems to delay the problem.  My question is, what 
are the suggested options for using the parallel GC.  Currently we are 
using something of this nature:


-server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy 
-XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m 
-XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr


I am new to solr and GC tuning, so any advice is appreciated.

Boosting by facets with standard query

2009-04-16 Thread ashokc


I have a query that yields results binned in several facets. How can I boost
the results that fall in certain facets over the rest of them that do not
belong to those facets? I use the standard query format. Thank you
- ashok
-- 
View this message in context: 
http://www.nabble.com/Boosting-by-facets-with-standard-query-tp23084860p23084860.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Garbage Collectors


Personally, I'd start from scratch:
-Xmx -Xms...

-server is not even needed any more.

If you are not using Java 1.6, I suggest you do.

Next, I'd try to investigate why objects are not being cleaned up - this should 
not be happening in the first place.  Is Solr the only webapp running?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: David Baker dav...@mate1inc.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 3:33:18 PM
 Subject: Garbage Collectors
 
 I have an issue with garbage collection on our solr servers.  We have an 
 issue 
 where the  old generation  never  gets cleaned up on one of our servers.  
 This 
 server has a little over 2 million records which are updated every hour or 
 so.  
 I have tried the parallel GC and the concurrent GC.  The parallel seems more 
 stable for us, but both end up running out of memory.  I have increased the 
 memory allocated to the servers, but this just seems to delay the problem.  
 My 
 question is, what are the suggested options for using the parallel GC.  
 Currently we are using something of this nature:
 
 -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX:+UseParallelOldGC 
 -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 
 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr
 
 I am new to solr and GC tuning, so any advice is appreciated.

Re: NPE creating EmbeddedSolrServer

2009-04-16 Thread Clay Fink


This worked great. Thanks!

The only catch is you have to (eventually) call CoreContainer.shutdown(),
otherwise the app just hangs.


Alexandre Rafalovitch wrote:
 
 To reply to my own message.
 
 The following worked starting from scratch (example):
 
 
 SolrConfig solrConfig = new SolrConfig(

 D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr,
 solrconfig.xml,
 null);
 IndexSchema indexSchema = new IndexSchema(
 solrConfig,
 schema.xml,
 null);
 
 CoreContainer container = new CoreContainer(new
 SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
 CoreDescriptor dcore = new CoreDescriptor(container, ,
 solrConfig.getResourceLoader().getInstanceDir());
 dcore.setConfigName(solrConfig.getResourceName());
 dcore.setSchemaName(indexSchema.getResourceName());
 SolrCore core = new SolrCore(
 null,
 
 D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr\\data,
 solrConfig, indexSchema, dcore);
 container.register(, core, false);
 SolrServer server = new EmbeddedSolrServer(container, );
 
 
 
 Not sure I get the magical sequence yet, but maybe it will save
 somebody else half a day.
 
 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 Research group: http://www.clt.mq.edu.au/Research/
 
 
 
 On Tue, Mar 17, 2009 at 6:22 PM, Alexandre Rafalovitch
 arafa...@gmail.com wrote:
 Hello,

 I am trying to create a basic single-core embedded Solr instance. I
 figured out how to setup a single core instance and got (I believe)
 all files in right places. However,  I am unable to run trivial code
 without exception:
 
 

-- 
View this message in context: 
http://www.nabble.com/NPE-creating-EmbeddedSolrServer-tp22569143p23086774.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Non-linear structure for search and index documents


: I need index/search words extracted from pdf files with coordinates and page
: number, so I have this  structure:
: 
:- index the document id
:- a document has many pages
:- a page has many words
:- a word has geometry[w,h,x,y](inside of page)
: 
: Is this possible with solr?   
: If yes, how the best way to do that? Is using field collapsing?

it's possible, but Solr doesn't currenlty have any features that make it 
*easy*.  

the main things you have to ask yourself, before deciding what the best 
way approach this problem is, are: what do i want to be ableto do with 
this data?


if you need to search for docs where dog appears inside a certain 
x1,y1,x2,y2 box then you have to structure your index much differntly then 
if you just need to find all docs containing dog and then as part or 
your result get the w,h,x,y coordinates for each instance of the word.

The main Lucene feature that's probably going to be at the core of any 
work like this is Payloads ... but there'sgoing to be a signifficant 
about of java coding needed to take advantage of it in any of the ways i 
can think of that you might be wanting.


-Hoss

Re: Dictionary lookup possibilities


: For instance, my dictionary holds the following terms:
: 1 - a b c d
: 2 - c d e
: 3 - a b
: 4 - a e f g h
: 
: If I put the sentence [a b c d f g h] in as a query, I want to recieve
: dictionary items 1 (matching all words a b c d) and 3 (matching words a b)
: as matches

this is a pretty hard problem in general ... in my mind i call it the 
longest matching sub-phrase problem, but i have no idea if it has a real 
name.

the only solution i know of using Lucene is to construct a phrase query 
for each of the sub phrases, giving a bigger query boost to the longer 
phrases ... but it might be possible to design a customer query impl for 
solving this problem.

(i've never had an important enough use case to dedicate a significant 
amount of time to figuring it out)





-Hoss

Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens

Greetings,

Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
with me in the Seattle area? I can donate some facilities, etc. -- I
also always have topics to speak about :)

Cheers,
Bradford

Re: special characters in Solr search query.


: the special characters but the issue is while the document which I am 
: going to index contains any of these special characters it is throwing 
: query parse exception. Can anyone give pointer over this? Thanks in 

your question is kind of vauge ... for instance: it seems like you are 
saying that you get query parse exceptions when you try to *index* a 
document containing one of these characters ... which would be really odd.

can you give some specifics about what exactly it is you are doing? ... 
either the literal Solr URLs that you are GETing manually, or the Code 
that you are using to talk to solr?

-Hoss

Advice on moving from 1.3 to 1.4-dev or trunk?

2009-04-16 Thread ristretto.rb

Hello, I'm using solr 1.3 with solr.py.   We have a basic schema.xml,
nothing custom or out of the ordinary.
I need the following the feature from 
http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt

SOLR-911: Add support for multi-select faceting by allowing filters to be
tagged and facet commands to exclude certain filters.  This patch also
added the ability to change the output key for facets in the response, and
optimized distributed faceting refinement by lowering parsing overhead and
by making requests and responses smaller.

Since this requires 1.4, looks like I have to upgrade (or roll my own
solution that this feature provides.)
I'm looking for a bit of advice.  I have looked through the bugs here
http://issues.apache.org/jira/browse/SOLR/fixforversion/12313351

1.  I would need to get the source for 1.4 and build it, right?  No
release yet, eh?
2.  Any one using 1.4 in production without issue; is this wise?  Or
should I wait?
3.  Will I need to make changes to my schema.xml to support my current
field set under 1.4?
4.  Do I need to reindex all my data?

thanks
gene

Re: Solr posts xml


: I installed Solr on tomcat 6 and whenever I click search it displays the xml
: like I am editing it?
: 
: is that normal?

I'm afraid i don't really understand your question ... if you mean you get 
an XML formated response when you click the Search button on the admin 
screen, then yes -- that is normal.  by default SOlr returns results in 
XML.  other output formats (like json, etc..) are supported or you can use 
an XSLT to transform the XML to HTML or anything else you might want.


-Hoss

Re: Garbage Collectors

2009-04-16 Thread Bryan Talbot

If you're using java 5 or 6 jmap is a useful tool in tracking down  
memory leaks.


http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html

jmap -histo:live pid

will print a histogram of all live objects in the heap.  Start at the  
top and work your way down until you find something suspicious -- the  
trick is in knowing what is suspicious of course.



-Bryan




On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote:


Otis Gospodnetic wrote:

Personally, I'd start from scratch:
-Xmx -Xms...

-server is not even needed any more.

If you are not using Java 1.6, I suggest you do.

Next, I'd try to investigate why objects are not being cleaned up -  
this should not be happening in the first place.  Is Solr the only  
webapp running?



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: David Baker dav...@mate1inc.com
To: solr-user@lucene.apache.org
Sent: Thursday, April 16, 2009 3:33:18 PM
Subject: Garbage Collectors

I have an issue with garbage collection on our solr servers.  We  
have an issue where the  old generation  never  gets cleaned up on  
one of our servers.  This server has a little over 2 million  
records which are updated every hour or so.  I have tried the  
parallel GC and the concurrent GC.  The parallel seems more stable  
for us, but both end up running out of memory.  I have increased  
the memory allocated to the servers, but this just seems to delay  
the problem.  My question is, what are the suggested options for  
using the parallel GC.  Currently we are using something of this  
nature:


-server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX: 
+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m - 
XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr


I am new to solr and GC tuning, so any advice is appreciated.

Thanks for the reply, yes, solr is the only app running under this  
tomcat server. I will remove -server, and other options except the  
heap allocation options and see how it performs. Any suggestions on  
how to go about finding out why objects are not being cleaned up if  
these changes dont work?

Re: Boosting by facets with standard query

On Fri, Apr 17, 2009 at 1:03 AM, ashokc ash...@qualcomm.com wrote:


 I have a query that yields results binned in several facets. How can I
 boost
 the results that fall in certain facets over the rest of them that do not
 belong to those facets? I use the standard query format. Thank you


I'm not sure what you mean by boosting by facet. Do you mean that you want
to boost documents which match a term query?

If yes, you can use your_field_name:value^2.0 in the q parameter.
-- 
Regards,
Shalin Shekhar Mangar.

Re: Dictionary lookup possibilities

On Fri, Apr 17, 2009 at 3:37 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


  this is a pretty hard problem in general ... in my mind i call it the
 longest matching sub-phrase problem, but i have no idea if it has a real
 name.

 the only solution i know of using Lucene is to construct a phrase query
 for each of the sub phrases, giving a bigger query boost to the longer
 phrases ... but it might be possible to design a customer query impl for
 solving this problem.


There was an issue opened for something similar but there is not patch yet.

https://issues.apache.org/jira/browse/SOLR-633

-- 
Regards,
Shalin Shekhar Mangar.

Faceted Search

2009-04-16 Thread Sajith Weerakoon

Hi all,

Can someone of you tell me how to implement a faceted search?

 

Thanks,

Regards,

Sajith Vimukthi Weerakoon.

The facetd Search

2009-04-16 Thread Sajith Weerakoon

Hi all,

I am developing a search tool and it uses solr as the key querying
technique. At the moment I have got a very much stable version and I need to
enhance the application by introducing a faceted search. I went through the
documentation and did some modifications to my code. I could not get
anything out of it done. Is there a configuration involved? How can I
implement the faceted search?

 

Thanks,

Regards,

Sajith Vimukthi Weerakoon.

Re: The facetd Search