RE: Solr replication

2008-01-16 Thread Dilip.TS
Hi Bill,
I have some questions regarding the SOLR collection distribution.
!) Is it possilbe to add the index operations on the the slave server using
SOLR collection distribution and still the master server is updated with
these changes?
2)I have a requirement of having more than one solr instance (the
corresponding data directory for each solr core). Is it possible to maintain
different solr cores and still achieve SOLR collection distribution for all
of these cores independently. If yes, then how ?

Regards,
Dilip


  -Original Message-
  From: Bill Au [mailto:[EMAIL PROTECTED]
  Sent: Monday, January 14, 2008 9:40 PM
  To: [EMAIL PROTECTED]
  Subject: Re: Solr replication


  Yes, you need the same changes in scripts.conf on the slave server but you
don't need the post commit hook enabled on the slave server.
  The post commit hook is used to create snapshots.  You will see a new
snapshot in the data directory every time you do a commit on the master
server.  There is no need to create snapshots on the slave server as the
slave server copies the snapshots from the master server.

  The scripts are designed to run under Unix/Linux.  It uses symbolic link
and Unix/Linux commands like scp, ssh, rsync, cp.  I don't know much about
Windows so I don't know for sure if all the Unix/Linux stuff used by the
sccripts are available in Windows or not.

  Bill


  On 1/14/08, Dilip.TS [EMAIL PROTECTED] wrote:
Hi Bill,
I m trying to use the solr collection distribution.
and done the following changes:

1)Changes done in Master server on linux
#In scripts.conf file

user=
solr_hostname=localhost
solr_port=8983
rsyncd_port=18983
data_dir=/usr/solr/data/data_tenantID_1
webapp_name=solr
master_host=192.168.168.50
master_data_dir=/usr/solr/data/data_tenantID_1
master_status_dir=/usr/solr/logs

2)Enable the postcommit in solrconfig.xml

!-- A postCommit event is fired after every commit or optimize
ommand  --
listener event=postCommit class=solr.RunExecutableListener 
str name=exe/usr/solr/bin/snapshooter/str   str
name=dir/usr/solr/bin/str
bool name=waittrue/bool
!--arr name=argsstr-u jetty-6.1.6/str str-d
/opt/solr/data/str/arr--
arr name=env /arr
/listener

i run the Embedded solr folder and added a document to it..
and did a search for a word on the same server.
I found the following observations in the console:

INFO: query parser default operator is OR
Jan 14, 2008 3:37:38 PM org.apache.solr.schema.IndexSchema readSchema
INFO: unique key field: id
Jan 14, 2008 3:37:38 PM org.apache.solr.core.SolrCore init
INFO: Opening new SolrCore at //usr//solr/,
dataDir=//usr//solr//data//data_tenantID_1
Jan 14, 2008 3:37:38 PM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //[EMAIL PROTECTED]firstSearcher]
Jan 14, 2008 3:37:38 PM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //[EMAIL PROTECTED]newSearcher]
Jan 14, 2008 3:37:39 PM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created xslt: org.apache.solr.request.XSLTResponseWriter
Jan 14, 2008 3:37:39 PM org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5
Jan 14, 2008 3:37:39 PM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created standard: org.apache.solr.handler.StandardRequestHandler
.
.
.
.
INFO: Opening [EMAIL PROTECTED] main
Jan 14, 2008 3:37:39 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher [EMAIL PROTECTED] main
Jan 14, 2008 3:37:39 PM org.apache.solr.update.UpdateHandler
parseEventListeners
INFO: added SolrEventListener for postCommit:

org.apache.solr.core.RunExecutableListener{exe=/usr/solr/bin/snapshooter,dir
=/usr/solr/bin,wait=true,env=[]}
Jan 14, 2008 3:37:39 PM
org.apache.solr.update.DirectUpdateHandler2$CommitTracker init
INFO: AutoCommit: disabled


In the above console i find postCommit:

org.apache.solr.core.RunExecutableListener{exe=/usr/solr/bin/snapshooter,dir
=/usr/solr/bin,wait=true,env=[]}
command being called after doing a commit.
This is a scenario for the add/search done on the same master server on
Linux.


1)I would like to know do we require similar entries for the scrips.conf
and
the postcommit enabled in the solrconfig.xml for the slave server too.
  If yes, are these entries for the slave server  should be identical to
that of master or it is different?

2)Also can we have the Linux machine acting as a master server and the
slave
can be made to run on windows machine?

Thanks in advance.
Regards
Dilip






-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED] ]
Sent: Saturday, December 15, 2007 1:08 AM
To: solr-user@lucene.apache.org; [EMAIL PROTECTED]
   

Problem with dismax handler when searching Solr along with field

2008-01-16 Thread farhanali

when i search the query for example 

http://localhost:8983/solr/select/?q=categoryqt=dismax

it gives the results but when i want to search on the basis of field name
like

http://localhost:8983/solr/select/?q=maincategory:Carsqt=dismax

it does not gives results however

http://localhost:8983/solr/select/?q=maincategory:Cars

return results of cars from field name maincategory


-- 
View this message in context: 
http://www.nabble.com/Problem-with-dismax-handler-when-searching-Solr-along-with-field-tp14878239p14878239.html
Sent from the Solr - User mailing list archive at Nabble.com.



Indexing two sets of details

2008-01-16 Thread Gavin
Hi,
In the web application we are developing we have two sets of details.
The personal details and the resume details. We allow 5 different
resumes to be available for each user. But we want the personal details
to remain same for each 5 resumes. The problem is when personal details
are changed we will have to update all 5 resumes. 
I was thinking if we index the personal details fields separately we
only have to change/update those fields. But the problem is when
searching for users using fields from both personal details and resume.
Then I have to manually combine both searches and what if one search
gives more results than the other. Would really appreciate it if  anyone
has a suggestion on how I should tackle this problem


Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for 
the use of the individual or entity to which they are addressed. The content 
and opinions 
contained in this email are not necessarily those of hSenid Software 
International. 
If you have received this email in error please contact the sender.



Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Shalin Shekhar Mangar
Look at http://issues.apache.org/jira/browse/SOLR-303

Please note that it is still work in progress. So you may not be able to use
it immeadiately.

On Jan 16, 2008 10:53 AM, Srikant Jakilinki [EMAIL PROTECTED] wrote:

 Hi All,

 There is a requirement in our group of indexing and searching several
 millions of documents (TREC) in real-time and millisecond responses.
 For the moment we are preferring scale-out (throw more commodity
 machines) approaches rather than scale-up (faster disks, more
 RAM). This is in-turn inspired by the Scale-out vs. Scale-up paper
 (mail me if you want a copy) in which it was proven that this kind of
 distribution scales better and is more resilient.

 So, are there any resources available (Wiki, Tutorials, Slides, README
 etc.) that throw light and guide newbies on how to run Solr in a
 multi-machine scenario? I have gone through the mailing lists and site
 but could not really find any answers or hands-on stuff to do so. An
 adhoc guideline to get things working with 2 machines might just be
 enough but for the sake of thinking out loud and solicit responses
 from the list, here are my questions:

 1) Solr that has to handle a fairly large index which has to be split
 up on multiple disks (using Multicore?)
 - Space is not a problem since we can use NFS but that is not
 recommended as we would only exploit 1 processor
 2) Solr that has to handle a large collective index which has to be
 split up on multi-machines
 - The index is ever increasing (TB scale) and dynamic and all of it
 has to be searched at any point
 3) Solr that has to exploit multi-machines because we have plenty of
 them in a tightly coupled P2P scenario
 - Machines are not a problem but will they be if they are of varied
 configurations (PIII to Core2; Linux to Vista; 32-bit to 64-bit; J2SE
 1.1 to 1.6)
 4) Solr that has to distribute load on several machines
 - The index(s) could be common though like say using a distributed
 filesystem (Hadoop?)

 In each the above cases (we might use all of these strategies at
 various use cases) the application should use Solr as a strict backend
 and named service (IP or host:port) so that we can expose this
 application (and the service) to the web or intranet. Machine failures
 should be tolerated too. Also, does Solr manage load balancing out of
 the box if it was indeed configured to work with multi-machines?

 Maybe it is superfluous but is Solr and/or Nutch the only way to use
 Lucene in a multi-machine environment? Or is there some hidden
 document/project somewhere that makes it possible by exposing a
 regular Lucene process over the network using RMI or something? It is
 my understanding (could be wrong) that Nutch and to some extent, Solr
 do not perform well when there is a lot of indexing activity in
 parallel to search. Batch processing is also there and perhaps we can
 use Nutch/Solr there. Even so, we need multi-machine directions.

 I am sure that multi-machines make possible for a lot of other ways
 which might solve the goal better and that others have practical
 experience on. So, any advise and tips are also very welcome. We
 intend to document things and do some benchmarking along the way in
 the open spirit.

 Really sorry for the length but I hope some answers are forthcoming.

 Cheers,
 Srikant




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr replication

2008-01-16 Thread Bill Au
my answers inilne...

On Jan 16, 2008 3:51 AM, Dilip.TS [EMAIL PROTECTED] wrote:

 Hi Bill,
 I have some questions regarding the SOLR collection distribution.
 !) Is it possilbe to add the index operations on the the slave server
 using
 SOLR collection distribution and still the master server is updated with
 these changes?


No.  The replication process is only one way, from the master to the slave.
The idea behind it is that the slave servers would be for query only and the
number of slaves can
be increased or decreased according to traffic load.


 2)I have a requirement of having more than one solr instance (the
 corresponding data directory for each solr core). Is it possible to
 maintain
 different solr cores and still achieve SOLR collection distribution for
 all
 of these cores independently. If yes, then how ?


Does each solr instance has its own solr home?  If so you can use
replication within each instance by simply adjusting the parameters in
scripts.conf for each instance.  Even if they all share a single solr home,
the replication related scripts all have command line option to override
values set in scripts.conf:

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

So you can invoke the scripts for each instance by setting the data
directory on the command line.



 Regards,
 Dilip


  -Original Message-
  From: Bill Au [mailto:[EMAIL PROTECTED]
  Sent: Monday, January 14, 2008 9:40 PM
  To: [EMAIL PROTECTED]
  Subject: Re: Solr replication


  Yes, you need the same changes in scripts.conf on the slave server but
 you
 don't need the post commit hook enabled on the slave server.
  The post commit hook is used to create snapshots.  You will see a new
 snapshot in the data directory every time you do a commit on the master
 server.  There is no need to create snapshots on the slave server as the
 slave server copies the snapshots from the master server.

  The scripts are designed to run under Unix/Linux.  It uses symbolic link
 and Unix/Linux commands like scp, ssh, rsync, cp.  I don't know much about
 Windows so I don't know for sure if all the Unix/Linux stuff used by the
 sccripts are available in Windows or not.

  Bill


  On 1/14/08, Dilip.TS [EMAIL PROTECTED] wrote:
Hi Bill,
I m trying to use the solr collection distribution.
and done the following changes:

1)Changes done in Master server on linux
#In scripts.conf file

user=
solr_hostname=localhost
solr_port=8983
rsyncd_port=18983
data_dir=/usr/solr/data/data_tenantID_1
webapp_name=solr
master_host=192.168.168.50
master_data_dir=/usr/solr/data/data_tenantID_1
master_status_dir=/usr/solr/logs

2)Enable the postcommit in solrconfig.xml

!-- A postCommit event is fired after every commit or optimize
 ommand  --
listener event=postCommit class=solr.RunExecutableListener 
str name=exe/usr/solr/bin/snapshooter/str   str
name=dir/usr/solr/bin/str
bool name=waittrue/bool
!--arr name=argsstr-u jetty-6.1.6/str str-d
/opt/solr/data/str/arr--
arr name=env /arr
/listener

i run the Embedded solr folder and added a document to it..
and did a search for a word on the same server.
I found the following observations in the console:

INFO: query parser default operator is OR
Jan 14, 2008 3:37:38 PM org.apache.solr.schema.IndexSchema readSchema
INFO: unique key field: id
Jan 14, 2008 3:37:38 PM org.apache.solr.core.SolrCore init
INFO: Opening new SolrCore at //usr//solr/,
dataDir=//usr//solr//data//data_tenantID_1
Jan 14, 2008 3:37:38 PM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //[EMAIL PROTECTED]firstSearcher]
Jan 14, 2008 3:37:38 PM org.apache.solr.core.SolrCore parseListener
INFO: Searching for listeners: //[EMAIL PROTECTED]newSearcher]
Jan 14, 2008 3:37:39 PM
 org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created xslt: org.apache.solr.request.XSLTResponseWriter
Jan 14, 2008 3:37:39 PM org.apache.solr.request.XSLTResponseWriter init
INFO: xsltCacheLifetimeSeconds=5
Jan 14, 2008 3:37:39 PM
 org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created standard: org.apache.solr.handler.StandardRequestHandler
.
.
.
.
INFO: Opening [EMAIL PROTECTED] main
Jan 14, 2008 3:37:39 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher [EMAIL PROTECTED] main
Jan 14, 2008 3:37:39 PM org.apache.solr.update.UpdateHandler
parseEventListeners
INFO: added SolrEventListener for postCommit:

 org.apache.solr.core.RunExecutableListener{exe=/usr/solr/bin/snapshooter
 ,dir
=/usr/solr/bin,wait=true,env=[]}
Jan 14, 2008 3:37:39 PM
org.apache.solr.update.DirectUpdateHandler2$CommitTracker init
INFO: AutoCommit: disabled


In the above console i find postCommit:

 

Re: Indexing very large files.

2008-01-16 Thread David Thibault
All,
I just found a thread about this on the mailing list archives because I'm
troubleshooting the same problem.  The kicker is that it doesn't take such
large files to kill the StringBuilder.  I have discovered the following:

By using a text file made up of  3,443,464 bytes or less, I get no error.

AT 3,443,465 bytes:


Exception in thread main java.lang.OutOfMemoryError: Java heap space

at java.lang.String.init(String.java:208)

at java.lang.StringBuilder.toString(StringBuilder.java:431)

at org.junit.Assert.format(Assert.java:321)

at org.junit.ComparisonFailure$ComparisonCompactor.compact(
ComparisonFailure.java:80)

at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java:37)

at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)

at java.lang.Throwable.toString(Throwable.java:344)

at java.lang.String.valueOf(String.java:2615)

at java.io.PrintWriter.print(PrintWriter.java:546)

at java.io.PrintWriter.println(PrintWriter.java:683)

at java.lang.Throwable.printStackTrace(Throwable.java:510)

at org.apache.tools.ant.util.StringUtils.getStackTrace(
StringUtils.java:96)

at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
(JUnitTestRunner.java:856)

at
org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
(XMLJUnitResultFormatter.java:280)

at
org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
(XMLJUnitResultFormatter.java:255)

at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
JUnitTestRunner.java:988)

at junit.framework.TestResult.addError(TestResult.java:38)

at junit.framework.JUnit4TestAdapterCache$1.testFailure(
JUnit4TestAdapterCache.java:51)

at org.junit.runner.notification.RunNotifier$4.notifyListener(
RunNotifier.java:96)

at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
RunNotifier.java:37)

at org.junit.runner.notification.RunNotifier.fireTestFailure(
RunNotifier.java:93)

at org.junit.internal.runners.TestMethodRunner.addFailure(
TestMethodRunner.java:104)

at org.junit.internal.runners.TestMethodRunner.runUnprotected(
TestMethodRunner.java:87)

at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
BeforeAndAfterRunner.java:34)

at org.junit.internal.runners.TestMethodRunner.runMethod(
TestMethodRunner.java:75)

at org.junit.internal.runners.TestMethodRunner.run(
TestMethodRunner.java:45)

at
org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
TestClassMethodsRunner.java:71)

at org.junit.internal.runners.TestClassMethodsRunner.run(
TestClassMethodsRunner.java:35)

at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
TestClassRunner.java:42)

at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
BeforeAndAfterRunner.java:34)

at org.junit.internal.runners.TestClassRunner.run(
TestClassRunner.java:52)

at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:32)



AT 3,443,466 byes (or more) :


Exception in thread main java.lang.OutOfMemoryError: Java heap space

at java.lang.AbstractStringBuilder.expandCapacity(
AbstractStringBuilder.java:99)

at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java
:393)

at java.lang.StringBuilder.append(StringBuilder.java:120)

at org.junit.Assert.format(Assert.java:321)

at org.junit.ComparisonFailure$ComparisonCompactor.compact(
ComparisonFailure.java:80)

at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java:37)

at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)

at java.lang.Throwable.toString(Throwable.java:344)

at java.lang.String.valueOf(String.java:2615)

at java.io.PrintWriter.print(PrintWriter.java:546)

at java.io.PrintWriter.println(PrintWriter.java:683)

at java.lang.Throwable.printStackTrace(Throwable.java:510)

at org.apache.tools.ant.util.StringUtils.getStackTrace(
StringUtils.java:96)

at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
(JUnitTestRunner.java:856)

at
org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
(XMLJUnitResultFormatter.java:280)

at
org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
(XMLJUnitResultFormatter.java:255)

at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
JUnitTestRunner.java:988)

at junit.framework.TestResult.addError(TestResult.java:38)

at junit.framework.JUnit4TestAdapterCache$1.testFailure(
JUnit4TestAdapterCache.java:51)

at org.junit.runner.notification.RunNotifier$4.notifyListener(
RunNotifier.java:96)

at 

Cache size and Heap size

2008-01-16 Thread Evgeniy Strokin
Hello,.. 
I have relatively large RAM (10Gb) on my server which is running Solr. I 
increased Cache settings and start to see OutOfMemory exceptions, specially on 
facet search.
Is anybody has some suggestions how Cache settings related to Memory 
consumptions? What are optimal settings? How they could be calculated?
 
Thank you for any advise,
Gene

conceptual issues with solr

2008-01-16 Thread Philippe Guillard
Hi here,

It seems that Lucene accepts any kind of XML document but Solr accepts only
flat name/value pairs inside a document to be indexed.
You'll find below what I'd like to do, Thanks for help of any kind !

Phil


I need to index products (hotels) which have a price by date, then search
them by date or date range and price range.
Is there a way to do that with Solr?

At the moment i have a document for each hotel :
add
doc
field name=urlhttp:///yyy/field
field name=id1/field
field name=nameHotel Opera/field
field name=category4 stars/field
.
/doc
/add

I would need to add my dates/price values like this but it is forbidden in
Solr indexing:
date value=30/01/2008 price=200
date value=31/01/2008 price=150

Otherwise i could define a default field (being an integer) and have as many
fields as dates, like this:
field name=30/01/2008200/field
field name=31/01/2008150/field
indexing would accept it but i think i will not be able to search or sort by
date

The only solution i found at the moment is to create a document for each
date/price
add
doc
field name=urlhttp:///yyy/field
field name=id1/field
field name=nameHotel Opera/field
field name=date30/01/2008/field
field name=price200/field
/doc
doc
field name=url http:///yyy/field
field name=id1/field
field name=nameHotel Opera/field
field name=date31/01/2008/field
field name=price150/field
/doc
/add
then i'll have many documents for 1 hotel
and in order to search by date range i would need more documents
like this :
field name=date-range28/01/2008 to 31/01/2008/field
field name=date-range29/01/2008 to 31/01/2008/field
field name=date-range30/01/2008 to 31/01/2008/field

Since i need to index many other informations about an hotel (address,
telephone, amenities etc...) i wouldn' like to duplicate too much
information, and i think it would not be scalable to search first in a dates
index then in hotels index to retrieve hotel information.

Any idea?


Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
I don't think this is a StringBuilder limitation, but rather your Java
JVM doesn't start with enough memory. i.e. -Xmx.

In raw Lucene, I've indexed 240M files

Best
Erick

On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED]
wrote:

 All,
 I just found a thread about this on the mailing list archives because I'm
 troubleshooting the same problem.  The kicker is that it doesn't take such
 large files to kill the StringBuilder.  I have discovered the following:

 By using a text file made up of  3,443,464 bytes or less, I get no error.

 AT 3,443,465 bytes:


 Exception in thread main java.lang.OutOfMemoryError: Java heap space

at java.lang.String.init(String.java:208)

at java.lang.StringBuilder.toString(StringBuilder.java:431)

at org.junit.Assert.format(Assert.java:321)

at org.junit.ComparisonFailure$ComparisonCompactor.compact(
 ComparisonFailure.java:80)

at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
 :37)

at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)

at java.lang.Throwable.toString(Throwable.java:344)

at java.lang.String.valueOf(String.java:2615)

at java.io.PrintWriter.print(PrintWriter.java:546)

at java.io.PrintWriter.println(PrintWriter.java:683)

at java.lang.Throwable.printStackTrace(Throwable.java:510)

at org.apache.tools.ant.util.StringUtils.getStackTrace(
 StringUtils.java:96)

at

 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
 (JUnitTestRunner.java:856)

at

 org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
 (XMLJUnitResultFormatter.java:280)

at

 org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
 (XMLJUnitResultFormatter.java:255)

at
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
 JUnitTestRunner.java:988)

at junit.framework.TestResult.addError(TestResult.java:38)

at junit.framework.JUnit4TestAdapterCache$1.testFailure(
 JUnit4TestAdapterCache.java:51)

at org.junit.runner.notification.RunNotifier$4.notifyListener(
 RunNotifier.java:96)

at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
 RunNotifier.java:37)

at org.junit.runner.notification.RunNotifier.fireTestFailure(
 RunNotifier.java:93)

at org.junit.internal.runners.TestMethodRunner.addFailure(
 TestMethodRunner.java:104)

at org.junit.internal.runners.TestMethodRunner.runUnprotected(
 TestMethodRunner.java:87)

at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
 BeforeAndAfterRunner.java:34)

at org.junit.internal.runners.TestMethodRunner.runMethod(
 TestMethodRunner.java:75)

at org.junit.internal.runners.TestMethodRunner.run(
 TestMethodRunner.java:45)

at
 org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
 TestClassMethodsRunner.java:71)

at org.junit.internal.runners.TestClassMethodsRunner.run(
 TestClassMethodsRunner.java:35)

at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
 TestClassRunner.java:42)

at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
 BeforeAndAfterRunner.java:34)

at org.junit.internal.runners.TestClassRunner.run(
 TestClassRunner.java:52)

at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:32)



 AT 3,443,466 byes (or more) :


 Exception in thread main java.lang.OutOfMemoryError: Java heap space

at java.lang.AbstractStringBuilder.expandCapacity(
 AbstractStringBuilder.java:99)

at java.lang.AbstractStringBuilder.append(
 AbstractStringBuilder.java
 :393)

at java.lang.StringBuilder.append(StringBuilder.java:120)

at org.junit.Assert.format(Assert.java:321)

at org.junit.ComparisonFailure$ComparisonCompactor.compact(
 ComparisonFailure.java:80)

at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
 :37)

at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)

at java.lang.Throwable.toString(Throwable.java:344)

at java.lang.String.valueOf(String.java:2615)

at java.io.PrintWriter.print(PrintWriter.java:546)

at java.io.PrintWriter.println(PrintWriter.java:683)

at java.lang.Throwable.printStackTrace(Throwable.java:510)

at org.apache.tools.ant.util.StringUtils.getStackTrace(
 StringUtils.java:96)

at

 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
 (JUnitTestRunner.java:856)

at

 org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
 (XMLJUnitResultFormatter.java:280)

at

 org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
 (XMLJUnitResultFormatter.java:255)

at
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
 

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
P.S. Lucene by default limits the maximum field length
to 10K tokens, so you have to bump that for large files.

Erick

On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote:

 I don't think this is a StringBuilder limitation, but rather your Java
 JVM doesn't start with enough memory. i.e. -Xmx.

 In raw Lucene, I've indexed 240M files

 Best
 Erick


 On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED]
 wrote:

  All,
  I just found a thread about this on the mailing list archives because
  I'm
  troubleshooting the same problem.  The kicker is that it doesn't take
  such
  large files to kill the StringBuilder.  I have discovered the following:
 
 
  By using a text file made up of  3,443,464 bytes or less, I get no
  error.
 
  AT 3,443,465 bytes:
 
 
  Exception in thread main java.lang.OutOfMemoryError: Java heap space
 
 at java.lang.String .init(String.java:208)
 
 at java.lang.StringBuilder.toString(StringBuilder.java:431)
 
 at org.junit.Assert.format(Assert.java:321)
 
 at org.junit.ComparisonFailure$ComparisonCompactor.compact (
  ComparisonFailure.java:80)
 
 at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
  :37)
 
 at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
 
 at java.lang.Throwable.toString (Throwable.java:344)
 
 at java.lang.String.valueOf(String.java:2615)
 
 at java.io.PrintWriter.print(PrintWriter.java:546)
 
 at java.io.PrintWriter.println(PrintWriter.java:683)
 
 at java.lang.Throwable.printStackTrace(Throwable.java:510)
 
 at org.apache.tools.ant.util.StringUtils.getStackTrace(
  StringUtils.java:96)
 
 at
 
  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
  (JUnitTestRunner.java:856)
 
 at
 
  org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
  (XMLJUnitResultFormatter.java:280)
 
 at
 
  org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
  (XMLJUnitResultFormatter.java:255)
 
 at
  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
  JUnitTestRunner.java:988)
 
 at junit.framework.TestResult.addError(TestResult.java :38)
 
 at junit.framework.JUnit4TestAdapterCache$1.testFailure(
  JUnit4TestAdapterCache.java:51)
 
 at org.junit.runner.notification.RunNotifier$4.notifyListener(
  RunNotifier.java:96)
 
 at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
  RunNotifier.java:37)
 
 at org.junit.runner.notification.RunNotifier.fireTestFailure(
  RunNotifier.java:93)
 
 at org.junit.internal.runners.TestMethodRunner.addFailure (
  TestMethodRunner.java:104)
 
 at org.junit.internal.runners.TestMethodRunner.runUnprotected(
  TestMethodRunner.java:87)
 
 at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
  BeforeAndAfterRunner.java:34)
 
 at org.junit.internal.runners.TestMethodRunner.runMethod(
  TestMethodRunner.java:75)
 
 at org.junit.internal.runners.TestMethodRunner.run(
  TestMethodRunner.java :45)
 
 at
  org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
  TestClassMethodsRunner.java:71)
 
 at org.junit.internal.runners.TestClassMethodsRunner.run(
  TestClassMethodsRunner.java :35)
 
 at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
  TestClassRunner.java:42)
 
 at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
  BeforeAndAfterRunner.java:34)
 
 at org.junit.internal.runners.TestClassRunner.run(
  TestClassRunner.java:52)
 
 at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java
  :32)
 
 
 
  AT 3,443,466 byes (or more) :
 
 
  Exception in thread main java.lang.OutOfMemoryError: Java heap space
 
 at java.lang.AbstractStringBuilder.expandCapacity(
  AbstractStringBuilder.java:99)
 
 at java.lang.AbstractStringBuilder.append (
  AbstractStringBuilder.java
  :393)
 
 at java.lang.StringBuilder.append(StringBuilder.java:120)
 
 at org.junit.Assert.format(Assert.java:321)
 
 at org.junit.ComparisonFailure$ComparisonCompactor.compact (
  ComparisonFailure.java:80)
 
 at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
  :37)
 
 at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
 
 at java.lang.Throwable.toString (Throwable.java:344)
 
 at java.lang.String.valueOf(String.java:2615)
 
 at java.io.PrintWriter.print(PrintWriter.java:546)
 
 at java.io.PrintWriter.println(PrintWriter.java:683)
 
 at java.lang.Throwable.printStackTrace(Throwable.java:510)
 
 at org.apache.tools.ant.util.StringUtils.getStackTrace(
  StringUtils.java:96)
 
 at
 
  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
  (JUnitTestRunner.java:856)
 

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I think your PS might do the trick.  My JVM doesn't seem to be the issue,
because I've set it to -Xmx512m -Xms256m.  I will track down the solr config
parameter you mentioned and try that.  Thanks for the quick response!
Dave

On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote:

 P.S. Lucene by default limits the maximum field length
 to 10K tokens, so you have to bump that for large files.

 Erick

 On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote:

  I don't think this is a StringBuilder limitation, but rather your Java
  JVM doesn't start with enough memory. i.e. -Xmx.
 
  In raw Lucene, I've indexed 240M files
 
  Best
  Erick
 
 
  On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED]
  wrote:
 
   All,
   I just found a thread about this on the mailing list archives because
   I'm
   troubleshooting the same problem.  The kicker is that it doesn't take
   such
   large files to kill the StringBuilder.  I have discovered the
 following:
  
  
   By using a text file made up of  3,443,464 bytes or less, I get no
   error.
  
   AT 3,443,465 bytes:
  
  
   Exception in thread main java.lang.OutOfMemoryError: Java heap space
  
  at java.lang.String .init(String.java:208)
  
  at java.lang.StringBuilder.toString(StringBuilder.java:431)
  
  at org.junit.Assert.format(Assert.java:321)
  
  at org.junit.ComparisonFailure$ComparisonCompactor.compact (
   ComparisonFailure.java:80)
  
  at org.junit.ComparisonFailure.getMessage(
 ComparisonFailure.java
   :37)
  
  at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
  
  at java.lang.Throwable.toString (Throwable.java:344)
  
  at java.lang.String.valueOf(String.java:2615)
  
  at java.io.PrintWriter.print(PrintWriter.java:546)
  
  at java.io.PrintWriter.println(PrintWriter.java:683)
  
  at java.lang.Throwable.printStackTrace(Throwable.java:510)
  
  at org.apache.tools.ant.util.StringUtils.getStackTrace(
   StringUtils.java:96)
  
  at
  
  
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
   (JUnitTestRunner.java:856)
  
  at
  
  
 org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
   (XMLJUnitResultFormatter.java:280)
  
  at
  
  
 org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
   (XMLJUnitResultFormatter.java:255)
  
  at
  
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
   JUnitTestRunner.java:988)
  
  at junit.framework.TestResult.addError(TestResult.java :38)
  
  at junit.framework.JUnit4TestAdapterCache$1.testFailure(
   JUnit4TestAdapterCache.java:51)
  
  at org.junit.runner.notification.RunNotifier$4.notifyListener(
   RunNotifier.java:96)
  
  at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
   RunNotifier.java:37)
  
  at org.junit.runner.notification.RunNotifier.fireTestFailure(
   RunNotifier.java:93)
  
  at org.junit.internal.runners.TestMethodRunner.addFailure (
   TestMethodRunner.java:104)
  
  at org.junit.internal.runners.TestMethodRunner.runUnprotected(
   TestMethodRunner.java:87)
  
  at org.junit.internal.runners.BeforeAndAfterRunner.runProtected
 (
   BeforeAndAfterRunner.java:34)
  
  at org.junit.internal.runners.TestMethodRunner.runMethod(
   TestMethodRunner.java:75)
  
  at org.junit.internal.runners.TestMethodRunner.run(
   TestMethodRunner.java :45)
  
  at
   org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
   TestClassMethodsRunner.java:71)
  
  at org.junit.internal.runners.TestClassMethodsRunner.run(
   TestClassMethodsRunner.java :35)
  
  at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
   TestClassRunner.java:42)
  
  at org.junit.internal.runners.BeforeAndAfterRunner.runProtected
 (
   BeforeAndAfterRunner.java:34)
  
  at org.junit.internal.runners.TestClassRunner.run(
   TestClassRunner.java:52)
  
  at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java
   :32)
  
  
  
   AT 3,443,466 byes (or more) :
  
  
   Exception in thread main java.lang.OutOfMemoryError: Java heap space
  
  at java.lang.AbstractStringBuilder.expandCapacity(
   AbstractStringBuilder.java:99)
  
  at java.lang.AbstractStringBuilder.append (
   AbstractStringBuilder.java
   :393)
  
  at java.lang.StringBuilder.append(StringBuilder.java:120)
  
  at org.junit.Assert.format(Assert.java:321)
  
  at org.junit.ComparisonFailure$ComparisonCompactor.compact (
   ComparisonFailure.java:80)
  
  at org.junit.ComparisonFailure.getMessage(
 ComparisonFailure.java
   :37)
  
  at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
  
  at java.lang.Throwable.toString (Throwable.java:344)
  
  at 

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I tried raising the maxFieldLength1/maxFieldLength under
mainIndex as well as indexDefaults and still no luck. I'm trying to
upload a text file that is about 8 MB in size.  I think the following stack
trace still points to some sort of overflowed String issue.  Thoughts?
Solr returned an error: Java heap space  java.lang.OutOfMemoryError: Java
heap space
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
at java.lang.StringCoding.encode(StringCoding.java:272)
at java.lang.String.getBytes(String.java:947)
at org.apache.lucene.index.FieldsWriter.addDocument(FieldsWriter.java:98)
at org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:107)

at org.apache.lucene.index.IndexWriter.buildSingleDocSegment(
IndexWriter.java:977)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:965)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:947)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(
DirectUpdateHandler2.java:270)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:166)
at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
XmlUpdateRequestHandler.java:84)
 at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)

at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:159)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:174)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:127)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:117)
 at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)

at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)

at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:665)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:528)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:81)
 at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619)

java.io.IOException: Server returned HTTP response code: 500 for URL:
http://solr:8080/solr/update
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
HttpURLConnection.java:1170)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postData(
SimplePostTool.java:134)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postFile(
SimplePostTool.java:87)
at com.itstrategypartners.sents.solrUpload.Uploader.uploadFile(
Uploader.java:97)
at com.itstrategypartners.sents.solrUpload.UploaderTest.uploadFile(
UploaderTest.java:95)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.junit.internal.runners.TestMethodRunner.executeMethodBody(
TestMethodRunner.java:99)
at org.junit.internal.runners.TestMethodRunner.runUnprotected(
TestMethodRunner.java:81)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
BeforeAndAfterRunner.java:34)
at org.junit.internal.runners.TestMethodRunner.runMethod(
TestMethodRunner.java:75)
at org.junit.internal.runners.TestMethodRunner.run(
TestMethodRunner.java:45)
at
org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
TestClassMethodsRunner.java:71)
at org.junit.internal.runners.TestClassMethodsRunner.run(
TestClassMethodsRunner.java:35)
at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
TestClassRunner.java:42)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
BeforeAndAfterRunner.java:34)
at org.junit.internal.runners.TestClassRunner.run(
TestClassRunner.java:52)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:32)
at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(
JUnitTestRunner.java:421)
at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(
JUnitTestRunner.java:912)
at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
The PS really wasn't related to your OOM, and raising that shouldn't
have changed the behavior. All that happens if you go beyond 10,000
tokens is that the rest gets thrown away.

But we're beyond my real knowledge level about SOLR, so I'll defer
to others. A very quick-n-dirty test as to whether you're actually
allocation more memory to the process you *think* you are would be
to bump it ridiculously higher. I'm completely unclear about what
process gets the increased memory relative to the server.

[EMAIL PROTECTED]


On Jan 16, 2008 11:33 AM, David Thibault [EMAIL PROTECTED]
wrote:

 I tried raising the maxFieldLength1/maxFieldLength under
 mainIndex as well as indexDefaults and still no luck. I'm trying to
 upload a text file that is about 8 MB in size.  I think the following
 stack
 trace still points to some sort of overflowed String issue.  Thoughts?
 Solr returned an error: Java heap space  java.lang.OutOfMemoryError: Java
 heap space
 at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
 at java.lang.StringCoding.encode(StringCoding.java:272)
 at java.lang.String.getBytes(String.java:947)
 at org.apache.lucene.index.FieldsWriter.addDocument(FieldsWriter.java:98)
 at org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java
 :107)

 at org.apache.lucene.index.IndexWriter.buildSingleDocSegment(
 IndexWriter.java:977)
 at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:965)
 at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:947)
 at org.apache.solr.update.DirectUpdateHandler2.addDoc(
 DirectUpdateHandler2.java:270)
 at org.apache.solr.handler.XmlUpdateRequestHandler.update(
 XmlUpdateRequestHandler.java:166)
 at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
 XmlUpdateRequestHandler.java:84)
  at org.apache.solr.handler.RequestHandlerBase.handleRequest(
 RequestHandlerBase.java:77)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
 :191)

 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
 SolrDispatchFilter.java:159)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
 ApplicationFilterChain.java:215)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(
 ApplicationFilterChain.java:188)
 at org.apache.catalina.core.StandardWrapperValve.invoke(
 StandardWrapperValve.java:213)
 at org.apache.catalina.core.StandardContextValve.invoke(
 StandardContextValve.java:174)
  at org.apache.catalina.core.StandardHostValve.invoke(
 StandardHostValve.java
 :127)
  at org.apache.catalina.valves.ErrorReportValve.invoke(
 ErrorReportValve.java
 :117)
  at org.apache.catalina.core.StandardEngineValve.invoke(
 StandardEngineValve.java:108)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
 :151)

 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
 :874)

 at

 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
 (Http11BaseProtocol.java:665)
 at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
 PoolTcpEndpoint.java:528)
 at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
 LeaderFollowerWorkerThread.java:81)
  at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
 ThreadPool.java:689)
 at java.lang.Thread.run(Thread.java:619)

 java.io.IOException: Server returned HTTP response code: 500 for URL:
 http://solr:8080/solr/update
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
 HttpURLConnection.java:1170)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postData(
 SimplePostTool.java:134)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postFile(
 SimplePostTool.java:87)
at com.itstrategypartners.sents.solrUpload.Uploader.uploadFile(
 Uploader.java:97)
at com.itstrategypartners.sents.solrUpload.UploaderTest.uploadFile(
 UploaderTest.java:95)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.junit.internal.runners.TestMethodRunner.executeMethodBody(
 TestMethodRunner.java:99)
at org.junit.internal.runners.TestMethodRunner.runUnprotected(
 TestMethodRunner.java:81)
at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
 BeforeAndAfterRunner.java:34)
at org.junit.internal.runners.TestMethodRunner.runMethod(
 TestMethodRunner.java:75)
at org.junit.internal.runners.TestMethodRunner.run(
 TestMethodRunner.java:45)
at
 org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
 TestClassMethodsRunner.java:71)
at org.junit.internal.runners.TestClassMethodsRunner.run(
 

Re: Indexing very large files.

2008-01-16 Thread Walter Underwood
This error means that the JVM has run out of heap space. Increase the
heap space. That is an option on the java command. I set my heap to
200 Meg and do it this way with Tomcat 6:

JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh

wunder

On 1/16/08 8:33 AM, David Thibault [EMAIL PROTECTED] wrote:

 java.lang.OutOfMemoryError: Java heap space



Re: Indexing very large files.

2008-01-16 Thread David Thibault
Nice signature...=)

On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote:

 The PS really wasn't related to your OOM, and raising that shouldn't
 have changed the behavior. All that happens if you go beyond 10,000
 tokens is that the rest gets thrown away.

 But we're beyond my real knowledge level about SOLR, so I'll defer
 to others. A very quick-n-dirty test as to whether you're actually
 allocation more memory to the process you *think* you are would be
 to bump it ridiculously higher. I'm completely unclear about what
 process gets the increased memory relative to the server.

 [EMAIL PROTECTED]


 On Jan 16, 2008 11:33 AM, David Thibault [EMAIL PROTECTED]
 wrote:

  I tried raising the maxFieldLength1/maxFieldLength under
  mainIndex as well as indexDefaults and still no luck. I'm trying to
  upload a text file that is about 8 MB in size.  I think the following
  stack
  trace still points to some sort of overflowed String issue.  Thoughts?
  Solr returned an error: Java heap space  java.lang.OutOfMemoryError:
 Java
  heap space
  at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
  at java.lang.StringCoding.encode(StringCoding.java:272)
  at java.lang.String.getBytes(String.java:947)
  at org.apache.lucene.index.FieldsWriter.addDocument(FieldsWriter.java
 :98)
  at org.apache.lucene.index.DocumentWriter.addDocument(
 DocumentWriter.java
  :107)
 
  at org.apache.lucene.index.IndexWriter.buildSingleDocSegment(
  IndexWriter.java:977)
  at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:965)
  at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:947)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(
  DirectUpdateHandler2.java:270)
  at org.apache.solr.handler.XmlUpdateRequestHandler.update(
  XmlUpdateRequestHandler.java:166)
  at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(
  XmlUpdateRequestHandler.java:84)
   at org.apache.solr.handler.RequestHandlerBase.handleRequest(
  RequestHandlerBase.java:77)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)  at
  org.apache.solr.servlet.SolrDispatchFilter.execute(
 SolrDispatchFilter.java
  :191)
 
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
  SolrDispatchFilter.java:159)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
  ApplicationFilterChain.java:215)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(
  ApplicationFilterChain.java:188)
  at org.apache.catalina.core.StandardWrapperValve.invoke(
  StandardWrapperValve.java:213)
  at org.apache.catalina.core.StandardContextValve.invoke(
  StandardContextValve.java:174)
   at org.apache.catalina.core.StandardHostValve.invoke(
  StandardHostValve.java
  :127)
   at org.apache.catalina.valves.ErrorReportValve.invoke(
  ErrorReportValve.java
  :117)
   at org.apache.catalina.core.StandardEngineValve.invoke(
  StandardEngineValve.java:108)
  at org.apache.catalina.connector.CoyoteAdapter.service(
 CoyoteAdapter.java
  :151)
 
  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
  :874)
 
  at
 
 
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
  (Http11BaseProtocol.java:665)
  at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
  PoolTcpEndpoint.java:528)
  at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
  LeaderFollowerWorkerThread.java:81)
   at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
  ThreadPool.java:689)
  at java.lang.Thread.run(Thread.java:619)
 
  java.io.IOException: Server returned HTTP response code: 500 for URL:
  http://solr:8080/solr/update
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
  HttpURLConnection.java:1170)
 at
 com.itstrategypartners.sents.solrUpload.SimplePostTool.postData(
  SimplePostTool.java:134)
 at
 com.itstrategypartners.sents.solrUpload.SimplePostTool.postFile(
  SimplePostTool.java:87)
 at com.itstrategypartners.sents.solrUpload.Uploader.uploadFile(
  Uploader.java:97)
 at
 com.itstrategypartners.sents.solrUpload.UploaderTest.uploadFile(
  UploaderTest.java:95)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
  NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
  DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at org.junit.internal.runners.TestMethodRunner.executeMethodBody(
  TestMethodRunner.java:99)
 at org.junit.internal.runners.TestMethodRunner.runUnprotected(
  TestMethodRunner.java:81)
 at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
  BeforeAndAfterRunner.java:34)
 at org.junit.internal.runners.TestMethodRunner.runMethod(
  TestMethodRunner.java:75)
 at org.junit.internal.runners.TestMethodRunner.run(
  TestMethodRunner.java:45)
 

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Walter and all,
I had been bumping up the heap for my Java app (running outside of Tomcat)
but I hadn't yet tried bumping up my Tomcat heap.  That seems to have helped
me upload the 8MB file, but it's crashing while uploading a 32MB file now. I
Just bumped tomcat to 1024MB of heap, so I'm not sure what the problem is
now.  I suspect Walter was on to something, since it sort of fixed my
problem.  I will keep troubleshooting the Tomcat memory and go from there..

Best,
Dave

On 1/16/08, Walter Underwood [EMAIL PROTECTED] wrote:

 This error means that the JVM has run out of heap space. Increase the
 heap space. That is an option on the java command. I set my heap to
 200 Meg and do it this way with Tomcat 6:

 JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh

 wunder

 On 1/16/08 8:33 AM, David Thibault [EMAIL PROTECTED] wrote:

  java.lang.OutOfMemoryError: Java heap space




Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Srikant Jakilinki
Thanks for that Shalin. Looks like I have to wait and keep track of 
developments.


Forgetting about indexes that cannot be fit on a single machine 
(distributed search), any links to have Solr running in a 2-machine 
environment? I want to measure how much improvement there will be in 
performance with the addition of machines for computation (space later) 
and I need a 2-machine setup for that.


Thanks
Srikant

Shalin Shekhar Mangar wrote:

Look at http://issues.apache.org/jira/browse/SOLR-303

Please note that it is still work in progress. So you may not be able to use
it immeadiately.



--
Find out how you can get spam free email.
http://www.bluebottle.com/tag/3



Re: Cache size and Heap size

2008-01-16 Thread evgeniy . strokin
I'm using Tomcat. I set Max Size = 5Gb and I checked in profiler that it's 
actually uses whole memory. There is no significant memory use by other 
applications.
Whole change was I increased the size of cache to:
LRU Cache(maxSize=1048576, initialSize=1048576, autowarmCount=524288, [EMAIL 
PROTECTED])
I know this is a lot and I'm going to decrease it, I was just experimenting, 
but I need some guidelines of how to calculate the right size of the cache.
 
Thank you
Gene



- Original Message 
From: Daniel Alheiros [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 10:48:50 AM
Subject: Re: Cache size and Heap size

Hi Gene.

Have you set your app server / servlet container to use allocate some of
this memory to be used?

You can define the maximum and minimum heap size adding/replacing some
parameters on the app server initialization:

-Xmx1536m -Xms1536m

Which app server / servlet container are you using?

Regards,
Daniel Alheiros

On 16/1/08 15:23, Evgeniy Strokin [EMAIL PROTECTED] wrote:

 Hello,.. 
 I have relatively large RAM (10Gb) on my server which is running Solr. I
 increased Cache settings and start to see OutOfMemory exceptions, specially on
 facet search.
 Is anybody has some suggestions how Cache settings related to Memory
 consumptions? What are optimal settings? How they could be calculated?
  
 Thank you for any advise,
 Gene


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Shalin Shekhar Mangar
Solr provides a few scripts to create a multiple-machine deployment. One box
is setup as the master (used primarily for writes) and others as slaves.
Slaves are added as per application requirements. The index is transferred
using rsync. Look at http://wiki.apache.org/solr/CollectionDistribution for
details.

You can put the slaves behind a load balancer or share the slaves among your
front-end servers to measure performance.

On Jan 17, 2008 12:39 AM, Srikant Jakilinki [EMAIL PROTECTED]
wrote:

 Thanks for that Shalin. Looks like I have to wait and keep track of
 developments.

 Forgetting about indexes that cannot be fit on a single machine
 (distributed search), any links to have Solr running in a 2-machine
 environment? I want to measure how much improvement there will be in
 performance with the addition of machines for computation (space later)
 and I need a 2-machine setup for that.

 Thanks
 Srikant

 Shalin Shekhar Mangar wrote:
  Look at http://issues.apache.org/jira/browse/SOLR-303
 
  Please note that it is still work in progress. So you may not be able to
 use
  it immeadiately.
 

 --
 Find out how you can get spam free email.
 http://www.bluebottle.com/tag/3




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Mike Klaas

On 16-Jan-08, at 11:09 AM, Srikant Jakilinki wrote:

Thanks for that Shalin. Looks like I have to wait and keep track of  
developments.


Forgetting about indexes that cannot be fit on a single machine  
(distributed search), any links to have Solr running in a 2-machine  
environment? I want to measure how much improvement there will be  
in performance with the addition of machines for computation (space  
later) and I need a 2-machine setup for that.


If you are looking for automatic replication and load-balancing  
across multiple machines, Solr does not provide that.  The typical  
strategy is as follows: index half the documents on one machine and  
half on another.  Execute both queries simultaneously (using threads,  
f.i.), and combine the results.  You should observe a speed up.


-Mike


Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Mike Klaas

On 15-Jan-08, at 9:23 PM, Srikant Jakilinki wrote:


2) Solr that has to handle a large collective index which has to be
split up on multi-machines
- The index is ever increasing (TB scale) and dynamic and all of it
has to be searched at any point


This will require significant development on your part.  Nutch may be  
able to provide more of what you need OOB.



3) Solr that has to exploit multi-machines because we have plenty of
them in a tightly coupled P2P scenario
- Machines are not a problem but will they be if they are of varied
configurations (PIII to Core2; Linux to Vista; 32-bit to 64-bit; J2SE
1.1 to 1.6)


Solr requires java 1.5, lucene requires java 1.4.  Also, there is  
certainly no point mixing PIII's and modern cpus: trying to achieve  
the appropriate balance between machines of such disparate capability  
will take much more effort than will be gained out of using them.


-Mike


Re: Cache size and Heap size

2008-01-16 Thread Mike Klaas

On 16-Jan-08, at 11:15 AM, [EMAIL PROTECTED] wrote:

I'm using Tomcat. I set Max Size = 5Gb and I checked in profiler  
that it's actually uses whole memory. There is no significant  
memory use by other applications.

Whole change was I increased the size of cache to:
LRU Cache(maxSize=1048576, initialSize=1048576,  
autowarmCount=524288,  
[EMAIL PROTECTED])


autowarmcount  maxSize certainly doesn't make sense.

I know this is a lot and I'm going to decrease it, I was just  
experimenting, but I need some guidelines of how to calculate the  
right size of the cache.


Each filter that matches more than ~3000 documents will occupy  
maxDocs/8 bytes of memory.  Certain kinds of faceting require one  
entry per unique value in a field.  The best way to tune this is to  
monitor your cache hit/expunge statistics for the filter cache (on  
the solr admin statistics screen).


-Mike


Re: Problem with dismax handler when searching Solr along with field

2008-01-16 Thread Mike Klaas

On 16-Jan-08, at 3:15 AM, farhanali wrote:



when i search the query for example

http://localhost:8983/solr/select/?q=categoryqt=dismax

it gives the results but when i want to search on the basis of  
field name

like

http://localhost:8983/solr/select/?q=maincategory:Carsqt=dismax

it does not gives results however

http://localhost:8983/solr/select/?q=maincategory:Cars

return results of cars from field name maincategory

Anyone have some idea???


The dismax handler does not allow you to use lucene query syntax.   
The qf parameter must be used to select the fields to query  
(alternatively, you can provide a lucene-style query in an fq filter).


See the documentation here:
http://wiki.apache.org/solr/DisMaxRequestHandler

-Mike


IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I am using the embedded Solr API for my indexing process. I created a brand new 
index with my application without any problem. I then ran my indexer in 
incremental mode. This process copies the working index to a temporary Solr 
location, adds/updates any records, optimizes the index, and then copies it 
back to the working location. There are currently not any instances of Solr 
reading this index. Also, I commit after every 10 rows. The schema.xml and 
solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info(Optimized index);
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at 
org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect during 
the adds or commits. After this, the index files are still non-optimized.

I know there is not a whole lot to go on here. Anything in particular that I 
should look at?



Re: Big number of conditions of the search

2008-01-16 Thread evgeniy . strokin
I see,.. but I really need to run it on Solr. We have already indexed 
everything.
I don't really want to construct a query with 1K OR conditions, and send to 
Solr to parse it first and run it after. 
May be there is a way to go directly to Lucene, or Solr and run such query from 
Java, passing Array of IDs, or something like this?
Could anybody give me some advise of how to do this in better way?
 
Thank you
Gene



- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, January 11, 2008 12:26:14 AM
Subject: Re: Big number of conditions of the search

Evgeniy - sound like a problem best suited for RDBMS, really.

You can run such an OR query, but you'll have to manually increase the max 
number of clauses allowed (in one of the configs) and make sure the JVM has 
plenty of memory.  But again, this is best done in RDBMS with some count(*) and 
GROUP BY selects.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Evgeniy Strokin [EMAIL PROTECTED]
To: Solr User solr-user@lucene.apache.org
Sent: Thursday, January 10, 2008 4:39:44 PM
Subject: Big number of conditions of the search

Hello, I don't know how to formulate this right, I'll give an example:
I have 20 millions documents with unique ID indexed.
I have list of IDs stored somewhere. I need to run query which will
take documents with ID from my list and gives me some statistic. 
For example: my documents are addresses with unique ID. I have list
which contains 10 thousand IDs of some addresses. I need to find how many
addresses are in NJ from my list? Or another scenario: give me all
states my addresses from and how many addresses in each state (only
addresses from my list)?

So I was thinking I could run facet search by field State, but my
query would be like this: ID:123 OR ID:23987 OR ID:294343  10K such OR
conditions in a row, which is ridicules and not even possible I think.

Could somebody suggest some solution for this?

Thank you
Gene

Re: Indexing very large files.

2008-01-16 Thread David Thibault
OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max.  For
some reason Walter's suggestion helped me get past the 8MB file upload to
Solr but it's still choking on a 32MB file.  Is there a way to set
per-webapp JVM settings in tomcat, or is the overall tomcat JVM sufficient
to set?  I can't see anything in the tomcat manager to suggest that there
are smaller memory limitations for solr or any other webapp (all the demo
webapps that tomcat comes with are still there right now).
Here's the trace I get when I try to upload the 32MB file:


java.lang.OutOfMemoryError: Java heap space
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
:95)
at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java
:61)
at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java
:336)
at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java
:395)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.pipe(
SimplePostTool.java:167)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postData(
SimplePostTool.java:125)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postFile(
SimplePostTool.java:87)
at com.itstrategypartners.sents.solrUpload.Uploader.uploadFile(
Uploader.java:97)
at com.itstrategypartners.sents.solrUpload.UploaderTest.uploadFile(
UploaderTest.java:95)

Any more thoughts on possible causes?

Best,
Dave

On 1/16/08, David Thibault [EMAIL PROTECTED] wrote:

 Walter and all,

 I had been bumping up the heap for my Java app (running outside of Tomcat)
 but I hadn't yet tried bumping up my Tomcat heap.  That seems to have helped
 me upload the 8MB file, but it's crashing while uploading a 32MB file now. I
 Just bumped tomcat to 1024MB of heap, so I'm not sure what the problem is
 now.  I suspect Walter was on to something, since it sort of fixed my
 problem.  I will keep troubleshooting the Tomcat memory and go from there..


 Best,
 Dave

 On 1/16/08, Walter Underwood  [EMAIL PROTECTED] wrote:
 
  This error means that the JVM has run out of heap space. Increase the
  heap space. That is an option on the java command. I set my heap to
  200 Meg and do it this way with Tomcat 6:
 
  JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh
 
  wunder
 
  On 1/16/08 8:33 AM, David Thibault  [EMAIL PROTECTED]
  wrote:
 
   java.lang.OutOfMemoryError: Java heap space
 
 




RE: Indexing very large files.

2008-01-16 Thread Timothy Wonil Lee
I think you should try isolating the problem.
It may turn out that the problem isn't really to do with Solr, but file
uploading.
I'm no expert, but that's what I'd try out in such situation.

Cheers,

Timothy Wonil Lee

http://timundergod.blogspot.com/
http://www.google.com/reader/shared/16849249410805339619


-Original Message-
From: David Thibault [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 17 January 2008 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing very large files.

OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max.  For
some reason Walter's suggestion helped me get past the 8MB file upload to
Solr but it's still choking on a 32MB file.  Is there a way to set
per-webapp JVM settings in tomcat, or is the overall tomcat JVM sufficient
to set?  I can't see anything in the tomcat manager to suggest that there
are smaller memory limitations for solr or any other webapp (all the demo
webapps that tomcat comes with are still there right now).
Here's the trace I get when I try to upload the 32MB file:


java.lang.OutOfMemoryError: Java heap space
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
:95)
at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java
:61)
at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java
:336)
at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java
:395)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.pipe(
SimplePostTool.java:167)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postData(
SimplePostTool.java:125)
at com.itstrategypartners.sents.solrUpload.SimplePostTool.postFile(
SimplePostTool.java:87)
at com.itstrategypartners.sents.solrUpload.Uploader.uploadFile(
Uploader.java:97)
at com.itstrategypartners.sents.solrUpload.UploaderTest.uploadFile(
UploaderTest.java:95)

Any more thoughts on possible causes?

Best,
Dave

On 1/16/08, David Thibault [EMAIL PROTECTED] wrote:

 Walter and all,

 I had been bumping up the heap for my Java app (running outside of Tomcat)
 but I hadn't yet tried bumping up my Tomcat heap.  That seems to have
helped
 me upload the 8MB file, but it's crashing while uploading a 32MB file now.
I
 Just bumped tomcat to 1024MB of heap, so I'm not sure what the problem is
 now.  I suspect Walter was on to something, since it sort of fixed my
 problem.  I will keep troubleshooting the Tomcat memory and go from
there..


 Best,
 Dave

 On 1/16/08, Walter Underwood  [EMAIL PROTECTED] wrote:
 
  This error means that the JVM has run out of heap space. Increase the
  heap space. That is an option on the java command. I set my heap to
  200 Meg and do it this way with Tomcat 6:
 
  JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh
 
  wunder
 
  On 1/16/08 8:33 AM, David Thibault  [EMAIL PROTECTED]
  wrote:
 
   java.lang.OutOfMemoryError: Java heap space
 
 




!DSPAM:478e7768324633671820667!



Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin,

Don't have the answer to EOF but I'm wondering why the index moving.  You 
don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn [EMAIL PROTECTED]
To: Solr solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to a
 temporary Solr location, adds/updates any records, optimizes the index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info(Optimized index);
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at
 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at
 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?






Re: Spell checker index rebuild

2008-01-16 Thread Otis Gospodnetic
Do you trust the spellchecker 100% (not looking at its source now).  I'd peek 
at the index with Luke (Luke I trust :)) and see if that term is really there 
first.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Doug Steigerwald [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:56:35 PM
Subject: Spell checker index rebuild

Having another weird spell checker index issue.  Starting off from a
 clean index and spell check 
index, I'll index everything in example/exampledocs.  On the first
 rebuild of the spellchecker index 
using the query below says the word 'blackjack' exists in the
 spellchecker index.  Great, no problems.

Rebuild it again and the word 'blackjack' does not exist any more.

http://localhost:8983/solr/core0/select?q=blackjackqt=spellcheckercmd=rebuild

Any ideas?  This is with a Solr trunk build from yesterday.

doug





Re: Indexing very large files.

2008-01-16 Thread Yonik Seeley
From your stack trace, it looks like it's your client running out of
memory, right?

SimplePostTool was meant as a command-line replacement to curl to
remove that dependency, not as a recommended way to talk to Solr.

-Yonik

On Jan 16, 2008 4:29 PM, David Thibault [EMAIL PROTECTED] wrote:
 OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max.  For
 some reason Walter's suggestion helped me get past the 8MB file upload to
 Solr but it's still choking on a 32MB file.  Is there a way to set
 per-webapp JVM settings in tomcat, or is the overall tomcat JVM sufficient
 to set?  I can't see anything in the tomcat manager to suggest that there
 are smaller memory limitations for solr or any other webapp (all the demo
 webapps that tomcat comes with are still there right now).
 Here's the trace I get when I try to upload the 32MB file:


 java.lang.OutOfMemoryError: Java heap space
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java
 :95)
 at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java
 :61)
 at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java
 :336)
 at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(StreamEncoder.java
 :395)
 at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:136)
 at java.io.OutputStreamWriter.write(OutputStreamWriter.java:191)
 at com.itstrategypartners.sents.solrUpload.SimplePostTool.pipe(
 SimplePostTool.java:167)
 at com.itstrategypartners.sents.solrUpload.SimplePostTool.postData(
 SimplePostTool.java:125)
 at com.itstrategypartners.sents.solrUpload.SimplePostTool.postFile(
 SimplePostTool.java:87)
 at com.itstrategypartners.sents.solrUpload.Uploader.uploadFile(
 Uploader.java:97)
 at com.itstrategypartners.sents.solrUpload.UploaderTest.uploadFile(
 UploaderTest.java:95)

 Any more thoughts on possible causes?

 Best,
 Dave


Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin,

Perhaps you want to look at how Solr can be used in a master-slave setup.  This 
will separate your indexing from searching.  Don't have the URL, but it's on 
zee Wiki.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 5:25:34 PM
Subject: Re: IOException: read past EOF during optimize phase

It is more of a file structure thing for our application. We build in
 one place and do our index syncing in a different place. I doubt it is
 relevant to this issue, but figured I would include this information
 anyway.

- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 2:21:31 PM
Subject: Re: IOException: read past EOF during optimize phase


Kevin,

Don't have the answer to EOF but I'm wondering why the index
 moving.  You don't need to do that as far as Solr is concerned.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Kevin Osborn [EMAIL PROTECTED]
To: Solr solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 3:07:23 PM
Subject: IOException: read past EOF during optimize phase

I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a
 temporary Solr location, adds/updates any records, optimizes the
 index,
 and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every
 10 rows. The schema.xml and solrconfig.xml files have not changed.

Here is my function call.
protected void optimizeProducts() throws IOException {
UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
CommitUpdateCommand commitCmd = new CommitUpdateCommand(true);
commitCmd.optimize = true;

updateHandler.commit(commitCmd);

log.info(Optimized index);
}

So, during the optimize phase, I get the following stack trace:
java.io.IOException: read past EOF
at


 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
at


 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
at


 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
at


 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
at


 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
at


 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
at ...

There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

I know there is not a whole lot to go on here. Anything in particular
 that I should look at?












Re: Indexing very large files.

2008-01-16 Thread Otis Gospodnetic
David,
I bet you can quickly identify the source using YourKit or another Java 
profiler  jmap command line tool might also give you some direction.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: David Thibault [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 1:31:23 PM
Subject: Re: Indexing very large files.

Walter and all,
I had been bumping up the heap for my Java app (running outside of
 Tomcat)
but I hadn't yet tried bumping up my Tomcat heap.  That seems to have
 helped
me upload the 8MB file, but it's crashing while uploading a 32MB file
 now. I
Just bumped tomcat to 1024MB of heap, so I'm not sure what the problem
 is
now.  I suspect Walter was on to something, since it sort of fixed my
problem.  I will keep troubleshooting the Tomcat memory and go from
 there..

Best,
Dave

On 1/16/08, Walter Underwood [EMAIL PROTECTED] wrote:

 This error means that the JVM has run out of heap space. Increase the
 heap space. That is an option on the java command. I set my heap to
 200 Meg and do it this way with Tomcat 6:

 JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh

 wunder

 On 1/16/08 8:33 AM, David Thibault [EMAIL PROTECTED]
 wrote:

  java.lang.OutOfMemoryError: Java heap space







Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I did see that bug, which made me suspect Lucene. In my case, I tracked down 
the problem. It was my own application. I was using Java's 
FileChannel.transferTo functions to copy my index from one location to another. 
One of the files is bigger than 2^31-1 bytes. So, one of my files was corrupted 
during the copy because I was just doing one pass. I now loop the copy function 
until the entire file is copied and everything works fine.

DOH!

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, January 16, 2008 4:57:08 PM
Subject: Re: IOException: read past EOF during optimize phase


This may be a Lucene bug... IIRC, I saw at least one other lucene user
with a similar stack trace.  I think the latest lucene version (2.3
dev) should fix it if that's the case.

-Yonik

On Jan 16, 2008 3:07 PM, Kevin Osborn [EMAIL PROTECTED] wrote:
 I am using the embedded Solr API for my indexing process. I created a
 brand new index with my application without any problem. I then ran my
 indexer in incremental mode. This process copies the working index to
 a temporary Solr location, adds/updates any records, optimizes the
 index, and then copies it back to the working location. There are currently
 not any instances of Solr reading this index. Also, I commit after
 every 10 rows. The schema.xml and solrconfig.xml files have not
 changed.

 Here is my function call.
 protected void optimizeProducts() throws IOException {
 UpdateHandler updateHandler = m_SolrCore.getUpdateHandler();
 CommitUpdateCommand commitCmd = new
 CommitUpdateCommand(true);
 commitCmd.optimize = true;

 updateHandler.commit(commitCmd);

 log.info(Optimized index);
 }

 So, during the optimize phase, I get the following stack trace:
 java.io.IOException: read past EOF
 at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:89)
 at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:34)
 at
 org.apache.lucene.store.IndexInput.readChars(IndexInput.java:107)
 at
 org.apache.lucene.store.IndexInput.readString(IndexInput.java:93)
 at
 org.apache.lucene.index.FieldsReader.addFieldForMerge(FieldsReader.java:211)
 at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:119)
 at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:323)
 at
 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:206)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
 at
 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1835)
 at
 org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1195)
 at
 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:508)
 at ...

 There are no exceptions or anything else that appears to be incorrect
 during the adds or commits. After this, the index files are still
 non-optimized.

 I know there is not a whole lot to go on here. Anything in particular
 that I should look at?







Logging in Solr

2008-01-16 Thread David Thibault
All,
I'm new to Solr and Tomcat and I'm trying to track down some odd errors.
 How do I set up Tomcat to do fine-grained Solr-specific logging?  I have
looked around enough to know that it should be possible to do per-webapp
logging in Tomcat 5.5, but the details are hard to follow for a newbie.  Any
suggestions would be greatly appreciated.

Best,
Dave


Re: conceptual issues with solr

2008-01-16 Thread Norberto Meijome
On Wed, 16 Jan 2008 16:54:56 +0100
Philippe Guillard [EMAIL PROTECTED] wrote:

 Hi here,
 
 It seems that Lucene accepts any kind of XML document but Solr accepts only
 flat name/value pairs inside a document to be indexed.
 You'll find below what I'd like to do, Thanks for help of any kind !
 
 Phil
 

Hey Phil,

 
 I need to index products (hotels) which have a price by date, then search
 them by date or date range and price range.
 Is there a way to do that with Solr?

yes - look at the data types definition (somewhere in the wiki of the sample
schema.xml) about data-types for indexing dates and integers,etc There are some
caveats about using date data type fields (too much resolution, can slow down
too much..)

 
 At the moment i have a document for each hotel :
 add
 doc
 field name=urlhttp:///yyy/field
 field name=id1/field
 field name=nameHotel Opera/field
 field name=category4 stars/field
 .
 /doc
 /add
 
 I would need to add my dates/price values like this but it is forbidden in
 Solr indexing:
 date value=30/01/2008 price=200
 date value=31/01/2008 price=150
 
 Otherwise i could define a default field (being an integer) and have as many
 fields as dates, like this:
 field name=30/01/2008200/field
 field name=31/01/2008150/field
 indexing would accept it but i think i will not be able to search or sort by
 date

for simple dates like that, why not make use of dynamic fields ? define , for
example, bydate_* as dynamic fields, then you can do :

field name=bydate_MMDD/field

so, from your example : 
field name=bydate_20080131200/field
field name=bydate_20080130150/field


 The only solution i found at the moment is to create a document for each
 date/price
 add
 doc
 field name=urlhttp:///yyy/field
 field name=id1/field
 field name=nameHotel Opera/field
 field name=date30/01/2008/field
 field name=price200/field
 /doc
 doc
 field name=url http:///yyy/field
 field name=id1/field
 field name=nameHotel Opera/field
 field name=date31/01/2008/field
 field name=price150/field
 /doc
 /add

If the field 'id' is your schemas ID, then this wouldn't work , but sure,
the approach would be valid though a bit wasteful wrt to storing the metadata
about the hotel There was a thread some time ago in this list (a month or 2
ago) about clever uses of the field defined as ID in the schema.

 then i'll have many documents for 1 hotel
 and in order to search by date range i would need more documents
 like this :
 field name=date-range28/01/2008 to 31/01/2008/field
 field name=date-range29/01/2008 to 31/01/2008/field
 field name=date-range30/01/2008 to 31/01/2008/field
 
 Since i need to index many other informations about an hotel (address,
 telephone, amenities etc...) i wouldn' like to duplicate too much
 information, and i think it would not be scalable to search first in a dates
 index then in hotels index to retrieve hotel information.
 
 Any idea?

It strikes me you'd probably want a relational DB for this kind
of thing

B
_
{Beto|Norberto|Numard} Meijome

Unix is user friendly. However, it isn't idiot friendly.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Solr schema filters

2008-01-16 Thread Chris Hostetter
: For this exact example, use the WordDelimiterFilter exactly as
: configured in the text fieldType in the example schema that ships
: with solr.  The trick is to then use some slop when querying.
: 
: FT-50-43 will be indexed as FT, 50, 43 / 5043  (the last two tokens
: are in the same position).
: Now when querying, FT-5043 won't match without slop because there is
: a 50 token in the middle of the indexed terms... so try FT-5043~1

FYI: this was the motivation for the qs param on dismax ... 

http://localhost:8983/solr/select?debugQuery=trueqt=dismaxpf=qf=textq=FT-5043qs=3


-Hoss



Re: DisMax Syntax

2008-01-16 Thread Chris Hostetter

: I may be mistaken, but this is not equivalent to my query.In my query i have
: matches for x1, matches for x2 without slope and/or boosting and then match
: to x1 x2 (exact match) with slope (~) a and boost (b) in order to have
: results with exact match score better.
: The total score is the sum of all the above.
: Your query seems diff

the structure of the query will look different in debugging, and the 
scores won't be exactly the same, but the concept is the same.



-Hoss



Re: Fuzziness with DisMaxRequestHandler

2008-01-16 Thread Chris Hostetter

: Is there any way to make the DisMaxRequestHandler a bit more forgiving with
: user queries, I'm only getting results when the user enters a close to
: perfect match. I'd like to allow near matches if possible, but I'm not sure
: how to add something like this when special query syntax isn't allowed.

the principle goal of dismax was to leave query string syntax as simple 
as possible, and move the mechanisms for controlling the query structure 
into other paramaters.

the idea of making Queries Fuzzy is an interesting one ... it's something 
i don't remember anyone ever asking about before, and i'd never really 
considered it (from a UI perspective i find did you mean style 
spellchecking to be a better approach then making a user's query 
implicitly fuzzy) but it seems like it would be pretty easy to add support 
for something ...  one approach qould be to add a numeric fuzz 
parameter, that if set would make the DisMaxQUeryParser return 
FuzzyQueries in place of TermQueries ... an alternate appraoch would be to 
allow per field fuzziness by tweaking the qf syntax so instead of just 
fieldA^4 where 4 is the boost value, you could have fieldA^4~0.8 where 4 
is the boost value and 0.8 is the fuzziness factor

I haven't thought about it hard enough to have an opinion about which 
would make more sense ... but the overall idea certainly seems like it 
could be a usefull feature if osmeone wants to submit a patch.




-Hoss



Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-16 Thread Chris Hostetter

: Does anyone have more experience doing this kind of stuff and whants to share?

My advice: don't.

I work with (or work with people who work with) about two dozen Solr 
indexes -- we don't attempt to update a single one of them in any sort of 
transactional way.  Some of them are updated real time (ie: as soon as 
the authoritative DB is updated by some code, the same code updates the 
Solr index; Some of them are updated in batch (ie: once every N minutes 
code checks a log of all logical objects modified/deleted from the DB and 
sends the adds/delets to Solr; And some are only ever rebuilt from scrath 
every N hours (because the data in them isn't very time sensative and 
rebuilding from scratch is easier then dealing with incremental or batch 
updates.

But as i said: we never attempt to be transactional about it, for a few 
reasons:
  1) why should it be part of the transaction?  a Solr index is a 
denormalized/inverted index of data .. why should a tool (or any other 
process) be prevented from writting to an authoritative data store just 
becuase a non authoritative copy of that data can't be updated?  ... if 
you used MySQL with replication, would you really want to block all writes 
to the master just because there's a glitch in replicating to a slave?
  2) why worry about it?  It's relaly a non issue.  If an add or 
delete fails it's usually either developer error (ie: the code 
generating your add statements thinks there's a field that doesn't 
exist), a transient timeout (maybe because of a commit in progress) or 
network glitch (have the client retry once or twice), or in very rare 
instances the whole Solr index was completely jacked (either from disk 
failure, or OOM due to a huge spike in load) and we want to revert 
to a backup of the index in the shortterm and rebuild the index from 
scratch to play it safe.
  3) why limit yourself?  you're going to want the ability to trigger 
arbitrary indexing of your data objects at anytime -- if for no other 
reason then so when you decide to add a field to your index you can 
reindex them all -- so why make your index updating code inherently tied 
to your DB updating code?


As for your specific question along the lines of why can't we do a 
mix of adds and deletes all as part of one update message? the answer 
is because no one ever wrote any code to parse messages like that.  BUT! 
... that's not the question you really want to ask.  the question you 
relaly want to ask is: *IF* someone wrote code to allow a mix of adds 
and deletes all as part of one update message, would it solve my problem 
of wanting to be able to modify my solr index transactionally? and the 
answer is No.  Even if Solr accepted update messages that looked 
like this...

update
   deleteid42/id/delete
   addfield name=id7/fieldfield name=abb/field/add
   addfield name=id666/fieldfield name=a/field/add
/update

...the low level lucene calls that it would be doing internall still 
aren't transactional, so the first delete and add might succeed, but 
if there was then some kind of internal error, or a timeout because the 
first add took a while (maybe it triggered a segment merge) and the second 
add didn't happen -- the first two commands would have still been 
executed, and there would be no way to rollback.

In a nutshell: you would be no better off then if your client code has 
sent all three as seperate update messages.


-Hoss



Re: Restrict values in a multivalued field

2008-01-16 Thread Chris Hostetter

: In my schema I have a multivalued field, and the values of that field are
: stored and indexed in the index. I wanted to know if its possible to
: restrict the number of multiple values being returned from that field, on a
: search? And how? Because, lets say, if I have thousands of values in that
: multivalued field, returning all of them would be a lot of load on the
: system. So, I want to restrict it to send me only say, 50 values out of the
: thousands.

How would Solr pick which 50 to return?
Why not index all thousand (so you can search on them) in an unstored 
field, and only store the 50 you want returned in a seperate (unindexed 
field).  the index size will be exactly the same -- admittedly you'll have 
to send a bit more data over the wire for each doc you index, but that's 
probably a trivial amount (assuming the 50 values you want to store are 
representative of the thousands you index you are talking about at most 
a 5% increases in the amount of data you send solr on each add)





-Hoss



Re: Fwd: Solr Text field

2008-01-16 Thread Chris Hostetter
: searches. That is fine by me. But I'm still at the first question:
: How do I conduct a wildcard search for ARIZONA on a solr.textField?  I tried

as i said:  it really depends on what kind of index analyzer you have 
configured for the field -- the query analyzer isn't used at all when 
dealing with ildcard and prefix queries, so what you type in before the 
* must match the prefix of an actually indexed term that makes it into 
your index as a result of the index analyzer.

If you add the debugQuery=true param to your queries, and compare the 
differnces you see in the parsedquery_toString value between searching for 
field:AR* and field:Arizona and field:ARIZONA you'll see what i mean.

if you take a look at the Luke request handler which will show you the 
actual raw terms in your index (or the top N anyway), you can see what's 
really in there -- or --if you use the analysis.jsp interface, it will 
show you 
what Terms your analyzer will actaully produce if you index the raw string 
ARIZONA ... whatever you see there is what you need to be searching for 
when you do your prefix queries.


-Hoss



Re: batch indexing takes more time than shown on SOLR output -- something to do with IO?

2008-01-16 Thread Chris Hostetter
: INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42
: more)
: ]} 0 875
: 
: However, when timing this instruction on the client-side (I use SOlrJ --
: req.process(server)) I get totally different numbers (in the beginning the
: client-side measured time is about 2 seconds on average but after some time
: this time goes up to about 30-40 seconds, altough the solr-outputted time
: stays between 0.8-1.3 seconds? 

as Otis mentioned, that time is the raw processing of the request, not 
counting any network IO between the client and the server, or any time 
spent by the ResponseWriter formating the response.  you can get more 
accurate numbers about exctly how long the server spent doing all of these 
things from the access log of your servlet container (which should be 
recording the time only after every last byte is written back to the 
client.

that said: there's really no reason for as big a descrepency as you are 
describing particularly on updates where the ResposneWriter has almost 
nothing to do (30-40 seconds per update?!?!?!)

I'm not very familiar with SolrJ, but are you by any chance using it in a 
way that sends a commit after every update command?  (commits can get 
successifly longer as your index gets bigger.)

: Does this have anything to do with costly IO-activity that is accounted for
: in the SOLR output? If this is true, what tool do you recommend using to
: monitor IO-activity?

Which IO-activity are you talking about?




-Hoss



Re: FunctionQuery in a custom request handler

2008-01-16 Thread Chris Hostetter

: How do I access the ValueSource for my DateField? I'd like to use a
: ReciprocalFloatFunction from inside the code, adding it aside others in the
: main BooleanQuery.

The FieldType API provides a getValueSource method (so every FieldType 
picks it's own best ValueSource implementaion).


-Hoss