Re: Using Solr Analyzers in Lucene

2010-10-05 Thread Max Lynch
I guess I missed the init() method.  I was looking at the factory and
thought I saw config loading stuff (like getInt) which I assumed meant it
need to have schema.xml available.

Thanks!

-Max

On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter wrote:

> Hi Max,
>
> why don't you use WordDelimiterFilterFactory directly? I'm doing the same
> stuff inside my own analyzer:
>
> final Map args = new HashMap();
>
> args.put("generateWordParts", "1");
> args.put("generateNumberParts", "1");
> args.put("catenateWords", "0");
> args.put("catenateNumbers", "0");
> args.put("catenateAll", "0");
> args.put("splitOnCaseChange", "1");
> args.put("splitOnNumerics", "1");
> args.put("preserveOriginal", "1");
> args.put("stemEnglishPossessive", "0");
> args.put("language", "English");
>
> wordDelimiter = new WordDelimiterFilterFactory();
> wordDelimiter.init(args);
> stream = wordDelimiter.create(stream);
>
> --
> Kind regards,
> Mathias
>
> > -Original Message-
> > From: Max Lynch [mailto:ihas...@gmail.com]
> > Sent: Tuesday, October 05, 2010 1:03 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Using Solr Analyzers in Lucene
> >
> > I have made progress on this by writing my own Analyzer.  I basically
> added
> > the TokenFilters that are under each of the solr factory classes.  I had
> to
> > copy and paste the WordDelimiterFilter because, of course, it was package
> > protected.
> >
> >
> >
> > On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch  wrote:
> >
> > > Hi,
> > > I asked this question a month ago on lucene-user and was referred here.
> > >
> > > I have content being analyzed in Solr using these tokenizers and
> filters:
> > >
> > >  > > positionIncrementGap="100">
> > >
> > >  
> > >
> > >  > > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > >  language="English"
> > > protected="protwords.txt"/>
> > >   
> > >   
> > > 
> > >  > > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > >  language="English"
> > > protected="protwords.txt"/>
> > >   
> > > 
> > >
> > > Basically I want to be able to search against this index in Lucene with
> one
> > > of my background searching applications.
> > >
> > > My main reason for using Lucene over Solr for this is that I use the
> > > highlighter to keep track of exactly which terms were found which I use
> for
> > > my own scoring system and I always collect the whole set of found
> > > documents.  I've messed around with using Boosts but it wasn't fine
> grained
> > > enough and I wasn't able to effectively create a score threshold (would
> > > creating my own scorer be a better idea?)
> > >
> > > Is it possible to use this analyzer from Lucene, or at least re-create
> it
> > > in code?
> > >
> > > Thanks.
> > >
> > >
>
>


Re: Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
I have made progress on this by writing my own Analyzer.  I basically added
the TokenFilters that are under each of the solr factory classes.  I had to
copy and paste the WordDelimiterFilter because, of course, it was package
protected.



On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch  wrote:

> Hi,
> I asked this question a month ago on lucene-user and was referred here.
>
> I have content being analyzed in Solr using these tokenizers and filters:
>
>  positionIncrementGap="100">
>
>  
>
>  generateWordParts="0" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>   
>   
> 
>  generateWordParts="0" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>   
> 
>
> Basically I want to be able to search against this index in Lucene with one
> of my background searching applications.
>
> My main reason for using Lucene over Solr for this is that I use the
> highlighter to keep track of exactly which terms were found which I use for
> my own scoring system and I always collect the whole set of found
> documents.  I've messed around with using Boosts but it wasn't fine grained
> enough and I wasn't able to effectively create a score threshold (would
> creating my own scorer be a better idea?)
>
> Is it possible to use this analyzer from Lucene, or at least re-create it
> in code?
>
> Thanks.
>
>


Using Solr Analyzers in Lucene

2010-10-04 Thread Max Lynch
Hi,
I asked this question a month ago on lucene-user and was referred here.

I have content being analyzed in Solr using these tokenizers and filters:


   
 




  
  




  


Basically I want to be able to search against this index in Lucene with one
of my background searching applications.

My main reason for using Lucene over Solr for this is that I use the
highlighter to keep track of exactly which terms were found which I use for
my own scoring system and I always collect the whole set of found
documents.  I've messed around with using Boosts but it wasn't fine grained
enough and I wasn't able to effectively create a score threshold (would
creating my own scorer be a better idea?)

Is it possible to use this analyzer from Lucene, or at least re-create it in
code?

Thanks.


Search a URL

2010-09-23 Thread Max Lynch
Is there a tokenizer that will allow me to search for parts of a URL?  For
example, the search "google" would match on the data "
http://mail.google.com/dlkjadf";

This tokenizer factory doesn't seem to be sufficient:









 

 
 
 
 


Thanks.


Re: Updating document without removing fields

2010-08-30 Thread Max Lynch
Thanks Lance.

I have decided to just put all of my processing on a bigger server along
with solr.  It's too bad, but I can manage.

-Max

On Sun, Aug 29, 2010 at 9:59 PM, Lance Norskog  wrote:

> No. Document creation is all-or-nothing, fields are not updateable.
>
> I think you have to filter all of your field changes through a "join"
> server. That is,
> all field updates could go to a database and the master would read
> document updates
> from that database. Or, you could have one updater feed updates to the
> other, The
> sends all updates to the master.
>
> Lance
>
> On Sun, Aug 29, 2010 at 6:19 PM, Max Lynch  wrote:
> > Hi,
> > I have a master solr server and two slaves.  On each of the slaves I have
> > programs running that read the slave index, do some processing on each
> > document, add a few new fields, and commit the changes back to the
> master.
> >
> > The problem I'm running into right now is one slave will update one
> document
> > and the other slave will eventually update the same document, but the
> > changes will overwrite each other.  For example, one slave will add a
> field
> > and commit the document, but the other slave won't have that field yet so
> it
> > won't duplicate the document when it updates the doc with its own new
> > field.  This causes the document to miss one set of fields from one of
> the
> > slaves.
> >
> > Can I update a document without having to recreate it?  Is there a way to
> > update the slave and then have the slave commit the changes to the master
> > (adding new fields in the process?)
> >
> > Thanks.
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Updating document without removing fields

2010-08-29 Thread Max Lynch
Hi,
I have a master solr server and two slaves.  On each of the slaves I have
programs running that read the slave index, do some processing on each
document, add a few new fields, and commit the changes back to the master.

The problem I'm running into right now is one slave will update one document
and the other slave will eventually update the same document, but the
changes will overwrite each other.  For example, one slave will add a field
and commit the document, but the other slave won't have that field yet so it
won't duplicate the document when it updates the doc with its own new
field.  This causes the document to miss one set of fields from one of the
slaves.

Can I update a document without having to recreate it?  Is there a way to
update the slave and then have the slave commit the changes to the master
(adding new fields in the process?)

Thanks.


Re: Duplicating a Solr Doc

2010-08-25 Thread Max Lynch
It seems like this is a way to accomplish what I was looking for:
CoreContainer coreContainer = new CoreContainer();
File home = new
File("/home/max/packages/test/apache-solr-1.4.1/example/solr");
File f = new File(home, "solr.xml");


coreContainer.load("/home/max/packages/test/apache-solr-1.4.1/example/solr",
f);

SolrCore core = coreContainer.getCore("newsblog");
IndexSchema schema = core.getSchema();
DocumentBuilder builder = new DocumentBuilder(schema);


// get a Lucene Doc
// Document d = ...


SolrDocument solrDocument = new SolrDocument();

builder.loadStoredFields(solrDocument, d);
logger.debug("Loaded stored date: " +
solrDocument.getFieldValue("date_added_solr"));

However, one thing that scares me is the warning message I get from the
CoreContainer:
 [java] Aug 25, 2010 10:25:23 PM org.apache.solr.update.SolrIndexWriter
finalize
 [java] SEVERE: SolrIndexWriter was not closed prior to finalize(),
indicates a bug -- POSSIBLE RESOURCE LEAK!!!

I'm not sure what exactly triggers that but it's a result of the code I
posted above.

On Wed, Aug 25, 2010 at 10:49 PM, Max Lynch  wrote:

> Right now I am doing some processing on my Solr index using Lucene Java.
> Basically, I loop through the index in Java and do some extra processing of
> each document (processing that is too intensive to do during indexing).
>
> However, when I try to update the document in solr with new fields (using
> SolrJ), the document either loses fields I don't explicitly set, or if I
> have Solr-specific fields such as a solr "date" field type, I am not able to
> copy the value as I can't read the value from Java.
>
> Is there a way to add a field to a solr document without having to
> re-create the document?  If not, how can I read the value of a Solr date in
> java?  Document.get("date_field") returns null even though the value shows
> up when I access it through solr.  If I could read this value I could just
> copy the fields from the Lucene Document to a SolrInputDocument.
>
> Thanks.
>


Re: Delete by query issue

2010-08-25 Thread Max Lynch
Thanks Lance.  I'll give that a try going forward.

On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog  wrote:

> Here's the problem: the standard Solr parser is a little weird about
> negative queries. The way to make this work is to say
>*:* AND -field:[* TO *]
>
> This means "select everything AND only these documents without a value
> in the field".
>
> On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch  wrote:
> > I was trying to filter out all documents that HAVE that field.  I was
> trying
> > to delete any documents where that field had empty values.
> >
> > I just found a way to do it, but I did a range query on a string date in
> the
> > Lucene DateTools format and it worked, so I'm satisfied.  However, I
> believe
> > it worked because all of my documents have values for that field.
> >
> > Oh well.
> >
> > -max
> >
> > On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹)  >wrote:
> >
> >> Excuse me, what's the hyphen before  the field name 'date_added_solr'?
> Is
> >> this some kind of new query format that I didn't know?
> >>
> >> -date_added_solr:[* TO *]'
> >>
> >> - Original Message -
> >> From: "Max Lynch" 
> >> To: 
> >> Sent: Thursday, August 26, 2010 6:12 AM
> >> Subject: Delete by query issue
> >>
> >>
> >> > Hi,
> >> > I am trying to delete all documents that have null values for a
> certain
> >> > field.  To that effect I can see all of the documents I want to delete
> by
> >> > doing this query:
> >> > -date_added_solr:[* TO *]
> >> >
> >> > This returns about 32,000 documents.
> >> >
> >> > However, when I try to put that into a curl call, no documents get
> >> deleted:
> >> > curl http://localhost:8985/solr/newsblog/update?commit=true -H
> >> > "Content-Type: text/xml" --data-binary
> >> '-date_added_solr:[*
> >> > TO *]'
> >> >
> >> > Solr responds with:
> >> > 
> >> > 0 >> > name="QTime">364
> >> > 
> >> >
> >> > But nothing happens, even if I explicitly issue a commit afterward.
> >> >
> >> > Any ideas?
> >> >
> >> > Thanks.
> >> >
> >>
> >>
> >>
> >>
> 
> >>
> >>
> >>
> >> %<&b6G$J0T.'$$'d(l/f,r!C
> >> Checked by AVG - www.avg.com
> >> Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
> >> 14:34:00
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Duplicating a Solr Doc

2010-08-25 Thread Max Lynch
Right now I am doing some processing on my Solr index using Lucene Java.
Basically, I loop through the index in Java and do some extra processing of
each document (processing that is too intensive to do during indexing).

However, when I try to update the document in solr with new fields (using
SolrJ), the document either loses fields I don't explicitly set, or if I
have Solr-specific fields such as a solr "date" field type, I am not able to
copy the value as I can't read the value from Java.

Is there a way to add a field to a solr document without having to re-create
the document?  If not, how can I read the value of a Solr date in java?
Document.get("date_field") returns null even though the value shows up when
I access it through solr.  If I could read this value I could just copy the
fields from the Lucene Document to a SolrInputDocument.

Thanks.


Re: Delete by query issue

2010-08-25 Thread Max Lynch
I was trying to filter out all documents that HAVE that field.  I was trying
to delete any documents where that field had empty values.

I just found a way to do it, but I did a range query on a string date in the
Lucene DateTools format and it worked, so I'm satisfied.  However, I believe
it worked because all of my documents have values for that field.

Oh well.

-max

On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) wrote:

> Excuse me, what's the hyphen before  the field name 'date_added_solr'? Is
> this some kind of new query format that I didn't know?
>
> -date_added_solr:[* TO *]'
>
> - Original Message -
> From: "Max Lynch" 
> To: 
> Sent: Thursday, August 26, 2010 6:12 AM
> Subject: Delete by query issue
>
>
> > Hi,
> > I am trying to delete all documents that have null values for a certain
> > field.  To that effect I can see all of the documents I want to delete by
> > doing this query:
> > -date_added_solr:[* TO *]
> >
> > This returns about 32,000 documents.
> >
> > However, when I try to put that into a curl call, no documents get
> deleted:
> > curl http://localhost:8985/solr/newsblog/update?commit=true -H
> > "Content-Type: text/xml" --data-binary
> '-date_added_solr:[*
> > TO *]'
> >
> > Solr responds with:
> > 
> > 0 > name="QTime">364
> > 
> >
> > But nothing happens, even if I explicitly issue a commit afterward.
> >
> > Any ideas?
> >
> > Thanks.
> >
>
>
>
> 
>
>
>
> %<&b6G$J0T.'$$'d(l/f,r!C
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10
> 14:34:00
>


Delete by query issue

2010-08-25 Thread Max Lynch
Hi,
I am trying to delete all documents that have null values for a certain
field.  To that effect I can see all of the documents I want to delete by
doing this query:
-date_added_solr:[* TO *]

This returns about 32,000 documents.

However, when I try to put that into a curl call, no documents get deleted:
curl http://localhost:8985/solr/newsblog/update?commit=true -H
"Content-Type: text/xml" --data-binary '-date_added_solr:[*
TO *]'

Solr responds with:

0364


But nothing happens, even if I explicitly issue a commit afterward.

Any ideas?

Thanks.


Re: Duplicate a core

2010-08-03 Thread Max Lynch
What I'm doing now is just adding the documents to the other core each night
and deleting old documents from the other core when I'm finished.  Is there
a better way?

On Tue, Aug 3, 2010 at 4:38 PM, Max Lynch  wrote:

> Is it possible to duplicate a core?  I want to have one core contain only
> documents within a certain date range (ex: 3 days old), and one core with
> all documents that have ever been in the first core.  The small core is then
> replicated to other servers which do "real-time" processing on it, but the
> "archive" core exists for longer term searching.
>
> I understand I could just connect to both cores from my indexer, but I
> would like to not have to send duplicate documents across the network to
> save bandwidth.
>
> Is this possible?
>
> Thanks.
>


Duplicate a core

2010-08-03 Thread Max Lynch
Is it possible to duplicate a core?  I want to have one core contain only
documents within a certain date range (ex: 3 days old), and one core with
all documents that have ever been in the first core.  The small core is then
replicated to other servers which do "real-time" processing on it, but the
"archive" core exists for longer term searching.

I understand I could just connect to both cores from my indexer, but I would
like to not have to send duplicate documents across the network to save
bandwidth.

Is this possible?

Thanks.


Re: Know which terms are in a document

2010-07-29 Thread Max Lynch
Yea, I've had mild success with the highlighting approach with lucene, but
wasn't sure if there was another method available from solr.

Thanks Mike.

On Thu, Jul 29, 2010 at 5:17 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> This is a fairly frequently requested and missing feature in Lucene/Solr...
>
> Lucene actually "knows" this information while it's scoring each
> document; it's just that it in no way tries to record that.
>
> If you will only do this on a few documents (eg the one page of
> results) then piggybacking on the highlighter is an OK approach.
>
> If you need it on more docs than that, then probably you should
> customize how your queries are scored to also tally up which docs had
> which terms.
>
> Mike
>
> On Wed, Jul 28, 2010 at 6:53 PM, Max Lynch  wrote:
> > I would like to be search against my index, and then *know* which of a
> set
> > of given terms were found in each document.
> >
> > For example, let's say I want to show articles with the word "pizza" or
> > "cake" in them, but would like to be able to say which of those two was
> > found.  I might use this to handle the article differently if it is about
> > pizza, or if it is about cake.  I understand I can do multiple queries
> but I
> > would like to avoid that.
> >
> > One thought I had was to use a highlighter and only return a fragment
> with
> > the highlighted word, but I'm not sure how to do this with the various
> > highlighting options.
> >
> > Is there a way?
> >
> > Thanks.
> >
>


Know which terms are in a document

2010-07-28 Thread Max Lynch
I would like to be search against my index, and then *know* which of a set
of given terms were found in each document.

For example, let's say I want to show articles with the word "pizza" or
"cake" in them, but would like to be able to say which of those two was
found.  I might use this to handle the article differently if it is about
pizza, or if it is about cake.  I understand I can do multiple queries but I
would like to avoid that.

One thought I had was to use a highlighter and only return a fragment with
the highlighted word, but I'm not sure how to do this with the various
highlighting options.

Is there a way?

Thanks.


Re: CommonsHttpSolrServer add document hangs

2010-07-20 Thread Max Lynch
I'm still having trouble with this.  My program will run for a while, then
hang up at the same place.  Here is my add/commit process:

I am using StreamingUpdateSolrServer with queue size = 100 and num threads =
3.  My indexing process spawns 8 threads to process a subset of RSS feeds
which each thread then loops through.  Once a thread has processed a new
article, it constructs a new SolrInputDocument, creates a temporary
Collection containing just the one new document, then
calls server.add(docs).  I never call commit() or optimize() from my java
code (I did before though, but I took that out).

On the server side, I have these related settings:
  

  300
  1



I also have replication set up, as this is the master, here are the
settings:


  commit
  startup
  schema.xml,stopwords.txt



Those are the only extra settings I've set.  I also have a cron job running
every minute executing this command:
curl http://localhost:8985/solr/mycore/update -F stream.body=' '

Otherwise I don't see the numDocs number increase on the admin statistics
page.

This process will soon be ONLY for indexing.  Is there a better way to
optimize it?  I replicate from the slaves every 60 seconds, and I want
documents to be available to the slaves as soon as possible.  Currently I
have a search process that has some IndexSearcher's on the Solr index (it's
a pure Lucene program), could that be causing issues?  This process never
opens an IndexWriter.

Thanks!


On Tue, Jul 13, 2010 at 10:52 AM, Max Lynch  wrote:

> Great, thanks!
>
>
> On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom  > wrote:
>
>> If you're only adding documents you can also have a go with
>> StreamingUpdateSolrServer instead of the CommonsHttpSolrServer.
>> Couple that with the suggestion of master/slave so the searches don't
>> interfere with the indexing and you should have a pretty responsive
>> system.
>>
>> -Original Message-
>> From: Robert Petersen [mailto:rober...@buy.com]
>> Sent: maandag 12 juli 2010 22:30
>> To: solr-user@lucene.apache.org
>> Subject: RE: CommonsHttpSolrServer add document hangs
>>
>> You could try a master slave setup using replication perhaps, then the
>> slave serves searches and indexing commits on the master won't hang up
>> searches at least...
>>
>> Here is the description:  http://wiki.apache.org/solr/SolrReplication
>>
>>
>> -Original Message-
>> From: Max Lynch [mailto:ihas...@gmail.com]
>> Sent: Monday, July 12, 2010 11:57 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: CommonsHttpSolrServer add document hangs
>>
>> Thanks Robert,
>>
>> My script did start going again, but it was waiting for about half an
>> hour
>> which seems a bit excessive to me.  Is there some tuning I can do on the
>> solr end to optimize for my use case, which is very heavy on commits and
>> very light on searches (I do most of my searches on the raw Lucene index
>> in
>> the background)?
>>
>> Thanks.
>>
>> On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen 
>> wrote:
>>
>> > Maybe solr is busy doing a commit or optimize?
>> >
>> > -Original Message-
>> > From: Max Lynch [mailto:ihas...@gmail.com]
>> > Sent: Monday, July 12, 2010 9:59 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: CommonsHttpSolrServer add document hangs
>> >
>> > Hey guys,
>> > I'm using Solr 1.4.1 and I've been having some problems lately with
>> code
>> > that adds documents through a CommonsHttpSolrServer.  It seems that
>> > randomly
>> > the call to theserver.add() will hang.  I am currently running my code
>> > in a
>> > single thread, but I noticed this would happen in multi threaded code
>> as
>> > well.  The jar version of commons-httpclient is 3.1.
>> >
>> > I got a thread dump of the process, and one thread seems to be waiting
>> > on
>> > the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
>> as
>> > shown below.  All other threads are in a RUNNABLE state (besides the
>> > Finalizer daemon).
>> >
>> > [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM
>> (16.3-b01
>> > mixed mode):
>> > [java]
>> > [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10
>> > tid=0x7f441051c800 nid=0x527c in Object.wait()
>> [0x7f4417e2f000]
>> > [java]java.lang.Thread.State: WAITING (on object monitor)
>> > [java] at java.lang.Object.wait(Native Method)
>> > [java] - waiting on <0x7f443ae5b290> (a
>> > java.lang.ref.ReferenceQueue$Lock)
>> > [java] at
>> > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>> > [java] - locked <0x7f443ae5b290> (a
>> > java.lang.ref.ReferenceQueue$Lock)
>> > [java] at
>> > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>> > [java] at
>> >
>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
>> > ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
>> >
>> > Any ideas?
>> >
>> > Thanks.
>> >
>>
>
>


Re: CommonsHttpSolrServer add document hangs

2010-07-13 Thread Max Lynch
Great, thanks!

On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom
wrote:

> If you're only adding documents you can also have a go with
> StreamingUpdateSolrServer instead of the CommonsHttpSolrServer.
> Couple that with the suggestion of master/slave so the searches don't
> interfere with the indexing and you should have a pretty responsive
> system.
>
> -Original Message-
> From: Robert Petersen [mailto:rober...@buy.com]
> Sent: maandag 12 juli 2010 22:30
> To: solr-user@lucene.apache.org
> Subject: RE: CommonsHttpSolrServer add document hangs
>
> You could try a master slave setup using replication perhaps, then the
> slave serves searches and indexing commits on the master won't hang up
> searches at least...
>
> Here is the description:  http://wiki.apache.org/solr/SolrReplication
>
>
> -Original Message-
> From: Max Lynch [mailto:ihas...@gmail.com]
> Sent: Monday, July 12, 2010 11:57 AM
> To: solr-user@lucene.apache.org
> Subject: Re: CommonsHttpSolrServer add document hangs
>
> Thanks Robert,
>
> My script did start going again, but it was waiting for about half an
> hour
> which seems a bit excessive to me.  Is there some tuning I can do on the
> solr end to optimize for my use case, which is very heavy on commits and
> very light on searches (I do most of my searches on the raw Lucene index
> in
> the background)?
>
> Thanks.
>
> On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen 
> wrote:
>
> > Maybe solr is busy doing a commit or optimize?
> >
> > -Original Message-
> > From: Max Lynch [mailto:ihas...@gmail.com]
> > Sent: Monday, July 12, 2010 9:59 AM
> > To: solr-user@lucene.apache.org
> > Subject: CommonsHttpSolrServer add document hangs
> >
> > Hey guys,
> > I'm using Solr 1.4.1 and I've been having some problems lately with
> code
> > that adds documents through a CommonsHttpSolrServer.  It seems that
> > randomly
> > the call to theserver.add() will hang.  I am currently running my code
> > in a
> > single thread, but I noticed this would happen in multi threaded code
> as
> > well.  The jar version of commons-httpclient is 3.1.
> >
> > I got a thread dump of the process, and one thread seems to be waiting
> > on
> > the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
> as
> > shown below.  All other threads are in a RUNNABLE state (besides the
> > Finalizer daemon).
> >
> > [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM
> (16.3-b01
> > mixed mode):
> > [java]
> > [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10
> > tid=0x7f441051c800 nid=0x527c in Object.wait()
> [0x7f4417e2f000]
> > [java]java.lang.Thread.State: WAITING (on object monitor)
> > [java] at java.lang.Object.wait(Native Method)
> > [java] - waiting on <0x7f443ae5b290> (a
> > java.lang.ref.ReferenceQueue$Lock)
> > [java] at
> > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> > [java] - locked <0x7f443ae5b290> (a
> > java.lang.ref.ReferenceQueue$Lock)
> > [java] at
> > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> > [java] at
> >
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
> > ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
> >
> > Any ideas?
> >
> > Thanks.
> >
>


Re: CommonsHttpSolrServer add document hangs

2010-07-12 Thread Max Lynch
Thanks Robert,

My script did start going again, but it was waiting for about half an hour
which seems a bit excessive to me.  Is there some tuning I can do on the
solr end to optimize for my use case, which is very heavy on commits and
very light on searches (I do most of my searches on the raw Lucene index in
the background)?

Thanks.

On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen  wrote:

> Maybe solr is busy doing a commit or optimize?
>
> -Original Message-
> From: Max Lynch [mailto:ihas...@gmail.com]
> Sent: Monday, July 12, 2010 9:59 AM
> To: solr-user@lucene.apache.org
> Subject: CommonsHttpSolrServer add document hangs
>
> Hey guys,
> I'm using Solr 1.4.1 and I've been having some problems lately with code
> that adds documents through a CommonsHttpSolrServer.  It seems that
> randomly
> the call to theserver.add() will hang.  I am currently running my code
> in a
> single thread, but I noticed this would happen in multi threaded code as
> well.  The jar version of commons-httpclient is 3.1.
>
> I got a thread dump of the process, and one thread seems to be waiting
> on
> the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
> shown below.  All other threads are in a RUNNABLE state (besides the
> Finalizer daemon).
>
> [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
> mixed mode):
> [java]
> [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10
> tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
> [java]java.lang.Thread.State: WAITING (on object monitor)
> [java] at java.lang.Object.wait(Native Method)
> [java] - waiting on <0x7f443ae5b290> (a
> java.lang.ref.ReferenceQueue$Lock)
> [java] at
> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> [java] - locked <0x7f443ae5b290> (a
> java.lang.ref.ReferenceQueue$Lock)
> [java] at
> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> [java] at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen
> ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
>
> Any ideas?
>
> Thanks.
>


CommonsHttpSolrServer add document hangs

2010-07-12 Thread Max Lynch
Hey guys,
I'm using Solr 1.4.1 and I've been having some problems lately with code
that adds documents through a CommonsHttpSolrServer.  It seems that randomly
the call to theserver.add() will hang.  I am currently running my code in a
single thread, but I noticed this would happen in multi threaded code as
well.  The jar version of commons-httpclient is 3.1.

I got a thread dump of the process, and one thread seems to be waiting on
the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as
shown below.  All other threads are in a RUNNABLE state (besides the
Finalizer daemon).

 [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01
mixed mode):
 [java]
 [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10
tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000]
 [java]java.lang.Thread.State: WAITING (on object monitor)
 [java] at java.lang.Object.wait(Native Method)
 [java] - waiting on <0x7f443ae5b290> (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 [java] - locked <0x7f443ae5b290> (a
java.lang.ref.ReferenceQueue$Lock)
 [java] at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 [java] at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

Any ideas?

Thanks.


MailEntityProcessor class cast exception

2010-06-16 Thread Max Lynch
With last night's build of solr, I am trying to use the MailEntityProcessor
to index an email account.  However, when I call my dataimport url, I
receive a class cast exception:

INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=44
Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
WARNING: Unable to read: dataimport.properties
Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

 
commit{dir=/home/m/g/spider/misc/solrindex_nl/index,segFN=segments_1,version=1276738117525,generation=1,filenames=[segments_1]
Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1276738117525
Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
load EntityProcessor implementation for entity:99544078513223 Processing
Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:804)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:535)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:260)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:184)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:392)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:373)
Caused by: java.lang.ClassCastException:
org.apache.solr.handler.dataimport.MailEntityProcessor cannot be cast to
org.apache.solr.handler.dataimport.EntityProcessor
at
org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:801)
... 6 more
Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback

Here is my dataimport part of my solrconfig.xml:
  
  
  /home/max/packages/apache-solr-4.0-2010-06-16_08-05-33/e/solr/conf/data-config.xml
  
  

and my data-config.xml:


   



I did try to rebuild the solr nightly, but I still receive the same error.
 I have all of the required jar's (AFAIK) in my application's lib folder.

Any ideas?

Thanks.