cleanup after OutOfMemoryError

2013-09-04 Thread Ryan McKinley
I have an application where I am calling DirectUpdateHandler2 directly with:

  update.addDoc(cmd);

This will sometimes hit:

java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248)
at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273)
at
org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126)
at
org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
at
org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303)

and then a little while later:

auto commit error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)


Is there anythign I can/should do to cleanup after the OOME?  At a minimum
I do not want any new requests using the same IndexWriter.  Should I use:


  catch(OutOfMemoryError ex) {

   update.getCommitTracker().cancelPendingCommit();
 update.newIndexWriter(false);
 ...

or perhaps 'true' for rollback?

Thanks
Ryan


NRT persistant flags?

2013-03-13 Thread Ryan McKinley
I'm looking for a way to quickly flag/unflag documents.

This could be one at a time or by query (even *:*)

I have hacked together something based on ExternalFileField that is
essentially a FST holding all the ids (solr not lucene).  Like the
FieldCache, it holds a WeakHashMapAtomicReader,OpenBitSet where the
OpenBitSet is loaded by iterating the FST on the reader (just like
ExternalFileField)

This seems to work OK, but there *must* be something better!

Any ideas on the right approach for something like this?  This feels like
it should be related to DocValues or the FieldCache

Thanks for any pointers!

ryan


edismax bq, ignore tf/idf?

2012-10-26 Thread Ryan McKinley
Hi-

I am trying to add a setting that will boost results based on
existence in different buckets.  Using edismax, I added the bq
parameter:

location:A^5 location:B^3

I want this to put everything in location A above everything in
location B.  This mostly works, BUT depending on the number of matches
for each location, location:B can get a higher final score.

Is there a way to ignore tf/idf when boosting this location?

location from a field type:
 class=solr.StrField  omitNorms=true


Thanks for any pointers!

ryan


Re: edismax bq, ignore tf/idf?

2012-10-26 Thread Ryan McKinley
thanks!


On Fri, Oct 26, 2012 at 4:20 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 : How about a boost function, bf or boost?
 :
 : bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0))

 Right ... assuming you only want to ignore tf/idf on these fields in this
 specifc context, function queries are the way to go -- otherwise you could
 just use a per-field similarity to ignore tf/idf.

 I would suggest however that instead of using the exists(query())
 consider the tf() function ...

 bf=if(tf(location,A),5,0)bf=if(tf(location,B),3,0)

 s/bf/boost/g  s/0/1/g if you wnat mutiplicitive boosts.


 -Hoss


Re: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

2012-08-25 Thread Ryan McKinley
If you optimize the index, are the results the same?

maybe it is showing counts for deleted docs (i think it does... and
this is expected)

ryan


On Sat, Aug 25, 2012 at 9:57 AM, Fuad Efendi f...@efendi.ca wrote:

 This is bug in Solr 4.0.0-Beta Schema Browser: Load Term Info shows 9682
 News, but direct query shows 3577.

 /solr/core0/select?q=channel:Newsfacet=truefacet.field=channelrows=0

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
 str name=facettrue/str
 str name=qchannel:News/str
 str name=facet.fieldchannel/str
 str name=rows0/str
 /lst
 /lst
 result name=response numFound=3577 start=0/
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=channel
 int name=News3577/int
 int name=Blogs0/int
 int name=Message Boards0/int
 int name=Video0/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 /response


 -Original Message-
 Sent: August-24-12 11:29 PM
 To: solr-user@lucene.apache.org
 Cc: sole-...@lucene.apache.org
 Subject: RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser
 Importance: High

 Any news?
 CC: Dev


 -Original Message-
 Subject: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser

 Hi there,

 Load term Info shows 3650 for a specific term MyTerm, and when I execute
 query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it
 happens after I commit data too, nothing changes; and this field is
 single-valued non-tokenized string.

 -Fuad

 --
 Fuad Efendi
 416-993-2060
 http://www.tokenizer.ca





 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ContentStreamUpdateRequest method addFile in 4.0 release.

2012-06-08 Thread Ryan McKinley
for the ExtractingRequestHandler, you can put anything into the
request contentType.

try:
addFile( file, application/octet-stream )

but anything should work

ryan




On Thu, Jun 7, 2012 at 2:32 PM, Koorosh Vakhshoori
kvakhsho...@gmail.com wrote:
 In latest 4.0 release, the addFile() method has a new argument 'contentType':

 addFile(File file, String contentType)

 In context of Solr Cell how should addFile() method be called? Specifically
 I refer to the Wiki example:

 ContentStreamUpdateRequest up = new
 ContentStreamUpdateRequest(/update/extract);
 up.addFile(new File(mailing_lists.pdf));
 up.setParam(literal.id, mailing_lists.pdf);
 up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 result = server.request(up);
 assertNotNull(Couldn't upload mailing_lists.pdf, result);
 rsp = server.query( new SolrQuery( *:*) );
 Assert.assertEquals( 1, rsp.getResults().getNumFound() );

 given at URL: http://wiki.apache.org/solr/ExtractingRequestHandler

 Since Solr Cell is calling Tika under the hood, doesn't the file
 content-type is already identified by Tika? Looking at the code, it seems
 passing NULL would do the job, is that correct? Also for Solr Cell, is the
 ContentStreamUpdateRequest class is the right one to use or there is a
 different class that is more appropriate here?

 Thanks


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/ContentStreamUpdateRequest-method-addFile-in-4-0-release-tp3988344.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-15 Thread Ryan McKinley
In 4.0, solr no longer uses JSP, so it is not enabled in the example setup.

You can enable JSP in your servlet container using whatever method
they provide.  For Jetty, using start.jar, you need to add the command
line: java -jar start.jar -OPTIONS=jsp

ryan



On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote:
 Hello,

 How do I enable JSP support in Solr 4.0 ?

 Thanks
 Naga


Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-15 Thread Ryan McKinley
just use the admin UI -- look at the 'cloud' tab


On Tue, May 15, 2012 at 12:53 PM, Naga Vijayapuram nvija...@tibco.com wrote:
 Alright; thanks.  Tried with -OPTIONS=jsp and am still seeing this on
 console Š

 2012-05-15 12:47:08.837:INFO:solr:No JSP support.  Check that JSP jars are
 in lib/jsp and that the JSP option has been specified to start.jar

 I am trying to go after
 http://localhost:8983/solr/collection1/admin/zookeeper.jsp (or its
 equivalent in 4.0) after going through
 http://wiki.apache.org/solr/SolrCloud

 May I know the right zookeeper url in 4.0 please?

 Thanks
 Naga


 On 5/15/12 10:56 AM, Ryan McKinley ryan...@gmail.com wrote:

In 4.0, solr no longer uses JSP, so it is not enabled in the example
setup.

You can enable JSP in your servlet container using whatever method
they provide.  For Jetty, using start.jar, you need to add the command
line: java -jar start.jar -OPTIONS=jsp

ryan



On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com
wrote:
 Hello,

 How do I enable JSP support in Solr 4.0 ?

 Thanks
 Naga



Re: syntax for negative query OR something

2012-05-02 Thread Ryan McKinley
thanks!



On Wed, May 2, 2012 at 4:43 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : How do I search for things that have no value or a specified value?

 Things with no value...
        (*:* -fieldName:[* TO *])
 Things with a specific value...
        fieldName:A
 Things with no value or a specific value...
        (*:* -fieldName:[* TO *]) fieldName:A
 ...or if you aren't using OR as your default op
        (*:* -fieldName:[* TO *]) OR fieldName:A

 : I have a few variations of:
 : -fname:[* TO *] OR fname:(A B C)

 that is just syntacitic sugar for...
        -fname:[* TO *] fname:(A B C)

 which is an empty set.

 you need to be explicit that the exclude docs with a value in this field
 clause should applied to the set of all documents


 -Hoss


Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?

2012-05-01 Thread Ryan McKinley
check a release since r1332752

If things still look problematic, post a comment on:
https://issues.apache.org/jira/browse/SOLR-3426

this should now have a less verbose message with an older SLF4j and with Log4j


On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote:
 I have similar issue using log4j for logging with trunk build, the
 CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am
 using sjfj 1.5.2

 10:07:45,918 WARN  [CoreContainer] Unable to read SLF4J version
 java.lang.NoSuchMethodError:
 org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)



 On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote:

 On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com
 wrote:
  There is a recent JIRA issue about keeping the last n logs to display in
 the admin UI.
 
  That introduced a problem - and then the fix introduced a problem - and
 then the fix mitigated the problem but left that ugly logging as a by
 product.
 
  Don't remember the issue # offhand. I think there was a dispute about
 what should be done with it.
 
  On May 1, 2012, at 11:14 AM, Benson Margulies wrote:
 
  CoreContainer.java, in the method 'load', finds itself calling
  loader.NewInstance with an 'fname' of Log4j of the slf4j backend is
  'Log4j'.

 Couldn't someone just fix the if statement to say, 'OK, if we're doing
 log4j, we have no log watcher' and skip all the loud failing on the
 way?



 
  e.g.:
 
  2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer  - Unable
  to load LogWatcher
  org.apache.solr.common.SolrException: Error loading class 'Log4j'
 
  What is it actually looking for? Have I misplaced something?
 
  - Mark Miller
  lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 



Re: Ampersand issue

2012-05-01 Thread Ryan McKinley
If your json value is amp; the proper xml value is amp;amp;

What is the value you are setting on the stored field?  is is  or amp;?


On Mon, Apr 30, 2012 at 12:57 PM, William Bell billnb...@gmail.com wrote:
 One idea was to wrap the field with CDATA. Or base64 encode it.



 On Fri, Apr 27, 2012 at 7:50 PM, Bill Bell billnb...@gmail.com wrote:
 We are indexing a simple XML field from SQL Server into Solr as a stored 
 field. We have noticed that the amp; is outputed as amp;amp; when using 
 wt=XML. When using wt=JSON we get the normal amp;. If there a way to 
 indicate that we don't want to encode the field since it is already XML when 
 using wt=XML ?

 Bill Bell
 Sent from mobile




 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076


Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-26 Thread Ryan McKinley
In general -- i would not suggest mixing EmbeddedSolrServer with a
different style (unless the other instances are read only).  If you
have multiple instances writing to the same files on disk you are
asking for problems.

Have you tried just using StreamingUpdateSolrServer for daily update?
I would suspect that it would be faster then EmbeddedSolrServer
anyway.

ryan



On Wed, Apr 25, 2012 at 11:32 PM, pcrao purn...@gmail.com wrote:
 Hi,

 Any more thoughts??

 Thanks,
 PC Rao.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3940383.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting fields in SOLR using Solrj

2012-04-26 Thread Ryan McKinley
I would suggest debugging with browser requests -- then switching to
Solrj after you are at 1st base.

In particular, try adding the debugQuery=true parameter to the
request and see what solr thinks is happening.

The value that will work for the 'qt' parameter depends on what is
configured in solrconfig.xml -- I suspect you want to point to a
requestHandler that is configured to use edismax query parser.  This
can be configured by default with:

lst name=defaults
str name=defTypeedismax/str
/lst

ryan


On Wed, Apr 25, 2012 at 3:57 PM, Joe joe.pol...@gmail.com wrote:
 Hi,

 I'm using the solrj API to query my SOLR 3.6 index. I have multiple text
 fields, which I would like to weight differently. From what I've read, I
 should be able to do this using the dismax or edismax query types. I've
 tried the following:

 SolrQuery query = new SolrQuery();
 query.setQuery( title:apples oranges content:apples oranges);
 query.setQueryType(edismax);
 query.set(qf, title^10.0 content^1.0);
 QueryResponse rsp = m_Server.query( query );

 But this doesn't work. I've tried the following variations to set the query
 type, but it doesn't seem to make a difference.

 query.setQueryType(dismax);
 query.set(qt,dismax);
 query.set(type,edismax);
 query.set(qt,edismax);
 query.set(type,dismax);

 I'd like to retain the full Lucene query syntax, so I prefer ExtendedDisMax
 to DisMax. Boosting individual terms in the query (as shown below) does
 work, but is not a valid solution, since the queries are automatically
 generated and can get arbitrarily complex is syntax.

 query.setQuery( title:apples^10.0 oranges^10.0 content:apples oranges);

 Any help would be much appreciated.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Boosting-fields-in-SOLR-using-Solrj-tp3939789p3939789.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: 'No JSP support' error in embedded Jetty for solrCloud as of apache-solr-4.0-2012-04-02_11-54-55

2012-04-09 Thread Ryan McKinley
zookeeper.jsp was removed (along with all JSP stuff) in trunk

Take a look at the cloud tab in the UI, or check the /zookeeper
servlet for the JSON raw output

ryan


On Mon, Apr 9, 2012 at 6:42 AM, Benson Margulies bimargul...@gmail.com wrote:
 Starting the leader with:

  java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=rnicloud
 -DzkRun -DnumShards=3 -Djetty.port=9167  -jar start.jar

 and browsing to

 http://localhost:9167/solr/rnicloud/admin/zookeeper.jsp

 I get:

 HTTP ERROR 500

 Problem accessing /solr/rnicloud/admin/zookeeper.jsp. Reason:

    JSP support not configured
 Powered by Jetty://


Re: SolrCloud Zookeeper view does not work on latest snapshot

2012-04-06 Thread Ryan McKinley
There have been a bunch of changes getting the zookeeper info and UI
looking good.  The info moved from being on the core to using a
servlet at the root level.

Note, it is not a request handler anymore, so the wt=XXX has no
effect.  It is always JSON

ryan


On Fri, Apr 6, 2012 at 7:01 AM, Jamie Johnson jej2...@gmail.com wrote:
 I looked at our old system and indeed it used to make a call to
 /solr/zookeeper not /solr/corename/zookeeper.  I am making a change
 locally so I can run with this but is this a bug or did I much
 something up with my configuration?

 On Fri, Apr 6, 2012 at 9:33 AM, Jamie Johnson jej2...@gmail.com wrote:
 I just downloaded the latest snapshot and fired it up to take a look
 around and I'm getting the following error when looking at the Cloud
 view.

 Loading of undefined failed with HTTP-Status 404

 The request I see going out is as follows

 http://localhost:8501/solr/slice1_shard1/zookeeper?wt=json

 this doesn't work but this does

 http://localhost:8501/solr/zookeeper?wt=json

 Any thoughts why this would happen?


Re: solr geospatial / spatial4j

2012-03-08 Thread Ryan McKinley
On Wed, Mar 7, 2012 at 7:25 AM, Matt Mitchell goodie...@gmail.com wrote:
 Hi,

 I'm researching options for handling a better geospatial solution. I'm
 currently using Solr 3.5 for a read-only database, and the
 point/radius searches work great. But I'd like to start doing point in
 polygon searches as well. I've skimmed through some of the geospatial
 jira issues, and read about spaitial4j, which is very interesting. I
 see on the github page that this will soon be part of lucene, can
 anyone confirm this?

perhaps -- see the discussion on:
https://issues.apache.org/jira/browse/LUCENE-3795

This will involve a few steps before it is actually integrated with
the lucene project -- and then a few more to be usable from solr


 I attempted to build the spatial4j demo but no luck. It had problems
 finding lucene 4.0-SNAPSHOT, which I guess is because there are no
 4.0-SNAPSHOT nightly builds? If anyone knows how I can get around
 this, please let me know!


ya they are published -- you just have to specify where you want to
pull them from.  If you use the 'updateLucene' profile, it will pull
them from:  https://repository.apache.org/content/groups/snapshots/

use:  mvn clean install -P updateLucene


 Other than spatial4j, is there a way to do point in polgyon searches
 with solr 3.5.0 right now? Is there some tricky indexing/querying
 strategy that would allow this?


I don't know of anything else -- and note that polygon stuff has a
ways to go before it is generally ready for prime-time.

ryan


Re: Improving performance for SOLR geo queries?

2012-02-08 Thread Ryan McKinley
Hi Matthias-

I'm trying to understand how you have your data indexed so we can give
reasonable direction.

What field type are you using for your locations?  Is it using the
solr spatial field types?  What do you see when you look at the debug
information from debugQuery=true?

From my experience, there is no single best practice for spatial
queries -- it will depend on your data density and distribution if.

You may also want to look at:
http://code.google.com/p/lucene-spatial-playground/
but note this is off lucene trunk -- the geohash queries are super fast though

ryan




2012/2/8 Matthias Käppler matth...@qype.com:
 Hi Erick,

 if we're not doing geo searches, we filter by location tags that we
 attach to places. This is simply a hierachical regional id, which is
 simple to filter for, but much less flexible. We use that on Web a
 lot, but not on mobile, where we want to performance searches in
 arbitrary radii around arbitrary positions. For those location tag
 kind of queries, the average time spent in SOLR is 43msec (I'm looking
 at the New Relic snapshot of the last 12 hours). I have disabled our
 optimization again just yesterday, so for the bbox queries we're now
 at an avg of 220ms (same time window). That's a 5 fold increase in
 response time, and in peak hours it's worse than that.

 I've also found a blog post from 3 years ago which outlines the inner
 workings of the SOLR spatial indexing and searching:
 http://www.searchworkings.org/blog/-/blogs/23842
 From that it seems as if SOLR already performs a similar optimization
 we had in mind during the index step, so if I understand correctly, it
 doesn't even search over all records, only those that were mapped to
 the grid box identified during indexing.

 What I would love to see is what the suggested way is to perform a geo
 query on SOLR, considering that they're so difficult to cache and
 expensive to run. Is the best approach to restrict the candidate set
 as much as possible using cheap filter queries, so that SOLR merely
 has to do the geo search against these subsets? How does the query
 planner work here? I see there's a cost attached to a filter query,
 but one can only set it when cache is set to false? Are cached geo
 queries executed last when there are cheaper filter queries to cut
 down on documents? If you have a real world practical setup to share,
 one that performs well in a production environment that serves
 requests in the Millions per day, that would be great.

 I'd love to contribute documentation by the way, if you knew me you'd
 know I'm an avid open source contributor and actually run several open
 source projects myself. But tell me, how can I possibly contribute
 answer to questions I don't have an answer to? That's why I'm here,
 remember :) So please, these kinds of snippy replies are not helping
 anyone.

 Thanks
 -Matthias

 On Tue, Feb 7, 2012 at 3:06 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 So the obvious question is what is your
 performance like without the distance filters?

 Without that knowledge, we have no clue whether
 the modifications you've made had any hope of
 speeding up your response times

 As for the docs, any improvements you'd like to
 contribute would be happily received

 Best
 Erick

 2012/2/6 Matthias Käppler matth...@qype.com:
 Hi,

 we need to perform fast geo lookups on an index of ~13M places, and
 were running into performance problems here with SOLR. We haven't done
 a lot of query optimization / SOLR tuning up until now so there's
 probably a lot of things we're missing. I was wondering if you could
 give me some feedback on the way we do things, whether they make
 sense, and especially why a supposed optimization we implemented
 recently seems to have no effect, when we actually thought it would
 help a lot.

 What we do is this: our API is built on a Rails stack and talks to
 SOLR via a Ruby wrapper. We have a few filters that almost always
 apply, which we put in filter queries. Filter cache hit rate is
 excellent, about 97%, and cache size caps at 10k filters (max size is
 32k, but it never seems to reach that many, probably because we
 replicate / delta update every few minutes). Still, geo queries are
 slow, about 250-500msec on average. We send them with cache=false, so
 as to not flood the fq cache and cause undesirable evictions.

 Now our idea was this: while the actual geo queries are poorly
 cacheable, we could clearly identify geographical regions which are
 more often queried than others (naturally, since we're a user driven
 service). Therefore, we dynamically partition Earth into a static grid
 of overlapping boxes, where the grid size (the distance of the nodes)
 depends on the maximum allowed search radius. That way, for every user
 query, we would always be able to identify a single bounding box that
 covers it. This larger bounding box (200km edge length) we would send
 to SOLR as a cached filter query, along with the actual user query
 

Best approach to Intersect results with big SetString?

2011-09-01 Thread Ryan McKinley
I have an application where I need to return all results that are not
in a SetString  (the Set is managed from hazelcast... but that is
not relevant)

As a fist approach, i have a SerachComponent that injects a BooleanQuery:

  BooleanQuery bq = new BooleanQuery(true);
  for( String id : ids) {
bq.add(new BooleanClause(new TermQuery(new
Term(id,id)),Occur.MUST_NOT));
  }

This works, but i'm concerned about how many terms we could end up
with as the size grows.

Another possibility could be a Filter that iterates though FieldCache
and checks if each value is in the SetString

Any thoughts/directions on things to look at?

thanks
ryan


Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-05 Thread Ryan McKinley

 Ah, thanks Hoss - I had meant to respond to the original email, but
 then I lost track of it.

 Via pseudo-fields, we actually already have the ability to retrieve
 values via FieldCache.
 fl=id:{!func}id

 But using CSF would probably be better here - no memory overhead for
 the FieldCache entry.


Not sure if this is related, but we should also consider using the
memory codec for id field
https://issues.apache.org/jira/browse/LUCENE-3209


Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Ryan McKinley
patches are always welcome!


On Tue, Jul 5, 2011 at 3:04 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote:
 i've tried to add the params for group=true and group.field=myfield by using
 the SolrQuery.
 But the result is null. Do i have to configure something? In wiki part for
 field collapsing i couldn't
 find anything.

 No specific (type-safe) support for grouping is in SolrJ currently.
 But you should still have access to the complete generic solr response
 via SolrJ regardless (i.e. use getResponse())

 -Yonik
 http://www.lucidimagination.com



Re: JOIN, query on the parent?

2011-07-01 Thread Ryan McKinley
On Fri, Jul 1, 2011 at 9:06 AM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Thu, Jun 30, 2011 at 6:19 PM, Ryan McKinley ryan...@gmail.com wrote:
 Hello-

 I'm looking for a way to find all the links from a set of results.  Consider:

 doc
  id:1
  type:X
  link:a
  link:b
 /doc

 doc
  id:2
  type:X
  link:a
  link:c
 /doc

 doc
  id:3
  type:Y
  link:a
 /doc

 Is there a way to search for all the links from stuff of type X -- in
 this case (a,b,c)

 Do the links point to other documents somehow?
 Let's assume that there are documents with ids of a,b,c

 fq={!join from=link to=id}type:X

 Basically, you start with the set of documents that match type:X, then
 follow from link to id to arrive at the new set of documents.


Yup -- that works.  Thank you!

ryan


JOIN, query on the parent?

2011-06-30 Thread Ryan McKinley
Hello-

I'm looking for a way to find all the links from a set of results.  Consider:

doc
 id:1
 type:X
 link:a
 link:b
/doc

doc
 id:2
 type:X
 link:a
 link:c
/doc

doc
 id:3
 type:Y
 link:a
/doc

Is there a way to search for all the links from stuff of type X -- in
this case (a,b,c)

If I'm understanding the {!join stuff, it lets you search on the
children, but i don't really see how to limit the parent values.

Am I missing something, or is this a further extension to the JoinQParser?


thanks
ryan


Re: Solr: Images, Docs and Binary data

2011-04-06 Thread Ryan McKinley
You can store binary data using a binary field type -- then you need
to send the data base64 encoded.

I would strongly recommend against storing large binary files in solr
-- unless you really don't care about performance -- the file system
is a good option that springs to mind.

ryan




2011/4/6 Ezequiel Calderara ezech...@gmail.com:
 Another question that maybe is easier to answer, how can i store binary
 data? Any example schema?

 2011/4/6 Ezequiel Calderara ezech...@gmail.com

 Hello everyone, i need to know if some has used solr for indexing and
 storing images (upt to 16MB) or binary docs.

 How does solr behaves with this type of docs? How affects performance?

 Thanks Everyone

 --
 __
 Ezequiel.

 Http://www.ironicnet.com




 --
 __
 Ezequiel.

 Http://www.ironicnet.com



Re: [WKT] Spatial Searching

2011-03-29 Thread Ryan McKinley
 Does anyone know of a patch or even when this functionality might be included 
 in to Solr4.0? I need to query for polygons ;-)

check:
http://code.google.com/p/lucene-spatial-playground/

This is my sketch / soon-to-be-proposal for what I think lucene
spatial should look like.  It includes a WKTField that can do complex
geometry queries:

https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/jts/


ryan


Re: please make JSONWriter public

2011-03-01 Thread Ryan McKinley
You may have noticed the ResponseWriter code is pretty hairy!  Things
are package protected so that the API can change between minor release
without concern for back compatibility.

In 4.0 (/trunk) I hope to rework the whole ResponseWriter framework so
that it is more clean and hopefully stable enough that making parts
public is helpful.

For now, you can:
- copy the code
- put your class in the same package name
- make it public in your own distribution

ryan



On Mon, Feb 28, 2011 at 2:56 PM, Paul Libbrecht p...@hoplahup.net wrote:

 Hello fellow SOLR experts,

 may I ask to make top-level and public the class
    org.apache.solr.request.JSONWriter
 inside
    org.apache.solr.request.JSONResponseWriter
 I am re-using it to output JSON search result to code that I wish not to 
 change on the client but the current visibility settings (JSONWriter is 
 package protected) makes it impossible for me without actually copying the 
 code (which is possible thanks to the good open-source nature).

 thanks in advance

 paul


Re: Solr 4.0 trunk in production

2011-02-20 Thread Ryan McKinley
Not crazy -- but be aware of a few *key* caviates.

1. Do good testing on a stable snapshot.
2. Don't get surprised if you have to rebuild the index from scratch
to upgrade in the future.  The official releases will upgrade smoothly
-- but within dev builds, anything may happen.



On Sat, Feb 19, 2011 at 9:50 AM, Mark static.void@gmail.com wrote:
 Would I be crazy even to consider putting this in production? Thanks



Re: boosting results by a query?

2011-02-14 Thread Ryan McKinley
found something that works great!

in 3.1+ we can sort by a function query, so:

sort=query({!lucene v='field:value'}) desc, score desc

will put everything that matches 'field:value' first, then order the
rest by score

check:
http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function




On Fri, Feb 11, 2011 at 4:31 PM, Ryan McKinley ryan...@gmail.com wrote:
 I have an odd need, and want to make sure I am not reinventing a wheel...

 Similar to the QueryElevationComponent, I need to be able to move
 documents to the top of a list that match a given query.

 If there were no sort, then this could be implemented easily with
 BooleanQuery (i think) but with sort it gets more complicated.  Seems
 like I need:

  sortSpec.setSort( new Sort( new SortField[] {
    new SortField( something that only sorts results in the boost query ),
    new SortField( the regular sort )
  }));

 Is there an existing FieldComparator I should look at?  Any other
 pointers/ideas?

 Thanks
 ryan



boosting results by a query?

2011-02-11 Thread Ryan McKinley
I have an odd need, and want to make sure I am not reinventing a wheel...

Similar to the QueryElevationComponent, I need to be able to move
documents to the top of a list that match a given query.

If there were no sort, then this could be implemented easily with
BooleanQuery (i think) but with sort it gets more complicated.  Seems
like I need:

  sortSpec.setSort( new Sort( new SortField[] {
new SortField( something that only sorts results in the boost query ),
new SortField( the regular sort )
  }));

Is there an existing FieldComparator I should look at?  Any other
pointers/ideas?

Thanks
ryan


edismax with windows path input?

2011-02-10 Thread Ryan McKinley
I am using the edismax query parser -- its awesome!  works well for
standard dismax type queries, and allows explicit fields when
necessary.

I have hit a snag when people enter something that looks like a windows path:
lst name=params
 str name=qF:\path\to\a\file/str
/lst
this gets parsed as:
str name=rawquerystringF:\path\to\a\file/str
str name=querystringF:\path\to\a\file/str
str name=parsedquery+()/str

Putting it in quotes makes the not-quite right query:
str name=rawquerystringF:\path\to\a\file/str
str name=querystringF:\path\to\a\file/str
str name=parsedquery
+DisjunctionMaxQuery((path:f:pathtoafile^4.0 | name:f (pathtoafile
fpathtoafile)^7.0)~0.01)
/str
str name=parsedquery_toString
+(path_path:f:pathtoafile^4.0 | name:f (pathtoafile fpathtoafile)^7.0)~0.01
/str

Telling people to escape the query:
q=F\:\\path\\to\\a\\file
is unrealistic, but gives the proper parsed query:

+DisjunctionMaxQuery((path_path:f:/path/to/a/file^4.0 | name:f path
to a (file fpathtoafile)^7.0)~0.01)

Any ideas on how to support this?  I could try looking for things like
paths in the app, and then modify the query, or maybe look at
extending edismax.  Perhaps when F: does not match a given field, it
could auto escape the rest of the word?

thanks
ryan


Re: edismax with windows path input?

2011-02-10 Thread Ryan McKinley
ah -- that makes sense.

Yonik... looks like you were assigned to it last week -- should I take
a look, or do you already have something in the works?


On Thu, Feb 10, 2011 at 2:52 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : extending edismax.  Perhaps when F: does not match a given field, it
 : could auto escape the rest of the word?

 that's actually what yonik initially said it was suppose to do, but when i
 tried to add a param to let you control which fields would be supported
 using the : syntax i discovered it didn't work but oculdn't figure out
 why ... details are in the SOLR-1553 comments


 -Hoss



Re: edismax with windows path input?

2011-02-10 Thread Ryan McKinley

 foo_s:foo\-bar
 is a valid lucene query (with only a dash between the foo and the
 bar), and presumably it should be treated the same in edismax.
 Treating it as foo_s:foo\\-bar (a backslash and a dash between foo and
 bar) might cause more problems than it's worth?


I don't think we should escape anything that has a valid field name.
If foo_s is a field, then foo_s:foo\-bar should be used as is.

If foo_s is not a field, I would want the whole thing escaped to:
foo_s\:foo\\-bar before getting passed to the rest of the dismax mojo.

Does that make sense?

marking edismax as experimental for 3.1 makes sense!

ryan


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Ryan McKinley

 Where do you get your Lucene/Solr downloads from?

 [] ASF Mirrors (linked in our release announcements or via the Lucene website)

 [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [X] I/we build them from source via an SVN/Git checkout.



Re: Different behavior for q=goo.com vs q=@goo.com in queries?

2010-12-31 Thread Ryan McKinley
also try debugQuery=true and see why each result matched



On Thu, Dec 30, 2010 at 4:10 PM, mrw mikerobertsw...@gmail.com wrote:


 Basically, just what you've suggested.  I did the field/query analysis piece
 with verbose output.  Not entirely sure how to interpret the results, of
 course.  Currently reading anything I can find on that.


 Thanks


 Erick Erickson wrote:

 What steps have you taken to figure out whether the
 contents of your index are what you think? I suspect
 that the fields you're indexing aren't being
 analyzed/tokenized quite the way you expect either at
 query time or index time (or maybe both!).

 Take a look at the admin/analysis page for the field you're indexing
 the data into. If that doesn't shed any light on the problem,
 please paste in the fieldType definition for the field in question,
 maybe another set of eyes can see the issue.

 Best
 Erick





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2169478.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: API for using Multi cores with SolrJ

2010-10-18 Thread Ryan McKinley
On Mon, Oct 18, 2010 at 10:12 AM, Tharindu Mathew mcclou...@gmail.com wrote:
 Thanks Peter. That helps a lot. It's weird that this not documented anywhere. 
 :(

Feel free to edit the wiki :)


Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Ryan McKinley
Do you already have the files as solr XML?  If so, I don't think you need solrj

If you need to build SolrInputDocuments from your existing structure,
solrj is a good choice.  If you are indexing lots of stuff, check the
StreamingUpdateSolrServer:
http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html


On Sun, Oct 17, 2010 at 11:01 PM, Jason, Kim hialo...@gmail.com wrote:

 Hi all
 I have a huge amount of xml files for indexing.
 I want to index using solrj binary format to get performance gain.
 Because I heard that using xml files to index is quite slow.
 But I don't know how to use index through solrj binary format and can't find
 examples.
 Please give some help.
 Thanks,
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1722612.html
 Sent from the Solr - User mailing list archive at Nabble.com.



query pending commits?

2010-10-18 Thread Ryan McKinley
I have an indexing pipeline that occasionally needs to check if a
document is already in the index (even if not commited yet).

Any suggestions on how to do this without calling commit/ before each check?

I have a list of document ids and need to know which ones are in the
index (actually I need to know which ones are not in the index)  I
figured I would write a custome RequestHandler that would check the
main Reader and the UpdateHander reader, but it now looks like
'update' is handled directly within IndexWriter.

Any ideas?

thanks
ryan


Re: is indexing single-threaded?

2010-09-23 Thread Ryan McKinley
Multiple threads work well.

If you are using solrj, check the StreamingSolrServer for an
implementation that will keep X number of threads busy.

Your mileage will very, but in general I find a reasonable thread
count is ~ (number of cores)+1


On Wed, Sep 22, 2010 at 5:52 AM, Andy angelf...@yahoo.com wrote:
 Does Solr index data in a single thread or can data be indexed concurrently 
 in multiple threads?

 Thanks
 Andy






Re: How can I delete the entire contents of the index?

2010-09-23 Thread Ryan McKinley
deletequery*:*/query/delete

will leave you a fresh index


On Thu, Sep 23, 2010 at 12:50 AM, xu cheng xcheng@gmail.com wrote:
 deletequerythe query that fetch the data you wanna
 delete/query/delete
 I did like this to delete my data
 best regards

 2010/9/23 Igor Chudov ichu...@gmail.com

 Let's say that I added a number of elements to Solr (I use
 Webservice::Solr as the interface to do so).

 Then I change my mind and want to delete them all.

 How can I delete all contents of the database, but leave the database
 itself, just empty?

 Thanks

 i




Re: No more trunk support for 2.9 indexes

2010-09-12 Thread Ryan McKinley
 I suppose an index 'remaker' might be something like a DIH reader for
 a Solr index - streams everything out of the existing index, writing
 it into the new one?

This works fine if all fields are stored (and copy field does not go
to a stored field), otherwise you would need/want to start with the
orignial source.

ryan


Re: Logic behind Solr creating files in .../data/index path.

2010-09-07 Thread Ryan McKinley
Check:
http://lucene.apache.org/java/3_0_2/fileformats.html


On Tue, Sep 7, 2010 at 3:16 AM, rajini maski rajinima...@gmail.com wrote:
 All,

 While we post data to Solr... The data get stored in   //data/index  path
 in some multiple files with different file extensions...
 Not worrying about the extensions, I want to know how are these number of
 files created ?
 Does anyone know on what logic are these multiple index files  created in
 data/index  path ... ? If we do an optimize , The number of files get
 reduced...
 Else, say some N number of files are  created.. Based on what parameter it
 creates? And how are the sizes of file varies there?


 Hope I am clear about the doubt I have...



help refactoring from 3.x to 4.x

2010-08-23 Thread Ryan McKinley
I have a function that works well in 3.x, but when I tried to
re-implement in 4.x it runs very very slow (~20ms vs 45s on an index w
~100K items).

Big picture, I am trying to calculate a bounding box for items that
match the query.  To calculate this, I have two fields bboxNS, and
bboxEW that get filled with the min and max values for that doc.  To
get the bounding box, I just need the first matching term in the index
and the last matching term.

In 3.x the code looked like this:

public class FirstLastMatchingTerm
{
  String first = null;
  String last = null;

  public static FirstLastMatchingTerm read(SolrIndexSearcher searcher,
String field, DocSet docs) throws IOException
  {
FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm();
if( docs.size()  0 ) {
  IndexReader reader = searcher.getReader();
  TermEnum te = reader.terms(new Term(field,));
  do {
Term t = te.term();
if( null == t || !t.field().equals(field) ) {
  break;
}

if( searcher.numDocs(new TermQuery(t), docs)  0 ) {
  firstLast.last = t.text();
  if( firstLast.first == null ) {
firstLast.first = firstLast.last;
  }
}
  }
  while( te.next() );
}
return firstLast;
  }
}


In 4.x, I tried:

public class FirstLastMatchingTerm
{
  String first = null;
  String last = null;

  public static FirstLastMatchingTerm read(SolrIndexSearcher searcher,
String field, DocSet docs) throws IOException
  {
FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm();
if( docs.size()  0 ) {
  IndexReader reader = searcher.getReader();

  Terms terms = MultiFields.getTerms(reader, field);
  TermsEnum te = terms.iterator();
  BytesRef term = te.next();
  while( term != null ) {
if( searcher.numDocs(new TermQuery(new Term(field,term)), docs)  0 ) {
  firstLast.last = term.utf8ToString();
  if( firstLast.first == null ) {
firstLast.first = firstLast.last;
  }
}
term = te.next();
  }
}
return firstLast;
  }
}

but the results are slow (and incorrect).  I tried some variations of
using ReaderUtil.Gather(), but the real hit seems to come from
  if( searcher.numDocs(new TermQuery(new Term(field,term)), docs)  0 )

Any ideas?  I'm not tied to the approach or indexing strategy, so if
anyone has other suggestions that would be great.  Looking at it
again, it seems crazy that you have to run a query for each term, but
in 3.x

thanks
ryan


Re: Problem in setting the request writer in SolrJ (wiki page wrong?)

2010-08-23 Thread Ryan McKinley
Note that the 'setRequestWriter' is not part of the SolrServer API, it
is on the CommonsHttpSolrServer:
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html#setRequestWriter%28org.apache.solr.client.solrj.request.RequestWriter%29

If you are using EmbeddedSolrServer, the params are not serialized via
RequestWriter, so you don't have any options there.

ryan


On Mon, Aug 23, 2010 at 9:24 AM, Constantijn Visinescu
baeli...@gmail.com wrote:
 Hello,

 I'm using an embedded solrserver in my Java webapp, but as far as i
 can tell it's defaulting to sending updates in XML, which seems like a
 huge waste compared to sending it in Java binary format.

 According to this page:
 http://wiki.apache.org/solr/Solrj#Setting_the_RequestWriter

 I'm supposed to be able to set the requestwriter like so:
 server.setRequestWriter(new BinaryRequestWriter());

 However this method doesn't seem to exists in the SolrServer class of
 SolrJ 1.4.1 ?

 How do i set it to process updates in the java binary format?

 Thanks in advance,
 Constantijn Visinescu

 P.S.
 I'm creating my SolrServer instance like this:
        private SolrServer solrServer;
        CoreContainer container = new CoreContainer.Initializer().initialize();
        solrServer = new EmbeddedSolrServer(container, );

 this solrServer wont let me set a request writer.



Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Any pointers on how to sort by reverse index order?
http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc

it seems like it should be easy to do with the function query stuff,
but i'm not sure what to sort by (unless I add a new field for indexed
time)


Any pointers?

Thanks
Ryan


Re: Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Looks like you can sort by _docid_ to get things in index order or
reverse index order.

?sort=_docid_ asc

thank you solr!


On Fri, Jul 23, 2010 at 2:23 PM, Ryan McKinley ryan...@gmail.com wrote:
 Any pointers on how to sort by reverse index order?
 http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc

 it seems like it should be easy to do with the function query stuff,
 but i'm not sure what to sort by (unless I add a new field for indexed
 time)


 Any pointers?

 Thanks
 Ryan



Re: REST calls

2010-06-30 Thread Ryan McKinley
If there is a real desire/need to make things restful in the
official sense, it is worth looking at using a REST framework as the
controller rather then the current solution.  perhaps:

http://www.restlet.org/
https://jersey.dev.java.net/

These would be cool since they encapsulate lots of the request
plumbing work that it would be better if we could leverage more widely
used approaches then support our own.

That said, what we have is functional and powerful -- if you are
concerned about people editing the index (with GET/POST or whatever)
there are plenty of ways to solve this.

ryan


On Wed, Jun 30, 2010 at 5:31 PM, Lance Norskog goks...@gmail.com wrote:
 I've looked at the problem. It's fairly involved. It probably would
 take several iterations. (But not as many as field collapsing :)

 On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote:
  Apparently this is not ReStFuL It is IMVHO insane.

 Patches welcome...

 -Yonik
 http://www.lucidimagination.com




 --
 Lance Norskog
 goks...@gmail.com



Re: Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Ryan McKinley
Interesting -- I don't think there is anything that does this.

Though it seems like something the XML Query syntax should be able to
do, but we would still need to add the ability to send the xml style
query to solr.



On Fri, May 28, 2010 at 12:23 PM, Phillip Rhodes
rhodebumpl...@gmail.com wrote:
 Hi.
 I am building up a query with quite a bit of logic such as parentheses, plus
 signs, etc... and it's a little tedious dealing with it all at a string
 level.  I was wondering if anyone has any thoughts on constructing the query
 in lucene and using the string representation of the query to send to solr.

 Thanks,
 Phillip



Re: multicore Vs multiple solr webapps

2010-05-27 Thread Ryan McKinley
The two approaches solve different needs.  In 'multicore' you have a
single webapp with multiple indexes.  This means they are all running
in the same JVM.  This may be an advantage or a disadvantage depending
on what you are doing.

ryan



On Thu, May 27, 2010 at 10:44 AM, Antonello Mangone
antonello.mang...@gmail.com wrote:
 Hi to all, I have a question for you ...
 Can someone exaplain me the differences between a unique solr application
 multicore and multiple solr webapps ???
 Thank you all in advance



Re: SolrJ/EmbeddedSolrServer

2010-05-22 Thread Ryan McKinley
Check:
http://wiki.apache.org/solr/CoreAdmin

Unless I'm missing something, I think you should be able to sort what you need


On Fri, May 21, 2010 at 7:55 PM, Ken Krugler
kkrugler_li...@transpac.com wrote:
 I've got a situation where my data directory (a) needs to live elsewhere
 besides inside of Solr home, (b) moves to a different location when updating
 indexes, and (c) setting up a symlink from solr_home/data isn't a great
 option.

 So what's the best approach to making this work with SolrJ? The low-level
 solution seems to be

 - create my own SolrCore instance, where I specify the data directory
 - use that to update the CoreContainer
 - create a new EmbeddedSolrServer

 But recreating the EmbeddedSolrServer with each index update feels wrong,
 and I'd like to avoid mucking around with low-level SolrCore instantiation.

 Any other approaches?

 Thanks,

 -- Ken

 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g







Re: Special Circumstances for embedded Solr

2010-05-21 Thread Ryan McKinley

 Any other commonly compelling reasons to use SolrJ?

The most compelling reason (I think) is that if you program against
the Solrj API, you can switch between embedded/http/streaming
implementations without changing anything.

This is great for our app that is either run as a small local instance
of a big enterprise setting.

ryan


Re: Moving from Lucene to Solr?

2010-05-21 Thread Ryan McKinley
On Wed, May 19, 2010 at 6:38 AM, Peter Karich peat...@yahoo.de wrote:
 Hi all,

 while asking a question on stackoverflow [1] some other questions appear:
 Is SolrJ a recommended way to access Solr or should I prefer the HTTP
 interface?

solrj vs HTTP interface?  That will just be a matter of taste.  If you
are working in java, then solrj is likely a good option.



 How can I (j)unit-test Solr? (e.g. create+delete index via Java call)


If you want to mess with creating/removing indexes at runtime, see:
http://wiki.apache.org/solr/CoreAdmin


 Is Lucene faster than Solr? ... do you have experiences, preferable with
 the same index?

solr is built ontop of lucene, so in that regard it is the same speed.
 Depending on your app, the abstractions that solr makes may make it
less efficient then working directly in lucene.  Unless you have very
specialized needs, I doubt this will make a big difference.


Re: cheking the size of the index using solrj API's

2010-04-05 Thread Ryan McKinley
On Fri, Apr 2, 2010 at 7:07 AM, Na_D nabam...@zaloni.com wrote:

 hi,


 I need to monitor the index for the following information:

 1. Size of the index
 2 Last time the index was updated.


If by 'size o the index' you mean document count, then check the Luke
Request Handler
http://wiki.apache.org/solr/LukeRequestHandler

ryan


Re: [POLL] Users of abortOnConfigurationError ?

2010-03-23 Thread Ryan McKinley
The 'abortOnConfigurationError' option was added a long time ago...
at the time, there were many errors that would just be written to the
logs but startup would continue normally.

I felt (and still do) that if there is a configuration error
everything should fail loudly.  The option in solrconfig.xml was added
as a back-compatible way to get both behaviors.

I don't see any value in letting solr continue working even though
something was configured wrong.

Does a lack replies to this thread imply that everyone agrees?
(Reading the email, and following directions, i should just ignore
this email)

Ryan


On Thu, Mar 18, 2010 at 9:12 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 Due to some issues with the (lack of) functionality behind the
 abortOnConfigurationError option in solrconfig.xml, I'd like to take a
 quick poll of the solr-user community...

  * If you have never heard of the abortOnConfigurationError
   option prior to this message, please ignore this email.

  * If you have seen abortOnConfigurationError in solrconfig.xml,
   or in error messages when using Solr, but you have never
   modified the value of this option in your configs, or changed
   it at run time, please ignore this email.

  * If you have ever set abortOnConfigurationError=false, either
   in your config files or at run time, please reply to these
   three questions...

 1) What version of Solr are you using ?

 2) What advantages do you percieve that you have by setting
   abortOnConfigurationError=false ?

 3) What problems do you suspect you would encounter if this
   option was eliminated in future versions of Solr ?

 Thank you.

 (For people who are interested, the impetuses for this Poll can be found in
 SOLR-1743, SOLR-1817, SOLR-1824, and SOLR-1832)


 -Hoss




Re: Interesting OutOfMemoryError on a 170M index

2010-01-13 Thread Ryan McKinley


On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote:


Agreed, commit every second.


Do you need the index to be updated this often?  Are you reading from  
it every second?  and need results that are that fresh


If not, i imagine increasing the auto-commit time to 1min or even 10  
secs would help some.


Re, calling commit from the client with auto-commit...  if you are  
using auto-commit, you should not call commit from the client


ryan





Assuming I understand what you're saying correctly:
There shouldn't be any index readers - as at this point, just  
writing to the index.

Did I understand correctly what you meant?

-Nick

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: 13 January 2010 22:28
To: solr-user@lucene.apache.org
Subject: Re: Interesting OutOfMemoryError on a 170M index

The time in autocommit is in milliseconds. You are committing every  
second while indexing.  This then causes a build-up of sucessive  
index readers that absorb each commit, which is probably the out-of- 
memory.


On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com 
 wrote:


Hi,

I have a bit of an interesting OutOfMemoryError that I'm trying to
figure out.

My client  Solr server are running in the same JVM (for deployment
simplicity). FWIW, I'm using Jetty to host Solr. I'm using the
supplied code for the http-based client interface. Solr 1.3.0.

My app is adding about 20,000 documents per minute to the index - one
at a time (it is listening to an event stream and for every event, it
adds a new document to the index).
The size of the documents, however, is tiny - the total index growth
is only about 170M (after about 1 hr and the OutOfMemoryError) At  
this

point, there is zero querying happening - just updates to the index
(only adding documents, no updates or deletes) After about an hour or
so, my JVM runs out of heap space - and if I look at the memory
utilisation over time, it looks like a classic memory leak. It slowly
ramps up until we end up with constant FULL GC's and eventual OOME.
Max heap space is 512M.

In Solr, I'm using autocommit (to buffer the updates)
   autoCommit
 maxDocs1/maxDocs
 maxTime1000/maxTime
   /autoCommit

(Aside: Now, I'm not sure if I am meant to call commit or not on the
client SolrServer class if I am using autocommit - but as it turns
out, I get OOME whether I do that or not)

Any suggestions/advice of quick things to check before I dust off the
profiler?

Thanks in advance.

Cheers,
Nick

= 
=

=
 Please access the attached hyperlink for an important electronic  
communications disclaimer:

 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html

= 
=

=






--
Lance Norskog
goks...@gmail.com

= 
= 
= 
= 
= 
= 
= 
= 
= 
==
Please access the attached hyperlink for an important electronic  
communications disclaimer:

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
= 
= 
= 
= 
= 
= 
= 
= 
= 
==






Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-07 Thread Ryan McKinley


On Jan 7, 2010, at 10:50 AM, MitchK wrote:



Eric,

you mean, everything is okay, but I do not see it?


Internally for searching the analysis takes place and writes to the
index in an inverted fashion, but the stored stuff is left alone.


if I use an analyzer, Solr stores it's output two ways?
One public output, which is similar to the original input
and one hidden or internal output, which is based on the  
analyzer's work?

Did I understand that right?


yes.

indexed fields and stored fields are different.

Solr results show stored fields in the results (however facets are  
based on indexed fields)


Take a look at Lucene in Action for a better description of what is  
happening.  The best tool to get your head around what is happening is  
probably luke (http://www.getopt.org/luke/)





If yes, I have got another problem:
I don't want to waste any diskspace.


You have control over what is stored and what is indexed -- how that  
is configured is up to you.


ryan


Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-07 Thread Ryan McKinley


On Jan 7, 2010, at 12:11 PM, MitchK wrote:



Thank you, Ryan. I will have a look on lucene's material and luke.

I think I got it. :)

Sometimes there will be the need, to response on the one hand the  
value and

on the other hand the indexed version of the value.
How can I fullfill such needs? Doing copyfield on indexed-only fields?



see erik's response on 'analysis request handler'





ryantxu wrote:



On Jan 7, 2010, at 10:50 AM, MitchK wrote:



Eric,

you mean, everything is okay, but I do not see it?

Internally for searching the analysis takes place and writes to  
the

index in an inverted fashion, but the stored stuff is left alone.


if I use an analyzer, Solr stores it's output two ways?
One public output, which is similar to the original input
and one hidden or internal output, which is based on the
analyzer's work?
Did I understand that right?


yes.

indexed fields and stored fields are different.

Solr results show stored fields in the results (however facets are
based on indexed fields)

Take a look at Lucene in Action for a better description of what is
happening.  The best tool to get your head around what is happening  
is

probably luke (http://www.getopt.org/luke/)




If yes, I have got another problem:
I don't want to waste any diskspace.


You have control over what is stored and what is indexed -- how that
is configured is up to you.

ryan




--
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: SolJ and query parameters

2010-01-07 Thread Ryan McKinley


On Jan 7, 2010, at 1:05 PM, Jon Poulton wrote:

I've also just noticed that QueryParsing is not in the SolrJ API.  
It's in one of the other Solr jar dependencies.


I'm beginning to think that maybe the best approach it to write a  
query string generator which can generate strings of the form:


q={!lucene q.op=AND df=text}myfield:foo +bar -baz

Then just set this on a SolrQuery instance and send it over the  
wire. It not the kind of string you'd want an end user to have to  
type out.




Yes, if you need to manipulate the local params, that seems like a  
good approach.


Solrj was written before the local params syntax was introduced.

A patch that adds LocalParams support to solrj would be welcome :)

ryan




Re: Corrupted Index

2010-01-07 Thread Ryan McKinley

what version of solr are you running?


On Jan 7, 2010, at 3:08 PM, Jake Brownell wrote:


Hi all,

Our application uses solrj to communicate with our solr servers. We  
started a fresh index yesterday after upping the maxFieldLength  
setting in solrconfig. Our task indexes content in batches and all  
appeared to be well until noonish today, when after 40k docs, I  
started seeing errors. I've placed three stack traces below, the  
first occurred once and was the initial error, the second occurred a  
few times before the third started occurring on each request. I'd  
really appreciate any insight into what could have caused this, a  
missing file and then a corrupt index. If you know we'll have to  
nuke the entire index and start over I'd like to know that too-oddly  
enough searches against the index appear to be working.


Thanks!
Jake

#1

January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1
January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ 
_fsk_1uj.del (No such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No  
such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update
January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ 
_fsk_1uj.del (No such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No  
such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update
org.benetech.exception.WrappedException   
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)

org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)

org 
.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)

   org.apache.solr.client.solrj.SolrServer#commit(86)
   org.apache.solr.client.solrj.SolrServer#commit(75)
   org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
   org.bookshare.search.solr.SolrSearchEngine#index(232)

org 
.bookshare 
.service.task.SearchEngineIndexingTask#initialInstanceLoad(95)

   org.bookshare.service.task.SearchEngineIndexingTask#run(53)
   org.bookshare.service.scheduler.TaskWrapper#run(233)
   java.util.TimerThread#mainLoop(512)
   java.util.TimerThread#run(462)
Caused by:
solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update
org.apache.solr.common.SolrException
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)

org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)

org 
.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)

   org.apache.solr.client.solrj.SolrServer#commit(86)
   org.apache.solr.client.solrj.SolrServer#commit(75)
   org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
   org.bookshare.search.solr.SolrSearchEngine#index(232)

org 
.bookshare 
.service.task.SearchEngineIndexingTask#initialInstanceLoad(95)

   org.bookshare.service.task.SearchEngineIndexingTask#run(53)
   org.bookshare.service.scheduler.TaskWrapper#run(233)
   java.util.TimerThread#mainLoop(512)
   java.util.TimerThread#run(462)

#2

January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1
January 7, 2010 12:10:10 PM CST  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update
January 7, 2010 12:10:10 PM CST  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update
org.benetech.exception.WrappedException   
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)


Re: No Analyzer, tokenizer or stemmer works at Solr

2010-01-06 Thread Ryan McKinley


On Jan 6, 2010, at 3:48 PM, MitchK wrote:



I have tested a lot and all the time I thought I set wrong options  
for my

custom analyzer.
Well, I have noticed that Solr isn't using ANY analyzer, filter or  
stemmer.

It seems like it only stores the original input.


The stored value is always the original input.

The *indexed* values are transformed by analysis.

If you really need to store the analyzed fields, that may be possible  
with an UpdateRequestProcessor.  also see:

https://issues.apache.org/jira/browse/SOLR-314

ryan


Re: how to do a Parent/Child Mapping using entities

2009-12-30 Thread Ryan McKinley

Ya, structured data gets a little funny.

For starters, the order of multi-valued fields should be maintained,  
so if you have:


doc
 field name=urlhttp://aaa/field
 field name=url_rank5/field
 field name=urlhttp://bbb/field
 field name=url_rank4/field
/doc

the response will return result in order, so you can map them with  
array indicies.


I have played some tricks with a JSON field analyzer that give you  
some more control.


For example, if you index:

doc
 field name=url{ url:http://host/;, rank:5 }/field
/doc

Then I use an analyzer that indexes the terms:
  url:http://host/
  rank:5

I just posted SOLR-1690, if you want to take a look at that approach

ryan


On Dec 30, 2009, at 4:25 AM, magui wrote:



Thanks Sascha for your post, but i find it interresting, but in my  
case i
don't want to use an additionnal field, i want to be able with the  
same
schema to do a simple query like : q=res_url:some url, and a query  
like

the other one;
in other word; is there any solution to make two or more multivalued  
fields

in the same document linked with each other, e.g:
in this result:

- result name=response numFound=1 start=0
- doc
 str name=id1/str
 str name=keywordKey1/str
- arr name=res_url
 strurl1/str
 strurl2/str
 strurl3/str
 strurl4/str
 /arr
- arr name=res_rank
 str1/str
 str2/str
 str3/str
 str4/str
 /arr
 /doc
 /result

i would like to make solr understand that for this document,  
value:url1 of
res_url field is linked to value:1 of res_rank field, and all of  
them

are linked to the commen field keyword.
I think that i should use a custom field analyser or some thing like  
that;

but i don't know what to do.

but thanks for all; and any supplied help will be lovable.


Sascha Szott wrote:


Hi,

you could create an additional index field res_ranked_url that  
contains

the concatenated value of an url and its corresponding rank, e.g.,

res_rank +   + res_url

Then, q=res_ranked_url:1 url1 retrieves all documents with url1  
as the

first url.

A drawback of this workaround is that you have to use a phrase query
thus preventing wildcard searches for urls.

-Sascha



Hello everybody, i would like to know how to create index  
supporting a

parent/child mapping and then querying the child to get the results.
in other words; imagine that we have a database containing 2
tables:Keyword[id(int), value(string)] and Result[id(int),  
res_url(text),

res_text(tex), res_date(date), res_rank(int)]
For indexing, i used the DataImportHandler to import data and it  
works

well,
and my query response seems good:(q=*:*) (imagine that we have  
only this

to
keywords and their results)

  ?xml version=1.0 encoding=UTF-8 ?
-response
-lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
-lst name=params
  str name=q*:*/str
  /lst
  /lst
-result name=response numFound=2 start=0
-doc
  str name=id1/str
  str name=keywordKey1/str
-arr name=res_url
  strurl1/str
  strurl2/str
  strurl3/str
  strurl4/str
  /arr
-arr name=res_rank
  str1/str
  str2/str
  str3/str
  str4/str
  /arr
  /doc
-doc
  str name=id2/str
  str name=keywordKey2/str
-arr name=res_url
  strurl1/str
  strurl5/str
  strurl8/str
  strurl7/str
  /arr
-arr name=res_rank
  str1/str
  str2/str
  str3/str
  str4/str
  /arr
  /doc
  /result
  /response

but the problem is when i tape a query kind of  
this:q=res_url:url2 AND
res_rank:1 and this to say that i want to search for the keywords  
in

which
the url (url2) is ranked at the first position, i have a result like
this:

?xml version=1.0 encoding=UTF-8 ?
-response
-lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
-lst name=params
  str name=qres_url:url2 AND res_rank:1/str
  /lst
  /lst
-result name=response numFound=1 start=0
-doc
  str name=id1/str
  str name=keywordKey1/str
-arr name=res_url
  strurl1/str
  strurl2/str
  strurl3/str
  strurl4/str
  /arr
-arr name=res_rank
  str1/str
  str2/str
  str3/str
  str4/str
  /arr
  /doc
  /result
  /response

But this is not true; because the url present in the 1st position  
in the

results of the keyword key1 is url1 and not url2.
So what i want to say is : is there any solution to make the  
values of

the
multivalued fields linked;
so in our case we can see that the previous result say that:
 - url1 is present in 1st position of key1 results
 - url2 is present in 2nd position of key1 results
 - url3 is present in 3rd position of key1 results
 - url4 is present in 4th position of key1 results

and i would like that solr consider this when executing queries.

Any helps please; and thanks for all :)






--
View this message in context: 
http://old.nabble.com/how-to-do-a-Parent-Child-Mapping-using-entities-tp26956426p26965478.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: SOLR or Hibernate Search?

2009-12-29 Thread Ryan McKinley

If you need to search via the Hibernate API, then use hibernate search.

If you need a scaleable HTTP (REST) then solr may be the way to go.

Also, i don't think hibernate has anything like the faceting / complex  
query stuff etc.




On Dec 29, 2009, at 3:25 PM, Márcio Paulino wrote:


Hey Everyone!

I was make a comparison of both technologies (SOLR AND Hibernate  
Search) and
i see many things are equals. Anyone could told me when i must use  
SOLR and

when i must use Hibernate Search?

Im my project i will have:

1. Queries for indexed fields (Strings) and for not indexed Fields  
(Integer,
Float, Date). [In Hibernate Search on in SOLR, i must search on  
index and,
with results of query, search on database (I can't search in both  
places ate

same time).]
I Will Have search like:
Give me all Register Where Value  190 And Name Contains = 'JAVA' 

2. My client need process a lot of email (20.000 per day) and i must  
indexed
all fields (excluded sentDate ) included Attachments, and  
performance is

requirement of my System

3. My Application is multiclient, and i need to separate the index by
client.

In this Scenario, whats the best solution? SOLR or HIbernateSearch

I See SOLR is a dedicated server and has a good performance test. I  
don't
see advantages to use hibernate-search in comparison with SOLR  
(Except the

fact of integrate with my Mapped Object)

Thanks for Help

--
att,

**
Márcio Paulino
Campo Grande - MS
MSN / Gtalk: mcopaul...@gmail.com
ICQ: 155897898
**




Re: logger in embedded solr

2009-11-19 Thread Ryan McKinley

check:
http://wiki.apache.org/solr/SolrLogging

if you are using 1.4 you want to drop in the slf4j-log4j jar file and  
then it should read your log4j configs



On Nov 19, 2009, at 2:15 PM, Harsch, Timothy J. (ARC-TI)[PEROT  
SYSTEMS] wrote:



Hi all,
I have an J2EE application using embedded solr via solr4j.  It seems  
the logging that SOLR produces has a mind of its own, and is not  
changeable via my log4j.properties.  In fact I know this because I  
wired in a Log4J config listener in my web.xml and redirected all my  
logs to a custom location.  Which works, but now all my messages go  
to the custom location and all the embedded SOLR messages are still  
going into catalina.out.  How can I get access to the logger of the  
Embedded SOLR.


Thanks,
Tim Harsch
Sr. Software Engineer
Perot Systems





Re: Missing slf4j jar in solr 1.4.0 distribution?

2009-11-18 Thread Ryan McKinley
Solr includes slf4j-jdk14-1.5.5.jar, if you want to use the nop (or  
log4j, or loopback) impl you will need to include that in your own  
project.


Solr uses slf4j so that each user can decide their logging  
implementation, it includes the jdk version so that something works  
off-the-shelf, but if you want more control, then you can switch in  
whatever you want.


ryan


On Nov 18, 2009, at 1:22 AM, Per Halvor Tryggeseth wrote:

Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar  
file missing in solrj-lib, so I suggest that it should be included  
in the next release.


Per Halvor





-Opprinnelig melding-
Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sendt: 17. november 2009 20:51
Til: 'solr-user@lucene.apache.org'
Emne: Re: Missing slf4j jar in solr 1.4.0 distribution?


: I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a
: required slf4j jar was missing in the distribution (i.e.
: apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError:
: org/slf4j/impl/StaticLoggerBinder when using solrj
   ...
: Have I overlooked something or are not all necessary classes  
required

: for using solrj in solr 1.4.0 included in the distribution?

Regretably, Solr releases aren't particularly consistent about where  
third-party libraries can be found.


If you use the the pre-built war, the 'main' dependencies are  
allready bunlded into it.  If you want to roll your own, you need to  
look at the ./lib directory -- ./dist is only *suppose* to  
contain the artifacts built from solr source But that solrj-lib  
directory can be confusing)...


hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-*
lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar

-Hoss





Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ryan McKinley

It looks like solr+spatial will get some attention in 1.5, check:
https://issues.apache.org/jira/browse/SOLR-1561

Depending on your needs, that may be enough.  More robust/scaleable  
solutions will hopefully work their way into 1.5 (any help is always  
appreciated!)



On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:


Hey,

  I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr 
)
points to pretty old documentation. Is there a better document I  
refer to

for the setting up of LocalSolr and some performance analysis?

  Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a  
LocalSolr

lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and  
notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene  
package
is in Lucene codebase and the package name is  
org.apache.lucene.spatial blah

blah.

  But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/   
does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase  
directly.
After I restart tomcat, I could not load solr admin page. The error  
is as

follows. It looks solr is still looking for
old named classes.

 Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your  
log files
for more detailed information on what may be wrong. If you want solr  
to

continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at  
java.lang.Class.forName0(Native

Method) at java.lang.Class.forName(Class.java:247) at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 
833) at

org.apache.solr.core.SolrCore.init(SolrCore.java:551) at
org.apache.solr.core.CoreContainer 
$Initializer.initialize(CoreContainer.java:137)

at
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
83)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)

at
org 
.apache 
.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4222)

at
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access 
$0(ContainerBase.java:744)

at
org.apache.catalina.core.ContainerBase 
$PrivilegedAddChild.run(ContainerBase.java:144)

at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
544) at
org 
.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 
626)

at
org 
.apache 
.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
488) at

org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org 
.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)

at
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1022) at

org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443) at
org.apache.catalina.core.StandardService.start(StandardService.java: 
448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java: 
700) at

org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 

Re: The status of Local/Geo/Spatial/Distance Solr

2009-11-13 Thread Ryan McKinley

Also:
https://issues.apache.org/jira/browse/SOLR-1302


On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote:


Hey,

  I am interested in using LocalSolr to go Local/Geo/Spatial/Distance
search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr 
)
points to pretty old documentation. Is there a better document I  
refer to

for the setting up of LocalSolr and some performance analysis?

  Just sync-ed Solr codebase and found LocalSolr is still NOT in the
contrib package. Do we have a plan to incorporate it? I download a  
LocalSolr

lib localsolr-1.5.jar from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and  
notice
that the namespace is com.pjaol.search. blah blah, while LocalLucene  
package
is in Lucene codebase and the package name is  
org.apache.lucene.spatial blah

blah.

  But localsolr-1.5.jar from from
http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/   
does not
work with lucene-spatial-3.0-dev.jar I build from Lucene codebase  
directly.
After I restart tomcat, I could not load solr admin page. The error  
is as

follows. It looks solr is still looking for
old named classes.

 Thanks.

HTTP Status 500 - Severe errors in solr configuration. Check your  
log files
for more detailed information on what may be wrong. If you want solr  
to

continue after configuration errors, change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in null
-
java.lang.NoClassDefFoundError:
com/pjaol/search/geo/utils/DistanceFilter at  
java.lang.Class.forName0(Native

Method) at java.lang.Class.forName(Class.java:247) at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at
org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at
org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 
833) at

org.apache.solr.core.SolrCore.init(SolrCore.java:551) at
org.apache.solr.core.CoreContainer 
$Initializer.initialize(CoreContainer.java:137)

at
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
83)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

at
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)

at
org 
.apache 
.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at  
org.apache.catalina.core.StandardContext.start(StandardContext.java: 
4222)

at
org 
.apache 
.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
at org.apache.catalina.core.ContainerBase.access 
$0(ContainerBase.java:744)

at
org.apache.catalina.core.ContainerBase 
$PrivilegedAddChild.run(ContainerBase.java:144)

at java.security.AccessController.doPrivileged(Native Method) at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 
738) at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 
544) at
org 
.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 
626)

at
org 
.apache 
.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 
488) at

org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at
org 
.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
311)

at
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1022) at

org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 
1014) at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 
443) at
org.apache.catalina.core.StandardService.start(StandardService.java: 
448) at
org.apache.catalina.core.StandardServer.start(StandardServer.java: 
700) at

org.apache.catalina.startup.Catalina.start(Catalina.java:552) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597) 

Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread Ryan McKinley
The HTMLStripCharFilter will strip the html for the *indexed* terms,  
it does not effect the *stored* field.


If you don't want html in the stored field, can you just strip it out  
before passing to solr?



On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:


Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example centercontent/center if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
adddoc boost=1.0field name=idhttp://haha.com/fieldfield
name=textlt;centergt;contentlt;/centergt;/field/doc/add

Any help is highly appreciated. Thanks.

--
Aseem




Re: Problems downloading lucene 2.9.1

2009-11-02 Thread Ryan McKinley


On Nov 2, 2009, at 8:29 AM, Grant Ingersoll wrote:



On Nov 2, 2009, at 12:12 AM, Licinio Fernández Maurelo wrote:


Hi folks,

as we are using an snapshot dependecy to solr1.4, today we are  
getting
problems when maven try to download lucene 2.9.1 (there isn't a any  
2.9.1

there).

Which repository can i use to download it?


They won't be there until 2.9.1 is officially released.  We are  
trying to speed up the Solr release by piggybacking on the Lucene  
release, but this little bit is the one downside.


Until then, you can add a repo to:

http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven/




Re: Programmatically configuring SLF4J for Solr 1.4?

2009-11-01 Thread Ryan McKinley
I'm sure it is possible to configure JDK logging (java.util.loging)  
programatically... but I have never had much luck with it.


It is very easy to configure log4j programatically, and this works  
great with solr.


To use log4j rather then JDK logging, simply add slf4j- 
log4j12-1.5.8.jar (from http://www.slf4j.org/download.html) to your  
classpath


ryan



On Nov 1, 2009, at 11:05 PM, Don Werve wrote:

So, I've spent a bit of the day banging my head against this, and  
can't get

it sorted.  I'm using a DirectSolrConnection embedded in a JRuby
application, and everything works great, except I can't seem to get  
it to do

anything except log to the console.  I've tried pointing
'java.util.logging.config.file' to a properties file, as well as  
specifying
a logfile as part of the constructor for DirectSolrConnection, but  
so far,

nothing has really worked.

What I'd like to do is programmatically direct the Solr logs to a  
logfile,
so that I can have my app start up, parse its config, and throw the  
Solr

logs where they need to go based on that.

So, I don't suppose anybody has a code snippet (in Java) that sets  
up SLF4J
for Solr logging (and that doesn't reference an external properties  
file)?


Using the latest (1 Nov 2009) nightly build of Solr 1.4.0-dev




Re: (Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?

2009-10-14 Thread Ryan McKinley


I wonder why the common classes are in the solrj JAR?
Is the solrj JAR not just for the clients?


the solr server uses solrj for distributed search.  This makes solrj  
the general way to talk to solr (even from within solr)






releasing memory?

2009-10-08 Thread Ryan McKinley

Hello-

I have an application that can run in the background on a user Desktop  
-- it will go through phases of being used and not being used.  I want  
to be able to free as many system resources when not in use as possible.


Currently I have a timer that wants for 10 mins of inactivity and  
releases a bunch of memory (unrelated to lucene/solor).  Any  
suggestion on the best way to do this in lucene/solor?  perhaps reload  
a core?


thanks for any pointers
ryan


Re: Solrj possible deadlock

2009-09-23 Thread Ryan McKinley

do you have anything custom going on?

The fact that the lock is in java2d seems suspicious...


On Sep 23, 2009, at 7:01 PM, pof wrote:



I had the same problem again yesterday except the process halted  
after about

20mins this time.


pof wrote:


Hello, I was running a batch index the other day using the Solrj
EmbeddedSolrServer when the process abruptly froze in it's tracks  
after
running for about 4-5 hours and indexing ~400K documents. There  
were no
document locks so it would seem likely that there was some kind of  
thread
deadlock. I was hoping someone might be able to tell me some  
information

about the following thread dump taken at the time:

Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode):

DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition
[0x..0x0018a044]
  java.lang.Thread.State: RUNNABLE

Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in  
Object.wait()

[0x0311d000..0x0311def4]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue 
$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
133)

   - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
149)

   at sun.java2d.Disposer.run(Disposer.java:143)
   at java.lang.Thread.run(Thread.java:636)

pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on  
condition

[0x08a6a000..0x08a6b074]
  java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x967acfd0 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer 
$ConditionObject)

   at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at
java.util.concurrent.locks.AbstractQueuedSynchronizer 
$ConditionObject.await(AbstractQueuedSynchronizer.java:1978)

   at
java 
.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java: 
386)

   at
java 
.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 
1043)

   at
java 
.util 
.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 
1103)

   at
java.util.concurrent.ThreadPoolExecutor 
$Worker.run(ThreadPoolExecutor.java:603)

   at java.lang.Thread.run(Thread.java:636)

Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5  
runnable

[0x..0x]
  java.lang.Thread.State: RUNNABLE

CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on
condition [0x..0x096a7af4]
  java.lang.Thread.State: RUNNABLE

Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting  
on

condition [0x..0x]
  java.lang.Thread.State: RUNNABLE

Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait()
[0x005ca000..0x005caef4]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue 
$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
133)

   - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
149)
   at java.lang.ref.Finalizer 
$FinalizerThread.run(Finalizer.java:177)


Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in
Object.wait() [0x00579000..0x00579d74]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:502)
   at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
   - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock)

VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable

VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on
condition

JNI global references: 1072

Heap
def new generation   total 36288K, used 23695K [0x93f1,  
0x9667,

0x9667)
 eden space 32256K,  73% used [0x93f1, 0x95633f60, 0x95e9)
 from space 4032K,   0% used [0x95e9, 0x95e9, 0x9628)
 to   space 4032K,   0% used [0x9628, 0x9628, 0x9667)
tenured generation   total 483968K, used 72129K [0x9667,  
0xb3f1,

0xb3f1)
  the space 483968K,  14% used [0x9667, 0x9ace04b8, 0x9ace0600,
0xb3f1)
compacting perm gen  total 23040K, used 22983K [0xb3f1,  
0xb559,

0xb7f1)
  the space 23040K,  99% used [0xb3f1, 0xb5581ff8, 0xb5582000,
0xb559)
No shared spaces configured.

Cheers. Brett.



--
View this message in context: 
http://www.nabble.com/Solrj-possible-deadlock-tp25530146p25531321.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Solr SVN build problem

2009-09-12 Thread Ryan McKinley

Should be fixed in trunk.  Try updating and see if it works for you

See:
https://issues.apache.org/jira/browse/SOLR-1424



On Sep 9, 2009, at 8:12 PM, Allahbaksh Asadullah wrote:


Hi ,
I am building Solr from source. During building it from source I am  
getting

following error.

generate-maven-artifacts:
   [mkdir] Created dir: c:\Downloads\solr_trunk\build\maven
   [mkdir] Created dir: c:\Downloads\solr_trunk\dist\maven
[copy] Copying 1 file to
c:\Downloads\solr_trunk\build\maven\c:\Downloads\s
olr_trunk\src\maven

BUILD FAILED
c:\Downloads\solr_trunk\build.xml:741: The following error occurred  
while

execut
ing this line:
c:\Downloads\solr_trunk\common-build.xml:261: Failed to copy
c:\Downloads\solr_t
runk\src\maven\solr-parent-pom.xml.template to
c:\Downloads\solr_trunk\build\mav
en\c:\Downloads\solr_trunk\src\maven\solr-parent-pom.xml.template  
due to

java.io
.FileNotFoundException
c:\Downloads\solr_trunk\build\maven\c:\Downloads\solr_tru
nk\src\maven\solr-parent-pom.xml.template (The filename, directory  
name, or

volu
me label syntax is incorrect)

Regards,
Allahbaksh




Re: If field A is empty take field B. Functionality available?

2009-08-28 Thread Ryan McKinley

can you just add a new field that has the real or ave price?
Just populate that field at index time...  make it indexed but not  
stored


If you want the real or average price to be treated the same in  
faceting, you are really going to want them in the same field.



On Aug 28, 2009, at 1:16 PM, Britske wrote:



I have 2 fields:
realprice
avgprice

I'd like to be able to take the contents of avgprice if realprice is  
not

available.
due to design the average price cannot be encoded in the 'realprice'- 
field.


Since I need to be able to filter, sort and facet on these fields,  
it would
be really nice to be able to do that just on something like a  
virtual-field
called 'price' or something. That field should contain the  
conditional logic

to know from which actual field to take the contents from.

I was looking at using functionqueries, but to me knowledge these  
can't be

used to filter and facet on.

Would creating a custom field work for this or does a field know  
nothing
from its sibling-fields? What would performance impact be like,  
since this

is really important in this instance.

Any better ways? Subclassing standardrequestHandler and hacking it all
together seems rather ugly to me, but if it's needed...

Thanks,
Geert-Jan

--
View this message in context: 
http://www.nabble.com/If-field-A-is-empty-take-field-B.-Functionality-available--tp25193668p25193668.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Why isn't this working?

2009-08-27 Thread Ryan McKinley


On Aug 27, 2009, at 10:35 PM, Paul Tomblin wrote:


Yesterday or the day before, I asked specifically if I would need to
restart the Solr server if somebody else loaded data into the Solr
index using the EmbeddedServer, and I was told confidently that no,
the Solr server would see the new data as soon as it was committed.
So today I fired up the Solr server (and after making
apache-tomcat-6.0.20/solr/data a symlink to where the Solr data really
lives and restarting the web server), and did some queries.  Then I
ran a program that loaded a bunch of data and committed it.  Then I
did the queries again.  And the new data is NOT showing.  Using Luke,
I can see 10022 documents in the index, but the Solr statistics page
(http://localhost:8080/solrChunk/admin/stats.jsp) is still showing
8677, which is how many there were before I reloaded the data.

So am I doing something wrong, or was the assurance I got yesterday
that this is possible wrong?



did not follow the advice from yesterday... but...

the commit word can be a but misleading, it could also be called  
reload


Say you have an embedded solr server and an http solr server pointed  
to the same location.

1.  make sure only is read only!  otherwise you can make a mess.
2. calling commit on the embedded solr instance, will not have any  
effect on the http instance UNTIL you call commit (reload) on the http  
instance.


ryan


Re: ${solr.abortOnConfigurationError:false} - does it defaults to false

2009-08-26 Thread Ryan McKinley


On Aug 26, 2009, at 3:33 PM, djain101 wrote:



I have one quick question...

If in solrconfig.xml, if it says ...

abortOnConfigurationError${solr.abortOnConfigurationError:false}/ 
abortOnConfigurationError


does it mean abortOnConfigurationError defaults to false if it is  
not set

as system property?



correct



Re: Solr-773 (GEO Module) question

2009-08-19 Thread Ryan McKinley


On Aug 19, 2009, at 6:45 AM, johan.sjob...@findwise.se wrote:


Hi,


we're glancing at the GEO search module known from the jira issue 773
(http://issues.apache.org/jira/browse/SOLR-773).


It seems to us that the issue is still open and not yet included in  
the

nightly builds.


correct



Is there a release plan for the nightly builds, and is this module
considered core or contrib?



activity on the nightly builds is winding down as we gear up for the  
1.4 release.


After 1.4 is out, I expect progress on the geo stuff.  It will be in  
contrib (not core) and will likely be marked experimental for a  
while.  That is, stuff will be added without the expectation that the  
interfaces will be set in stone.


best
ryan


Re: Posting data in JSON

2009-07-30 Thread Ryan McKinley

check:
https://issues.apache.org/jira/browse/SOLR-945

this will not likely make it into 1.4



On Jul 30, 2009, at 1:41 PM, Jérôme Etévé wrote:


Hi,

 Nope, I'm not using solrj (my client code is in Perl), and I'm with  
solr 1.3.


J.

2009/7/30 Shalin Shekhar Mangar shalinman...@gmail.com:
On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé  
jerome.et...@gmail.com

wrote:


Hi All,

I'm wondering if it's possible to post documents to solr in JSON  
format.


JSON is much faster than XML to get the queries results, so I think
it'd be great to be able to post data in JSON to speed up the  
indexing

and lower the network load.


If you are using Java,Solrj on 1.4 (trunk), you can use the binary  
format
which is extremely compact and efficient. Note that with Solr/Solrj  
1.3,

binary became the default response format for Solrj clients.

--
Regards,
Shalin Shekhar Mangar.





--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net




Re: LocalSolr - order of fields on xml response

2009-07-22 Thread Ryan McKinley
ya...  'expected', but perhaps not ideal.  As is, LocalSolr munges the  
document on its way out the door to add the distance.


When LocalSolr makes it into the source, it will likely use a method  
like:

 https://issues.apache.org/jira/browse/SOLR-705
to augment each document with the calculated distance.

This will at least have consistent behavior.



On Jul 22, 2009, at 10:47 AM, Daniel Cassiano wrote:


Hi folks,

When I do some query with LocalSolr to get the geo_distance, the  
order of

xml fields is different of a standard query.
It's a simple query, like this:
http://myhost.com:8088/solr/core/select?qt=geox=-46.01y=-23.01radius=15sort=geo_distanceascq=*:*

Is this an expected behavior of LocalSolr?


Thanks!

--
Daniel Cassiano
_
http://www.apontador.com.br/
http://www.maplink.com.br/




Re: Solr JMX and Cacti

2009-07-20 Thread Ryan McKinley


On Jul 20, 2009, at 8:47 AM, Edward Capriolo wrote:


Hey all,

We have several deployments of Solr across our enterprise. Our largest
one is a several GB and when enough documents are added an OOM
exception is occurring.

To debug this problem I have enable JMX. My goal is to write some
cacti templates similar to the ones I have done for hadoop.
http://www.jointhegrid.com/hadoop/. The only cacti template for solr I
have found is old, broken and is using curl and PHP to try and read
the values off the web interface. I have a few general
questions/comments and also would like to know how others are dealing
with this.

1) SNMP has counters/gauges. With JMX it is hard to know what a
variable is without watching it for a while. Some fields are obvious,
(total_x) (cumulative_x) it is worth wild to add some notes in the
MBEAN info to say works like counter works like gauge. This way a
network engineer like me does not have to go code surfing to figure
out how to graph them.

Has anyone written up a list of what the attributes are, types, and
what they mean?

2) The values that are not counter style I am assuming are sampled,
what is the sampling rate and is it adjustable?

Any tips are helpful. Thank you,


Check:
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/RequestHandlerBase.java

For cacti, you should probably ignore the two 'rate' based  
calculations as they are just derivatives:
lst.add(avgTimePerRequest, (float) totalTime / (float)  
this.numRequests); lst.add(avgRequestsPerSecond, (float)  
numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart));






Re: SolrJ embedded server : error while adding document

2009-07-20 Thread Ryan McKinley

not sure what you mean...  yes, i guess...

you send a bunch of requests with add( doc/collection ) and they are  
not visible until you send commit()



On Jul 20, 2009, at 9:07 AM, Gérard Dupont wrote:

my mistake, pb with the buffer I added. But it raises a question :  
does solr
(using embedded server) has its own buffer mechanism in indexing or  
not ? I

guess not but I might be wrong.

2009/7/20 Gérard Dupont ger.dup...@gmail.com


Hi SolR guys,

I'm starting to play with SolR after few years with classic Lucene.  
I'm
trying to index a single document using the embedded server, but I  
got a
strange error which looks like XML parsing problem (see trace  
hereafter). To
add details, this is a simple Junit which create single document  
then pass
it to the server in a ArraylistSolrInputDocument. The document  
only have 2

fields id and text as it is described in the configuration.

ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
   at
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:114)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
   at
org 
.apache 
.solr 
.client 
.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 
147)

   at
org 
.apache 
.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 
217)

   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
   at
org 
.weblab_project 
.services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132)

   at
org 
.weblab_project 
.services 
.solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun 
.reflect 
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

   at
sun 
.reflect 
.DelegatingMethodAccessorImpl 
.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:154)
   at junit.framework.TestCase.runBare(TestCase.java:127)
   at junit.framework.TestResult$1.protect(TestResult.java:106)
   at junit.framework.TestResult.runProtected(TestResult.java:124)
   at junit.framework.TestResult.run(TestResult.java:109)
   at junit.framework.TestCase.run(TestCase.java:118)
   at junit.framework.TestSuite.runTest(TestSuite.java:208)
   at junit.framework.TestSuite.run(TestSuite.java:203)
   at
org 
.eclipse 
.jdt 
.internal 
.junit 
.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)

   at
org 
.eclipse 
.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)

   at
org 
.eclipse 
.jdt 
.internal 
.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)

   at
org 
.eclipse 
.jdt 
.internal 
.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)

   at
org 
.eclipse 
.jdt 
.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java: 
386)

   at
org 
.eclipse 
.jdt 
.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 
196)


Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=/update params={} status=500 QTime=6
Cannot flush the index buffer : Server error while adding documents

--
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory





--
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org

Document  Learning team - LITIS Laboratory




Luke / get doc count for each term

2009-06-16 Thread Ryan McKinley
Hi-

I'm trying to use the LukeRequestHandler with an index of ~9 million
docs.  I know that counting the top / distinct terms for each field is
expensive and can take a LONG time to return.

Is there a faster way to check the number of documents for each field?
 Currently this gets the doc count for each term:

  if( sfield != null  sfield.indexed() ) {
Query q = qp.parse( fieldName+:[* TO *] );
int docCount = searcher.numDocs( q, matchAllDocs );
...

Looking at it again, that could be replaced with:

  if( sfield != null  sfield.indexed() ) {
Query q = qp.parse( fieldName+:[* TO *] );
int docCount = searcher.getDocSet( q ).size();
...

Is there any faster option then running a query for each field?

thanks
ryan


Re: Luke / get doc count for each term

2009-06-16 Thread Ryan McKinley


On Jun 16, 2009, at 5:21 PM, Grant Ingersoll wrote:



On Jun 16, 2009, at 1:57 PM, Ryan McKinley wrote:



Is there a faster way to check the number of documents for each  
field?

Currently this gets the doc count for each term:



In the past, I've created a field that contains the names of the  
Fields present on the document.  Then, simply facet on the new  
Field.  I think that gets you what you want and the mechanism is all  
built in to Solr and is quite speedy.



makes sense -- i like this idea.

ryan


filter on millions of IDs from external query

2009-06-03 Thread Ryan McKinley
I am working with an in index of ~10 million documents.  The index  
does not change often.


I need to preform some external search criteria that will return some  
number of results -- this search could take up to 5 mins and return  
anywhere from 0-10M docs.


I would like to use the output of this long running query as a filter  
in solr.


Any suggestions on how to wire this all together?

My initial ideas (I have not implemented anything yet -- just want to  
check with you all before starting down the wrong path) is to:
* assume the index will always be optimized, in this case every id  
maps to a lucene int id.

* Store the results of the expensive query as a bitset.
* use the stored bitset in the lucene query.

I'm sure I can get this to work, but it seems kinda ugly (and  
brittle).  Any better thoughts on how to do this?  If we had some sort  
of external tagging interface, each document could just get tagged  
with what query it matches.


thanks
ryan




Re: When searching for !...@#$%^*() all documents are matched incorrectly

2009-05-30 Thread Ryan McKinley
two key things to try (for anyone ever wondering why a query matches documents)

1.  add debugQuery=true and look at the explain text below --
anything that contributed to the score is listed there
2.  check /admin/analysis.jsp -- this will let you see how analyzers
break text up into tokens.

Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
something to do with it...


On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote:

 Hi,

 I'm running Solr 1.3/Java 1.6.

 When I run a query like  - (activity_type:NAME) AND title:(\...@#$%\^\*\(\))
 all the documents are returned even though there is not a single match.
 There is no title that matches the string (which has been escaped).

 My document structure is as follows

 doc
 str name=activity_typeNAME/str
 str name=titleBathing/str
 
 /doc


 The title field is of type text_title which is described below.

 fieldType name=text_title class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        !-- in this example, we will only use synonyms at query time
        filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
        --
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/

      /analyzer
    /fieldType

 When I run the query against Luke, no results are returned. Any suggestions
 are appreciated.


 --
 View this message in context: 
 http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: getting all rows from SOLRJ client using setRows method

2009-05-21 Thread Ryan McKinley


careful what you ask for...  what if you have a million docs?  will  
you get an OOM?


Maybe a better solution is to run a loop where you grab a bunch of  
docs and then increase the start value.


but you can always use:
query.setRows( Integer.MAX_VALUE )

ryan


On May 21, 2009, at 8:37 PM, darniz wrote:



Hello
is there a way you can get all the results back from SOLR when  
querying

solrJ client

my gut feeling was that this might work
query.setRows(-1)

The way is to change the configuration xml file, but that like hard  
coding
the configuration, and there also i have to set some valid number, i  
cant

say return all rows.

Is there a way to done through query.

Thanks
rashid


--
View this message in context: 
http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: How to retrieve all available Cores in a static way ?

2009-05-20 Thread Ryan McKinley

I cringe to suggest this but you can use the deprecated call:
 SolrCore.getSolrCore().getCoreContainer()


On May 19, 2009, at 11:21 AM, Giovanni De Stefano wrote:


Hello all,

I have a quick question but I cannot find a quick answer :-)

I have a Java client running on the same JVM where Solr is running.

The Solr I have is a multicore.

How can I retrieve from the Java client the different cores available?

I tried with:

...
CoreContainer container = new CoreContainer();
CollectionSolrCore cores = container.getCores();
...

but I get nothing useful... :-(

Is there any static method that lets me get this collection?

Thanks a lot!

Giovanni




Re: multicore for 20k users?

2009-05-18 Thread Ryan McKinley
since there is so little overlap, I would look at a core for each  
user...


However, to manage 20K cores, you will not want to use the off the  
shelf core management implementation to maintain these cores.   
Consider overriding SolrDispatchFilter to initialize a CoreContainer  
that you manage.



On May 17, 2009, at 10:11 PM, Chris Cornell wrote:


On Sun, May 17, 2009 at 8:38 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:


Chris,

Yes, disk space is cheap, and with so little overlap you won't gain  
much by putting everything in a single index.  Plus, when each user  
has a separate index, it's easy to to split users and distribute  
over multiple machines if you ever need to do that, it's easy and  
fast to completely reindex one user's data without affecting other  
users, etc.


Several years ago I built Simpy at http://www.simpy.com/ that way  
(but pre-Solr, so it uses Lucene directly) and never regretted it.   
There are way more than 20K users there with many searches per  
second and with constant indexing.  Each user has an index for  
bookmarks and an index for notes.  Each group has its own index,  
shared by all group members.  The main bookmark search is another  
index.  People search is yet another index.  And so on.  Single  
server.




Thankyou very much for your insight and experience, sounds like we
shouldn't be thinking about prematurely optimizing this.

Has someone actually used multicore this way, though?  With  
thousands of them?


Independently of advice in that regard, I guess our next step is to
explore and create some dummy scenarios/tests to try and stress
multicore (search latency is not as much of a factor as memory usage
is).  I'll report back on any conclusion we come to.

Thanks!
Chris




Re: multicore for 20k users?

2009-05-17 Thread Ryan McKinley

how much overlap is there with the 20k user documents?

if you create a separate index for each of them will you be indexing  
90% of the documents 20K times?  How many total documents could an  
individual user typically see?  How many total distinct documents are  
you talking about?  Is the indexing strategy the same for all users?   
(the same analysis etc)


Is it actually possible to limit visibility by role rather then user?

I would start with trying to put everything in one index -- if that is  
not possible, then look at a multi-core option.




On May 17, 2009, at 5:53 PM, Chris Cornell wrote:


Trying to create a search solution for about 20k users at a company.
Each person's documents are private and different (some overlap... it
would be nice to not have to store/index copies).

Is multicore something that would work or should we auto-insert a
facet into each query generated by the person?

Thanks for any advice, I am very new to solr.  Any tiny push in the
right direction would be appreciated.

Thanks,
Chris




Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread Ryan McKinley
right -- which one you pick will depend more on your runtime  
environment then anything else.


If you need to hit a server (on a different machine)  
CommonsHttpSolrServer is your only option.


If you are running an embedded application -- where your custom code  
lives in the same JVM as solr -- you can use EmbeddedSolrServer.  The  
nice thing is that since they are the same interface, you can change  
later.


The performance comments on the wiki can be a bit misleading -- yes,  
in some cases embedded could be faster, but that may depend on how you  
are sending things -- are you sending 1000s of single document  
requests really fast?  If so, try sending a bunch of documents  
together in one request.


Also consider using the StreamingHttpSolrServer (https://issues.apache.org/jira/browse/SOLR-906 
) -- it has a few quirks, but can be much faster.


In any case, as long as you program against the SolrServer interface,  
then you could swap the implementation as needed.


ryan


On May 14, 2009, at 3:35 PM, Eric Pugh wrote:

CommonsHttpSolrServer is how you access Solr from a Java client via  
HTTP.  You can connect to a Solr running anywhere   
EmbeddedSolrServer starts up Solr internally, and connects directly,  
all in a single JVM...  Embedded may be faster, the jury is out, but  
you have to have your Solr server and your Solr client on the same  
box...   Unless you really need it, I would start with  
CommonsHttpSolrServer, it's easier to configure and get going with  
and more flexible.


Eric


On May 14, 2009, at 1:30 PM, sachin78 wrote:



What is the difference between EmbeddedSolrServer and  
CommonsHttpSolrServer.

Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use  
CommonsHttpSolrServer.


Can anyone please guide me the right path/way.So that i pick the  
right

implementation.

Thanks in advance.

--Sachin
--
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal








Re: Does solrj return result in XML format? If not then how to make it do that.

2009-05-04 Thread Ryan McKinley


use this constructor:

  public CommonsHttpSolrServer(String solrServerUrl, HttpClient  
httpClient, ResponseParser parser) throws MalformedURLException {

this(new URL(solrServerUrl), httpClient, parser, false);
  }

and give it the XMLResponseParser

 -- - - -

Is this just helpful for debugging with packet sniffing?  the XML  
format will be a bit slower then the binary format.


ryan


On May 4, 2009, at 8:22 AM, Erik Hatcher wrote:

Just out of curiosity, what's the use case for getting the result  
back in XML from SolrJ?


Erik

On May 4, 2009, at 8:13 AM, ahmed baseet wrote:

Can we get the results as received by Solrj in XML format? If yes  
how to do
that. I think there must be some way to make solrj returns results  
in XML

format.
I need some pointers in this direction. As I know solrs returns the  
result
in solrdocument format that we've to iterate to extract the fields.  
Thank

you.

--Ahmed.






Re: Does solrj return result in XML format? If not then how to make it do that.

2009-05-04 Thread Ryan McKinley
The point of using solrj is that you don't have to do any parsing  
yourself -- you get access to the results in object form.


If you need to do parsing, just grab the xml directly:
http://host/solr/select?q=*:*wt=xml


On May 4, 2009, at 9:36 AM, ahmed baseet wrote:


As I know when we query solr from solr admin interface we get back the
results in xml format, so thought there must be something similar  
for solrj

as well, which I'll make to go thru an xml parser at the other end and
display all the results in the browser. Otherwise I've to iterate the
solrdocumentlist and create a list[may be] to put the results and  
return it

back to the browser which will handle displaying that list/map etc.

--Ahmed.



On Mon, May 4, 2009 at 5:52 PM, Erik Hatcher e...@ehatchersolutions.com 
wrote:


Just out of curiosity, what's the use case for getting the result  
back in

XML from SolrJ?

  Erik


On May 4, 2009, at 8:13 AM, ahmed baseet wrote:

Can we get the results as received by Solrj in XML format? If yes  
how to

do
that. I think there must be some way to make solrj returns results  
in XML

format.
I need some pointers in this direction. As I know solrs returns  
the result
in solrdocument format that we've to iterate to extract the  
fields. Thank

you.

--Ahmed.








Re: How to index the contents from SVN repository

2009-04-27 Thread Ryan McKinley

I would suggest looking at Apache commons VFS and using the solrj API:

http://commons.apache.org/vfs/

With SVN, you may be able to use the webdav provider.

ryan



On Apr 26, 2009, at 4:08 AM, Ashish P wrote:



Is there any way to index contents of SVN rep in Solr ??
--
View this message in context: 
http://www.nabble.com/How-to-index-the-contents-from-SVN-repository-tp23240110p23240110.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Access HTTP headers from custom request handler

2009-04-23 Thread Ryan McKinley
Right, you will have to build a new war with your own subclass of  
SolrDispatchFilter *rather* then using the packaged one.



On Apr 23, 2009, at 12:34 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



nope.
you must edit the web.xml and register the filter there

On Thu, Apr 23, 2009 at 3:45 PM, Giovanni De Stefano
giovanni.destef...@gmail.com wrote:

Hello Hoss,

thank you for your reply.

I have no problems subclassing the SolrDispatchFilter...but where  
shall I

configure it? :-)

I cannot find any doc/wiki explaining how to configure a custom  
dispatch

filter.

I believe it should be in solrconfig.xml

requestDispatcher ... ... /requestDispatcher

Any idea? Is there a schema for solrconfig.xml? It would make my life
easier... ;-)

Thanks,
Giovanni



On Wed, Apr 15, 2009 at 12:48 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:



: Solr cannot assume that the request would always come from http  
(think
: of EmbeddedSolrServer) .So it assumes that there are only  
parameters


exactly.

: Your best bet is to modify SolrDispatchFilter and readthe params  
and

: set them in the SolrRequest Object

SolrDispatchFilter is designed to be subclassed to make this easy by
overriding the execute method...

 protected void execute( HttpServletRequest req, SolrRequestHandler
handler,
 SolrQueryRequest sreq, SolrQueryResponse  
rsp) {

   sreq.getContext().put( HttpServletRequest, req );
   super.execute( req, handler, sreq, rsp )
 }

-Hoss








--
--Noble Paul




Re: CollapseFilter with the latest Solr in trunk

2009-04-20 Thread Ryan McKinley
I have not looked at this in a while, but I think the biggest thing it  
is missing right now is a champion -- someone to get the patches (and  
bug fixes) to a state where it can easily be committed.  Minor bug  
fixes are road blocks to getting things integrated.


ryan


On Apr 20, 2009, at 10:16 AM, Jeff Newburn wrote:

What are the current issues holding this back?  Seems to be working  
with

some minor bug fixes.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



From: Otis Gospodnetic otis_gospodne...@yahoo.com
Reply-To: solr-user@lucene.apache.org
Date: Sun, 19 Apr 2009 20:30:22 -0700 (PDT)
To: solr-user@lucene.apache.org
Subject: Re: CollapseFilter with the latest Solr in trunk


Once somebody really makes it work, I'm sure it will be released!


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Antonio Eggberg antonio_eggb...@yahoo.se
To: solr-user@lucene.apache.org
Sent: Sunday, April 19, 2009 9:21:20 PM
Subject: Re: CollapseFilter with the latest Solr in trunk


I wish it would be planned for 1.4 :))


--- Den sön 2009-04-19 skrev Otis Gospodnetic :


Från: Otis Gospodnetic
Ämne: Re: CollapseFilter with the latest Solr in trunk
Till: solr-user@lucene.apache.org
Datum: söndag 19 april 2009 15.06

Thanks for sharing!
It would be good if you (of Jeff from Zappos or anyone
making changes to this) could put up a new patch for this
most-voted-JIRA-issue.


Thanks,
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: climbingrose
To: solr-user@lucene.apache.org
Sent: Sunday, April 19, 2009 8:12:11 AM
Subject: Re: CollapseFilter with the latest Solr in

trunk


Ok, here is how I fixed this problem:

 public DocListAndSet

getDocListAndSet(Query query, ListfilterList,

DocSet docSet, Sort lsort, int offset, int len, int

flags) throwsIOException {


   //DocListAndSet ret = new

DocListAndSet();




  //getDocListC(ret,query,filterList,docSet,lsort,offset,len,
flags |=

GET_DOCSET);

DocSet theFilt = getDocSet(filterList);

if (docSet != null) theFilt = (theFilt != null) ?
theFilt.intersection(docSet) : docSet;

   QueryCommand qc = new

QueryCommand();




  qc.setQuery(query).setFilter(theFilt);




  qc.setSort(lsort).setOffset(offset).setLen(len).setFlags(flags
|=

GET_DOCSET);

   QueryResult result = new

QueryResult();


   getDocListC(result,qc);



   return

result.getDocListAndSet();


 }


There is also one-off error in CollapseFilter which

you can find solution on

Jira.

Cheers,
Cuong

On Sat, Apr 18, 2009 at 4:41 AM, Jeff Newburn wrote:


We are currently trying to do the same

thing.  With the patch unaltered we

can use fq as long as collapsing is turned

on.  If we just send a normal

document level query with an fq parameter it

blows up.


Additionally, it does not appear that the

collapse.facet option works at

all.

--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com

- 702-943-7562




From: climbingrose
Reply-To:
Date: Fri, 17 Apr 2009 16:53:00 +1000
To: solr-user
Subject: CollapseFilter with the latest Solr

in trunk


Hi all,

Have any one try to use CollapseFilter with

the latest version of Solr in

trunk? However, it looks like Solr 1.4

doesn't allow calling

setFilterList()

and setFilter() on one instance of the

QueryCommand. I modified the code

in

QueryCommand to allow this:

   public QueryCommand

setFilterList(Query f) {

//  if( filter != null )

{

//throw new

IllegalArgumentException( Either filter or

filterList

may be set in the QueryCommand, but not

both. );

//  }
 filterList =

null;

 if (f !=

null) {



  filterList = new ArrayList(2);



  filterList.add(f);

 }
 return this;
   }

However, I still have a problem which

prevent query filters from working

when used in conjunction with

CollapseFilter. In other words, query

filters

doesn't seem to have any effects on the

result set when CollapseFilter is

used.

The other problem is related to OpenBitSet:

java.lang.ArrayIndexOutOfBoundsException:

2183

at

org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java:242)

at
org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java: 
202)


at








org 
.apache 
.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:16

1

)

at



org.apache.solr.search.CollapseFilter.(CollapseFilter.java:141)


at



org 
.apache 
.solr.handler.component.QueryComponent.process(QueryComponent.java

:2

17)
at



org 
.apache 
.solr.handler.component.SearchHandler.handleRequestBody(SearchHand

le

r.java:195)
at



org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.

ja

va:131)

at

org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)

at








org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:30

3

)

at



org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:

23

2)


Re: Multiple Solr-instance share same solr.home

2009-04-19 Thread Ryan McKinley
as long as you make sure there are never two applications writing to  
the same index, you *should* be ok.


But tread carefully...




On Apr 19, 2009, at 3:28 PM, vivek sar wrote:


Both Solr instances will be writing to separate indexes, but can they
share the same solr.home? So, here is what I want,

1) solr.home = solr/multicore
2) There is a single solr.xml under multicore directory
3) Each instance would use the same solr.xml, which will have entries
for multiple cores
4) Each instance will write to different core at a time - so one index
will be written by only one writer at a time.

not sure if this is a supported configuration.

Thanks.
-vivek




On Sun, Apr 19, 2009 at 5:55 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:


Vivek - no, unless you want trouble - only 1 writer can write to a  
specific index at a time.



Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar vivex...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sunday, April 19, 2009 4:33:00 AM
Subject: Multiple Solr-instance share same solr.home

Hi,

Is it possible to have two solr instances share the same solr.home?
I've two Solr instances running on the same box and I was  
wondering if
I can configure them to have the same solr.home. I tried it, but  
looks

like the second instance overwrites the first one's value in the
solr.xml (I'm using multicore for both instances). This is just for
convenience so I don't have to manage multiple solr index directory
locations - I can have all the indexes written into the same  
location

and do the clean up from one place itself. If this is not supported
then it's not a big deal.

Thanks,
-vivek







Re: Advice on moving from 1.3 to 1.4-dev or trunk?

2009-04-19 Thread Ryan McKinley


When you say Test ... Are you suggesting there is a test suite I
should run, or do just do my own testing?



your own testing...

If you use a 'nightly' the unit tests all pass.

BUT if you are not running from a standard release, there is may be  
things that are not totally flushed out, or configurations that have  
not been tried yet.  For a release build lots of effort is made to  
make sure all lose ends are tied up.


ryan


Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown

2009-04-15 Thread Ryan McKinley


The work being done is addressing the deletes, AIUI, but of course
there are other things happening during shutdown, too.

There are no deletes to do. It was a clean index to begin with
and there were no duplicates.



I have not followed this thread, so forgive me if this has already  
been suggested


If you know that there are not any duplicates, have you tried indexing  
with allowDups=true?


It will not change the fsync cost, but it may reduce some other  
checking times.


ryan


Re: Search included in *all* fields

2009-04-13 Thread Ryan McKinley

what about:
 fieldA:value1 AND fieldB:value2

this can also be written as:
 +fieldA:value1 +fieldB:value2


On Apr 13, 2009, at 9:53 PM, Johnny X wrote:



I'll start a new thread to make things easier, because I've only  
really got

one problem now.

I've configured my Solr to search on all fields, so it will only  
search for

a specific query in a specific field (e.g. q=Date:October) will only
search the 'Date' field, rather than all the others.

The issue is when you build up multiple fields to search on. Only  
one of
those has to match for a result to be returned, rather than all of  
them. Is

there a way to change this?


Cheers!
--
View this message in context: 
http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: QueryElevationComponent : hot update of elevate.xml

2009-04-10 Thread Ryan McKinley


On Apr 10, 2009, at 7:48 AM, Nicolas Pastorino wrote:


Hello !


Browsing the mailing-list's archives did not help me find the  
answer, hence the question asked directly here.


Some context first :
Integrating Solr with a CMS ( eZ Publish ), we chose to support  
Elevation. The idea is to be able to 'elevate' any object from the  
CMS. This can be achieved through eZ Publish's back office, with a  
dedicated Elevate administration GUI, the configuration is stored in  
the CMS temporarily, and then synchronized frequently and/or on  
demand onto Solr. This synchronisation is currently done as follows :

1. Generate the elevate.xml based on the stored configuration
2. Replace elevate.xml in Solr's dataDir
3. Commit. It appears that when having elevate.xml in Solr's  
dataDir, and solely in this case, commiting triggers a reload of  
elevate.xml. This does not happen when elevate.xml is stored in  
Solr's conf dir.



This method has one main issue though : eZ Publish needs to have  
access to the same filesystem as the one on which Solr's dataDir is  
stored. This is not always the case when the CMS is clustered for  
instance -- show stopper :(


Hence the following idea / RFC :
How about extending the Query Elevation system with the possibility  
to push an updated elevate.xml file/XML through HTTP ?
This would update the file where it is actually located, and trigger  
a reload of the configuration.
Not being very knowledgeable about Solr's API ( yet ! ), i cannot  
figure out whether this would be possible, how this would be  
achievable ( which type of plugin for instance ) or even be valid ?



Perhaps look at implementing custom RequestHandler:
http://wiki.apache.org/solr/SolrRequestHandler

maybe it could POST the new elevate.xm and then save it to the right  
place and call commit...


ryan






Re: logging

2009-04-10 Thread Ryan McKinley
If you use the off the shelf .war, it *should* be the same.  (if not,  
we need to fix it)


If you are building your own .war, how SLF4 behaves depends on what  
implementation is in the runtime path.  If you want to use log4j  
logging, put in the slf4j-log4j.jar in your classpath and you should  
be all set.



On Apr 9, 2009, at 4:56 PM, Kevin Osborn wrote:

We built our own webapp that used the Solr JARs. We used Apache  
Commons/log4j logging and just put log4j.properties in the Resin  
conf directory. The commons-logging and log4j jars were put in the  
Resin lib driectory. Everything worked great and we got log files  
for our code only.


So, I upgraded to Solr 1.4 and I no longer get my log file. I assume  
it has something to do with Solr 1.4 using SL4J instead of JDK  
logging, but it seems like my code would be independent of that. Any  
ideas?








Re: [Newbie]How to influante Revelance in Solr ?

2009-03-29 Thread Ryan McKinley


On Mar 29, 2009, at 8:42 AM, Shalin Shekhar Mangar wrote:


On Sun, Mar 29, 2009 at 4:57 PM, aerox7 amyne.berr...@me.com wrote:



I want to get results orderd by keyword matching (score) and  
popularity.


When i tryed somthing like this : q=hpsort=popularity desc, score  
desc
I get Hp printer, hp laptop and hp jet, so it works ! But when i  
try to
search hp jet (q=hp jetsort=popularity desc, score desc) i get the  
same

result like the first query whos totaly wrong !

How to influante the score in my case ? for exemple give to the  
matching 1

factor and popularity 1.5 (or 2)



Do not sort by popularity first as it will dominate the score. Look at
function queries for influencing the score based on the popularity.

http://wiki.apache.org/solr/FunctionQuery



also consider using the dismax parser with the 'bf' parameter.  I  
think the example has that configured, also check:


http://wiki.apache.org/solr/DisMaxRequestHandler#head-14b9ca618089829d139e6f3d6f52ff63e22a80d1

ryan


  1   2   3   4   5   6   7   >