date:20110831

Re: How to list all dynamic fields of a document using solrj?

2011-08-31 Thread Michael Szalay

You are right, they are not stored...
But I is possible to see them, as the schema browser in the admin application 
does?

Regards Michael

--
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business

- Ursprüngliche Mail -
Von: Erick Erickson erickerick...@gmail.com
An: solr-user@lucene.apache.org
Gesendet: Dienstag, 30. August 2011 16:31:05
Betreff: Re: How to list all dynamic fields of a document using solrj?

This works for me, admittedly with the 3.3 code base:

Hmmm, did you *store* the dynamic fields? only stored fields are returned

CommonsHttpSolrServer server = new
CommonsHttpSolrServer(http://localhost:8983/solr;);
SolrQuery query = new SolrQuery();
query.setQuery(*);
query.setRows(1000);
QueryResponse qr = server.query(query, SolrRequest.METHOD.POST);

for (Object obj : qr.getHeader()) log(obj.toString());


SolrDocumentList sdl = qr.getResults();
for (SolrDocument d : sdl) {
  // Print out all the fields in the record.
  for (String key : d.getFieldNames()) {
  log(key +  :  + d.get(key).toString());
  }
  // try a specific dynamic field
  Object val = d.getFieldValue(e1_t);
  if (val != null)
  log(getting specific value:  + val.toString());
}


On Tue, Aug 30, 2011 at 2:11 AM, Michael Szalay
michael.sza...@basis06.ch wrote:
 Hi Juan

 I tried with the following code first:

 final SolrQuery allDocumentsQuery = new  SolrQuery();
 allDocumentsQuery.setQuery(id: + myId);
 allDocumentsQuery.setFields(*);
 allDocumentsQuery.setRows(1);
 QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST);


 With this, only non-dynamic fields are returned.
 Then I wrote the following helper method:

  private SetString getDynamicFields() throws SolrServerException, 
 IOException {
        final LukeRequest luke = new LukeRequest();
        luke.setShowSchema(false);
        final LukeResponse process = luke.process(solr);
        final MapString, FieldInfo fieldInfo = process.getFieldInfo();
        final SetString dynamicFields = new HashSetString();
        for (final String key : fieldInfo.keySet()) {
            if (key.endsWith(_string) || (key.endsWith(_dateTime))) {
                dynamicFields.add(key);
            }
        }
        return dynamicFields;
    }

 where as _string and _dateTime are the suffixes of my dynamic fields.
 This one returns really all stored fields of the document:

 final SetString dynamicFields = getDynamicFields();
 final SolrQuery allDocumentsQuery = new  SolrQuery();
 allDocumentsQuery.setQuery(uri: + myId);
 allDocumentsQuery.setFields(*);
 for (final String df : dynamicFields) {
    allDocumentsQuery.addField(df);
 }

 allDocumentsQuery.setRows(1);
 QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST);

 Is there a more elegant way to do this? We are using solrj 3.1.0 and solr 
 3.1.0.

 Regards
 Michael
 --
 Michael Szalay
 Senior Software Engineer

 basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
 http://www.basis06.ch - source of smart business

 - Ursprüngliche Mail -
 Von: Juan Grande juan.gra...@gmail.com
 An: solr-user@lucene.apache.org
 Gesendet: Montag, 29. August 2011 18:19:05
 Betreff: Re: How to list all dynamic fields of a document using solrj?

 Hi Michael,

 It's supposed to work. Can we see a snippet of the code you're using to
 retrieve the fields?

 *Juan*



 On Mon, Aug 29, 2011 at 8:33 AM, Michael Szalay
 michael.sza...@basis06.chwrote:

 Hi all

 how can I list all dynamic fields and their values of a document using
 solrj?
 The dynamic fields are never returned when I use setFields(*).

 Thanks

 Michael

 --
 Michael Szalay
 Senior Software Engineer

 basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
 http://www.basis06.ch - source of smart business

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-31 Thread Marc Jacobs

Hi Erick,

This is one of the errors I get (at the 4GB memory machine) and after
a while Tomcat crashes:

   SEVERE: SolrIndexWriter was not closed prior to finalize(),
indicates a bug -- POSSIBLE RESOURCE LEAK!!!

And this is part of my solrconfig.xml (I'm indexing 200k documents per run):

  indexDefaults
useCompoundFilefalse/useCompoundFile
mergeFactor10/mergeFactor
ramBufferSizeMB128/ramBufferSizeMB
maxFieldLength1024000/maxFieldLength
writeLockTimeout6/writeLockTimeout
commitLockTimeout6/commitLockTimeout
lockTypenative/lockType
  /indexDefaults

  mainIndex
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor10/mergeFactor
unlockOnStartupfalse/unlockOnStartup
reopenReaderstrue/reopenReaders
deletionPolicy class=solr.SolrDeletionPolicy
  str name=maxCommitsToKeep1/str
  str name=maxOptimizedCommitsToKeep0/str
/deletionPolicy
infoStream file=INFOSTREAM.txtfalse/infoStream
  /mainIndex

  updateHandler class=solr.DirectUpdateHandler2
autoCommit
  maxTime6/maxTime
/autoCommit
  /updateHandler

  query
maxBooleanClauses1024/maxBooleanClauses

filterCache class=solr.FastLRUCache
 size=0
 initialSize=0
 autowarmCount=0/

queryResultCache class=solr.LRUCache
 size=0
 initialSize=0
 autowarmCount=0/

documentCache class=solr.LRUCache
   size=0
   initialSize=0
   autowarmCount=0/

enableLazyFieldLoadingtrue/enableLazyFieldLoading
queryResultWindowSize20/queryResultWindowSize
queryResultMaxDocsCached200/queryResultMaxDocsCached

listener event=newSearcher class=solr.QuerySenderListener
  arr name=queries
  /arr
/listener

listener event=firstSearcher class=solr.QuerySenderListener
  arr name=queries
lst
  str name=qstatic firstSearcher warming in solrconfig.xml/str
/lst
  /arr
/listener

useColdSearcherfalse/useColdSearcher
maxWarmingSearchers2/maxWarmingSearchers
  /query

In my opinion Solr has to work fine with this settings. Any comments?

Thanks again.

Greetings,

Marc

2011/8/31 Erick Erickson erickerick...@gmail.com

 See solrconfig.xml, particularly ramBufferSizeMB,
 also maxBufferedDocs.

 There's no reason you can't index as many documents
 as you want, unless your documents are absolutely
 huge (as in 100s of M, possibly G size).

 Are you actually getting out of memory problems?

 Erick

 On Tue, Aug 30, 2011 at 4:24 PM, Marc Jacobs jacob...@gmail.com wrote:
  Hi Chris,
 
  Thanks for the response.
  Eventualy I want to install Solr on a machine with a maximum memory of 4GB.
  I tried to index the data on that machine before, but it resulted in index
  locks and memory errors.
  Is 4GB not enough to index 100,000 documents in a row? How much should it
  be? Is there a way to tune this?
 
  Regards,
 
  Marc
 
  2011/8/30 Chris Hostetter hossman_luc...@fucit.org
 
 
  : The current system I'm using has 150GB of memory and while I'm indexing
  the
  : memoryconsumption is growing and growing (eventually more then 50GB).
  : In the attached graph (http://postimage.org/image/acyv7kec/) I indexed
  about
  : 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent
  throws
 
  Unless i'm missunderstanding sometihng about your graph, only ~12GB of
  memory is used by applications on that machine.  About 60GB is in use by
  the filesystem cache.
 
  The Filesystem cache is not memory being used by Solr, it's memory that is
  free and not in use by an application, so your OS is (wisely) using it to
  cache files from disk that you've recently accessed in case you need them
  again.  This is handy, and for max efficients (when keeping your index on
  disk) it's useful to make sure you allocate resources so that you have
  enough extra memory on your server that the entire index can be kept in
  the filesystem cache -- but the OS will happily free up that space for
  other apps that need it if they ask for more memory.
 
  : After indexing the memoryconsumption isn't dropping. Even after an
  optimize
  : command it's still there.
 
  as for why your Used memory grows to ~12GB and doesn't decrease even
  after an optimize: that's the way the Java memory model works.  whe nyou
  run the JVM you specificy (either explicitly or implicitly via defaults) a
  min  max heap size for hte JVM to allocate for itself.  it starts out
  asking the OS for the min, and as it needs more it asks for more up to the
  max.  but (most JVM implementations i know of) don't give back ram to
  the OS if they don't need it anymore -- they keep it as free space in the
  heap for future object allocation.
 
 
 
  -Hoss

Re: Reading results from FieldCollapsing

2011-08-31 Thread Sowmya V.B.

Hi Erick

I downloaded the latest build from (
https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/
)
But, I don't find the required class CollapseComponent in the src.
(org.apache.solr.handler.component.CollapseComponent).

The SolrJ in 3.4 does seem to have something like GroupResponse,
GroupCommand classes, which might be the ones I am looking for (though I am
not very sure).


Regards
Sowmya.

On Tue, Aug 30, 2011 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637

 Short form: It's in 3.4, not 3.3.

 So, your choices are:
 1 parse the XML yourself
 2 get a current 3x build (as in one of the nightlys) and use SolrJ there.

 Best
 Erick

 On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com wrote:
  Hi Erick
 
  Yes, I did see the XML format. But, I did not understand how to read the
  response using SolrJ.
 
  I found some information about Collapse Component on googling, which
 looks
  like a normal Solr XML results format.
 
 http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
 
  However, this class CollapseComponent does not seem to exist in Solr
  3.3. (org.apache.solr.handler.component.CollapseComponent)
  was the component mentioned in that link, which is not there in Solr3.3
  class files.
 
  Sowmya.
 
  On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Have you looked at the XML (or JSON) response format?
  You're right, it is different and you have to parse it
  differently, there are move levels. Try this query
  and you'll see the format (default data set).
 
  http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact
 
 
  Best
  Erick
 
  On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com
 wrote:
   Hi All
  
   I am trying to use FieldCollapsing feature in Solr. On the Solr admin
   interface, I give ...group=truegroup.field=fieldA and I can see
  grouped
   results.
   But, I am not able to figure out how to read those results in that
 order
  on
   java.
  
   Something like: SolrDocumentList doclist = response.getResults();
   gives me a set of results, on which I iterate, and get something like
   doclist.get(1).getFieldValue(title) etc.
  
   After grouping, doing the same step throws me error (apparently,
 because
  the
   returned xml formats are different too).
  
   How can I read groupValues and thereby other fieldvalues of the
 documents
   inside that group?
  
   S.
   --
   Sowmya V.B.
   
   Losing optimism is blasphemy!
   http://vbsowmya.wordpress.com
   
  
 
 
 
 
  --
  Sowmya V.B.
  
  Losing optimism is blasphemy!
  http://vbsowmya.wordpress.com
  
 




-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com

Re: Solr custom plugins: is it possible to have them persistent?

2011-08-31 Thread samuele.mattiuzzo

Eh eh you're right! This is what happens when you try to learn too much
things in the same moment!

Btw, i found this
http://webdevelopersjournal.com/columns/connection_pool.html which is
perfect, i can use the provided code as my singleton instance, now i just
have to figure out how i can detect the end of the indexing operation and so
close all the connections to the pool (the example shows a servlet using
that singleton, and they define a destroy method, but they never use it
inside the servlet class)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3297716.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom FilterFactory is when called

2011-08-31 Thread occurred

Sorry, it's been a long time since my last post...

Now I found out that the only goog solution is too do a core reload:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

It's been working very good for our needs.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-FilterFactory-is-when-called-tp3274503p3297814.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find out why a document wasn't found

2011-08-31 Thread Oleg Tikhonov

Hi,
why don't you index the file metadata, i.e. file name for instance. So when
file's metadata is indexed you could start querying by file name.

BR,
Oleg

On Wed, Aug 31, 2011 at 12:02 PM, occurred 
schaubm...@infodienst-ausschreibungen.de wrote:

 Hi,

 I'm looking for a solution to find out why a specific document wasn't found
 with my search query.

 This should be a solution which can be included in our application to show
 the customers how to rework their search queries.

 My first idea was:
 Search with the query and add to it the document id, then delete the search
 terms till the document was found...

 It's just a starting idea, but maybe one has a better idea or there is
 allready a plugin existing? ;-)

 cheers
 Charlie

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Find-out-why-a-document-wasn-t-found-tp3297821p3297821.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find out why a document wasn't found

2011-08-31 Thread occurred

Hi Oleg,

ah, maybe there is a misunderstanding.
With document I meant a record in the index not a file.
The records are indexed via a DB.

cheers
Charlie

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-out-why-a-document-wasn-t-found-tp3297821p3297875.html
Sent from the Solr - User mailing list archive at Nabble.com.

Field grouping?

2011-08-31 Thread Denis Kuzmenok

Hi.

Suppose  i  have  a field price with different values, and i want to
get  ranges for this field depending on docs count, for example i want
to  get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for
200 docs = 34 docs in each field, etc.

Is it possible with solr?

Re: Shingle and Query Performance

2011-08-31 Thread Lord Khan Han

Thanks Erick.. If I figure out something I will let you know also..  No body
replied except you I thought there might be more people involve here..

Thanks


On Wed, Aug 31, 2011 at 3:47 AM, Erick Erickson erickerick...@gmail.comwrote:

 OK, I'll have to defer because this makes no sense.
 4+ seconds in the debug component?

 Sorry I can't be more help here, but nothing really
 jumps out.
 Erick

 On Tue, Aug 30, 2011 at 12:45 PM, Lord Khan Han khanuniver...@gmail.com
 wrote:
  Below the output of the debug. I am measuring pure solr qtime which show
 in
  the Qtime field in solr xml.
 
  arr name=parsed_filter_queries
  strmrank:[0 TO 100]/str
  /arr
  lst name=timing
  double name=time8584.0/double
  lst name=prepare
  double name=time12.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time12.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.SpellCheckComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double
  /lst
  /lst
  lst name=process
  double name=time8572.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time4480.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time41.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.SpellCheckComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time4051.0/double
  /lst
 
  On Tue, Aug 30, 2011 at 5:38 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Can we see the output if you specify both
  debugQuery=ondebug=true
 
  the debug=true will show the time taken up with various
  components, which is sometimes surprising...
 
  Second, we never asked the most basic question, what are
  you measuring? Is this the QTime of the returned response?
  (which is the time actually spent searching) or the time until
  the response gets back to the client, which may involve lots besides
  searching...
 
  Best
  Erick
 
  On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han khanuniver...@gmail.com
 
  wrote:
   Hi Eric,
  
   Fields are lazy loading, content stored in solr and machine 32 gig..
 solr
   has 20 gig heap. There is no swapping.
  
   As you see we have many phrases in the same query . I couldnt find a
 way
  to
   drop qtime to subsecends. Suprisingly non shingled test better qtime !
  
  
   On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Oh, one other thing: have you profiled your machine
   to see if you're swapping? How much memory are
   you giving your JVM? What is the underlying
   hardware setup?
  
   Best
   Erick
  
   On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
200K docs and 36G index? It sounds like you're storing
your documents in the Solr index. In and of itself, that
shouldn't hurt your query times, *unless* you have
lazy field loading turned off, have you checked that
lazy field loading is enabled?
   
   
   
Best
Erick
   
On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han 
  khanuniver...@gmail.com
   wrote:
Another insteresting thing is : all one word or more word queries
   including
phrase queries such as barack obama  slower in shingle
  configuration.
   What
i am doing wrong ? without shingle barack obama Querytime 300ms
   with
shingle  780 ms..
   
   
On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han 
  khanuniver...@gmail.com
   wrote:
   
Hi,
   
What is the difference between solr 3.3  and the trunk ?
I will try 3.3  and let you know the results.
   
   
Here the search handler:
   
requestHandler name=search class=solr.SearchHandler
   default=true
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   !--str name=fqcategory:vv/str--
 str name=fqmrank:[0 TO 100]/str
   str name=echoParamsexplicit/str
   int name=rows10/int
 str name=defTypeedismax/str
   !--str name=qftitle^0.05 url^1.2 content^1.7
m_title^10.0/str--
str name=qftitle^1.05 url^1.2 content^1.7 m_title^10.0/str
 !-- str name=bfrecip(ee_score,-0.85,1,0.2)/str --
 str

Re: How to list all dynamic fields of a document using solrj?

2011-08-31 Thread Erick Erickson

Sure, just use the Luke handler. See LukeRequest and LukeResponse
in the API documentation.

Best
Erick

On Wed, Aug 31, 2011 at 2:23 AM, Michael Szalay
michael.sza...@basis06.ch wrote:
 You are right, they are not stored...
 But I is possible to see them, as the schema browser in the admin application 
 does?

 Regards Michael

 --
 Michael Szalay
 Senior Software Engineer

 basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
 http://www.basis06.ch - source of smart business

 - Ursprüngliche Mail -
 Von: Erick Erickson erickerick...@gmail.com
 An: solr-user@lucene.apache.org
 Gesendet: Dienstag, 30. August 2011 16:31:05
 Betreff: Re: How to list all dynamic fields of a document using solrj?

 This works for me, admittedly with the 3.3 code base:

 Hmmm, did you *store* the dynamic fields? only stored fields are returned

    CommonsHttpSolrServer server = new
 CommonsHttpSolrServer(http://localhost:8983/solr;);
    SolrQuery query = new SolrQuery();
    query.setQuery(*);
    query.setRows(1000);
    QueryResponse qr = server.query(query, SolrRequest.METHOD.POST);

    for (Object obj : qr.getHeader()) log(obj.toString());


    SolrDocumentList sdl = qr.getResults();
    for (SolrDocument d : sdl) {
      // Print out all the fields in the record.
      for (String key : d.getFieldNames()) {
          log(key +  :  + d.get(key).toString());
      }
      // try a specific dynamic field
      Object val = d.getFieldValue(e1_t);
      if (val != null)
      log(getting specific value:  + val.toString());
    }


 On Tue, Aug 30, 2011 at 2:11 AM, Michael Szalay
 michael.sza...@basis06.ch wrote:
 Hi Juan

 I tried with the following code first:

 final SolrQuery allDocumentsQuery = new  SolrQuery();
 allDocumentsQuery.setQuery(id: + myId);
 allDocumentsQuery.setFields(*);
 allDocumentsQuery.setRows(1);
 QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST);


 With this, only non-dynamic fields are returned.
 Then I wrote the following helper method:

  private SetString getDynamicFields() throws SolrServerException, 
 IOException {
        final LukeRequest luke = new LukeRequest();
        luke.setShowSchema(false);
        final LukeResponse process = luke.process(solr);
        final MapString, FieldInfo fieldInfo = process.getFieldInfo();
        final SetString dynamicFields = new HashSetString();
        for (final String key : fieldInfo.keySet()) {
            if (key.endsWith(_string) || (key.endsWith(_dateTime))) {
                dynamicFields.add(key);
            }
        }
        return dynamicFields;
    }

 where as _string and _dateTime are the suffixes of my dynamic fields.
 This one returns really all stored fields of the document:

 final SetString dynamicFields = getDynamicFields();
 final SolrQuery allDocumentsQuery = new  SolrQuery();
 allDocumentsQuery.setQuery(uri: + myId);
 allDocumentsQuery.setFields(*);
 for (final String df : dynamicFields) {
    allDocumentsQuery.addField(df);
 }

 allDocumentsQuery.setRows(1);
 QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST);

 Is there a more elegant way to do this? We are using solrj 3.1.0 and solr 
 3.1.0.

 Regards
 Michael
 --
 Michael Szalay
 Senior Software Engineer

 basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
 http://www.basis06.ch - source of smart business

 - Ursprüngliche Mail -
 Von: Juan Grande juan.gra...@gmail.com
 An: solr-user@lucene.apache.org
 Gesendet: Montag, 29. August 2011 18:19:05
 Betreff: Re: How to list all dynamic fields of a document using solrj?

 Hi Michael,

 It's supposed to work. Can we see a snippet of the code you're using to
 retrieve the fields?

 *Juan*



 On Mon, Aug 29, 2011 at 8:33 AM, Michael Szalay
 michael.sza...@basis06.chwrote:

 Hi all

 how can I list all dynamic fields and their values of a document using
 solrj?
 The dynamic fields are never returned when I use setFields(*).

 Thanks

 Michael

 --
 Michael Szalay
 Senior Software Engineer

 basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
 http://www.basis06.ch - source of smart business

Re: Reading results from FieldCollapsing

2011-08-31 Thread Erick Erickson

Actually, I haven't used the new stuff yet, so I'm not entirely sure either,
but that sure would be the place to start. There's some historical
ambiguity, Grouping started out as Field Collapsing, and they are
used interchangeably.

If you go to the bug I linked to and open up the patch file, you'll
see the code that implements the grouping in SolrJ, that should
give you a good place to start.

Best
Erick

On Wed, Aug 31, 2011 at 3:28 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 Hi Erick

 I downloaded the latest build from (
 https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/
 )
 But, I don't find the required class CollapseComponent in the src.
 (org.apache.solr.handler.component.CollapseComponent).

 The SolrJ in 3.4 does seem to have something like GroupResponse,
 GroupCommand classes, which might be the ones I am looking for (though I am
 not very sure).


 Regards
 Sowmya.

 On Tue, Aug 30, 2011 at 5:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637

 Short form: It's in 3.4, not 3.3.

 So, your choices are:
 1 parse the XML yourself
 2 get a current 3x build (as in one of the nightlys) and use SolrJ there.

 Best
 Erick

 On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com wrote:
  Hi Erick
 
  Yes, I did see the XML format. But, I did not understand how to read the
  response using SolrJ.
 
  I found some information about Collapse Component on googling, which
 looks
  like a normal Solr XML results format.
 
 http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
 
  However, this class CollapseComponent does not seem to exist in Solr
  3.3. (org.apache.solr.handler.component.CollapseComponent)
  was the component mentioned in that link, which is not there in Solr3.3
  class files.
 
  Sowmya.
 
  On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Have you looked at the XML (or JSON) response format?
  You're right, it is different and you have to parse it
  differently, there are move levels. Try this query
  and you'll see the format (default data set).
 
  http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact
 
 
  Best
  Erick
 
  On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com
 wrote:
   Hi All
  
   I am trying to use FieldCollapsing feature in Solr. On the Solr admin
   interface, I give ...group=truegroup.field=fieldA and I can see
  grouped
   results.
   But, I am not able to figure out how to read those results in that
 order
  on
   java.
  
   Something like: SolrDocumentList doclist = response.getResults();
   gives me a set of results, on which I iterate, and get something like
   doclist.get(1).getFieldValue(title) etc.
  
   After grouping, doing the same step throws me error (apparently,
 because
  the
   returned xml formats are different too).
  
   How can I read groupValues and thereby other fieldvalues of the
 documents
   inside that group?
  
   S.
   --
   Sowmya V.B.
   
   Losing optimism is blasphemy!
   http://vbsowmya.wordpress.com
   
  
 
 
 
 
  --
  Sowmya V.B.
  
  Losing optimism is blasphemy!
  http://vbsowmya.wordpress.com
  
 




 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com

Re: shareSchema=true - location of schema.xml?

2011-08-31 Thread François Schiettecatte

Satish

You don't say which platform you are on but have you tried links (with ln on 
linux/unix) ?

François

On Aug 31, 2011, at 12:25 AM, Satish Talim wrote:

 I have 1000's of cores and to reduce the cost of loading unloading
 schema.xml, I have my solr.xml as mentioned here -
 http://wiki.apache.org/solr/CoreAdmin
 namely:
 
 solr
  cores adminPath=/admin/cores shareSchema=true
...
  /cores
 /solr
 
 However, I am not sure where to keep the common schema.xml file? In which
 case, do I need the schema.xml in the conf folder of each and every core?
 
 My folder structure is:
 
 multicore (contains solr.xml)
|_ core0
 |_ conf
 ||_ schema.xml
 ||_ solrconfig.xml
 ||_ other files
   core1
 |_ conf
 ||_ schema.xml
 ||_ solrconfig.xml
 ||_ other files
 |
   exampledocs (contains 1000's of .csv files and post.jar)
 
 Satish

Re: Reading results from FieldCollapsing

2011-08-31 Thread Martijn v Groningen

The CollapseComponent was never comitted. This class exists in the
SOLR-236 patches. You don't need to change the configuration in order
to use grouping.
The blog you mentioned is based on the SOLR-236 patches. The current
grouping in Solr 3.3 has superseded these patches.

From Solr 3.4 (not yet released) the QueryResponse class in solrj has
a method getGroupResponse. Use this method to get the grouped
response.

On 31 August 2011 14:10, Erick Erickson erickerick...@gmail.com wrote:
 Actually, I haven't used the new stuff yet, so I'm not entirely sure either,
 but that sure would be the place to start. There's some historical
 ambiguity, Grouping started out as Field Collapsing, and they are
 used interchangeably.

 If you go to the bug I linked to and open up the patch file, you'll
 see the code that implements the grouping in SolrJ, that should
 give you a good place to start.

 Best
 Erick

 On Wed, Aug 31, 2011 at 3:28 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 Hi Erick

 I downloaded the latest build from (
 https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/
 )
 But, I don't find the required class CollapseComponent in the src.
 (org.apache.solr.handler.component.CollapseComponent).

 The SolrJ in 3.4 does seem to have something like GroupResponse,
 GroupCommand classes, which might be the ones I am looking for (though I am
 not very sure).


 Regards
 Sowmya.

 On Tue, Aug 30, 2011 at 5:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637

 Short form: It's in 3.4, not 3.3.

 So, your choices are:
 1 parse the XML yourself
 2 get a current 3x build (as in one of the nightlys) and use SolrJ there.

 Best
 Erick

 On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com wrote:
  Hi Erick
 
  Yes, I did see the XML format. But, I did not understand how to read the
  response using SolrJ.
 
  I found some information about Collapse Component on googling, which
 looks
  like a normal Solr XML results format.
 
 http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/
 
  However, this class CollapseComponent does not seem to exist in Solr
  3.3. (org.apache.solr.handler.component.CollapseComponent)
  was the component mentioned in that link, which is not there in Solr3.3
  class files.
 
  Sowmya.
 
  On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Have you looked at the XML (or JSON) response format?
  You're right, it is different and you have to parse it
  differently, there are move levels. Try this query
  and you'll see the format (default data set).
 
  http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact
 
 
  Best
  Erick
 
  On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com
 wrote:
   Hi All
  
   I am trying to use FieldCollapsing feature in Solr. On the Solr admin
   interface, I give ...group=truegroup.field=fieldA and I can see
  grouped
   results.
   But, I am not able to figure out how to read those results in that
 order
  on
   java.
  
   Something like: SolrDocumentList doclist = response.getResults();
   gives me a set of results, on which I iterate, and get something like
   doclist.get(1).getFieldValue(title) etc.
  
   After grouping, doing the same step throws me error (apparently,
 because
  the
   returned xml formats are different too).
  
   How can I read groupValues and thereby other fieldvalues of the
 documents
   inside that group?
  
   S.
   --
   Sowmya V.B.
   
   Losing optimism is blasphemy!
   http://vbsowmya.wordpress.com
   
  
 
 
 
 
  --
  Sowmya V.B.
  
  Losing optimism is blasphemy!
  http://vbsowmya.wordpress.com
  
 




 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com
 





-- 
Met vriendelijke groet,

Martijn van Groningen

Re: highlight on prefix query

2011-08-31 Thread Ahmed Boubaker

Well, that's one use case, there're others where you need to highlight only
what is matching.

For now, I solved the problem by writing an additional procedure to correct
the highlighting.  Not nice, but it works!

On Sat, Aug 6, 2011 at 11:10 AM, Kissue Kissue kissue...@gmail.com wrote:

 I think this is correct behaviour. If you go to google and search for
 Tel,
 you will see that telephone is highlighted.

 On Fri, Aug 5, 2011 at 5:42 PM, Ahmed Boubaker
 abdeka.boubake...@gmail.comwrote:

  Hi,
 
  I am using solr 3 and highlighting is working fine.  However when using
  prefix query like tel*, the highlighter highlights the whole matching
 words
  (i.e. television, telephone, ...).  I am highlighting a very short field
  (3~5 words length).
 
  How can I prevent the highlighter from doing so?  I want to get only the
  prefix of these words highlighted (i.e. emtel/emevision,
  emtel/emephone, ...), any solution or idea ?
 
  Many thanks for your help,
 
  Boubaker

Re: Find results with or without whitespace

2011-08-31 Thread roySolr

Frankie, Have you fixes this issue? I'm interested in your solution,,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-results-with-or-without-whitespace-tp3117144p3298298.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: core creation and instanceDir parameter

2011-08-31 Thread Gérard Dupont

up !

No-one have any clue about this question ? Is it more a dev-related question
?

2011/8/26 Gérard Dupont ger.dup...@gmail.com

 Hi all,

 Playing with multicore and dynamic creation of new core, I found out that
 there is one mandatory parameter instanceDir which is mandaotry to find
 out the location of solrconfig.xml and schema.xml. Since all my cores share
 the same configuration (found realtively to the $SOLR_HOME defined on server
 side) and that all data is saved in the same folder (one sub-folder per
 core), I was wandering why do we still need to send this parameter? In my
 configuration, I would like to avoid that the client, which ask for core
 creation, need to be aware of instance location on the server.

 BTW I'm on solr 3.3.0

 Thanks for any advice.

 --
 Gérard Dupont
 Information Processing Control and Cognition (IPCC)
 CASSIDIAN - an EADS company

 Document  Learning team - LITIS Laboratory




-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document  Learning team - LITIS Laboratory

Re: I can't pass the unit test when compile from apache-solr-3.3.0-src

2011-08-31 Thread lee carroll

Not sure if this has progressed further but I'm getting test failure
for 3.3 also.

Trunk builds and tests fine but 3.3 fails the test below

(Note i've a new box so could be a silly set up issue i've missed but
i think everything is in place (latest version of java 1.6, latest
version of ant)  main difference is the number of cpu's went from
1 to 4)

failed test output is:
Testsuite: org.apache.solr.common.util.ContentStreamTest
Tests run: 3, Failures: 0, Errors: 1, Time elapsed: 21.172 sec
- Standard Error -
NOTE: reproduce with: ant test -Dtestcase=ContentStreamTest
-Dtestmethod=testURLStream
-Dtests.seed=743785413891938113:-7792321629547565878
NOTE: test params are: locale=ar_QA, timezone=Europe/Vilnius
NOTE: all tests run in this JVM:
[CommonGramsQueryFilterFactoryTest, TestBrazilianStemFilterFactory,
TestCzechStemFilterFactory, TestFrenchMinimalStemFilterFactory,
TestHindiFilters, TestKeywordMarkerFilterFactory,
TestPatternReplaceFilter, TestRemoveDuplicatesTokenFilter,
TestStemmerOverrideFilterFactory, TestUAX29URLEmailTokenizerFactory,
SolrExceptionTest, LargeVolumeJettyTest, TestUpdateRequestCodec,
ContentStreamTest]
NOTE: Windows XP 5.1 x86/Sun Microsystems Inc. 1.6.0_27
(32-bit)/cpus=4,threads=2,free=6342464,total=16252928
-  ---

Testcase: testStringStream took 0 sec
Testcase: testFileStream took 0 sec
Testcase: testURLStream took 21.157 sec
Caused an ERROR
Connection timed out: connect
java.net.ConnectException: Connection timed out: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.init(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
at java.net.URL.openStream(URL.java:1010)
at 
org.apache.solr.common.util.ContentStreamTest.testURLStream(ContentStreamTest.java:70)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)



On 3 August 2011 01:51, Shawn Heisey s...@elyograg.org wrote:
 On 7/29/2011 5:26 PM, Chris Hostetter wrote:

 Can you please be specific...
  * which test(s) fail for you?
  * what are the failures?

 Any time a test fails, that info appears in the ant test output, and the
 full details or all tests are written to build/test-results

 you can run ant test-reports from the solr directory to generate an HTML
 report of all the success/failure info.

 I am also having a consistent build failure with the 3.3 source.  Some info
 from junit about the failure is below.  If you want something different I
 still have it in my session, let me know.

    [junit] NOTE: reproduce with: ant test
 -Dtestcase=TestSqlEntityProcessorDelta
 -Dtestmethod=testNonWritablePersistFile
 -Dtests.seed=4609081405510352067:771607526385155597
    [junit] NOTE: test params are: locale=ko_KR, timezone=Asia/Saigon
    [junit] NOTE: all tests run in this JVM:
    [junit] [TestCachedSqlEntityProcessor, TestClobTransformer,
 TestContentStreamDataSource, TestDataConfig, TestDateFormatTransformer,
 TestDocBuilder, TestDocBuilder2, TestEntityProcessorBase, TestErrorHandling,
 TestEvaluatorBag, TestF                                 eldReader,
 TestFileListEntityProcessor, TestJdbcDataSource, TestLineEntityProcessor,
 TestNumberFormatTransformer, TestPlainTextEntityProcessor,
 TestRegexTransformer, TestScriptTransformer, TestSqlEntityProcessor,
 TestSqlEntityProcessor2
  TestSqlEntityProcessorDelta]
    [junit] NOTE: Linux 2.6.18-238.12.1.el5.centos.plusxen amd64/Sun
 Microsystems Inc. 1.6.0_26
 (64-bit)/cpus=3,threads=4,free=100917744,total=254148608


 Here's what I did on the last run:

 rm -rf lucene_solr_3_3
 svn co https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3
 lucene_solr_3_3
 cd lucene_solr_3_3/solr

DataImportHandler Transformer and environment property

2011-08-31 Thread Ahmed Boubaker

Hello,

Anyone knows how can you access property environmen from a custom
Transformer I defined?
Also, I am wondering where solrcore.properties should be located in a
multicore setup and how can I access the properties defined inside from
various solr plugins?

Many thanks for your help,

Boubaker

Re: is it possible to do automatic indexing in solr ?

2011-08-31 Thread Erik Hatcher

There is no scheduling built into Solr.  But many, including the search system 
deployed on our (Lucid's) website, is powered by cron jobs kicking off indexers 
of various varieties all the time.

Look into your operating systems scheduling capabilities and leverage those, is 
my advice.  Cron is your friend.

Erik

On Aug 31, 2011, at 09:59 , vighnesh wrote:

 hi all
 
 i am unable to do the scheduling in solr so is there any way to do automatic
 indexing in solr.Please give the solution on automatic idexing or specify
 the procedure for how to do schduling in solr.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/is-it-possible-to-do-automatic-indexing-in-solr-tp3298428p3298428.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: is it possible to do automatic indexing in solr ?

2011-08-31 Thread 于浩

you can use the task scheduler of windows or tomcat listener ,the related
solution is posted on the solr wiki
http://wiki.apache.org/solr/DataImportHandler#HTTPPostScheduler

Re: how to do schduling in solr ?

2011-08-31 Thread 于浩

You can have a look at this page:
http://wiki.apache.org/solr/DataImportHandler#HTTPPostScheduler
this scheduler can post  not only command like delta-import but also command
like full-import

Re: Solr 3.3 dismax MM parameter not working properly

2011-08-31 Thread Alexei Martchenko

I'm printing a big bold cheatsheet about it and stickin' it everywhere :-)

I wish I could change this thread's subject to alexei is not working
properly :-/

2011/8/30 Erick Erickson erickerick...@gmail.com

 Yep, that one takes a while to figure out, then
 I wind up re-figuring it out every time I have
 to change it G...

 Best
 Erick

 On Tue, Aug 30, 2011 at 6:36 PM, Alexei Martchenko
 ale...@superdownloads.com.br wrote:
  Hmmm I believe I discovered the problem.
 
  When you have something like this:
 
  250% 6-60%
 
  you should read it from right to left and use the word MORE.
 
  MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that
  includes 3, 4 and 5 AND 6) half is mandatory.
 
  if you wanna a special rule for 2 terms just add:
 
  11 250% 6-60%
 
  MORE THAN ONE clauses (2) should match 1.
 
  NOW this makes sense!
 
  2011/8/30 Alexei Martchenko ale...@superdownloads.com.br
 
  Anyone else strugglin' with dismax's MM parameter?
 
  We're having a problem here, seems that configs from 3 terms and more
 are
  being ignored by solr and it assumes previous configs.
 
  if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i
  get the same results for a 3-term query.
  If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str
 I
  also get same data for a 4-term query.
 
  I'm searching: windows service pack
  str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results
  str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results
  str name=mm1lt;100% 2lt;50%/str - very same 13000 results
  str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i
  get the 33 clause working.
  str name=mm2lt;100%/str - same 93 results, just in case.
  str name=mm2lt;50%/str - very same 13000 results as it should
  str name=mm2lt;-50%/str - 1121 results (weird)
 
  then i tried to control 3-term queries.
 
  str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%,
 ignoring
  the 3 clause.
  str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring
 again
  it.
 
  I'd like to accomplish something like this:
  str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str
 
  translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8
  terms at least half rounded up (5-3, 6-3, 7-4, 8-4)
 
  seems that he's only using 1 and 2 clauses.
 
  thanks in advance
 
  alexei
 
 
 
 
  --
 
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533
 




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

RE: Document Size for Indexing

2011-08-31 Thread Tirthankar Chatterjee

I am using 64 bit JVM and we are going out of memory in extraction phase where 
TIKA assigns the content after extracting to SOLRInputDocument in the pipeline 
which gets loaded in memory. 

We are using released 3.1 version of SOLR.

Thanks,
Tirthankar

-Original Message-
From: simon [mailto:mtnes...@gmail.com] 
Sent: Tuesday, August 30, 2011 1:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Document Size for Indexing

what issues exactly ?

are you using 32 bit Java ? That will restrict the JVM heap size to 2GB max.

-Simon

On Tue, Aug 30, 2011 at 11:26 AM, Tirthankar Chatterjee  
tchatter...@commvault.com wrote:

 Hi,

 I have a machine (win 2008R2) with 16GB RAM, I am having issue 
 indexing 1/2GB files. How do we avoid creating a SOLRInputDocument or 
 is there any way to directly use Lucene Index writer classes.

 What would be the best approach. We need some suggestions.
 
 Thanks,
 Tirthankar


 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient. Any unauthorized review, 
 use or distribution by others is strictly prohibited. If you have 
 received the message in error, please advise the sender by reply email 
 and delete the message. Thank you.
 *
**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you.
*

RE: NRT and commit behavior

2011-08-31 Thread Tirthankar Chatterjee

Also noticed that waitSearcher parameter value is not  honored inside commit. 
It is always defaulted to true which makes it slow during indexing. 

What we are trying to do is use an invalid query (which wont return any 
results) as a warming query. This way the commit returns faster. Are we doing 
something wrong here?

Thanks,
Tirthankar

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Monday, July 18, 2011 11:38 AM
To: solr-user@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: NRT and commit behavior

In practice, in my experience at least, a very 'expensive' commit can still 
slow down searches significantly, I think just due to CPU (or
i/o?) starvation. Not sure anything can be done about that.  That's my 
experience in Solr 1.4.1, but since searches have always been async with 
commits, it probably is the same situation even in more recent versions, I'd 
guess.

On 7/18/2011 11:07 AM, Yonik Seeley wrote:
 On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chasench...@earthlink.net  wrote:
 Very glad to hear that NRT is finally here!  But my question is this: 
 will things still come to a standstill during a commit?
 New updates can now proceed in parallel with a commit, and searches 
 have always been completely asynchronous w.r.t. commits.

 -Yonik
 http://www.lucidimagination.com

**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you.
*

RE: very slow commits and overlapping commits

2011-08-31 Thread Tirthankar Chatterjee

Try looking at your warming queries. Create a warming query that will not 
return ay results. See if it helps returning commits faster. 

Thx

-Original Message-
From: Bill Au [mailto:bill.w...@gmail.com] 
Sent: Friday, May 27, 2011 3:47 PM
To: solr-user@lucene.apache.org
Subject: Re: very slow commits and overlapping commits

I managed to get a thread dump during a slow commit:

resin-tcp-connection-*:5062-129 Id=12721 in RUNNABLE total cpu 
time=391530.ms user time=390620.ms at java.lang.String.intern(Native 
Method) at
org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74)
at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:356)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
at
org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:116)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:691)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:667)
at
org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:956)
at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:5207)
at
org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4370)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4209)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4200)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2195)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2158)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2122)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:230)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:48)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:70)
at
com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:173)
at
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:229)
at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:274)
at com.caucho.server.port.TcpConnection.run(TcpConnection.java:511)
at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:520)
at com.caucho.util.ThreadPool.run(ThreadPool.java:442)
at java.lang.Thread.run(Thread.java:619)

It looks like Lucene's StringHelper is hardcoding the max size of the hash 
table of SimpleStringInterner to 1024 and I might be hitting that limit, 
causing an actual call to java.lang.String.intern().

I think I need to reduce the number of fields in my index.  Any other things I 
can do to help in this case.

Bill

On Wed, May 25, 2011 at 11:28 AM, Bill Au bill.w...@gmail.com wrote:

 I am taking a snapshot after every commit.  From looking at the 
 snapshots, it does not look like the delay in caused by segments 
 merging because I am not seeing any large new segments after a commit.

 I still can't figure out why there is a 2 minutes gap between start 
 commit and SolrDelectionPolicy.onCommit.  Will changing the 
 deletion policy make any difference?  I am using the default deletion policy 
 now.

 Bill

 2011/5/21 Erick Erickson erickerick...@gmail.com

 Well, committing less offside a possibilty  g. Here's what's 
 probably happening. When you pass certain thresholds, segments are 
 merged which can take quite some time.  His are you triggering 
 commits? If it's external, think about using auto commit instead.

 Best
 Erick
 On May 20, 2011 6:04 PM, Bill Au bill.w...@gmail.com wrote:
  On my Solr 1.4.1 master I am doing commits regularly at a fixed
 interval.
 I
  noticed that from time to time commit will take longer than the 
  commit interval, causing commits to overlap. Then things will get 
  worse as
 commit
  will take longer and longer. Here is the logs for a long commit:

  [2011-05-18 23:47:30.071] start

 commit(optimize=false,waitFlush=false,waitSearcher=false,expungeDelet
 es=false)
  [2011-05-18 23:49:48.119] SolrDeletionPolicy.onCommit: 
  commits:num=2
  [2011-05-18 23:49:48.119]

 commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpa,v

Solr Upgrade from 1.4 to 3.1

2011-08-31 Thread Pawan Darira

Hi

I want to upgrade my solr version 1.4 to 3.1. Please suggest the steps 
what challenges might occur.

I have started using solr from 1.4  this is my 1st experience to upgrade
the version

thanks
Pawan

Re: Solr Upgrade from 1.4 to 3.1

2011-08-31 Thread Markus Jelsma

Everything you need to know about upgrading is listed in CHANGES.txt

On Wednesday 31 August 2011 18:14:11 Pawan Darira wrote:
 Hi
 
 I want to upgrade my solr version 1.4 to 3.1. Please suggest the steps 
 what challenges might occur.
 
 I have started using solr from 1.4  this is my 1st experience to upgrade
 the version
 
 thanks
 Pawan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Solr custom plugins: is it possible to have them persistent?

2011-08-31 Thread samuele.mattiuzzo

SEVERE: org.apache.solr.common.SolrException: Error Instantiating
UpdateRequestProcessorFactory, ToTheGoCustom is not a
org.apache.solr.update.processor.UpdateRequestProcessorFactory

i'm getting this error, but i don't know how to fix it

this is solrconfig.xml:

  updateRequestProcessorChain name=ToTheGoCustom
  processor class=ToTheGoCustom /
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

...

  requestHandler name=/update class=solr.XmlUpdateRequestHandler

   lst name=defaults
 str name=update.processorToTheGoCustom/str
   /lst

 /requestHandler

and this is my class implementation

import java.io.IOException;

import org.apache.solr.common.SolrInputDocument;
/* Solr import */
import org.apache.solr.request.SolrQueryRequest;
//import org.apache.solr.request.SolrQueryResponse;
import org.apache.solr.update.AddUpdateCommand;
import org.apache.solr.update.processor.UpdateRequestProcessor;
import org.apache.solr.update.processor.UpdateRequestProcessorFactory;


class ToTheGoCustom extends UpdateRequestProcessor
{



public ToTheGoCustom( UpdateRequestProcessor next) {
super( next );

}

//routine di modifica
@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();

//salary dal documento
Object sal = doc.getFieldValue( salary );
setSalary(doc,sal);

//location dal documento
Object loc = doc.getFieldValue( location );
Object cc = doc.getFieldValue( countrycode );
setLocation(doc,loc,cc);

//jobfield, jobposition dal document
Object title = doc.getFieldValue( job_title );
Object description = doc.getFieldValue( description );
//setFieldPosition(doc,title,description);


// ritorna il documento modificato all'handler principale
super.processAdd(cmd);
}
/* stuff here, not dangerous */
}

the file is called ToTheGoCustom.java, inside a NetBeans project called
ToTheGoCustom, and built as jar ToTheGoCustom.jar
i put it inside the solr-installation lib folder. I already did that once,
and it worked smoothly, i just added some methods and it gave me that error.

The only thing that may have changed is my editor, since i went throu a
formatting and reinstalled everything... So i think i built the plugins in
different ways (one working and one not, but i cannot recall the working
one...)

what am i missing? please be explicit, i'm really giving it up, this is too
messy to even only understand :(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3298850.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr custom plugins: is it possible to have them persistent?

2011-08-31 Thread Koji Sekiguchi


(11/09/01 1:22), samuele.mattiuzzo wrote:

SEVERE: org.apache.solr.common.SolrException: Error Instantiating
UpdateRequestProcessorFactory, ToTheGoCustom is not a
org.apache.solr.update.processor.UpdateRequestProcessorFactory

i'm getting this error, but i don't know how to fix it

this is solrconfig.xml:

   updateRequestProcessorChain name=ToTheGoCustom
   processor class=ToTheGoCustom /
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain


ToTheGoCustom should be the type of UpdateRequestProcessorFactory,
but ...



...

class ToTheGoCustom extends UpdateRequestProcessor
{


it seems UpdateRequestProcessor.

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Solr custom plugins: is it possible to have them persistent?

2011-08-31 Thread Tomás Fernández Löbbe

You need to create also a class that
extends org.apache.solr.update.processor.UpdateRequestProcessorFactory. This
is the one that you indicate on the solrconfig and it's the one that will
instantiate your UpdateRequestProcessor.
see
http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html

Regards,
Tomás

On Wed, Aug 31, 2011 at 1:22 PM, samuele.mattiuzzo samum...@gmail.comwrote:

 SEVERE: org.apache.solr.common.SolrException: Error Instantiating
 UpdateRequestProcessorFactory, ToTheGoCustom is not a
 org.apache.solr.update.processor.UpdateRequestProcessorFactory

 i'm getting this error, but i don't know how to fix it

 this is solrconfig.xml:

  updateRequestProcessorChain name=ToTheGoCustom
  processor class=ToTheGoCustom /
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

 ...

  requestHandler name=/update class=solr.XmlUpdateRequestHandler

   lst name=defaults
 str name=update.processorToTheGoCustom/str
   /lst

  /requestHandler

 and this is my class implementation

 import java.io.IOException;

 import org.apache.solr.common.SolrInputDocument;
 /* Solr import */
 import org.apache.solr.request.SolrQueryRequest;
 //import org.apache.solr.request.SolrQueryResponse;
 import org.apache.solr.update.AddUpdateCommand;
 import org.apache.solr.update.processor.UpdateRequestProcessor;
 import org.apache.solr.update.processor.UpdateRequestProcessorFactory;


 class ToTheGoCustom extends UpdateRequestProcessor
 {



public ToTheGoCustom( UpdateRequestProcessor next) {
super( next );

}

//routine di modifica
@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();

//salary dal documento
Object sal = doc.getFieldValue( salary );
setSalary(doc,sal);

//location dal documento
Object loc = doc.getFieldValue( location );
Object cc = doc.getFieldValue( countrycode );
setLocation(doc,loc,cc);

//jobfield, jobposition dal document
Object title = doc.getFieldValue( job_title );
Object description = doc.getFieldValue( description );
//setFieldPosition(doc,title,description);


// ritorna il documento modificato all'handler principale
super.processAdd(cmd);
}
/* stuff here, not dangerous */
 }

 the file is called ToTheGoCustom.java, inside a NetBeans project called
 ToTheGoCustom, and built as jar ToTheGoCustom.jar
 i put it inside the solr-installation lib folder. I already did that once,
 and it worked smoothly, i just added some methods and it gave me that
 error.

 The only thing that may have changed is my editor, since i went throu a
 formatting and reinstalled everything... So i think i built the plugins in
 different ways (one working and one not, but i cannot recall the working
 one...)

 what am i missing? please be explicit, i'm really giving it up, this is too
 messy to even only understand :(

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3298850.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Changing the DocCollector

2011-08-31 Thread Jamie Johnson

Believe I found it, wasn't populating the docset and doclist.  Again
thanks for all of the support.

On Tue, Aug 30, 2011 at 11:00 PM, Jamie Johnson jej2...@gmail.com wrote:
 Found score, so this works for regular queries but now I'm getting an
 exception when faceting.

 SEVERE: Exception during facet.field of type:java.lang.NullPointerException
        at 
 org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:451)
        at 
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:313)
        at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:357)
        at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:191)
        at 
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1290)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

 Any insight into what would cause that?

 On Tue, Aug 30, 2011 at 10:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 So I looked at doing this, but I don't see a way to get the scores
 from the docs as well.  Am I missing something in that regards?

 On Mon, Aug 29, 2011 at 8:53 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks Hoss.  I am actually ok with that, I think something like
 50,000 results from each shard as a max would be reasonable since my
 check takes about 1s for 50,000 records.  I'll give this a whirl and
 see how it goes.

 On Mon, Aug 29, 2011 at 6:46 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:

 : Also I see that this is before sorting, is there a way to do something
 : similar after sorting?  The reason is that I'm ok with the total
 : result not being completely accurate so long as the first say 10 pages
 : are accurate.  The results could get more accurate as you page through
 : them though.  Does that make sense?

 munging results after sorting is dangerous in the general case, but if you
 have a specific usecase where you're okay with only garunteeing accurate
 results up to result #X, then you might be able to get away with something
 like...

 * custom SearchComponent
 * configure to run after QueryComponent
 * in prepare, record the start  rows params, and replace them with 0 
 (MAX_PAGE_NUM * rows)
 * in process, iterate over the the DocList and build up your own new
 DocSlice based on the docs that match your special criteria - then use the
 original start/rows to generate a subset and return that

 ...getting this to play nicely with stuff like faceting be possible with
 more work, and manipulation of the DocSet (assuming you're okay with the
 facet counts only being as accurate as much as the DocList is -- filtered
 up to row X).

 it could fail misserablly with distributed search since you hvae no idea
 how many results will pass your filter.

 (note: this is all off the top of my head ... no idea if it would actually
 work)



 -Hoss

[Q]Solr response passed to remote JsonStore - highlighting properties embed in the response part

2011-08-31 Thread malic

Hello, I have a very specific question about the Solr response passed to
remote JsonStore.

*Solr response passed to remote JsonStore*

var myJsonStore =  new Ext.data.JsonStore({
// store configs
url: myurl,
baseParams:
{'wt':'json','facet':true,'facet.limit':-1,'facet.sort':'description','hl':true,'hl.fl':'*'},
  // reader configs
totalProperty: 'total',
idProperty: 'handle',
root:function(v){
return v.response.docs;
},
fields: ['handle', 'description']
})

*Solr standard output:*

{
responseHeader: {
status: 0,
QTime: 32
},
response: {
total: 21,
start: 0,
docs: [
{
description: The matte finish waves on this wedding band
contrast with the high polish borders. This sharp and elegant design was
finely crafted in Japan.,
handle: 8252,

},
{
description: This elegant ring has an Akoya cultured
pearl with a band of bezel-set round diamonds making it perfect for her to
wear to work or the night out.,
handle: 8142,

},

]
},
highlighting: {
8252: {
description: [
 and emelegant/em design was finely crafted in Japan.
]
},
8142: {
description: [
This emelegant/em ring has an Akoya cultured pearl with
a band of bezel-set round diamonds making
]
},

}
}


*What I want:* to change the output by embedding the highlighting properties
into the response properties, such that the response part looks like:

response: {
numFound: 21,
start: 0,
docs: [
{
description: The matte finish waves on this wedding band
contrast with the high polish borders. This sharp and emelegant/em
design was finely crafted in Japan.,
UID_PK: 8252,

},
{
description: This emelegant/em ring has an Akoya
cultured pearl with a band of bezel-set round diamonds making it perfect for
her to wear to work or the night out.,
UID_PK: 8142,

},

]
},

Can anyone suggest an approach to do this? Thx a lot. 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Q-Solr-response-passed-to-remote-JsonStore-highlighting-properties-embed-in-the-response-part-tp3297811p3297811.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr commit process and read downtime

2011-08-31 Thread Mike Austin

I've set up a master slave configuration and it's working great!  I know
this is the better setup but if I had just one index due to requirements,
I'd like to know more about the performance hit of the commit. let's just
assume I have a decent size index of a few gig normal sized documents with
high traffic.  A few questions:

- (main question) When you do a commit on a single index, is there anytime
when the reads will not have an index to search on?
- With the rebuilding of caches and whatever else happens, is the only
downside the fact that the server performance will be degraded due to file
copy, cache warming, etc.. or will the index be actually locked at some
point?
- On a commit, do the files get copied so you need double the space or is
that just the optimize?

I know a master/slave setup is used to reduce these issues, but if I had
only one server I need to know the potential risks.

Thanks,
Mike

RE: core creation and instanceDir parameter

2011-08-31 Thread Jaeger, Jay - DOT

Well, if it is for creating a *new* core, Solr doesn't know it is pointing to 
your shared conf directory until after you create it, does it?

JRJ

-Original Message-
From: Gérard Dupont [mailto:ger.dup...@gmail.com] 
Sent: Wednesday, August 31, 2011 8:17 AM
To: solr-user@lucene.apache.org
Subject: Re: core creation and instanceDir parameter

up !

No-one have any clue about this question ? Is it more a dev-related question
?

2011/8/26 Gérard Dupont ger.dup...@gmail.com

 Hi all,

 Playing with multicore and dynamic creation of new core, I found out that
 there is one mandatory parameter instanceDir which is mandaotry to find
 out the location of solrconfig.xml and schema.xml. Since all my cores share
 the same configuration (found realtively to the $SOLR_HOME defined on server
 side) and that all data is saved in the same folder (one sub-folder per
 core), I was wandering why do we still need to send this parameter? In my
 configuration, I would like to avoid that the client, which ask for core
 creation, need to be aware of instance location on the server.

 BTW I'm on solr 3.3.0

 Thanks for any advice.

 --
 Gérard Dupont
 Information Processing Control and Cognition (IPCC)
 CASSIDIAN - an EADS company

 Document  Learning team - LITIS Laboratory

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document  Learning team - LITIS Laboratory

Re: Document Size for Indexing

2011-08-31 Thread simon

So if I understand you, you are  using Tika /SolrJ together in a Solr client
process which talks to your Solr server ? What is the heap size ? Can you
give us a  stack trace from the OOM exception ?

-Simon

On Wed, Aug 31, 2011 at 10:58 AM, Tirthankar Chatterjee 
tchatter...@commvault.com wrote:

 I am using 64 bit JVM and we are going out of memory in extraction phase
 where TIKA assigns the content after extracting to SOLRInputDocument in the
 pipeline which gets loaded in memory.

 We are using released 3.1 version of SOLR.

 Thanks,
 Tirthankar

 -Original Message-
 From: simon [mailto:mtnes...@gmail.com]
 Sent: Tuesday, August 30, 2011 1:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Document Size for Indexing

 what issues exactly ?

 are you using 32 bit Java ? That will restrict the JVM heap size to 2GB
 max.

 -Simon

 On Tue, Aug 30, 2011 at 11:26 AM, Tirthankar Chatterjee 
 tchatter...@commvault.com wrote:

  Hi,
 
  I have a machine (win 2008R2) with 16GB RAM, I am having issue
  indexing 1/2GB files. How do we avoid creating a SOLRInputDocument or
  is there any way to directly use Lucene Index writer classes.
 
  What would be the best approach. We need some suggestions.
 
  Thanks,
  Tirthankar
 
 
  **Legal Disclaimer***
  This communication may contain confidential and privileged material
  for the sole use of the intended recipient. Any unauthorized review,
  use or distribution by others is strictly prohibited. If you have
  received the message in error, please advise the sender by reply email
  and delete the message. Thank you.
  *
 **Legal Disclaimer***
 This communication may contain confidential and privileged
 material for the sole use of the intended recipient. Any
 unauthorized review, use or distribution by others is strictly
 prohibited. If you have received the message in error, please
 advise the sender by reply email and delete the message. Thank
 you.
 *

Re: syntax for functions used in the fq parameter

2011-08-31 Thread Chris Hostetter


: Why doesn't  AND text:foo fill this requirement?

or fq=text:foo (if you don't want it to affect scoring, and it sounds 
like you don't)

But since you asked: if you want to use functions in fq you have to tell 
solr to parse it as a function.  There are a variaty of options...

https://wiki.apache.org/solr/FunctionQuery#Using_FunctionQuery

...but for an fq you'll probably want to use the frange QParser (since 
your goal is to limit matches based on the value of a function)...

https://lucene.apache.org/solr/api/org/apache/solr/search/FunctionRangeQParserPlugin.html



-Hoss

Re: Field grouping?

2011-08-31 Thread Alexei Martchenko

Yes, Ranged Facets
http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range

2011/8/31 Denis Kuzmenok forward...@ukr.net

 Hi.

 Suppose  i  have  a field price with different values, and i want to
 get  ranges for this field depending on docs count, for example i want
 to  get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for
 200 docs = 34 docs in each field, etc.

 Is it possible with solr?




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: Solr Geodist

2011-08-31 Thread Erick Erickson

No, this form is part of the Apache Solr project. Lucid does maintain
a searchable index of this list though...

Best
Erick

On Tue, Aug 30, 2011 at 10:40 PM, solrnovice manisha...@yahoo.com wrote:
 hi Lance, thanks for the link, i went to their site, lucidimagination forum,
 when i searched on geodist, i see my own posts. Is this forum part of
 lucidimagination?

 Just curious.

 thanks
 SN

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297262.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Geodist

2011-08-31 Thread Erick Erickson

This is the one I've used,
http://wiki.apache.org/solr/SpatialSearch

Best
Erick

On Tue, Aug 30, 2011 at 9:09 PM, solrnovice manisha...@yahoo.com wrote:
 hi Erik, today i had the distance working. Since the solr version under
 LucidImagination is not returning geodist(),  I downloaded Solr 4.0 from the
 nightly build. On lucid we had the full schema defined. So i copied that
 schema to the example directory of solr-4 and removed all references to
 Lucid and started the index.
 I wanted to try our schema under solr-4.

 Then i had the data indexed ( we have a rake written in ruby to index the
 contents) and ran the geodist queries and they all run like a charm. I do
 get distance as a pseudo column.

 Is there any documentation that gives me all the arguments of geodist(), i
 couldnt find it online.


 Erick, thanks for your help in going through my examples. NOw they all work
 on my solr installation.


 thanks
 SN

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297088.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplication of Output

2011-08-31 Thread Erick Erickson

The first question I'd ask is why are there duplicates
in your index in the first place?. If you're denormalizing,
that would account for it. Mostly, I'm just asking to be
sure that you expect duplicate product IDs. If you make
your productid a uniqueKey, there'll only be one of each

You'll have to re-index if you make this change though.

But grouping/field collapsing would, indeed, apply to this
problem.

deduplication isn't applicable, since you know exactly what
duplicates are. deduplication is more for fuzzy removal
of near-duplicates..

Hope this helps
Erick

On Wed, Aug 31, 2011 at 12:01 AM, Aaron Bains aaronba...@gmail.com wrote:
 Hello,

 What is the best way to remove duplicate values on output. I am using the
 following query:

 /solr/select/?q=wrt54g2version=2.2start=0rows=10indent=on*fl=productid*

 And I get the following results:

 doc
 int name=productid1011630553/int
 /doc
 doc
 int name=productid1011630553/int
 /doc
 docint name=productid1011630553/int
 /doc
 docint name=productid1011630553/int
 /doc
 docint name=productid1011630553/int
 /doc
 docint name=productid1011630553/int
 /doc
 docint name=productid1011630553/int
 /doc
 docint name=productid1013033708/int
 /doc
 docint name=productid1013033708/int
 /doc
 docint name=productid1013033708/int
 /doc


 But I don't want those results because there are duplicates. I am looking
 for results like below:

 doc
 int name=productid1011630553/int
 /doc
 doc
 int name=productid1013033708/int
 /doc

 I know there is deduplication and field collapsing but I am not sure if they
 are applicable in this situation. Thanks for your help!

Re: Find out why a document wasn't found

2011-08-31 Thread Erick Erickson

For a specific document, try explainOther, see:
http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_doesn.27t_document_id:juggernaut_appear_in_the_top_10_results_for_my_query

Don't quite know whether this will work for your users, you may have to
massage the output to make something more concise. But the information
should be there.

Best
Erick

On Wed, Aug 31, 2011 at 5:37 AM, occurred
schaubm...@infodienst-ausschreibungen.de wrote:
 Hi Oleg,

 ah, maybe there is a misunderstanding.
 With document I meant a record in the index not a file.
 The records are indexed via a DB.

 cheers
 Charlie

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Find-out-why-a-document-wasn-t-found-tp3297821p3297875.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Duplication of Output

2011-08-31 Thread Aaron Bains

Thanks! I appreciate your input. You are right, yesterday I actually
denormalized my index using multivalued fields. Now I am using Solr the way
it was designed and I am happy, everything seems to work great.

On Wed, Aug 31, 2011 at 6:06 PM, Erick Erickson erickerick...@gmail.comwrote:

 The first question I'd ask is why are there duplicates
 in your index in the first place?. If you're denormalizing,
 that would account for it. Mostly, I'm just asking to be
 sure that you expect duplicate product IDs. If you make
 your productid a uniqueKey, there'll only be one of each

 You'll have to re-index if you make this change though.

 But grouping/field collapsing would, indeed, apply to this
 problem.

 deduplication isn't applicable, since you know exactly what
 duplicates are. deduplication is more for fuzzy removal
 of near-duplicates..

 Hope this helps
 Erick

 On Wed, Aug 31, 2011 at 12:01 AM, Aaron Bains aaronba...@gmail.com
 wrote:
  Hello,
 
  What is the best way to remove duplicate values on output. I am using the
  following query:
 
 
 /solr/select/?q=wrt54g2version=2.2start=0rows=10indent=on*fl=productid*
 
  And I get the following results:
 
  doc
  int name=productid1011630553/int
  /doc
  doc
  int name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1013033708/int
  /doc
  docint name=productid1013033708/int
  /doc
  docint name=productid1013033708/int
  /doc
 
 
  But I don't want those results because there are duplicates. I am looking
  for results like below:
 
  doc
  int name=productid1011630553/int
  /doc
  doc
  int name=productid1013033708/int
  /doc
 
  I know there is deduplication and field collapsing but I am not sure if
 they
  are applicable in this situation. Thanks for your help!
 




-- 
Aaron Bains, Ivey HBA
+1 519.868.0820 (Mobile)
aar...@microcad.ca

Re: Duplication of Output

2011-08-31 Thread Markus Jelsma


 The first question I'd ask is why are there duplicates
 in your index in the first place?. If you're denormalizing,
 that would account for it. Mostly, I'm just asking to be
 sure that you expect duplicate product IDs. If you make
 your productid a uniqueKey, there'll only be one of each
 
 You'll have to re-index if you make this change though.
 
 But grouping/field collapsing would, indeed, apply to this
 problem.
 
 deduplication isn't applicable, since you know exactly what
 duplicates are. deduplication is more for fuzzy removal
 of near-duplicates..

That's only if you use Nutch' TextProfileSignature, MD5 and Lookup3 are meant 
for exact matching. I don't know if Lookup3Signature works on non-string/text 
values but i see no reason it should not work.

Might be an improvement to allow deduplication that skips creating a signature 
field and dedup on non-string values instead of that signature field.

 
 Hope this helps
 Erick
 
 On Wed, Aug 31, 2011 at 12:01 AM, Aaron Bains aaronba...@gmail.com wrote:
  Hello,
  
  What is the best way to remove duplicate values on output. I am using the
  following query:
  
  /solr/select/?q=wrt54g2version=2.2start=0rows=10indent=on*fl=product
  id*
  
  And I get the following results:
  
  doc
  int name=productid1011630553/int
  /doc
  doc
  int name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1011630553/int
  /doc
  docint name=productid1013033708/int
  /doc
  docint name=productid1013033708/int
  /doc
  docint name=productid1013033708/int
  /doc
  
  
  But I don't want those results because there are duplicates. I am looking
  for results like below:
  
  doc
  int name=productid1011630553/int
  /doc
  doc
  int name=productid1013033708/int
  /doc
  
  I know there is deduplication and field collapsing but I am not sure if
  they are applicable in this situation. Thanks for your help!

word proximity and queryoperator OR

2011-08-31 Thread abhayd

hi 

I have some issues with search result relevancy.

default query operator is OR
i search for iphone 4. I m not sure how would i get iphone 4 results to show
first.

I tried
?q=iphone+4start=0wt=jsonindent=onfl=displayName,scoreqt=dismaxfq=productType:Devicedebug=truepf=displayNameps=3

proximity  pf and ps but still no use. 
{
  responseHeader:{
status:0,
QTime:62,
params:{
  fl:displayName,score,
  wt:json,
  ps:3,
  fq:productType:Device,
  rows:10,
  indent:on,
  pf:displayName,
  qt:dismax,
  debug:true,
  start:0,
  q:iphone 4}},
  response:{numFound:25,start:0,maxScore:1.1627537,docs:[
  {
displayName:Apple iPhone 3GS - 32 GB - Black,
score:1.1627537},
  {
displayName:Apple iPhone 3GS - 8 GB - Black,
score:1.1627537},
  {
displayName:Apple iPhone 4 - 16 GB - White,
score:1.1627537},
  {
displayName:Apple iPhone 4 - 16 GB - Black,
score:1.1627537},
  {
displayName:Apple iPhone 4 - 32 GB - White,
score:1.1627537},
  {
displayName:Apple iPhone 4 - 32 GB - Black,
score:1.1627537},
  {
displayName:Apple iPhone 3GS - 8 GB - Black (Refurb),
score:1.1627537},
  {
displayName:Apple iPhone 3GS (Refurb Cosmetic Blemish) - 8GB,
score:1.1627537},
  {
displayName:Apple iPhone 4 - 32 GB - Black (Refurb),
score:1.1627537},
  {
displayName:Apple iPhone 4 (Refurb Cosmetic Blemish) - 16GB -
Black,
score:1.1301228}]
  },

I can add debug output if required.

Any help?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/word-proximity-and-queryoperator-OR-tp3299729p3299729.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: word proximity and queryoperator OR

2011-08-31 Thread Markus Jelsma

Debug output of a few would help. There can be other factors that produce more 
weight than pf/ps. Most of the time it's tf and norms that play a big part.
 hi
 
 I have some issues with search result relevancy.
 
 default query operator is OR
 i search for iphone 4. I m not sure how would i get iphone 4 results to
 show first.
 
 I tried
 ?q=iphone+4start=0wt=jsonindent=onfl=displayName,scoreqt=dismaxfq=pro
 ductType:Devicedebug=truepf=displayNameps=3
 
 proximity  pf and ps but still no use.
 {
   responseHeader:{
 status:0,
 QTime:62,
 params:{
   fl:displayName,score,
   wt:json,
   ps:3,
   fq:productType:Device,
   rows:10,
   indent:on,
   pf:displayName,
   qt:dismax,
   debug:true,
   start:0,
   q:iphone 4}},
   response:{numFound:25,start:0,maxScore:1.1627537,docs:[
   {
 displayName:Apple iPhone 3GS - 32 GB - Black,
 score:1.1627537},
   {
 displayName:Apple iPhone 3GS - 8 GB - Black,
 score:1.1627537},
   {
 displayName:Apple iPhone 4 - 16 GB - White,
 score:1.1627537},
   {
 displayName:Apple iPhone 4 - 16 GB - Black,
 score:1.1627537},
   {
 displayName:Apple iPhone 4 - 32 GB - White,
 score:1.1627537},
   {
 displayName:Apple iPhone 4 - 32 GB - Black,
 score:1.1627537},
   {
 displayName:Apple iPhone 3GS - 8 GB - Black (Refurb),
 score:1.1627537},
   {
 displayName:Apple iPhone 3GS (Refurb Cosmetic Blemish) - 8GB,
 score:1.1627537},
   {
 displayName:Apple iPhone 4 - 32 GB - Black (Refurb),
 score:1.1627537},
   {
 displayName:Apple iPhone 4 (Refurb Cosmetic Blemish) - 16GB -
 Black,
 score:1.1301228}]
   },
 
 I can add debug output if required.
 
 Any help?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/word-proximity-and-queryoperator-OR-tp3
 299729p3299729.html Sent from the Solr - User mailing list archive at
 Nabble.com.

Re: shareSchema=true - location of schema.xml?

2011-08-31 Thread Erick Erickson

Would it work to just (relative) path the schema
file for your cores with the schema parameter?

Best
Erick

2011/8/31 François Schiettecatte fschietteca...@gmail.com:
 Satish

 You don't say which platform you are on but have you tried links (with ln on 
 linux/unix) ?

 François

 On Aug 31, 2011, at 12:25 AM, Satish Talim wrote:

 I have 1000's of cores and to reduce the cost of loading unloading
 schema.xml, I have my solr.xml as mentioned here -
 http://wiki.apache.org/solr/CoreAdmin
 namely:

 solr
  cores adminPath=/admin/cores shareSchema=true
    ...
  /cores
 /solr

 However, I am not sure where to keep the common schema.xml file? In which
 case, do I need the schema.xml in the conf folder of each and every core?

 My folder structure is:

     multicore (contains solr.xml)
        |_ core0
             |_ conf
             |    |_ schema.xml
             |    |_ solrconfig.xml
             |    |_ other files
           core1
             |_ conf
             |    |_ schema.xml
             |    |_ solrconfig.xml
             |    |_ other files
             |
           exampledocs (contains 1000's of .csv files and post.jar)

 Satish

Re: shareSchema=true - location of schema.xml?

2011-08-31 Thread Satish Talim

I am experimenting Solr on Windows, for now.

Satish

2011/8/31 François Schiettecatte fschietteca...@gmail.com

 Satish

 You don't say which platform you are on but have you tried links (with ln
 on linux/unix) ?

 François

 On Aug 31, 2011, at 12:25 AM, Satish Talim wrote:

  I have 1000's of cores and to reduce the cost of loading unloading
  schema.xml, I have my solr.xml as mentioned here -
  http://wiki.apache.org/solr/CoreAdmin
  namely:
 
  solr
   cores adminPath=/admin/cores shareSchema=true
 ...
   /cores
  /solr
 
  However, I am not sure where to keep the common schema.xml file? In which
  case, do I need the schema.xml in the conf folder of each and every core?
 
  My folder structure is:
 
  multicore (contains solr.xml)
 |_ core0
  |_ conf
  ||_ schema.xml
  ||_ solrconfig.xml
  ||_ other files
core1
  |_ conf
  ||_ schema.xml
  ||_ solrconfig.xml
  ||_ other files
  |
exampledocs (contains 1000's of .csv files and post.jar)
 
  Satish

RE: word proximity and queryoperator OR

2011-08-31 Thread O. Klein

You might want to check your analyzers in schema.xml. It appears numbers are
filtered out.

So basically you are searching for iphone instead of iphone 4

 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/word-proximity-and-queryoperator-OR-tp3299729p3299919.html
Sent from the Solr - User mailing list archive at Nabble.com.

geodist() parameters?

2011-08-31 Thread William Bell

I want to go a geodist() calculation on 2 different sfields. How would
I do that?

http://localhost:8983/solr/select?q={!func}add(geodist(),geodist())fq={!geofilt}pt=39.86347,-105.04888d=100sfield=store_lat_lon

But I really want geodist() for one pt, and another geodist() for another pt.

Can I do something like geodist(store_lat_lon,39.86347,-105.04888,100) ?



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

RE: word proximity and queryoperator OR

2011-08-31 Thread abhayd


hi 
i dont understand why though.

here is my displayName filed type text


fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index 
tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


why would search term not be iphone 4 and only iphone

Synonyms.txt
iphone 4, itouch 

thanks
abhay
Date: Wed, 31 Aug 2011 16:57:48 -0700
From: ml-node+3299919-1944219593-210...@n3.nabble.com
To: ajdabhol...@hotmail.com
Subject: RE: word proximity and queryoperator OR



You might want to check your analyzers in schema.xml. It appears 
numbers are filtered out.


So basically you are searching for iphone instead of iphone 4


 







If you reply to this email, your message will be added to the 
discussion below:

http://lucene.472066.n3.nabble.com/word-proximity-and-queryoperator-OR-tp3299729p3299919.html



To unsubscribe from word proximity and queryoperator OR, click 
here.
  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/word-proximity-and-queryoperator-OR-tp3299729p3300214.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Geodist

2011-08-31 Thread solrnovice

ok, thank you Erick, i will check this forum as well.


thanks
SN

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3300236.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to display results across multiple pages when grouping is enabled by default

2011-08-31 Thread Balaji N.S. [via Lucene]

Solr 3.3 is what I use and I have configured grouping of results by default.
I have some 30-40 sample documents in my index. I use Solritas UI . When I
search I don't get the results across pages. Even when I specify an empty
query the results that are returned are just for the first page.

What file in the velocity script should I change to achieve this . Can you
please help me out.



__
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/How-to-display-results-across-multiple-pages-when-grouping-is-enabled-by-default-tp3300292p3300292.html
This email was sent by Balaji N.S. (via Nabble)
To receive all replies by email, subscribe to this discussion: 
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_codenode=3300292code=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3JnfDMzMDAyOTJ8MTIyNjQ0MTk0Mg==

Re: Field grouping?

2011-08-31 Thread Denis Kuzmenok

But  i  don't  know what values would be price field in that query. It
can  be 100-1000, and 10-100, and i want to get ranges in every query,
just split price field by docs number.

 Yes, Ranged Facets
 http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range

 2011/8/31 Denis Kuzmenok forward...@ukr.net

 Hi.

 Suppose  i  have  a field price with different values, and i want to
 get  ranges for this field depending on docs count, for example i want
 to  get 5 ranges for 100 docs with 20 docs in each range, 6 ranges for
 200 docs = 34 docs in each field, etc.

 Is it possible with solr?

53 matches

Mail list logo