Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Stephen Boesch
 [X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
 [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
 [X] I/we build them from source via an SVN/Git checkout.
 [] Other (someone in your company mirrors them internally or via a
downstream project)

2011/1/18 Ahmet Arslan iori...@yahoo.com

  [] ASF Mirrors (linked in our release announcements or via
  the Lucene website)
 
  [X] Maven repository (whether you use Maven, Ant+Ivy,
  Buildr, etc.)
 
  [] I/we build them from source via an SVN/Git checkout.
 
  [] Other (someone in your company mirrors them internally
  or via a downstream project)






Re: Including Small Amounts of New Data in Searches (MultiSearcher ?)

2011-01-09 Thread Stephen Boesch
Thanks Lance for mentioning the MergePolicies and specifically this one
contributed by LinkedIn.

2011/1/8 Lance Norskog goks...@gmail.com

 There are always slowdowns when merging new segments during indexing.
 A MergePolicy decides when to merge segments.  The older MergePolicies
 followed a strategy which is quite disruptive in an NRT environment.

 There is a new feature in 3.x  the trunk called
 'BalancedSegmentMergePolicy'. This new MergePolicy is designed for the
 near-real-time use case. It was contributed by LinkedIn. You may find
 it works well enough for your case.

 Lance

 On Thu, Jan 6, 2011 at 10:21 AM, Stephen Boesch java...@gmail.com wrote:
  Thanks Yonik,
   Using a stable release of Solr what would you suggest to do - given
  MultiSearch's demise and the other work is still ongoing?
 
  2011/1/6 Yonik Seeley yo...@lucidimagination.com
 
  On Thu, Jan 6, 2011 at 12:37 PM, Stephen Boesch java...@gmail.com
 wrote:
   Solr/lucene newbie here ..
  
   We would like searches against a solr/lucene index to immediately be
 able
  to
   view data that was added.  I stress small amount of new data given
 that
   any significant amount would require excessive  latency.
 
  There has been significant ongoing work in lucene-core for NRT (near
 real
  time).
  We need to overhaul Solr's DirectUpdateHandler2 to take advantage of
  all this work.
  Mark Miller took a first crack at it (sharing a single IndexWriter,
  letting lucene handle the concurrency issues, etc)
  but if there's a JIRA issue, I'm having trouble finding it.
 
   Looking around, i'm wondering if the direction would be a
 MultiSearcher
   living on top of our standard directory-based IndexReader as well as a
   custom Searchable that handles the newest documents - and then
 combines
  the
   two results?
 
  If you look at trunk, MultiSearcher has already gone away.
 
  -Yonik
  http://www.lucidimagination.com
 
 



 --
 Lance Norskog
 goks...@gmail.com



Including Small Amounts of New Data in Searches (MultiSearcher ?)

2011-01-06 Thread Stephen Boesch
Solr/lucene newbie here ..

We would like searches against a solr/lucene index to immediately be able to
view data that was added.  I stress small amount of new data given that
any significant amount would require excessive  latency.

Looking around, i'm wondering if the direction would be a MultiSearcher
living on top of our standard directory-based IndexReader as well as a
custom Searchable that handles the newest documents - and then combines the
two results?

is that a way to go - and would there be examples of similar
implementations?

thanks!

stephenb


Re: Including Small Amounts of New Data in Searches (MultiSearcher ?)

2011-01-06 Thread Stephen Boesch
Thanks Yonik,
  Using a stable release of Solr what would you suggest to do - given
MultiSearch's demise and the other work is still ongoing?

2011/1/6 Yonik Seeley yo...@lucidimagination.com

 On Thu, Jan 6, 2011 at 12:37 PM, Stephen Boesch java...@gmail.com wrote:
  Solr/lucene newbie here ..
 
  We would like searches against a solr/lucene index to immediately be able
 to
  view data that was added.  I stress small amount of new data given that
  any significant amount would require excessive  latency.

 There has been significant ongoing work in lucene-core for NRT (near real
 time).
 We need to overhaul Solr's DirectUpdateHandler2 to take advantage of
 all this work.
 Mark Miller took a first crack at it (sharing a single IndexWriter,
 letting lucene handle the concurrency issues, etc)
 but if there's a JIRA issue, I'm having trouble finding it.

  Looking around, i'm wondering if the direction would be a MultiSearcher
  living on top of our standard directory-based IndexReader as well as a
  custom Searchable that handles the newest documents - and then combines
 the
  two results?

 If you look at trunk, MultiSearcher has already gone away.

 -Yonik
 http://www.lucidimagination.com



Re: Luke for inspecting indexes on remote solr servers?

2011-01-04 Thread Stephen Boesch
i am interested in the LukeRequestHandler   in fact having been pointed to
it will try to find a comprehensive list of solr request handlers for future
reference.

as regards the -X our linux box does not have xwindows installed for
security reasons, which is why did not try that approach.


2011/1/4 Peter Karich peat...@yahoo.de

  Am 04.01.2011 21:43, schrieb Ahmet Arslan:
  Is that supported?  Pointer(s)
  to how to do it?
  perhaps http://wiki.apache.org/solr/LukeRequestHandler ?

 or via
 ssh u...@host -X
 ;-)



solr newbie: Diagnose why DataImportHandler DIH not saving documents

2010-12-31 Thread Stephen Boesch
I am asking for a full DataImport via a url.  It seems to be partially
 happy with the request - with debug=on I can see it saying that 10
documents were indexed.  The backend however realizes there are actually 440
records available for the query.

Not sure why only 10 records were selected and then why even those 10
records are not stored.


Here is the obfuscated url used for invoking the DataImport:

mySolrHost:8983/solr/core0/dataimport?command=full-importdebug=onhttp://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-importdebug=on


Here is the output:  looks reasonable for the 10 records it does find:
notice it says *added/updated 10 documents*

0360db-data-config.xmlfull-importdebugBrad is testing
thisjava.math.BigDecimal:1java.math.BigDecimal:15000947 Wood Duck
Lanejava.math.BigDecimal:3java.math.BigDecimal:15002Stanford Quad
Sculpturejava.math.BigDecimal:3java.math.BigDecimal:15200Apple Store - Palo
Altojava.math.BigDecimal:3java.math.BigDecimal:15201Fox
Theaterjava.math.BigDecimal:3java.math.BigDecimal:15220java.math.BigDecimal:3java.math.BigDecimal:15222Knowtate
promojava.math.BigDecimal:4welcome to Knowtatejava.math.BigDecimal:16163The
Green Dragon Tavernjava.math.BigDecimal:5java.math.BigDecimal:15020The All
New Infiniti Mjava.math.BigDecimal:5Introjava.math.BigDecimal:15100The All
New Infiniti Mjava.math.BigDecimal:5To hear current
specialsjava.math.BigDecimal:15100idleConfiguration Re-loaded
sucessfully11002010-12-31 16:45:11Indexing completed. *Added/Updated: 10
documents. *Deleted 0 documents.100:0:0.331This response format is
experimental. It is likely to change in the future.


But when I go to the Admin screen, it tells me   Documents Processed:
10   *Total
Documents Processed 0*
*
*
So what is difference between Documents and Total Documents ??  Note that
there is presently *no *data in the indexes.

mySolrHost:8983/solr/core0/admin/stats.jsphttp://knowtate.servehttp.com:8983/solr/core0/admin/

*name: */dataimport  *class:
*org.apache.solr.handler.dataimport.DataImportHandler
 *version: *1.0  *description: *Manage data import from databases to Solr  *
stats: *Status : IDLE
Documents Processed : 10
Requests made to DataSource : 1
Rows Fetched : 10
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 0
Total Requests made to DataSource : 0
Total Rows Fetched : 0
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1293831460260
requests : 2


Re: solr newbie: Diagnose why DataImportHandler DIH not saving documents

2010-12-31 Thread Stephen Boesch
one little extra piece of info: part of the stats page got omitted - notably
the number of errors was reported as 0.

errors : 0
timeouts : 0
totalTime : 1963
avgTimePerRequest : 981.5
avgRequestsPerSecond : 0.0011371888


2010/12/31 Stephen Boesch java...@gmail.com

 I am asking for a full DataImport via a url.  It seems to be partially
  happy with the request - with debug=on I can see it saying that 10
 documents were indexed.  The backend however realizes there are actually 440
 records available for the query.

 Not sure why only 10 records were selected and then why even those 10
 records are not stored.


 Here is the obfuscated url used for invoking the DataImport:

 mySolrHost:8983/solr/core0/dataimport?command=full-importdebug=onhttp://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-importdebug=on


 Here is the output:  looks reasonable for the 10 records it does find:
 notice it says *added/updated 10 documents*

 0360db-data-config.xmlfull-importdebugBrad is testing
 thisjava.math.BigDecimal:1java.math.BigDecimal:15000947 Wood Duck
 Lanejava.math.BigDecimal:3java.math.BigDecimal:15002Stanford Quad
 Sculpturejava.math.BigDecimal:3java.math.BigDecimal:15200Apple Store - Palo
 Altojava.math.BigDecimal:3java.math.BigDecimal:15201Fox
 Theaterjava.math.BigDecimal:3java.math.BigDecimal:15220java.math.BigDecimal:3java.math.BigDecimal:15222Knowtate
 promojava.math.BigDecimal:4welcome to Knowtatejava.math.BigDecimal:16163The
 Green Dragon Tavernjava.math.BigDecimal:5java.math.BigDecimal:15020The All
 New Infiniti Mjava.math.BigDecimal:5Introjava.math.BigDecimal:15100The All
 New Infiniti Mjava.math.BigDecimal:5To hear current
 specialsjava.math.BigDecimal:15100idleConfiguration Re-loaded
 sucessfully11002010-12-31 16:45:11Indexing completed. *Added/Updated: 10
 documents. *Deleted 0 documents.100:0:0.331This response format is
 experimental. It is likely to change in the future.


 But when I go to the Admin screen, it tells me   Documents Processed: 10
 *Total Documents Processed 0*
 *
 *
 So what is difference between Documents and Total Documents ??  Note that
 there is presently *no *data in the indexes.

 mySolrHost:8983/solr/core0/admin/stats.jsphttp://knowtate.servehttp.com:8983/solr/core0/admin/

 *name: * /dataimport   *class: * 
 org.apache.solr.handler.dataimport.DataImportHandler
   *version: * 1.0  *description: * Manage data import from databases to
 Solr   *stats: * Status : IDLE
 Documents Processed : 10
 Requests made to DataSource : 1
 Rows Fetched : 10
 Documents Deleted : 0
 Documents Skipped : 0
 Total Documents Processed : 0
 Total Requests made to DataSource : 0
 Total Rows Fetched : 0
 Total Documents Deleted : 0
 Total Documents Skipped : 0
 handlerStart : 1293831460260
 requests : 2





Re: solr newbie: Diagnose why DataImportHandler DIH not saving documents

2010-12-31 Thread Stephen Boesch
sure I'll try that.

2010/12/31 Ahmet Arslan iori...@yahoo.com

 It seems that with debug=on there is a hard coded default rows=10.


 http://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-importdebug=onechoParams=allrows=50

 returns  Added/Updated: 50 documents. Deleted 0 documents.

 It seems that debug parameter is related to
 /solr/core0/admin/dataimport.jsp page.

 Don't know exact purpose of debug parameter but, can't you just ignore it
 and use


 http://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-import


 --- On Sat, 1/1/11, Stephen Boesch java...@gmail.com wrote:

  From: Stephen Boesch java...@gmail.com
  Subject: Re: solr newbie: Diagnose why DataImportHandler DIH not saving
 documents
  To: solr-user@lucene.apache.org
  Date: Saturday, January 1, 2011, 3:09 AM
  one little extra piece of info: part
  of the stats page got omitted - notably
  the number of errors was reported as 0.
 
  errors : 0
  timeouts : 0
  totalTime : 1963
  avgTimePerRequest : 981.5
  avgRequestsPerSecond : 0.0011371888
 
 
  2010/12/31 Stephen Boesch java...@gmail.com
 
   I am asking for a full DataImport via a url.  It
  seems to be partially
happy with the request - with debug=on I can see
  it saying that 10
   documents were indexed.  The backend however
  realizes there are actually 440
   records available for the query.
  
   Not sure why only 10 records were selected and then
  why even those 10
   records are not stored.
  
  
   Here is the obfuscated url used for invoking the
  DataImport:
  
  
  mySolrHost:8983/solr/core0/dataimport?command=full-importdebug=on
 http://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-importdebug=on
 
  
  
   Here is the output:  looks reasonable for the 10
  records it does find:
   notice it says *added/updated 10 documents*
  
   0360db-data-config.xmlfull-importdebugBrad is testing
  
  thisjava.math.BigDecimal:1java.math.BigDecimal:15000947 Wood
  Duck
  
  Lanejava.math.BigDecimal:3java.math.BigDecimal:15002Stanford
  Quad
  
  Sculpturejava.math.BigDecimal:3java.math.BigDecimal:15200Apple
  Store - Palo
  
  Altojava.math.BigDecimal:3java.math.BigDecimal:15201Fox
  
 
 Theaterjava.math.BigDecimal:3java.math.BigDecimal:15220java.math.BigDecimal:3java.math.BigDecimal:15222Knowtate
   promojava.math.BigDecimal:4welcome to
  Knowtatejava.math.BigDecimal:16163The
   Green Dragon
  Tavernjava.math.BigDecimal:5java.math.BigDecimal:15020The
  All
   New Infiniti
  Mjava.math.BigDecimal:5Introjava.math.BigDecimal:15100The
  All
   New Infiniti Mjava.math.BigDecimal:5To hear current
   specialsjava.math.BigDecimal:15100idleConfiguration
  Re-loaded
   sucessfully11002010-12-31 16:45:11Indexing completed.
  *Added/Updated: 10
   documents. *Deleted 0 documents.100:0:0.331This
  response format is
   experimental. It is likely to change in the future.
  
  
   But when I go to the Admin screen, it tells
  me   Documents Processed: 10
   *Total Documents Processed 0*
   *
   *
   So what is difference between Documents and Total
  Documents ??  Note that
   there is presently *no *data in the indexes.
  
  
  mySolrHost:8983/solr/core0/admin/stats.jsp
 http://knowtate.servehttp.com:8983/solr/core0/admin/
  
   *name: * /dataimport   *class: *
  org.apache.solr.handler.dataimport.DataImportHandler
 *version: * 1.0  *description: *
  Manage data import from databases to
   Solr   *stats: * Status : IDLE
   Documents Processed : 10
   Requests made to DataSource : 1
   Rows Fetched : 10
   Documents Deleted : 0
   Documents Skipped : 0
   Total Documents Processed : 0
   Total Requests made to DataSource : 0
   Total Rows Fetched : 0
   Total Documents Deleted : 0
   Total Documents Skipped : 0
   handlerStart : 1293831460260
   requests : 2
  
  
  
 






Re: solr newbie: Diagnose why DataImportHandler DIH not saving documents

2010-12-31 Thread Stephen Boesch
Yes that fixed the problem.  interesting.. usually think setting debug just
changes the verbosity level.. in this case caused docs not to be processed.

02db-data-config.xmlfull-importidle144002010-12-31 17:45:03Indexing
completed. Added/Updated: 440 documents. Deleted 0 documents.2010-12-31
17:45:032010-12-31 17:45:034400:0:0.258This response format is experimental.
It is likely to change in the future.

Now I am seeing the full 440 docs being processed.
cool!

*ame: */dataimport  *class:
*org.apache.solr.handler.dataimport.DataImportHandler
 *version: *1.0  *description: *Manage data import from databases to Solr  *
stats: *Status : IDLE
Documents Processed : 440
Requests made to DataSource : 1
Rows Fetched : 440
Documents Deleted : 0
Documents Skipped : 0
Total Documents Processed : 880
Total Requests made to DataSource : 2
Total Rows Fetched : 880
Total Documents Deleted : 0
Total Documents Skipped : 0
handlerStart : 1293831460260
requests : 35
errors : 0
timeouts : 0
totalTime : 3170
avgTimePerRequest : 90.57143
avgRequestsPerSecond : 0.008557899

2010/12/31 Stephen Boesch java...@gmail.com

 sure I'll try that.

 2010/12/31 Ahmet Arslan iori...@yahoo.com

 It seems that with debug=on there is a hard coded default rows=10.


 http://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-importdebug=onechoParams=allrows=50

 returns  Added/Updated: 50 documents. Deleted 0 documents.

 It seems that debug parameter is related to
 /solr/core0/admin/dataimport.jsp page.

 Don't know exact purpose of debug parameter but, can't you just ignore it
 and use


 http://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-import


 --- On Sat, 1/1/11, Stephen Boesch java...@gmail.com wrote:

  From: Stephen Boesch java...@gmail.com
  Subject: Re: solr newbie: Diagnose why DataImportHandler DIH not saving
 documents
  To: solr-user@lucene.apache.org
  Date: Saturday, January 1, 2011, 3:09 AM
  one little extra piece of info: part
  of the stats page got omitted - notably
  the number of errors was reported as 0.
 
  errors : 0
  timeouts : 0
  totalTime : 1963
  avgTimePerRequest : 981.5
  avgRequestsPerSecond : 0.0011371888
 
 
  2010/12/31 Stephen Boesch java...@gmail.com
 
   I am asking for a full DataImport via a url.  It
  seems to be partially
happy with the request - with debug=on I can see
  it saying that 10
   documents were indexed.  The backend however
  realizes there are actually 440
   records available for the query.
  
   Not sure why only 10 records were selected and then
  why even those 10
   records are not stored.
  
  
   Here is the obfuscated url used for invoking the
  DataImport:
  
  
  mySolrHost:8983/solr/core0/dataimport?command=full-importdebug=on
 http://knowtate.servehttp.com:8983/solr/core0/dataimport?command=full-importdebug=on
 
  
  
   Here is the output:  looks reasonable for the 10
  records it does find:
   notice it says *added/updated 10 documents*
  
   0360db-data-config.xmlfull-importdebugBrad is testing
  
  thisjava.math.BigDecimal:1java.math.BigDecimal:15000947 Wood
  Duck
  
  Lanejava.math.BigDecimal:3java.math.BigDecimal:15002Stanford
  Quad
  
  Sculpturejava.math.BigDecimal:3java.math.BigDecimal:15200Apple
  Store - Palo
  
  Altojava.math.BigDecimal:3java.math.BigDecimal:15201Fox
  
 
 Theaterjava.math.BigDecimal:3java.math.BigDecimal:15220java.math.BigDecimal:3java.math.BigDecimal:15222Knowtate
   promojava.math.BigDecimal:4welcome to
  Knowtatejava.math.BigDecimal:16163The
   Green Dragon
  Tavernjava.math.BigDecimal:5java.math.BigDecimal:15020The
  All
   New Infiniti
  Mjava.math.BigDecimal:5Introjava.math.BigDecimal:15100The
  All
   New Infiniti Mjava.math.BigDecimal:5To hear current
   specialsjava.math.BigDecimal:15100idleConfiguration
  Re-loaded
   sucessfully11002010-12-31 16:45:11Indexing completed.
  *Added/Updated: 10
   documents. *Deleted 0 documents.100:0:0.331This
  response format is
   experimental. It is likely to change in the future.
  
  
   But when I go to the Admin screen, it tells
  me   Documents Processed: 10
   *Total Documents Processed 0*
   *
   *
   So what is difference between Documents and Total
  Documents ??  Note that
   there is presently *no *data in the indexes.
  
  
  mySolrHost:8983/solr/core0/admin/stats.jsp
 http://knowtate.servehttp.com:8983/solr/core0/admin/
  
   *name: * /dataimport   *class: *
  org.apache.solr.handler.dataimport.DataImportHandler
 *version: * 1.0  *description: *
  Manage data import from databases to
   Solr   *stats: * Status : IDLE
   Documents Processed : 10
   Requests made to DataSource : 1
   Rows Fetched : 10
   Documents Deleted : 0
   Documents Skipped : 0
   Total Documents Processed : 0
   Total Requests made to DataSource : 0
   Total Rows Fetched : 0
   Total Documents Deleted : 0
   Total Documents Skipped : 0
   handlerStart : 1293831460260
   requests : 2