Re: How to make UnInvertedField faster?

2011-10-22 Thread Simon Willnauer
On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Well... the limitation of DocValues is that it cannot handle more than
 one value per document (which UnInvertedField can).

you can pack this into one byte[] or use more than one field? I don't
see a real limitation here.

simon

 Hopefully we can fix that at some point :)

 Mike McCandless

 http://blog.mikemccandless.com

 On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 In trunk we have a feature called IndexDocValues which basically
 creates the uninverted structure at index time. You can then simply
 suck that into memory or even access it on disk directly
 (RandomAccess). Even if I can't help you right now this is certainly
 going to help you here. There is no need to uninvert at all anymore in
 lucene 4.0

 simon

 On Wed, Oct 19, 2011 at 8:05 PM, Michael Ryan mr...@moreover.com wrote:
 I was wondering if anyone has any ideas for making 
 UnInvertedField.uninvert()
 faster, or other alternatives for generating facets quickly.

 The vast majority of the CPU time for our Solr instances is spent generating
 UnInvertedFields after each commit. Here's an example of one of our slower 
 fields:

 [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) -
 UnInverted multi-valued field 
 {field=authorCS,memSize=38063628,tindexSize=422652,
 time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0}

 That is from an index with approximately 8 million documents. After each 
 commit,
 it takes on average about 90 seconds to uninvert all the fields that we 
 facet on.

 Any ideas at all would be greatly appreciated.

 -Michael





Re: Solr Open File Descriptors

2011-10-22 Thread samarth s
Thanks for sharing your insights shawn

On Mon, Oct 17, 2011 at 1:27 AM, Shawn Heisey s...@elyograg.org wrote:

 On 10/16/2011 12:01 PM, samarth s wrote:

 Hi,

 Is it safe to assume that with a megeFactor of 10 the open file
 descriptors
 required by solr would be around (1+ 10) * 10 = 110
 ref: *http://onjava.com/pub/a/**onjava/2003/03/05/lucene.html#**
 indexing_speed*http://onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed*
 Solr wiki:
 http://wiki.apache.org/solr/**SolrPerformanceFactors#**Optimization_**
 Considerationsstateshttp://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerationsstates

 that FD's required per segment is around 7.

 Are these estimates appropriate. Does it in anyway depend on the size of
 the
 index  number of docs (assuming same number of segments in any case) as
 well?


 My index has 10 files per normal  segment (the usual 7 plus three more for
 termvectors).  Some of the segments also have a .del file, and there is a
 segments_* file and a segments.gen file.  Your servlet container and other
 parts of the OS will also have to open files.

 I have personally seen three levels of segment merging taking place at the
 same time on a slow filesystem during a full-import, along with new content
 coming in at the same time.  With a mergefactor of 10, each merge is 11
 segments - the ten that are being merged and the merged segment.  If you
 have three going on at the same time, that's 33 segments, and you can have
 up to 10 more that are actively being built by ongoing index activity, so
 that's 43 potential segments.  If your filesystem is REALLY slow, you might
 end up with even more segments as existing merges are paused for new ones to
 start, but if you run into that, you'll want to udpate your hardware, so I
 won't consider it.

 Multiplying 43 segments by 11 files per segment yields a working
 theoretical maximum of 473 files.  Add in the segments files, you're up to
 475.

 Most operating systems have a default FD limit that's at least 1024.  If
 you only have one index (core) on your Solr server, Solr is the only thing
 running on that server, and it's using the default mergeFactor of 10, you
 should be fine with the default.  If you are going to have more than one
 index on your Solr server (such as a build core and a live core), you plan
 to run other things on the server, or you want to increase your mergeFactor
 significantly, you might need to adjust the OS configuration to allow more
 file descriptors.

 Thanks,
 Shawn




-- 
Regards,
Samarth


Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
Hi All,

I'm giving a talk at ApacheCon titled Bet you didn't know Lucene can... 
(http://na11.apachecon.com/talks/18396).  It's based on my observation, that 
over the years, a number of us in the community have done some pretty cool 
things using Lucene/Solr that don't fit under the core premise of full text 
search.  I've got a fair number of ideas for the talk (easily enough for 1 
hour), but I wanted to reach out to hear your stories of ways you've (ab)used 
Lucene and Solr to see if we couldn't extend the conversation to a bit more 
than the conference and also see if I can't inject more ideas beyond the ones I 
have.  I don't need deep technical details, but just high level use case and 
the basic insight that led you to believe Lucene/Solr could solve the problem.

Thanks in advance,
Grant


Grant Ingersoll
http://www.lucidimagination.com



Re: How to make UnInvertedField faster?

2011-10-22 Thread Michael McCandless
On Sat, Oct 22, 2011 at 4:10 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Fri, Oct 21, 2011 at 4:37 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Well... the limitation of DocValues is that it cannot handle more than
 one value per document (which UnInvertedField can).

 you can pack this into one byte[] or use more than one field? I don't
 see a real limitation here.

Well... not very easily?

UnInvertedField (DocTermOrds in Lucene) is the same as DocValues'
BYTES_VAR_SORTED.

So for an app to do this on top it'd have to handle the term - ord
resolving itself, save that somewhere, then encode the multiple ords
into a byte[].

I agree for other simple types (no deref/sorting involved) an app
could pack them into its own byte[] that's otherwise opaque to Lucene.

Mike McCandless

http://blog.mikemccandless.com


Re: data import in 4.0

2011-10-22 Thread Adeel Qureshi
yup that was it .. my data import files version was not the same as solr war
.. now I am having another problem though

I tried doing a simple data import

document
entity name=p query=SELECT ID, Status, Title FROM project
  field column=ID name=id /
  field column=Status name=status_s /
  field column=Title name=title_t /
   /entity
  /document

simple in terms of just pulling up three fields from a table and adding to
index and this worked fine but when I add a nested or joined table ..

document
entity name=project query=SELECT ID, Status, Title FROM project
  field column=ID name=id /
  field column=Status name=status_s /
  field column=Title name=title_t /
  entity name=related query=select last_name FROM person per inner
join project proj on proj.pi_pid = per.pid where proj.ID = ${project.ID}
  field column=last_name name=pi_s /
  /entity
   /entity
  /document

this data import doesnt seems to end .. it just keeps going .. i only have
about 15000 records in the main table and about 22000 in the joined table ..
but the Fetch count in dataimport handler status indicator thing shows that
it has fetched close to half a million records or something .. i m not sure
what those records are .. is there a way to see exactly what queries are
being run by dataimport handler .. is there something wrong with my nested
query ..

THanks
Adeel

On Fri, Oct 21, 2011 at 3:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote:

 So to me it heightens the probability of classloader conflicts,
 I haven't worked with Solr 4.0, so I don't know if set of JAR files
 are the same with Solr 3.4. Anyway, make sure that there is only
 ONE instance of apache-solr-dataimporthandler-***.jar in your
 whole tomcat+webapp.

 Maybe you have this jar file in CATALINA_HOME\lib folder.

 On Fri, Oct 21, 2011 at 3:06 PM, Adeel Qureshi adeelmahm...@gmail.com
 wrote:

  its deployed on a tomcat server ..
 
  On Fri, Oct 21, 2011 at 12:49 PM, Alireza Salimi
  alireza.sal...@gmail.comwrote:
 
   Hi,
  
   How do you start Solr, through start.jar or you deploy it to a web
   container?
   Sometimes problems like this are because of different class loaders.
   I hope my answer would help you.
  
   Regards
  
  
   On Fri, Oct 21, 2011 at 12:47 PM, Adeel Qureshi 
 adeelmahm...@gmail.com
   wrote:
  
Hi I am trying to setup the data import handler with solr 4.0 and
  having
some unexpected problems. I have a multi-core setup and only one core
needed
the dataimport handler so I have added the request handler to it and
   added
the lib imports in config file
   
lib dir=../../dist/
 regex=apache-solr-dataimporthandler-\d.*\.jar
  /
lib dir=../../dist/
regex=apache-solr-dataimporthandler-extras-\d.*\.jar /
   
for some reason this doesnt works .. it still keeps giving me
   ClassNoFound
error message so I moved the jars files to the shared lib folder and
  then
atleast I was able to see the admin screen with the dataimport plugin
loaded. But when I try to do the import its throwing this error
 message
   
INFO: Starting Full Import
Oct 21, 2011 11:35:41 AM org.apache.solr.core.SolrCore execute
INFO: [DW] webapp=/solr path=/select
   params={command=statusqt=/dataimport}
status=0 QTime=0
Oct 21, 2011 11:35:41 AM
 org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
WARNING: Unable to read: dataimport.properties
Oct 21, 2011 11:35:41 AM
  org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.lang.NoSuchMethodError:
  org.apache.solr.update.DeleteUpdateCommand:
method init()V not found
   at
   
   
  
 
 org.apache.solr.handler.dataimport.SolrWriter.doDeleteAll(SolrWriter.java:193)
   at
   
   
  
 
 org.apache.solr.handler.dataimport.DocBuilder.cleanByQuery(DocBuilder.java:1012)
   at
   
  
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:183)
   at
   
   
  
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335)
   at
   
   
  
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
   at
   
   
  
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374)
Oct 21, 2011 11:35:41 AM
 org.apache.solr.handler.dataimport.SolrWriter
rollback
SEVERE: Exception while solr rollback.
java.lang.NoSuchMethodError:
   org.apache.solr.update.RollbackUpdateCommand:
method init()V not found
   at
   
  
 
 org.apache.solr.handler.dataimport.SolrWriter.rollback(SolrWriter.java:184)
   at
   
  
 
 org.apache.solr.handler.dataimport.DocBuilder.rollback(DocBuilder.java:249)
   at
   
   
  
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:340)
   at
   
   
  
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
   at
   
   
  
 
 

Re: questions about autocommit committing documents

2011-10-22 Thread darul
Old entry but I try to configure auto commit.

I am still not sure to understand how Solr handles the commit process.

Does Solr really wait for 1 documents before send a commit ?

I was thinking, it will use maxTime and then commit a number of documents
less than 1.

Could you please correct this following scenario:
- 20 documents are added.
- After value of maxTime is reached, the 20 documents are committed because
less than 1 ?
- 2 documents are added.
- After value of maxTime is reached, only the first 1 documents are
committed. The next 1 will on next iteration of commit phase.

Is it the right way to understand both maxTime and maxDocs parameters ?

Thanks, 



 - If I enable autoCommit and set maxDocs at 1, does it mean that
 my new documents won't be avalable for searching until 10,000 new
 documents have been added?
 
Yes, that's correct. However, you can do a commit explicitly, if you want to
do so. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3443838.html
Sent from the Solr - User mailing list archive at Nabble.com.