Re: Documents in facet results

2009-05-17 Thread Fergus McMenemie
Dear community,

I'm wondering if there is a clean solution to my rather interesting problem. 
The following facet query results in a list of all facets and the number of 
all documents matching the corresponding facet as seen below:


Probably the quickest way would be to write another XSLT transform to reformat
the results to match your requirements. Then add extra query params to call
the transform.

   str name=wtxslt/str
   str name=trreformat.xsl/str

Fergus.

Query:
  str name=q*:*/str
  str name=facet.limit5/str
  str name=facet.fielden_atmosphere/str
  str name=rows0/str

Results:
lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
  lst name=en_atmosphere
   int name=Snug and pleasant675/int
   int name=Authentic385/int
   int name=Modern and functional378/int
   int name=Romantic374/int
   int name=Modest339/int
  /lst
 /lst

Now I would like to have the documents as child node of the various facet 
fields, so that the result will be something similar as:

lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
  lst name=en_atmosphere
   docs facet=Snug and pleasant
   doc...
   doc...
   /docs
   docs facet=Authentic
   doc...
   doc...
   /docs
   ...
  /lst
 /lst

Of course it would be possible to send a couple of queries for each facet to 
get the corresponding docs or I can parse the response xml, but it would be 
more efficient if SOLR can return the result as above.

Thanks!

-- 
Jeffrey Gelens  Buyways B.V.  Tel. 050 853 6600
Webengineer Friesestraatweg 215c  Fax. 050 853 6601
http://www.buyways.nl   9743 AD Groningen KvK  01074105

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Solr vs Sphinx

2009-05-17 Thread Fergus McMenemie
Something that would be interesting is to share solr configs for  
various types of indexing tasks.  From a solr configuration aimed at  
indexing web pages to one doing large amounts of text to one that  
indexes specific structured data.  I could see those being posted on  
the wiki and helping folks who say I want to do X, is there an  
example?.

I think most folks start with the example Solr install and tweak from  
there, which probably isn't the best path...

Eric

Yep a solr cookbook with lots of different example recipes. However
these would need to be very actively maintained to ensure they always
represented best practice. While using cocoon I made extensive use
of the examples section of the cocoon website. However most of the,
massive number of, examples represent obsolete cocoon practise. Or 
there were four or five examples doing the same thing in different 
ways with no text explaining the pros/cons of the different approaches.
This held me, as a newcomer, back and gave a bad impression of cocoon.

I was wondering about a performance hints page. I was caught by an
issue indexing CSV content where the use of overwrite=false made
an almost 3x difference to my indexing speed. Still do not really
know why!


On May 15, 2009, at 8:09 AM, Mark Miller wrote:

 In the spirit of good defaults:

 I think we should change the Solr highlighter to highlight phrase  
 queries by default, as well as prefix,range,wildcard constantscore  
 queries. Its awkward to have to tell people you have to turn those  
 on. I'd certainly prefer to have to turn them off if I have some  
 limitation rather than on.

Yep I agree, all whizzy new features should ideally be on by default
unless there is a significant performance penalty. It is not enough
that to issue a default solrconfig.xml with the feature on, it has to
be on by default inside the code.
 

 - Mark

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Fergus


Re: query regarding Indexing xml files -db-data-config.xml

2009-05-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi ,
u may not need that enclosing entity , if you only wish to index one file.

baseDir is not required if you give absolute path in the fileName.

no need to mention forEach or fields if you set useSolrAddSchema=true

On Sat, May 16, 2009 at 1:23 AM, jayakeerthi s mail2keer...@gmail.com wrote:
 Hi All,

 I am trying to index the fileds from the xml files, here is the
 configuration that I am using.


 db-data-config.xml

 dataConfig
    dataSource type=FileDataSource name =xmlindex/
    document name=products
     entity name=xmlfile processor=FileListEntityProcessor
 fileName=c:\test\ipod_other.xml  recursive=true rootEntity=false
 dataSource=null baseDir=${dataimporter.request.xmlDataDir}
     entity name=data processor=XPathEntityProcessor forEach=/record |
 /the/record/xpath  url=${xmlfile.fileAbsolutePath}
            field column=manu
 name=manu/

     /entity
        /entity
       /document
 /dataConfig

 Schema.xml has the field manu

 The input xml file used to import the field is

 doc
  field name=idF8V7067-APL-KIT/field
  field name=nameBelkin Mobile Power Cord for iPod w/ Dock/field
  field name=manuBelkin/field
  field name=catelectronics/field
  field name=catconnector/field
  field name=featurescar power adapter, white/field
  field name=weight4/field
  field name=price19.95/field
  field name=popularity1/field
  field name=inStockfalse/field
 /doc


 doing the full-import this is the response I am getting

 - lst name=statusMessages
  str name=Total Requests made to DataSource0/str
  str name=Total Rows Fetched0/str
  str name=Total Documents Skipped0/str
  str name=Full Dump Started2009-05-15 11:58:00/str
  str name=Indexing completed. Added/Updated: 0 documents. Deleted 0
 documents./str
  str name=Committed2009-05-15 11:58:00/str
  str name=Optimized2009-05-15 11:58:00/str
  str name=Time taken0:0:0.172/str
  /lst
  str name=WARNINGThis response format is experimental. It is likely to
 change in the future./str
  /response


 Do I missing anything here or is there any format on the input xml,?? please
 help resolving this.

 Thanks and regards,
 Jay




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr memory requirements?

2009-05-17 Thread Peter Wolanin
I think that if you have in your index any documents with norms, you
will still use norms for those fields even if the schema is changed
later.  Did you wipe and re-index after all your schema changes?

-Peter

On Fri, May 15, 2009 at 9:14 PM, vivek sar vivex...@gmail.com wrote:
 Some more info,

  Profiling the heap dump shows
 org.apache.lucene.index.ReadOnlySegmentReader as the biggest object
 - taking up almost 80% of total memory (6G) - see the attached screen
 shot for a smaller dump. There is some norms object - not sure where
 are they coming from as I've omitnorms=true for all indexed records.

 I also noticed that if I run a query - let's say generic query that
 hits 100million records and then follow up with a specific query -
 which hits only 1 record, the second query causes the increase in
 heap.

 Looks like there are few bytes being loaded into memory for each
 document - I've checked the schema all indexes have omitNorms=true,
 all caches are commented out - still looking to see what else might
 put things in memory which don't get collected by GC.

 I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr
 1.4 (which I'm using). Not sure if that can cause any problem. I do
 use range queries for dates - would that have any effect?

 Any other ideas?

 Thanks,
 -vivek

 On Thu, May 14, 2009 at 8:38 PM, vivek sar vivex...@gmail.com wrote:
 Thanks Mark.

 I checked all the items you mentioned,

 1) I've omitnorms=true for all my indexed fields (stored only fields I
 guess doesn't matter)
 2) I've tried commenting out all caches in the solrconfig.xml, but
 that doesn't help much
 3) I've tried commenting out the first and new searcher listeners
 settings in the solrconfig.xml - the only way that helps is that at
 startup time the memory usage doesn't spike up - that's only because
 there is no auto-warmer query to run. But, I noticed commenting out
 searchers slows down any other queries to Solr.
 4) I don't have any sort or facet in my queries
 5) I'm not sure how to change the Lucene term interval from Solr -
 is there a way to do that?

 I've been playing around with this memory thing the whole day and have
 found that it's the search that's hogging the memory. Any time there
 is a search on all the records (800 million) the heap consumption
 jumps by 5G. This makes me think there has to be some configuration in
 Solr that's causing some terms per document to be loaded in memory.

 I've posted my settings several times on this forum, but no one has
 been able to pin point what configuration might be causing this. If
 someone is interested I can attach the solrconfig and schema files as
 well. Here are the settings again under Query tag,

 query
  maxBooleanClauses1024/maxBooleanClauses
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  queryResultWindowSize50/queryResultWindowSize
  queryResultMaxDocsCached200/queryResultMaxDocsCached
   HashDocSet maxSize=3000 loadFactor=0.75/
  useColdSearcherfalse/useColdSearcher
  maxWarmingSearchers2/maxWarmingSearchers
  /query

 and schema,

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
 compressed=false/
  field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
 compressed=false/
  field name=erc type=string indexed=false stored=true
 compressed=false/
  field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=rc type=string indexed=false stored=true
 compressed=false/
  field name=rmcd type=string indexed=false stored=true
 compressed=false/
  field name=rmscd type=string indexed=false stored=true
 compressed=false/
  field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=sip type=string indexed=false stored=true
 compressed=false/
  field name=ts type=date indexed=true stored=false
 default=NOW/HOUR omitNorms=true/

  !-- catchall field, containing all other searchable text fields 
 (implemented
       via copyField further on in this schema  --
  field name=all 

Re: Solr memory requirements?

2009-05-17 Thread jlist9
I've never paid attention to post/commit ration. I usually do a commit
after maybe 100 posts. Is there a guideline about this? Thanks.

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
 during indexing.  There is no need to commit every 50K docs unless you want 
 to trigger snapshot creation.


Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-17 Thread Mike Klaas

Hi Jayson,

It is on my list of things to do.  I've been having a very busy week  
and and am also working all weekend.  I hope to get to it next week  
sometime, if no-one else has taken it.


cheers,
-mike

On 8-May-09, at 10:15 PM, jayson.minard wrote:



First cut of updated handler now in:
https://issues.apache.org/jira/browse/SOLR-1155

Needs review from those that know Lucene better, and double check  
for errors

in locking or other areas of the code.  Thanks.

--j


jayson.minard wrote:


Can we move this to patch files within the JIRA issue please.  Will  
make

it easier to review and help out a as a patch to current trunk.

--j


Jim Murphy wrote:




Yonik Seeley-2 wrote:


...your code snippit elided and edited below ...





Don't take this code as correct (or even compiling) but is this the
essence?  I moved shared access to the writer inside the read lock  
and
kept the other non-commit bits to the write lock.  I'd need to  
rethink

the locking in a more fundamental way but is this close to idea?



public void commit(CommitUpdateCommand cmd) throws IOException {

   if (cmd.optimize) {
 optimizeCommands.incrementAndGet();
   } else {
 commitCommands.incrementAndGet();
   }

   Future[] waitSearcher = null;
   if (cmd.waitSearcher) {
 waitSearcher = new Future[1];
   }

   boolean error=true;
   iwCommit.lock();
   try {
 log.info(start +cmd);

 if (cmd.optimize) {
   closeSearcher();
   openWriter();
   writer.optimize(cmd.maxOptimizeSegments);
 }
   finally {
 iwCommit.unlock();
}


 iwAccess.lock();
 try
{
 writer.commit();
}
finally
{
 iwAccess.unlock();
}

 iwCommit.lock();
 try
{
 callPostCommitCallbacks();
 if (cmd.optimize) {
   callPostOptimizeCallbacks();
 }
 // open a new searcher in the sync block to avoid opening it
 // after a deleteByQuery changed the index, or in between  
deletes

 // and adds of another commit being done.
 core.getSearcher(true,false,waitSearcher);

 // reset commit tracking
 tracker.didCommit();

 log.info(end_commit_flush);

 error=false;
   }
   finally {
 iwCommit.unlock();
 addCommands.set(0);
 deleteByIdCommands.set(0);
 deleteByQueryCommands.set(0);
 numErrors.set(error ? 1 : 0);
   }

   // if we are supposed to wait for the searcher to be  
registered, then

we should do it
   // outside of the synchronized block so that other update  
operations

can proceed.
   if (waitSearcher!=null  waitSearcher[0] != null) {
  try {
   waitSearcher[0].get();
 } catch (InterruptedException e) {
   SolrException.log(log,e);
 } catch (ExecutionException e) {
   SolrException.log(log,e);
 }
   }
 }









--
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23457422.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-17 Thread jayson.minard

Thanks Mike, I'm running it in a few environments that do not have
post-commit hooks and so far have not seen any issues.  A white-box review
will be helpful in seeing things that may rarely occur, or if I had any
misuse if internal data structures that I do not know well enough to
measure.

--j



Mike Klaas wrote:
 
 Hi Jayson,
 
 It is on my list of things to do.  I've been having a very busy week  
 and and am also working all weekend.  I hope to get to it next week  
 sometime, if no-one else has taken it.
 
 cheers,
 -mike
 
 On 8-May-09, at 10:15 PM, jayson.minard wrote:
 

 First cut of updated handler now in:
 https://issues.apache.org/jira/browse/SOLR-1155

 Needs review from those that know Lucene better, and double check  
 for errors
 in locking or other areas of the code.  Thanks.

 --j


 
 
 

-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23587440.html
Sent from the Solr - User mailing list archive at Nabble.com.



multicore for 20k users?

2009-05-17 Thread Chris Cornell
Trying to create a search solution for about 20k users at a company.
Each person's documents are private and different (some overlap... it
would be nice to not have to store/index copies).

Is multicore something that would work or should we auto-insert a
facet into each query generated by the person?

Thanks for any advice, I am very new to solr.  Any tiny push in the
right direction would be appreciated.

Thanks,
Chris


Re: multicore for 20k users?

2009-05-17 Thread Ryan McKinley

how much overlap is there with the 20k user documents?

if you create a separate index for each of them will you be indexing  
90% of the documents 20K times?  How many total documents could an  
individual user typically see?  How many total distinct documents are  
you talking about?  Is the indexing strategy the same for all users?   
(the same analysis etc)


Is it actually possible to limit visibility by role rather then user?

I would start with trying to put everything in one index -- if that is  
not possible, then look at a multi-core option.




On May 17, 2009, at 5:53 PM, Chris Cornell wrote:


Trying to create a search solution for about 20k users at a company.
Each person's documents are private and different (some overlap... it
would be nice to not have to store/index copies).

Is multicore something that would work or should we auto-insert a
facet into each query generated by the person?

Thanks for any advice, I am very new to solr.  Any tiny push in the
right direction would be appreciated.

Thanks,
Chris




Re: multicore for 20k users?

2009-05-17 Thread Chris Cornell
Thanks for helping Ryan,

On Sun, May 17, 2009 at 7:17 PM, Ryan McKinley ryan...@gmail.com wrote:
 how much overlap is there with the 20k user documents?

There are around 20k users but each one has anywhere from zero to
thousands of documents.  The final overlap is unknown because there is
a current set of documents but each user will add documents on the fly
(it's like their own personal search engine in a way).


 if you create a separate index for each of them will you be indexing 90% of
 the documents 20K times?

Probably more like 5-10%

 How many total documents could an individual user
 typically see?

Average is around 100 now but we want them to be able to add more.

 How many total distinct documents are you talking about?  Is
 the indexing strategy the same for all users?  (the same analysis etc)

The indexing strategy is the same for each user.


 Is it actually possible to limit visibility by role rather then user?

No, it has to be by user since it is a private document set.  We just
want to save on diskspace when there are big documents that are the
same across users (based on document checksum).


 I would start with trying to put everything in one index -- if that is not
 possible, then look at a multi-core option.

OK.  Another thing is that we want to allow the user to restrict
searches based on when the document was added... if we do share an
indexed item and insert some attribute into each query (like
user:ralph) then it couldn't have date-added based search.  Unless a
field was added like date-added-by-ralph, date-added-by-sally (ugh!).

Or maybe diskspace is cheap and we just should strive for simplicity?

Thanks,
Chris




 On May 17, 2009, at 5:53 PM, Chris Cornell wrote:

 Trying to create a search solution for about 20k users at a company.
 Each person's documents are private and different (some overlap... it
 would be nice to not have to store/index copies).

 Is multicore something that would work or should we auto-insert a
 facet into each query generated by the person?

 Thanks for any advice, I am very new to solr.  Any tiny push in the
 right direction would be appreciated.

 Thanks,
 Chris




Re: multicore for 20k users?

2009-05-17 Thread Otis Gospodnetic

Chris,

Yes, disk space is cheap, and with so little overlap you won't gain much by 
putting everything in a single index.  Plus, when each user has a separate 
index, it's easy to to split users and distribute over multiple machines if you 
ever need to do that, it's easy and fast to completely reindex one user's data 
without affecting other users, etc.

Several years ago I built Simpy at http://www.simpy.com/ that way (but 
pre-Solr, so it uses Lucene directly) and never regretted it.  There are way 
more than 20K users there with many searches per second and with constant 
indexing.  Each user has an index for bookmarks and an index for notes.  Each 
group has its own index, shared by all group members.  The main bookmark search 
is another index.  People search is yet another index.  And so on.  Single 
server.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Chris Cornell srchn...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sunday, May 17, 2009 8:37:44 PM
 Subject: Re: multicore for 20k users?
 
 Thanks for helping Ryan,
 
 On Sun, May 17, 2009 at 7:17 PM, Ryan McKinley wrote:
  how much overlap is there with the 20k user documents?
 
 There are around 20k users but each one has anywhere from zero to
 thousands of documents.  The final overlap is unknown because there is
 a current set of documents but each user will add documents on the fly
 (it's like their own personal search engine in a way).
 
 
  if you create a separate index for each of them will you be indexing 90% of
  the documents 20K times?
 
 Probably more like 5-10%
 
  How many total documents could an individual user
  typically see?
 
 Average is around 100 now but we want them to be able to add more.
 
  How many total distinct documents are you talking about?  Is
  the indexing strategy the same for all users?  (the same analysis etc)
 
 The indexing strategy is the same for each user.
 
 
  Is it actually possible to limit visibility by role rather then user?
 
 No, it has to be by user since it is a private document set.  We just
 want to save on diskspace when there are big documents that are the
 same across users (based on document checksum).
 
 
  I would start with trying to put everything in one index -- if that is not
  possible, then look at a multi-core option.
 
 OK.  Another thing is that we want to allow the user to restrict
 searches based on when the document was added... if we do share an
 indexed item and insert some attribute into each query (like
 user:ralph) then it couldn't have date-added based search.  Unless a
 field was added like date-added-by-ralph, date-added-by-sally (ugh!).
 
 Or maybe diskspace is cheap and we just should strive for simplicity?
 
 Thanks,
 Chris
 
 
 
 
  On May 17, 2009, at 5:53 PM, Chris Cornell wrote:
 
  Trying to create a search solution for about 20k users at a company.
  Each person's documents are private and different (some overlap... it
  would be nice to not have to store/index copies).
 
  Is multicore something that would work or should we auto-insert a
  facet into each query generated by the person?
 
  Thanks for any advice, I am very new to solr.  Any tiny push in the
  right direction would be appreciated.
 
  Thanks,
  Chris
 
 



Re: multicore for 20k users?

2009-05-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
A few questions,
1) what is the frequency of inserts?
2) how many cores need to be up and running at any given point



On Mon, May 18, 2009 at 3:23 AM, Chris Cornell srchn...@gmail.com wrote:
 Trying to create a search solution for about 20k users at a company.
 Each person's documents are private and different (some overlap... it
 would be nice to not have to store/index copies).

 Is multicore something that would work or should we auto-insert a
 facet into each query generated by the person?

 Thanks for any advice, I am very new to solr.  Any tiny push in the
 right direction would be appreciated.

 Thanks,
 Chris




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: multicore for 20k users?

2009-05-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, May 18, 2009 at 8:18 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Chris,

 As far as I know, AOL is using Solr with lots of cores.  What I don't know is 
 how they are handling shutting down of idle cores, which is something you'll 
 need to do if your machine can't handle all cores being open and their data 
 structures being populated at all times.  I know I had to do that same for 
 Simpy. :)

we have a custom build of Solr. we do just in time automatic loading
of cores and an LRU based unloading of cores when the upper water mark
is crossed

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Chris Cornell srchn...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sunday, May 17, 2009 10:11:10 PM
 Subject: Re: multicore for 20k users?

 On Sun, May 17, 2009 at 8:38 PM, Otis Gospodnetic
 wrote:
 
  Chris,
 
  Yes, disk space is cheap, and with so little overlap you won't gain much by
 putting everything in a single index.  Plus, when each user has a separate
 index, it's easy to to split users and distribute over multiple machines if 
 you
 ever need to do that, it's easy and fast to completely reindex one user's 
 data
 without affecting other users, etc.
 
  Several years ago I built Simpy at http://www.simpy.com/ that way (but
 pre-Solr, so it uses Lucene directly) and never regretted it.  There are way
 more than 20K users there with many searches per second and with constant
 indexing.  Each user has an index for bookmarks and an index for notes.  Each
 group has its own index, shared by all group members.  The main bookmark 
 search
 is another index.  People search is yet another index.  And so on.  Single
 server.
 

 Thankyou very much for your insight and experience, sounds like we
 shouldn't be thinking about prematurely optimizing this.

 Has someone actually used multicore this way, though?  With thousands of 
 them?

 Independently of advice in that regard, I guess our next step is to
 explore and create some dummy scenarios/tests to try and stress
 multicore (search latency is not as much of a factor as memory usage
 is).  I'll report back on any conclusion we come to.

 Thanks!
 Chris





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: multicore for 20k users?

2009-05-17 Thread Chris Cornell
On Sun, May 17, 2009 at 8:38 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Chris,

 Yes, disk space is cheap, and with so little overlap you won't gain much by 
 putting everything in a single index.  Plus, when each user has a separate 
 index, it's easy to to split users and distribute over multiple machines if 
 you ever need to do that, it's easy and fast to completely reindex one user's 
 data without affecting other users, etc.

 Several years ago I built Simpy at http://www.simpy.com/ that way (but 
 pre-Solr, so it uses Lucene directly) and never regretted it.  There are way 
 more than 20K users there with many searches per second and with constant 
 indexing.  Each user has an index for bookmarks and an index for notes.  Each 
 group has its own index, shared by all group members.  The main bookmark 
 search is another index.  People search is yet another index.  And so on.  
 Single server.


Thankyou very much for your insight and experience, sounds like we
shouldn't be thinking about prematurely optimizing this.

Has someone actually used multicore this way, though?  With thousands of them?

Independently of advice in that regard, I guess our next step is to
explore and create some dummy scenarios/tests to try and stress
multicore (search latency is not as much of a factor as memory usage
is).  I'll report back on any conclusion we come to.

Thanks!
Chris


Re: multicore for 20k users?

2009-05-17 Thread Otis Gospodnetic

Chris,

As far as I know, AOL is using Solr with lots of cores.  What I don't know is 
how they are handling shutting down of idle cores, which is something you'll 
need to do if your machine can't handle all cores being open and their data 
structures being populated at all times.  I know I had to do that same for 
Simpy. :)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Chris Cornell srchn...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sunday, May 17, 2009 10:11:10 PM
 Subject: Re: multicore for 20k users?
 
 On Sun, May 17, 2009 at 8:38 PM, Otis Gospodnetic
 wrote:
 
  Chris,
 
  Yes, disk space is cheap, and with so little overlap you won't gain much by 
 putting everything in a single index.  Plus, when each user has a separate 
 index, it's easy to to split users and distribute over multiple machines if 
 you 
 ever need to do that, it's easy and fast to completely reindex one user's 
 data 
 without affecting other users, etc.
 
  Several years ago I built Simpy at http://www.simpy.com/ that way (but 
 pre-Solr, so it uses Lucene directly) and never regretted it.  There are way 
 more than 20K users there with many searches per second and with constant 
 indexing.  Each user has an index for bookmarks and an index for notes.  Each 
 group has its own index, shared by all group members.  The main bookmark 
 search 
 is another index.  People search is yet another index.  And so on.  Single 
 server.
 
 
 Thankyou very much for your insight and experience, sounds like we
 shouldn't be thinking about prematurely optimizing this.
 
 Has someone actually used multicore this way, though?  With thousands of them?
 
 Independently of advice in that regard, I guess our next step is to
 explore and create some dummy scenarios/tests to try and stress
 multicore (search latency is not as much of a factor as memory usage
 is).  I'll report back on any conclusion we come to.
 
 Thanks!
 Chris



Re: Order document result by face count

2009-05-17 Thread Brian Mansell
Patric -

See the documents in facets results for a creative method for handling
this need with xslt transformations.

Cheers,

--bemansell

On May 16, 2009 2:11 AM, patric.wi...@rtl.de wrote:

Hello,

I've got a little problem. My index contains a formatid wich i counts in my
querys with the facet.field

select?q=text%3A(TEST)start=0rows=100facet=truefacet.field=formatidfacet.mincount=1facet.sort=true

The facet fields are sorted by count but my result is still sorted by the
score! Can I change that so all documents are grouped by the faced count?

lst name=formatid
  int name=126/int
  int name=2421/int
  int name=220/int
  int name=2012/int
  int name=274/int
  int name=122/int
  int name=262/int
  int name=32/int
  int name=382/int
  int name=412/int
  int name=351/int
/lst


Kind regards,
Patric


Die Information in dieser E-Mail ist vertraulich und exklusiv fuer den
Adressatenkreis bestimmt. Unbefugte Empfaenger haben kein Recht, vom Inhalt
Kenntnis zu nehmen, fehlgeleitete E-mails sind sofort zu loeschen.
Weiterleiten oder Kopieren darf, auch auszugsweise nur mit ausdruecklicher,
schriftlicher Einwilligung des Absenders erfolgen. In jedem Fall ist
sicherzustellen, dass keinerlei inhaltliche Veraenderungen erfolgen. Der
Absender ist von der Richtigkeit des Inhalts und der Uebertragung dieser
E-Mail ueberzeugt. Eine Haftung dafuer ist jedoch ausgeschlossen.

This is a confidential communication intended only for the named adresses.
If you received this communication in error, please notify us and return and
delete it without reading it. This e-mail may not be disclosed, copied or
distributed in any form without the obtained permission in writing of the
sender. In any case it may not be altered or otherwise changed. Whilst the
sender believes that the information is correct at the date of the e-mail,
no warranty and representation is given to this effect and no responsibility
can be accepted by the sender.


Re: Sole core naming convention for multicores

2009-05-17 Thread KK
Thank you Otis.
One silly question, how would I know that a particular character is
forbidden, I think Solr will give me exceptions saying that some characters
not allowed, right?

Thank,
KK.

On Sun, May 17, 2009 at 3:12 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 KK,

 That should work just fine.  Should any of the characters in email
 addresses turn out to be forbidden, just replace them consistently.  For
 example, if @ turns out to be the problem, you could simple replace it with
 _.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: KK dioxide.softw...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Saturday, May 16, 2009 3:45:01 AM
  Subject: Sole core naming convention for multicores
 
  Hi All,
  I'm trying to put multicores for Solr[lol, finding the multicore config a
  bit difficult, any good/simple steps to do the same?any pointers].
  Let me come to the point, essentially what I want is that whenever a
 person
  registersfor our service, I'll use his mail-id[this is unique] as the
  corename. I dont know if its viable or not. As per the wiki example the
  creation/registration of new core is done like this,
 
 
 http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=data
 
  this says the name as something like coreX where X replaces a num. Is it
  possible to have a name like say alex...@abc.com? If not may be I've
 map
  the mail-id to some unique number that I'll use as a core name. I don't
 want
  to do all this [don't know either], hence my question. Do let me know
 some
  smart ways of doing the same.
  Note: I've to use mail-id as the unique identifier. Thanks in
 appreciation.
 
  Thanks,
  KK




Re: start param for MoreLikeThis?

2009-05-17 Thread Brian Mansell
That's correct - You can paginate/offset mlt results only through the
MoreLikeThisHandler rather than the method you're using
(standardrequesthandler with mlt enabled).

Cheers,
--bemansell

On May 9, 2009 10:42 AM, jli...@gmail.com wrote:

Hi. I'm using the StandardRequestHandler for MoreLikeThis queries.
I find that although I can specify how many results I want returned
with mlt.count, it seesm like I can not specify a start location
so that I can paginate the results. Is this the case?

Thanks