Thanks for your note, Anand. What was the maximum chunk size for you? Could you post the relevant portions of your configuration file?
Thanks! Pete On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote: > Hi, > > I was also facing the issue of highlighting the large text files. I applied > the solution proposed here and it worked. But I am getting following error : > > > Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I > get this file from. Its reference is present in browse.vm > > <div class="results"> > #if($response.response.get('grouped')) > #foreach($grouping in $response.response.get('grouped')) > #parse("hitGrouped.vm") > #end > #else > #foreach($doc in $response.results) > #parse("hit.vm") > #end > #end > </div> > > > HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or > 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', > cwd=C:\glassfish3\glassfish\domains\domain1\config > java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in classpath > or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', > cwd=C:\glassfish3\glassfish\domains\domain1\config at > org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268) > at > org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42) > at org.apache.velocity.Template.process(Template.java:98) at > org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446) > at > > Thanks & Regards, > Anand > Anand Nigam > RBS Global Banking & Markets > Office: +91 124 492 5506 > > > -----Original Message----- > From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de] > Sent: 21 October 2011 14:58 > To: solr-user@lucene.apache.org > Subject: Re: Can Solr handle large text files? > > Hi Peter, > > highlighting in large text files can not be fast without dividing the > original text in small piece. > So take a look in > http://xtf.cdlib.org/documentation/under-the-hood/#Chunking > and in > http://www.lucidimagination.com/blog/2010/09/16/2446/ > > Which means that you should divide your files and use Result Grouping / Field > Collapsing to list only one hit per original document. > > (xtf also would solve your problem "out of the box" but xtf does not use > solr). > > Best regards > Karsten > > -------- Original-Nachricht -------- >> Datum: Thu, 20 Oct 2011 17:59:04 -0700 >> Von: Peter Spam <ps...@mac.com> >> An: solr-user@lucene.apache.org >> Betreff: Can Solr handle large text files? > >> I have about 20k text files, some very small, but some up to 300MB, >> and would like to do text searching with highlighting. >> >> Imagine the text is the contents of your syslog. >> >> I would like to type in some terms, such as "error" and "mail", and >> have Solr return the syslog lines with those terms PLUS two lines of context. >> Pretty much just like Google's highlighting. >> >> 1) Can Solr handle this? I had extremely long query times when I >> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.). I >> tried breaking the files into 1MB pieces, but searching would be wonky >> => return the wrong number of documents (ie. if one file had a term 5 >> times, and that was the only file that had the term, I want 1 result, not 5 >> results). >> >> 2) What sort of tokenizer would be best? Here's what I'm using: >> >> <field name="body" type="text_pl" indexed="true" stored="true" >> multiValued="false" termVectors="true" termPositions="true" >> termOffsets="true" /> >> >> <fieldType name="text_pl" class="solr.TextField"> >> <analyzer> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="0" generateNumberParts="0" catenateWords="0" >> catenateNumbers="0" >> catenateAll="0" splitOnCaseChange="0"/> >> </analyzer> >> </fieldType> >> >> >> Thanks! >> Pete > > *********************************************************************************** > > The Royal Bank of Scotland plc. Registered in Scotland No 90312. > Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. > Authorised and regulated by the Financial Services Authority. The > Royal Bank of Scotland N.V. is authorised and regulated by the > De Nederlandsche Bank and has its seat at Amsterdam, the > Netherlands, and is registered in the Commercial Register under > number 33002587. Registered Office: Gustav Mahlerlaan 350, > Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and > The Royal Bank of Scotland plc are authorised to act as agent for each > other in certain jurisdictions. > > This e-mail message is confidential and for use by the addressee only. > If the message is received by anyone other than the addressee, please > return the message to the sender by replying to it and then delete the > message from your computer. Internet e-mails are not necessarily > secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland > N.V. including its affiliates ("RBS group") does not accept responsibility > for changes made to this message after it was sent. For the protection > of RBS group and its clients and customers, and in compliance with > regulatory requirements, the contents of both incoming and outgoing > e-mail communications, which could include proprietary information and > Non-Public Personal Information, may be read by authorised persons > within RBS group other than the intended recipient(s). > > Whilst all reasonable care has been taken to avoid the transmission of > viruses, it is the responsibility of the recipient to ensure that the onward > transmission, opening or use of this message and any attachments will > not adversely affect its systems or data. No responsibility is accepted > by the RBS group in this regard and the recipient should carry out such > virus and other checks as it considers appropriate. > > Visit our website at www.rbs.com > > *********************************************************************************** > >