Re: Can Solr handle large text files?

Peter Spam Fri, 21 Oct 2011 10:35:21 -0700

Thanks for your note, Anand.  What was the maximum chunk size for you?  Could 
you post the relevant portions of your configuration file?



Thanks!
Pete

On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote:

> Hi,
> 
> I was also facing the issue of highlighting the large text files. I applied 
> the solution proposed here and it worked. But I am getting following error :
> 
> 
> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I 
> get this file from. Its reference is present in browse.vm
> 
> <div class="results">
>  #if($response.response.get('grouped'))
>    #foreach($grouping in $response.response.get('grouped'))
>      #parse("hitGrouped.vm")
>    #end
>  #else
>    #foreach($doc in $response.results)
>      #parse("hit.vm")
>    #end
>  #end
> </div>
> 
> 
> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or 
> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
> cwd=C:\glassfish3\glassfish\domains\domain1\config 
> java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in classpath 
> or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
> cwd=C:\glassfish3\glassfish\domains\domain1\config at 
> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268)
>  at 
> org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42)
>  at org.apache.velocity.Template.process(Template.java:98) at 
> org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446)
>  at 
> 
> Thanks & Regards,
> Anand
> Anand Nigam
> RBS Global Banking & Markets
> Office: +91 124 492 5506   
> 
> 
> -----Original Message-----
> From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de] 
> Sent: 21 October 2011 14:58
> To: solr-user@lucene.apache.org
> Subject: Re: Can Solr handle large text files?
> 
> Hi Peter,
> 
> highlighting in large text files can not be fast without dividing the 
> original text in small piece.
> So take a look in
> http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
> and in
> http://www.lucidimagination.com/blog/2010/09/16/2446/
> 
> Which means that you should divide your files and use Result Grouping / Field 
> Collapsing to list only one hit per original document.
> 
> (xtf also would solve your problem "out of the box" but xtf does not use 
> solr).
> 
> Best regards
>  Karsten
> 
> -------- Original-Nachricht --------
>> Datum: Thu, 20 Oct 2011 17:59:04 -0700
>> Von: Peter Spam <ps...@mac.com>
>> An: solr-user@lucene.apache.org
>> Betreff: Can Solr handle large text files?
> 
>> I have about 20k text files, some very small, but some up to 300MB, 
>> and would like to do text searching with highlighting.
>> 
>> Imagine the text is the contents of your syslog.
>> 
>> I would like to type in some terms, such as "error" and "mail", and 
>> have Solr return the syslog lines with those terms PLUS two lines of context.
>> Pretty much just like Google's highlighting.
>> 
>> 1) Can Solr handle this?  I had extremely long query times when I 
>> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I 
>> tried breaking the files into 1MB pieces, but searching would be wonky 
>> => return the wrong number of documents (ie. if one file had a term 5 
>> times, and that was the only file that had the term, I want 1 result, not 5 
>> results).
>> 
>> 2) What sort of tokenizer would be best?  Here's what I'm using:
>> 
>>   <field name="body" type="text_pl" indexed="true" stored="true"
>> multiValued="false" termVectors="true" termPositions="true" 
>> termOffsets="true" />
>> 
>>    <fieldType name="text_pl" class="solr.TextField">
>>      <analyzer>
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="0" generateNumberParts="0" catenateWords="0" 
>> catenateNumbers="0"
>> catenateAll="0" splitOnCaseChange="0"/>
>>      </analyzer>
>>    </fieldType>
>> 
>> 
>> Thanks!
>> Pete
> 
> ***********************************************************************************
>  
> The Royal Bank of Scotland plc. Registered in Scotland No 90312. 
> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. 
> Authorised and regulated by the Financial Services Authority. The 
> Royal Bank of Scotland N.V. is authorised and regulated by the 
> De Nederlandsche Bank and has its seat at Amsterdam, the 
> Netherlands, and is registered in the Commercial Register under 
> number 33002587. Registered Office: Gustav Mahlerlaan 350, 
> Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and 
> The Royal Bank of Scotland plc are authorised to act as agent for each 
> other in certain jurisdictions. 
> 
> This e-mail message is confidential and for use by the addressee only. 
> If the message is received by anyone other than the addressee, please 
> return the message to the sender by replying to it and then delete the 
> message from your computer. Internet e-mails are not necessarily 
> secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland 
> N.V. including its affiliates ("RBS group") does not accept responsibility 
> for changes made to this message after it was sent. For the protection
> of RBS group and its clients and customers, and in compliance with
> regulatory requirements, the contents of both incoming and outgoing
> e-mail communications, which could include proprietary information and
> Non-Public Personal Information, may be read by authorised persons
> within RBS group other than the intended recipient(s). 
> 
> Whilst all reasonable care has been taken to avoid the transmission of 
> viruses, it is the responsibility of the recipient to ensure that the onward 
> transmission, opening or use of this message and any attachments will 
> not adversely affect its systems or data. No responsibility is accepted 
> by the RBS group in this regard and the recipient should carry out such 
> virus and other checks as it considers appropriate. 
> 
> Visit our website at www.rbs.com 
> 
> ***********************************************************************************
>   
>

Re: Can Solr handle large text files?

Reply via email to