Thanks for the reminder - I had that set to 214xxx... (the max), but perf was 
terrible when I injected large files.

So what's the max recommended field size in kb?  I can try chopping up the 
syslogs into arbitrarily small pieces, but would love to know where to start.

Thanks!

Sent from my iPhone

On Oct 23, 2011, at 2:01 PM, Erick Erickson <erickerick...@gmail.com> wrote:

> Also be aware that by default Solr is configured to only index the
> first 10,000 lines
> of text. See maxFieldLength in solrconfig.xml
> 
> Best
> Erick
> 
> On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam <ps...@mac.com> wrote:
>> Thanks for your note, Anand.  What was the maximum chunk size for you?  
>> Could you post the relevant portions of your configuration file?
>> 
>> 
>> Thanks!
>> Pete
>> 
>> On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote:
>> 
>>> Hi,
>>> 
>>> I was also facing the issue of highlighting the large text files. I applied 
>>> the solution proposed here and it worked. But I am getting following error :
>>> 
>>> 
>>> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I 
>>> get this file from. Its reference is present in browse.vm
>>> 
>>> <div class="results">
>>>  #if($response.response.get('grouped'))
>>>    #foreach($grouping in $response.response.get('grouped'))
>>>      #parse("hitGrouped.vm")
>>>    #end
>>>  #else
>>>    #foreach($doc in $response.results)
>>>      #parse("hit.vm")
>>>    #end
>>>  #end
>>> </div>
>>> 
>>> 
>>> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or 
>>> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
>>> cwd=C:\glassfish3\glassfish\domains\domain1\config 
>>> java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in 
>>> classpath or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', 
>>> cwd=C:\glassfish3\glassfish\domains\domain1\config at 
>>> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268)
>>>  at 
>>> org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42)
>>>  at org.apache.velocity.Template.process(Template.java:98) at 
>>> org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446)
>>>  at
>>> 
>>> Thanks & Regards,
>>> Anand
>>> Anand Nigam
>>> RBS Global Banking & Markets
>>> Office: +91 124 492 5506
>>> 
>>> 
>>> -----Original Message-----
>>> From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de]
>>> Sent: 21 October 2011 14:58
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Can Solr handle large text files?
>>> 
>>> Hi Peter,
>>> 
>>> highlighting in large text files can not be fast without dividing the 
>>> original text in small piece.
>>> So take a look in
>>> http://xtf.cdlib.org/documentation/under-the-hood/#Chunking
>>> and in
>>> http://www.lucidimagination.com/blog/2010/09/16/2446/
>>> 
>>> Which means that you should divide your files and use Result Grouping / 
>>> Field Collapsing to list only one hit per original document.
>>> 
>>> (xtf also would solve your problem "out of the box" but xtf does not use 
>>> solr).
>>> 
>>> Best regards
>>>  Karsten
>>> 
>>> -------- Original-Nachricht --------
>>>> Datum: Thu, 20 Oct 2011 17:59:04 -0700
>>>> Von: Peter Spam <ps...@mac.com>
>>>> An: solr-user@lucene.apache.org
>>>> Betreff: Can Solr handle large text files?
>>> 
>>>> I have about 20k text files, some very small, but some up to 300MB,
>>>> and would like to do text searching with highlighting.
>>>> 
>>>> Imagine the text is the contents of your syslog.
>>>> 
>>>> I would like to type in some terms, such as "error" and "mail", and
>>>> have Solr return the syslog lines with those terms PLUS two lines of 
>>>> context.
>>>> Pretty much just like Google's highlighting.
>>>> 
>>>> 1) Can Solr handle this?  I had extremely long query times when I
>>>> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.).  I
>>>> tried breaking the files into 1MB pieces, but searching would be wonky
>>>> => return the wrong number of documents (ie. if one file had a term 5
>>>> times, and that was the only file that had the term, I want 1 result, not 
>>>> 5 results).
>>>> 
>>>> 2) What sort of tokenizer would be best?  Here's what I'm using:
>>>> 
>>>>   <field name="body" type="text_pl" indexed="true" stored="true"
>>>> multiValued="false" termVectors="true" termPositions="true"
>>>> termOffsets="true" />
>>>> 
>>>>    <fieldType name="text_pl" class="solr.TextField">
>>>>      <analyzer>
>>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>> generateWordParts="0" generateNumberParts="0" catenateWords="0" 
>>>> catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0"/>
>>>>      </analyzer>
>>>>    </fieldType>
>>>> 
>>>> 
>>>> Thanks!
>>>> Pete
>>> 
>>> ***********************************************************************************
>>> The Royal Bank of Scotland plc. Registered in Scotland No 90312.
>>> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
>>> Authorised and regulated by the Financial Services Authority. The
>>> Royal Bank of Scotland N.V. is authorised and regulated by the
>>> De Nederlandsche Bank and has its seat at Amsterdam, the
>>> Netherlands, and is registered in the Commercial Register under
>>> number 33002587. Registered Office: Gustav Mahlerlaan 350,
>>> Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and
>>> The Royal Bank of Scotland plc are authorised to act as agent for each
>>> other in certain jurisdictions.
>>> 
>>> This e-mail message is confidential and for use by the addressee only.
>>> If the message is received by anyone other than the addressee, please
>>> return the message to the sender by replying to it and then delete the
>>> message from your computer. Internet e-mails are not necessarily
>>> secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland
>>> N.V. including its affiliates ("RBS group") does not accept responsibility
>>> for changes made to this message after it was sent. For the protection
>>> of RBS group and its clients and customers, and in compliance with
>>> regulatory requirements, the contents of both incoming and outgoing
>>> e-mail communications, which could include proprietary information and
>>> Non-Public Personal Information, may be read by authorised persons
>>> within RBS group other than the intended recipient(s).
>>> 
>>> Whilst all reasonable care has been taken to avoid the transmission of
>>> viruses, it is the responsibility of the recipient to ensure that the onward
>>> transmission, opening or use of this message and any attachments will
>>> not adversely affect its systems or data. No responsibility is accepted
>>> by the RBS group in this regard and the recipient should carry out such
>>> virus and other checks as it considers appropriate.
>>> 
>>> Visit our website at www.rbs.com
>>> 
>>> ***********************************************************************************
>>> 
>> 
>> 

Reply via email to