Thanks for the reminder - I had that set to 214xxx... (the max), but perf was terrible when I injected large files.
So what's the max recommended field size in kb? I can try chopping up the syslogs into arbitrarily small pieces, but would love to know where to start. Thanks! Sent from my iPhone On Oct 23, 2011, at 2:01 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Also be aware that by default Solr is configured to only index the > first 10,000 lines > of text. See maxFieldLength in solrconfig.xml > > Best > Erick > > On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam <ps...@mac.com> wrote: >> Thanks for your note, Anand. What was the maximum chunk size for you? >> Could you post the relevant portions of your configuration file? >> >> >> Thanks! >> Pete >> >> On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote: >> >>> Hi, >>> >>> I was also facing the issue of highlighting the large text files. I applied >>> the solution proposed here and it worked. But I am getting following error : >>> >>> >>> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where can I >>> get this file from. Its reference is present in browse.vm >>> >>> <div class="results"> >>> #if($response.response.get('grouped')) >>> #foreach($grouping in $response.response.get('grouped')) >>> #parse("hitGrouped.vm") >>> #end >>> #else >>> #foreach($doc in $response.results) >>> #parse("hit.vm") >>> #end >>> #end >>> </div> >>> >>> >>> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or >>> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', >>> cwd=C:\glassfish3\glassfish\domains\domain1\config >>> java.lang.RuntimeException: Can't find resource 'hitGrouped.vm' in >>> classpath or 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/', >>> cwd=C:\glassfish3\glassfish\domains\domain1\config at >>> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:268) >>> at >>> org.apache.solr.response.SolrVelocityResourceLoader.getResourceStream(SolrVelocityResourceLoader.java:42) >>> at org.apache.velocity.Template.process(Template.java:98) at >>> org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:446) >>> at >>> >>> Thanks & Regards, >>> Anand >>> Anand Nigam >>> RBS Global Banking & Markets >>> Office: +91 124 492 5506 >>> >>> >>> -----Original Message----- >>> From: karsten-s...@gmx.de [mailto:karsten-s...@gmx.de] >>> Sent: 21 October 2011 14:58 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Can Solr handle large text files? >>> >>> Hi Peter, >>> >>> highlighting in large text files can not be fast without dividing the >>> original text in small piece. >>> So take a look in >>> http://xtf.cdlib.org/documentation/under-the-hood/#Chunking >>> and in >>> http://www.lucidimagination.com/blog/2010/09/16/2446/ >>> >>> Which means that you should divide your files and use Result Grouping / >>> Field Collapsing to list only one hit per original document. >>> >>> (xtf also would solve your problem "out of the box" but xtf does not use >>> solr). >>> >>> Best regards >>> Karsten >>> >>> -------- Original-Nachricht -------- >>>> Datum: Thu, 20 Oct 2011 17:59:04 -0700 >>>> Von: Peter Spam <ps...@mac.com> >>>> An: solr-user@lucene.apache.org >>>> Betreff: Can Solr handle large text files? >>> >>>> I have about 20k text files, some very small, but some up to 300MB, >>>> and would like to do text searching with highlighting. >>>> >>>> Imagine the text is the contents of your syslog. >>>> >>>> I would like to type in some terms, such as "error" and "mail", and >>>> have Solr return the syslog lines with those terms PLUS two lines of >>>> context. >>>> Pretty much just like Google's highlighting. >>>> >>>> 1) Can Solr handle this? I had extremely long query times when I >>>> tried this with Solr 1.4.1 (yes I was using TermVectors, etc.). I >>>> tried breaking the files into 1MB pieces, but searching would be wonky >>>> => return the wrong number of documents (ie. if one file had a term 5 >>>> times, and that was the only file that had the term, I want 1 result, not >>>> 5 results). >>>> >>>> 2) What sort of tokenizer would be best? Here's what I'm using: >>>> >>>> <field name="body" type="text_pl" indexed="true" stored="true" >>>> multiValued="false" termVectors="true" termPositions="true" >>>> termOffsets="true" /> >>>> >>>> <fieldType name="text_pl" class="solr.TextField"> >>>> <analyzer> >>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.WordDelimiterFilterFactory" >>>> generateWordParts="0" generateNumberParts="0" catenateWords="0" >>>> catenateNumbers="0" >>>> catenateAll="0" splitOnCaseChange="0"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> >>>> Thanks! >>>> Pete >>> >>> *********************************************************************************** >>> The Royal Bank of Scotland plc. Registered in Scotland No 90312. >>> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. >>> Authorised and regulated by the Financial Services Authority. The >>> Royal Bank of Scotland N.V. is authorised and regulated by the >>> De Nederlandsche Bank and has its seat at Amsterdam, the >>> Netherlands, and is registered in the Commercial Register under >>> number 33002587. Registered Office: Gustav Mahlerlaan 350, >>> Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and >>> The Royal Bank of Scotland plc are authorised to act as agent for each >>> other in certain jurisdictions. >>> >>> This e-mail message is confidential and for use by the addressee only. >>> If the message is received by anyone other than the addressee, please >>> return the message to the sender by replying to it and then delete the >>> message from your computer. Internet e-mails are not necessarily >>> secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland >>> N.V. including its affiliates ("RBS group") does not accept responsibility >>> for changes made to this message after it was sent. For the protection >>> of RBS group and its clients and customers, and in compliance with >>> regulatory requirements, the contents of both incoming and outgoing >>> e-mail communications, which could include proprietary information and >>> Non-Public Personal Information, may be read by authorised persons >>> within RBS group other than the intended recipient(s). >>> >>> Whilst all reasonable care has been taken to avoid the transmission of >>> viruses, it is the responsibility of the recipient to ensure that the onward >>> transmission, opening or use of this message and any attachments will >>> not adversely affect its systems or data. No responsibility is accepted >>> by the RBS group in this regard and the recipient should carry out such >>> virus and other checks as it considers appropriate. >>> >>> Visit our website at www.rbs.com >>> >>> *********************************************************************************** >>> >> >>