I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb "ext Ma, Xiaohui (NIH/NLM/LHC) [C]" 
<xiao...@mail.nlm.nih.gov 
 >:

> Thanks so much for your replay, Jan. I just found I cannot index pdf  
> files with the file size more than 20MB.
>
> I use curl index them, didn't get any error either. Do you have any  
> suggestions to index pdf files with more than 20MB?
>
> Thanks,
> Xiaohui
>
> -----Original Message-----
> From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
> Sent: Wednesday, December 01, 2010 11:30 AM
> To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
> solr-user-...@lucene.apache.org
> Subject: RE: how to set maxFieldLength to unlimitd
>
> You just can't set it to "unlimited". What you could do, is ignoring  
> the positions and put a filter in, that sets the token for all but  
> the first token to 0 (means the field length will be just 1, all  
> tokens "stacked" on the first position)
> You could also break per page, so you put each "page" on a new  
> position.
>
> Jan
>
>> -----Original Message-----
>> From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
>> [mailto:xiao...@mail.nlm.nih.gov]
>> Sent: Dienstag, 30. November 2010 19:49
>> To: solr-user@lucene.apache.org; 'solr-user- 
>> i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
>> Subject: how to set maxFieldLength to unlimitd
>>
>> I need index and search some pdf files which are very big (around  
>> 1000 pages each). How can I set maxFieldLength to unlimited?
>>
>> Thanks so much for your help in advance,
>> Xiaohui

Reply via email to