RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Does anyone know how to index a pdf file with very big size (more than 100MB)?

Thanks so much,
Xiaohui 
-Original Message-
From: Ma, Xiaohui (NIH/NLM/LHC) [C] 
Sent: Tuesday, November 30, 2010 4:22 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: how to set maxFieldLength to unlimitd

I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the mainIndex section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui



RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella
You just can't set it to unlimited. What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens stacked on the first 
position)
You could also break per page, so you put each page on a new position.

Jan

-Original Message-
From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
Sent: Dienstag, 30. November 2010 19:49
To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
'solr-user-...@lucene.apache.org'
Subject: how to set maxFieldLength to unlimitd

I need index and search some pdf files which are very big (around 1000 pages 
each). How can I set maxFieldLength to unlimited?

Thanks so much for your help in advance,
Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your replay, Jan. I just found I cannot index pdf files with 
the file size more than 20MB.

I use curl index them, didn't get any error either. Do you have any suggestions 
to index pdf files with more than 20MB?

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 11:30 AM
To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
solr-user-...@lucene.apache.org
Subject: RE: how to set maxFieldLength to unlimitd

You just can't set it to unlimited. What you could do, is ignoring the 
positions and put a filter in, that sets the token for all but the first token 
to 0 (means the field length will be just 1, all tokens stacked on the first 
position)
You could also break per page, so you put each page on a new position.

Jan

-Original Message-
From: ext Ma, Xiaohui (NIH/NLM/LHC) [C] [mailto:xiao...@mail.nlm.nih.gov]
Sent: Dienstag, 30. November 2010 19:49
To: solr-user@lucene.apache.org; 'solr-user-i...@lucene.apache.org'; 
'solr-user-...@lucene.apache.org'
Subject: how to set maxFieldLength to unlimitd

I need index and search some pdf files which are very big (around 1000 pages 
each). How can I set maxFieldLength to unlimited?

Thanks so much for your help in advance,
Xiaohui


Re: how to set maxFieldLength to unlimitd

2010-12-01 Thread jan.kurella
I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb ext Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov 
 :

 Thanks so much for your replay, Jan. I just found I cannot index pdf  
 files with the file size more than 20MB.

 I use curl index them, didn't get any error either. Do you have any  
 suggestions to index pdf files with more than 20MB?

 Thanks,
 Xiaohui

 -Original Message-
 From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
 Sent: Wednesday, December 01, 2010 11:30 AM
 To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
 solr-user-...@lucene.apache.org
 Subject: RE: how to set maxFieldLength to unlimitd

 You just can't set it to unlimited. What you could do, is ignoring  
 the positions and put a filter in, that sets the token for all but  
 the first token to 0 (means the field length will be just 1, all  
 tokens stacked on the first position)
 You could also break per page, so you put each page on a new  
 position.

 Jan

 -Original Message-
 From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
 [mailto:xiao...@mail.nlm.nih.gov]
 Sent: Dienstag, 30. November 2010 19:49
 To: solr-user@lucene.apache.org; 'solr-user- 
 i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
 Subject: how to set maxFieldLength to unlimitd

 I need index and search some pdf files which are very big (around  
 1000 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui


RE: how to set maxFieldLength to unlimitd

2010-12-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much, Jan. I use curl to index pdf files. Is there other way to do it?

I changed it the positionIncrement to 0, I didn't get it work either.

Thanks,
Xiaohui 

-Original Message-
From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com] 
Sent: Wednesday, December 01, 2010 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

I don't know about upload limitations, but for sure there are some in  
the default settings, this could explain the limit of 20MB. Which  
upload mechanism on solr side do you use? I guess this is not a lucene  
problem but rather the http-layer of solr.

If you manage to stream your PDF and start parsing it on the stream  
you then should go for the filter, that sets the positionIncrement to  
0 as mentioned.

What we did once for PDF files, we parsed them befor into plain text  
and where indexing this (but we were using lucene directly) with a  
streamReader.


Grüße, Jan

Am 01.12.2010 um 18:13 schrieb ext Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov 
 :

 Thanks so much for your replay, Jan. I just found I cannot index pdf  
 files with the file size more than 20MB.

 I use curl index them, didn't get any error either. Do you have any  
 suggestions to index pdf files with more than 20MB?

 Thanks,
 Xiaohui

 -Original Message-
 From: jan.kure...@nokia.com [mailto:jan.kure...@nokia.com]
 Sent: Wednesday, December 01, 2010 11:30 AM
 To: solr-user@lucene.apache.org; solr-user-i...@lucene.apache.org; 
 solr-user-...@lucene.apache.org
 Subject: RE: how to set maxFieldLength to unlimitd

 You just can't set it to unlimited. What you could do, is ignoring  
 the positions and put a filter in, that sets the token for all but  
 the first token to 0 (means the field length will be just 1, all  
 tokens stacked on the first position)
 You could also break per page, so you put each page on a new  
 position.

 Jan

 -Original Message-
 From: ext Ma, Xiaohui (NIH/NLM/LHC) [C]  
 [mailto:xiao...@mail.nlm.nih.gov]
 Sent: Dienstag, 30. November 2010 19:49
 To: solr-user@lucene.apache.org; 'solr-user- 
 i...@lucene.apache.org'; 'solr-user-...@lucene.apache.org'
 Subject: how to set maxFieldLength to unlimitd

 I need index and search some pdf files which are very big (around  
 1000 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui


how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I need index and search some pdf files which are very big (around 1000 pages 
each). How can I set maxFieldLength to unlimited?

Thanks so much for your help in advance,
Xiaohui


Re: how to set maxFieldLength to unlimitd

2010-11-30 Thread Erick Erickson
Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui



RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui



RE: how to set maxFieldLength to unlimitd

2010-11-30 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I set maxFieldLength to 2147483647, restarted tomcat and re-indexed pdf files 
again. I also commented out the one in the mainIndex section. Unfortunately 
the files are still chopped out if the size of file is more than 20MB.

Any suggestions? I really appreciate your help!
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, November 30, 2010 2:01 PM
To: solr-user@lucene.apache.org
Subject: Re: how to set maxFieldLength to unlimitd

Set the maxFieldLength value in solrconfig.xml to, say, 2147483647

Also, see this thread for a common gotcha:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html
,
it appears you can just comment out the one in the mainIndex section.

Best
Erick

On Tue, Nov 30, 2010 at 1:48 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] 
xiao...@mail.nlm.nih.gov wrote:

 I need index and search some pdf files which are very big (around 1000
 pages each). How can I set maxFieldLength to unlimited?

 Thanks so much for your help in advance,
 Xiaohui