Re: Fail to huge collection extraction

2012-09-10 Thread neosky
To Alex, Thanks for you advice. I did ask and I can understand the requirement is necessary for them. They won't browser all the result in one page. But they will use the query result to do some additional research. So what they want are something exact match the query. So they need to pull out

Re: Fail to huge collection extraction

2012-09-09 Thread neosky
Thanks Alex! Yes, you hit my key points. Actually I have to implement both of the requirements. The first one works very well as the reason you state. Now I have a website client which is 20 records per page. It is fast. However, my customer also wants to use Servlet to download the whole query

Re: Fail to huge collection extraction

2012-09-08 Thread neosky
I am sorry that I can't get your point. Would you explain a little more? I am still struggling with this problem. It seems crash by no meaning sometimes. Even I reduce to 5000 records each time, but sometimes it works well with 1 per page. -- View this message in context:

Fail to huge collection extraction

2012-08-27 Thread neosky
I am using Solr 3.5 and Jetty 8.12 I need to pull out huge query results at a time(for example, 1 million documents, probably a couple gigabytes size) and my machine is about 64 G memory. I use the java bin and SolrJ as my client. And I use a Servelt to provide a query down service for the end

Re: The index speed in the solr

2012-04-25 Thread neosky
Thanks for your suggestion. I will try later and give you a feedback if possible Now the way I use is to remove some ngram. Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3939366.html Sent from the Solr - User mailing list

Re: How can I get the top term in solr?

2012-04-22 Thread neosky
You are very helpful. Thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-get-the-top-term-in-solr-tp3926536p3931252.html Sent from the Solr - User mailing list archive at Nabble.com.

The index speed in the solr

2012-04-22 Thread neosky
It takes me 50 hours to index a total 9 G file(about 2,000,000 documents) with n-gram filter from min=6,max=10, my token before ngram filter is long(not a word, at most 300,000 bytes with white space). I split into 4 files and use the post.sh to update at the same time. I also tried to write a

Re: Further questions about behavior in ReversedWildcardFilterFactory

2012-04-20 Thread neosky
I have to discard this method at this time. Thank you all the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Further-questions-about-behavior-in-ReversedWildcardFilterFactory-tp3905416p3926423.html Sent from the Solr - User mailing list archive at Nabble.com.

How can I get the top term in solr?

2012-04-20 Thread neosky
Actually I would like to know two meaning of the top term in document level and index file level. 1.The top term in document level means that I would like to know the top term frequency in all document(only calculate once in one document) The solr schema.jsp seems to provide to top 10 term, but

Further questions about behavior in ReversedWildcardFilterFactory

2012-04-12 Thread neosky
I ask the question in http://lucene.472066.n3.nabble.com/A-little-onfusion-with-maxPosAsterisk-tt3889226.html However, when I do some implementation, I get a further questions. 1. Suppose I don't use ReversedWildcardFilterFactory in the index time, it seems that Solr doesn't allow the leading

Does the lucene can read the index file from solr?

2012-04-11 Thread neosky
both are version 3.5 I have tried that the solr can read the index file by lucene, but I tried to use the lucene to read the index file from a specific field. It returns me the result when I do the *.* search -- View this message in context:

which approach is correct?

2012-04-09 Thread neosky
Here are my fields field name=id101/fieldfield name=sequenceNGHGJGKGKLHJFKGJGKGK/field the sequence field is from 300 bytes to 56K bytes, no spaces I want to ngram from 3 to 8 NGH GHG HGJ ... NGHG GHGJ HGJG ... ... fieldType name=nGram1 class=solr.TextField positionIncrementGap=100

Re: Two questions about the Ngramtokenizerfactory

2012-04-08 Thread neosky
neosky wrote I use the solr 3.5 version 1. It seems that the Ngramtokenizerfactory only token the first 1024 characters. I search the problem on the Internet, somebody had noticed the bug in 2007, but I can't find the solution. ps: my max field length has been modified

Two questions about the Ngramtokenizerfactory

2012-04-07 Thread neosky
I use the solr 3.5 version 1. It seems that the Ngramtokenizerfactory only token the first 1024 characters. I search the problem on the Internet, somebody had noticed the bug in 2007, but I can't find the solution. ps: my max field length has been modified maxFieldLength5/maxFieldLength This

Re: A little onfusion with maxPosAsterisk

2012-04-06 Thread neosky
great! thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/A-little-onfusion-with-maxPosAsterisk-tp3889226p3890776.html Sent from the Solr - User mailing list archive at Nabble.com.

How to store the secondary results from the Solr

2012-04-05 Thread neosky
Because the first query result doesn't meet my requirement I have to do a secondary process manually based on the first query full results. Only after I finish the secondary process, I begin to show it to the end user based on specific records(for instance like the Solr does 10 records a time) one

A little onfusion with maxPosAsterisk

2012-04-05 Thread neosky
maxPosAsterisk - maximum position (1-based) of the asterisk wildcard ('*') that triggers the reversal of query term. Asterisk that occurs at positions higher than this value will not cause the reversal of query term. Defaults to 2, meaning that asterisks on positions 1 and 2 will cause a reversal.

How to return a result with multiple query?

2012-04-03 Thread neosky
1.I did 5 gram token in my sequence field, and I search as the following http://192.168.52.137:8983/solr/select?indent=ondefType=dismaxversion=2.2q=sequence:N%20sequence:N%20sequence:Gfq=start=0rows=10fl=*,scoreqt=wt=explainOther=hl=onhl.fl=sequence I want to return a document with

Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread neosky
Does anyone know it is a bug or not? I use Ngram in my index. fieldType name=text_general_rev class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=5 maxGramSize=5/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer

Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread neosky
My current version is solr 3.5. It should be the most updated. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-my-highlights-are-wrong-one-character-offset-tp3860286p3862872.html Sent from the Solr - User mailing list archive at Nabble.com.

Why my highlights are wrong(one character offset)?

2012-03-26 Thread neosky
all of my highlights has one character mistake in the offset,some fragments from my response. Thanks! response lst name=responseHeader int name=status0/int int name=QTime259/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flsequence/str str name=wt/ str

Why my highlights are wrong(one character offset)?

2012-03-26 Thread neosky
all of my highlights has one character mistake in the offset,some fragments from my response. Thanks! response lst name=responseHeader int name=status0/int int name=QTime259/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flsequence/str str name=wt/ str

Re: Does the Solr provide hightlight token position in the field?

2012-03-19 Thread neosky
Thanks! I look at the api carefully before, but not very sure. So,it seems that the highlighter might not be helpful. I am considering alternative solution for this problem. I would like to what exactly want, for instance I got a candidate record from my query:RVCES(I implement a 5 gram index)

Does the Solr provide hightlight token position in the field?

2012-03-18 Thread neosky
Does the hightlight can provide the exact position of the query For instance: MSAQLRKPTA*RVCES*CGRAEHWDDDLEAWQIARTDGTKQVGSPHCLHEWDINGNFNPVAMDD I want to know the Position of R in the highlight token. I want to do the secondary query based on the position, Thanks! -- View this message in context:

Re: How to avoid the unexpected character error?

2012-03-16 Thread neosky
I am sorry, but I can't get what you mean. I tried the HTMLStripCharFilter and PatternReplaceCharFilter. It doesn't work. Could you give me an example? Thanks! fieldType name=text_html class=solr.TextField positionIncrementGap=100 analyzer charFilter

How to avoid the unexpected character error?

2012-03-14 Thread neosky
I use the xml to index the data. One filed might contains some characters like '' = It seems that will produce the error I modify that filed doesn't index, but it doesn't work. I need to store the filed, but index might not be indexed. Thanks! -- View this message in context:

Re: How to avoid the unexpected character error?

2012-03-14 Thread neosky
Thanks! Does the schema.xml support this parameter? I am using the example post.jar to index my file. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html Sent from the Solr - User mailing list archive at

Does the lucene support the substring search?

2012-03-11 Thread neosky
Thank you! Now I use the awk to preprocess it. It seems quite efficiency.I think the other scripting languages will also be helpful. Return to the post, I would like to know about whether the lucene support the substring search or not. As you can see, one field of my document is long string filed

How to index a single big file?

2012-03-10 Thread neosky
Hello, I have a great challenge here. I have a big file(1.2G) with more than 200 million records need to index. It might more than 9 G file with more than 1000 million record later. One record contains 3 fields. I am quite newer for solr and lucene, so I have some questions: 1. It seems that solr