Re: searching using the CJKAnalyzer
Jon Schuster wrote: I didn't need to make any changes to Entities to get Japanese searches working. Are you using the CJKAnalyzer when you perform the search, not only when building the index? Yes, I use CJKAnalyzer all around. When searching I translate character-entities in order to find anything. When displaying search results, I don't see anything that looks as being part of an eastern character set. instead I see accented latin - and mathematical symbols. When I don't pass entities by the way things get really nasty: query passed: ?? char(, LATIN_1_SUPPLEMENT) char(?, LATIN_1_SUPPLEMENT) token found : length: 1 char(?, LATIN_1_SUPPLEMENT) char(, LATIN_1_SUPPLEMENT) char(, LATIN_1_SUPPLEMENT) token found : length: 1 char(, LATIN_1_SUPPLEMENT) searching contents: This was a query for two japanese characters. -Original Message- From: Daan Hoogland [mailto:[EMAIL PROTECTED] Sent: Sunday, October 10, 2004 10:48 PM To: Lucene Users List Subject: Re: searching using the CJKAnalyzer Importance: Low Che Dong wrote: Seem not Analyser problem but html parser charset detecting error. Could you show me the detail of the problem? Thank Che, I got it working by making the decode() from the Entities in demo public. I wrote a scanner to tranlate any entities in the query. I want to translate back to entities in the results, but I'm not sure what the criteria should be. It seems to be just binary data. How to conclude that 04?03?04 means ? Thanks Che Dong Daan Hoogland wrote: LS, in http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980 Jon Schuster explains how to get a Japanese search system working. I followed his advice and got a index that luke shows as what I expected it to be. I don't know how to enter a search so that it gets passed to the engine properly. It works in luke but not in weblucene or in my own app. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. ASML is neither liable for the proper and complete transmission of the information contained in this communication, nor for any delay in its receipt.
Re: Indexing Strategy for 20 million documents
--- Christoph Kiehl [EMAIL PROTECTED] wrote: Otis Gospodnetic wrote: I would try putting everything in a single index first, and split it up only if I see performance issues. Why would put everything into a single index? I found some benchmark results on the list (starting with your post from 06/08/04) from which I got the impression that the performance loss is very small if I choose to search in multiple indexes with MultiSearcher instead of using one big index. I think it's simpler to deal with a single index. One directory, one set of lock files, etc. If you don't gain anything by having multiple indices, why have them? Going from 1 index to N indices is not a lot of work (not a lot of Lucene-related code). How do you get from 1 index to N indices without adding the documents again? Yes, you would have to re-create N Lucene indices. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searching using the CJKAnalyzer
CJKAnalyser not support single byte-stream, front end interface and backend indexing process need to transform source into double byte charactor-stream properly before search/index. Please tell me know the output of http://www.chedong.com/tech/HelloUnicode.java with javac -encoding=gb2312 and javac -encoding=iso-8859-1 Regards Che Dong Daan Hoogland wrote: Jon Schuster wrote: I didn't need to make any changes to Entities to get Japanese searches working. Are you using the CJKAnalyzer when you perform the search, not only when building the index? Yes, I use CJKAnalyzer all around. When searching I translate character-entities in order to find anything. When displaying search results, I don't see anything that looks as being part of an eastern character set. instead I see accented latin - and mathematical symbols. When I don't pass entities by the way things get really nasty: query passed: ?? char(, LATIN_1_SUPPLEMENT) char(?, LATIN_1_SUPPLEMENT) token found : length: 1 char(?, LATIN_1_SUPPLEMENT) char(, LATIN_1_SUPPLEMENT) char(, LATIN_1_SUPPLEMENT) token found : length: 1 char(, LATIN_1_SUPPLEMENT) searching contents: This was a query for two japanese characters. -Original Message- From: Daan Hoogland [mailto:[EMAIL PROTECTED] Sent: Sunday, October 10, 2004 10:48 PM To: Lucene Users List Subject: Re: searching using the CJKAnalyzer Importance: Low Che Dong wrote: Seem not Analyser problem but html parser charset detecting error. Could you show me the detail of the problem? Thank Che, I got it working by making the decode() from the Entities in demo public. I wrote a scanner to tranlate any entities in the query. I want to translate back to entities in the results, but I'm not sure what the criteria should be. It seems to be just binary data. How to conclude that 04?03?04 means ? Thanks Che Dong Daan Hoogland wrote: LS, in http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980 Jon Schuster explains how to get a Japanese search system working. I followed his advice and got a index that luke shows as what I expected it to be. I don't know how to enter a search so that it gets passed to the engine properly. It works in luke but not in weblucene or in my own app. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multisearcher question
Hi, Index side information: No. of indexes: Two (to explain better I call these as index_a and index_b). Fields in index_a: x and y. Fields in index_b: y and z. I have written a multisearch code like this. Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A); Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B); Searcher[] searcher = new Searcher[2]; searcher[0] = search_a; searcher[1] = search_b; MultiSearcher searcher = new MultiSearcher(searcher); I am getting the following results, x:query - WORKS x:query AND y:query - WORKS x:query AND z:query - DOESN'T WORK Is this expected behavior? My question is, Can MultiSearcher be used to search on indexes with different fields? If yes, could you please correct the above code. Thanks, -Sreedhar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Special field values
Hi everybody, I am thinking about extending the Lucene search with metadata in the following way Field Value --- Title (n1, n2, n3, ..., nm) | ni element of {0,1} and m amount of distinct metadata values for title Expressed in an informal way, I want to store a tuple of values in a field. The values in the tuple show whether a value is used in the title or not. My question is then, whether I have to code that on my own or if the model is already set up to work like that. Thanks, Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Special field values
Hello Michael, This is something you'd have to code on your own. Otis --- Michael Hartmann [EMAIL PROTECTED] wrote: Hi everybody, I am thinking about extending the Lucene search with metadata in the following way Field Value --- Title (n1, n2, n3, ..., nm) | ni element of {0,1} and m amount of distinct metadata values for title Expressed in an informal way, I want to store a tuple of values in a field. The values in the tuple show whether a value is used in the title or not. My question is then, whether I have to code that on my own or if the model is already set up to work like that. Thanks, Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multisearcher question
Hello Sreedhar, This is the expected behaviour. The query is run against each index, and it won't have any matches in either index, because neither index has both fields. Otis --- Sreedhar, Dantam [EMAIL PROTECTED] wrote: Hi, Index side information: No. of indexes: Two (to explain better I call these as index_a and index_b). Fields in index_a: x and y. Fields in index_b: y and z. I have written a multisearch code like this. Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A); Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B); Searcher[] searcher = new Searcher[2]; searcher[0] = search_a; searcher[1] = search_b; MultiSearcher searcher = new MultiSearcher(searcher); I am getting the following results, x:query - WORKS x:query AND y:query - WORKS x:query AND z:query - DOESN'T WORK Is this expected behavior? My question is, Can MultiSearcher be used to search on indexes with different fields? If yes, could you please correct the above code. Thanks, -Sreedhar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Multisearcher question
Thanks Otis for you reply. If I want to solve the problem that I have defined in my previous mail, what is the suggested approach? Thanks, -Sreedhar -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 12, 2004 6:35 PM To: Lucene Users List Subject: Re: Multisearcher question Hello Sreedhar, This is the expected behaviour. The query is run against each index, and it won't have any matches in either index, because neither index has both fields. Otis --- Sreedhar, Dantam [EMAIL PROTECTED] wrote: Hi, Index side information: No. of indexes: Two (to explain better I call these as index_a and index_b). Fields in index_a: x and y. Fields in index_b: y and z. I have written a multisearch code like this. Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A); Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B); Searcher[] searcher = new Searcher[2]; searcher[0] = search_a; searcher[1] = search_b; MultiSearcher searcher = new MultiSearcher(searcher); I am getting the following results, x:query - WORKS x:query AND y:query - WORKS x:query AND z:query - DOESN'T WORK Is this expected behavior? My question is, Can MultiSearcher be used to search on indexes with different fields? If yes, could you please correct the above code. Thanks, -Sreedhar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SearchBlox J2EE Search Component Version 2.0 released
SearchBlox is a J2EE Search Component that delivers out-of-the-box search functionality for fast and easy implementation with your websites, applications, intranets and portals. SearchBlox uses the Lucene Search API and incorporates integrated HTTP/HTTPS and File System crawlers, support for various document formats including HTML, Word, PDF, PowerPoint and Excel, support for indexing and searching content in 18 languages and customizable search results, all controlled from a browser-based Admin Console. Main features in this release: == - Advanced Search: search by file format, language, keyword occurrence and modified date - Keyword-in-Context Display: search results are displayed with areas of content where the keyword occurs - Upgrade to Lucene 1.4.2 - Performance and stability improvements - Bug fixes SearchBlox is available as a Web Archive (WAR) and is deployable on any Servlet 2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are available for the following servers: JBoss - http://www.searchblox.com/gettingstarted_jboss.html Jetty - http://www.searchblox.com/gettingstarted_jetty.html JRun - http://www.searchblox.com/gettingstarted_jrun.html Oracle - http://www.searchblox.com/gettingstarted_oracle.html Pramati - http://www.searchblox.com/gettingstarted_pramati.html Resin - http://www.searchblox.com/gettingstarted_resin.html Sun - http://www.searchblox.com/gettingstarted_sun.html Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html Websphere - http://www.searchblox.com/gettingstarted_websphere.html SearchBlox is also available as SearchBlox Server. The Server is an integrated application incorporating everything you need to run SearchBlox. The Server includes the SearchBlox J2EE Component, the Jetty Application Server and the Java Runtime Environment (JRE) 1.4. With the SearchBlox Server, there are no additional software requirements to deploy SearchBlox. The SearchBlox FREE Edition is available free of charge and can index up to 1000 documents. The software can be downloaded from http://www.searchblox.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: indexing numeric entities?
Yes You need to parse the entities Yourself. I implemented an HTML entity parser as a part of http://objectledge.org project. You may use it if it will fit Your needs. It is in a ledge-components project module. See http://objectledge.org/modules/ledge-components/index.html Have fun, -- Damian Gajda Caltha Sp. j. http://www.caltha.pl/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: indexing numeric entities?
-Original Message- From: Damian Gajda [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 12, 2004 10:23 AM To: Lucene Users List Subject: Re: indexing numeric entities? Yes You need to parse the entities Yourself. I implemented an HTML entity parser as a part of http://objectledge.org project. You may use it if it will fit Your needs. It is in a ledge-components project module. See http://objectledge.org/modules/ledge-components/index.html Have fun, -- Damian Gajda Caltha Sp. j. http://www.caltha.pl/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Special field values
On Tuesday 12 October 2004 15:02, Otis Gospodnetic wrote: Hello Michael, This is something you'd have to code on your own. Otis --- Michael Hartmann [EMAIL PROTECTED] wrote: Hi everybody, I am thinking about extending the Lucene search with metadata in the following way Field Value --- Title (n1, n2, n3, ..., nm) | ni element of {0,1} and m amount of distinct metadata values for title Expressed in an informal way, I want to store a tuple of values in a field. The values in the tuple show whether a value is used in the title or not. A Lucene index can easily be used to determine whether or not a term is in a field of a document: IndexReader.open(indexName).termDocs(new Term(term, field)).skipTo(documentNr) returns the boolean indicating that. What do you need the {0,1} values for? Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multisearcher question
I think what Sreedhar is asking for is the capability to form a join across multiple indices - and if so, I could sure use that capability myself. However, I think Lucene's logic focuses only on a single query, so I doubt if that's easily done. - Original Message - From: Otis Gospodnetic To: Lucene Users List Sent: Tuesday, October 12, 2004 9:04 AM Subject: Re: Multisearcher question Hello Sreedhar, This is the expected behaviour. The query is run against each index, and it won't have any matches in either index, because neither index has both fields. Otis --- Sreedhar, Dantam [EMAIL PROTECTED] wrote: Hi, Index side information: No. of indexes: Two (to explain better I call these as index_a and index_b). Fields in index_a: x and y. Fields in index_b: y and z. I have written a multisearch code like this. Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A); Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B); Searcher[] searcher = new Searcher[2]; searcher[0] = search_a; searcher[1] = search_b; MultiSearcher searcher = new MultiSearcher(searcher); I am getting the following results, x:query - WORKS x:query AND y:query - WORKS x:query AND z:query - DOESN'T WORK Is this expected behavior? My question is, Can MultiSearcher be used to search on indexes with different fields? If yes, could you please correct the above code. Thanks, -Sreedhar - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Special field values
On Tuesday 12 October 2004 19:27, Paul Elschot wrote: IndexReader.open(indexName).termDocs(new Term(term, field)).skipTo(documentNr) returns the boolean indicating that. Well, almost. When it returns true one still needs to check the TermDocs for being at the documentNr. Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
sorting and score ordering
If I use a Sort instance on my searcher, what will have priority? Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ... if the .5 has a higher 'sort' value, will it return higher than one of the .9 lucene score values if they are lower? -- ___ Chris Fraschetti e [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Problem indexing
Hi, i have problem indexing in the rout C:\TXT\DOC\ But i indexing in the rout C:\TXT is OK Why is the problem ?? P.D Anybody speak spanish in the list please reply P.D. Si alguien habla espaƱol por favor respodame gracias.. -- Miguel Angel Angeles R. Asesoria en Conectividad y Servidores Telf. 97451277 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
As far as my testing showed, the sort will take priority, because it's basically an opt-in sort as opposed to the defaulted score sort. So you're basically displaying a sorted set over all your results as opposed to sorting the most relevant results. Hope this helps Nader Henein Chris Fraschetti wrote: If I use a Sort instance on my searcher, what will have priority? Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ... if the .5 has a higher 'sort' value, will it return higher than one of the .9 lucene score values if they are lower? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]