Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread ashokc
extracted doc http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml Grant Ingersoll-6 wrote: Hmm, looks very much like an encoding problem. Can you post a sample showing it, along with the commands you invoked? Thanks, Grant On Jul 28, 2009, at 6:14 PM, ashokc wrote

Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread ashokc
On Jul 28, 2009, at 6:14 PM, ashokc wrote: I am finding that the search results based on indexing Tika extracted text are very different from results based on indexing the text extracted via other means. This shows up for example with a chinese web site that I am trying to index. I created

Indexing TIKA extracted text. Are there some issues?

2009-07-28 Thread ashokc
I am finding that the search results based on indexing Tika extracted text are very different from results based on indexing the text extracted via other means. This shows up for example with a chinese web site that I am trying to index. I created the documents (for posting to SOLR) in two ways.

Re: CJKTokenizerFactory seems to work for Korea but not for China and Japan

2009-07-01 Thread ashokc
Yes, I reindexed the entire repository after each of my changes. Here is the output with debug on. == DEBUG OUTPUT BEGIN == lst name=responseHeader int name=status0/int int name=QTime83/int lst name=params str name=wtstandard/str str name=rows10/str

CJKTokenizerFactory seems to work for Korea but not for China and Japan

2009-06-30 Thread ashokc
Hi I have the following fieldType that processes korean/chinese/japanese text fieldType name=cjk_text class=solr.TextField analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory/

copyfield and 'store' and highlighting

2009-06-10 Thread ashokc
Hi, I copy 'field1' to 'field2' so that I can apply a different set of analyzers filters. Content wise, they are identical. 'field2' has to be stored because it is used for high-lighting. Do I have to declare 'field1' also to be stored? 'field1' is never returned in the response. Thanks. - ashok

qf boost Versus field boost for Dismax queries

2009-06-09 Thread ashokc
When 'dismax' queries are use, where is the best place to apply boost values/factors? While indexing by supplying the 'boost' attribute to the field, or in solrconfig.xml by specifying the 'qf' parameter with the same boosts? What are the advantages/disadvantages to each? What happens if both

How to disable posting updates from a remote server

2009-06-04 Thread ashokc
Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is

Highlighting and Field options

2009-06-01 Thread ashokc
Hi, The 'content' field that I am indexing is usually large (e.g. a pdf doc of a few Mb in size). I need highlighting to be on. This 'seems' to require that I have to set the 'content' field to be STORED. This returns the whole content field in the search result XML. for each matching document.

Re: Boosting by facets with standard query

2009-04-19 Thread ashokc
AM, ashokc ash...@qualcomm.com wrote: What we need is for the white_papers pdfs to be boosted, but if and only if such doucments are valid results to the search term in question. How would I write my above 'q' to accomplish that? Thanks for explaining in detail. Basically, all you

Re: Boosting by facets with standard query

2009-04-17 Thread ashokc
that? Thanks - ashok Shalin Shekhar Mangar wrote: On Fri, Apr 17, 2009 at 1:03 AM, ashokc ash...@qualcomm.com wrote: I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those

Boosting by facets with standard query

2009-04-16 Thread ashokc
I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those facets? I use the standard query format. Thank you - ashok -- View this message in context:

DIH uniqueKey

2009-04-14 Thread ashokc
Hi, I have separate JDBC datasources (DS1 DS2) that I want to index with DIH in a single SOLR instance. The unique record for the two sources are different. Do I have to synthesize a uniqueKey that spans both the datasources? Something like this? That is, the uniqueKey values will be like (+

Re: More than one language in the same document

2009-04-07 Thread ashokc
What I am doing right now is to capture all the content under content_korea for example, use 'copyField' to duplicate that content to content_english. content_korea gets processed with CJK analyzers, and content_english gets processed with usual detailed index/query analyzers, filters, synonyms.

Re: Oracle Clob column with DIH does not turn to String

2009-04-04 Thread ashokc
not be always in uppercase it can be in mixed case as well On Sat, Apr 4, 2009 at 12:58 AM, ashokc ash...@qualcomm.com wrote: Happy to report that it is working. Looks like we have to use UPPER CASE for all the column names. When I examined the map 'aRow', it had the column names in upper case

Re: Multi-valued fields with DIH

2009-04-04 Thread ashokc
That worked. Thanks again. Noble Paul നോബിള്‍ नोब्ळ् wrote: the column names are case sensitive try this field column=PROJECT_AREA name=projects / field column=PROJECT_VERSION name=projects / On Sat, Apr 4, 2009 at 3:58 AM, ashokc ash...@qualcomm.com wrote: Hi, I need

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
that is the easiest --Noble On Fri, Apr 3, 2009 at 9:35 AM, ashokc ash...@qualcomm.com wrote: That would require me to recompile (with ant/maven scripts?) the source and replace the jar for DIH, right? I can try - for the first time. - ashok Noble Paul നോബിള്‍  नोब्ळ् wrote: This looks strange

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
wrong with your setup. can you just paste the whole data-config.xml --Noble On Fri, Apr 3, 2009 at 5:39 PM, ashokc ash...@qualcomm.com wrote: Noble, I put in a few 'System.out.println' statements in the ClobTransformer.java file remade the war. But I see none of these prints coming up

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
behavior with the 'war' that download came with. Thanks Noble. Noble Paul നോബിള്‍ नोब्ळ् wrote: and which version of Solr are u using? On Fri, Apr 3, 2009 at 10:09 PM, ashokc ash...@qualcomm.com wrote: Sure: data-config Xml === dataConfig    dataSource driver

Re: Oracle Clob column with DIH does not turn to String

2009-04-03 Thread ashokc
of clue, why this may happen. I even wrote a testcase and it seems to work fine --Noble On Fri, Apr 3, 2009 at 10:23 PM, ashokc ash...@qualcomm.com wrote: I downloaded the nightly build yesterday (2nd April), modified the ClobTransformer.java file with some prints, compiled it all (ant dist

Multi-valued fields with DIH

2009-04-03 Thread ashokc
Hi, I need to assign multiple values to a field, with each value coming from a different column of the sql query. My data config snippet has lines like field column=project_area name=projects / field column=project_version name=projects / where 'project_area'

Oracle Clob column with DIH does not turn to String

2009-04-02 Thread ashokc
Hi, I have set up to import some oracle clob columns with DIH. I am using the latest nightly release. My config says, But it does not seem to turn this clob into a String. The search results show: 1.8670129 oracle.sql.c...@aed3a5 4486 Any pointers on why I do not get

Re: Oracle Clob column with DIH does not turn to String

2009-04-02 Thread ashokc
? Is the nightly war NOT the right one to use? Thanks for your help. - ashok ashokc wrote: Hi, I have set up to import some oracle clob columns with DIH. I am using the latest nightly release. My config says, entity name=description transformer=ClobTransformer ... field column=description clob=true

Re: Oracle Clob column with DIH does not turn to String

2009-04-02 Thread ashokc
ClobTransformer adding(System.out.println into ClobTransformer may help) On Fri, Apr 3, 2009 at 6:04 AM, ashokc ash...@qualcomm.com wrote: Correcting my earlier post. It lost some lines some how. Hi, I have set up to import some oracle clob columns with DIH. I am using the latest nightly release

More than one language in the same document

2009-03-26 Thread ashokc
Hi, I have documents where text from two languages, e.g. (english korean) or (english german) are mixed u p in a fairly intensive way. 20-30% of the text is in English and the rest in the other. Can somebody indicate how I should set up the 'analyzers' and 'fields' in schema.xml? Should I have

Re: Highlighting Oddities

2009-02-04 Thread ashokc
I have seen some of these oddities that Chris is referring to. In my case, terms that are NOT in the query get highlighted. For example searching for 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms either. Do these filter factories add some extra intelligence to the

Single index - multiple SOLR instances

2009-01-12 Thread ashokc
Hello, Is it possible to have the index created by a single SOLR instance, but have several SOLR instances field the search queries. Or do I HAVE to replicate the index for each SOLR instance that I want to answer queries? I need to set up a fail-over instance. Thanks - ashok -- View this

Re: Single index - multiple SOLR instances

2009-01-12 Thread ashokc
, if there is network in the picture. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ashokc ash...@qualcomm.com To: solr-user@lucene.apache.org Sent: Monday, January 12, 2009 3:05:40 PM Subject: Single index - multiple SOLR instances

Re: Boost a query by field at query time - Standard Request Handler

2008-12-09 Thread ashokc
Thanks for the reply. I figured there is no simple solution here. I am parsing the query in my code separating out negations, assertions and such and building the final SOLR query to issue. I simply ue the boost as given by the user. If none given, I use a default boost for title url matches. -

Re: Merging Indices

2008-12-05 Thread ashokc
to search over. Are there better approaches? Thanks - ashok Yonik Seeley wrote: On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote: The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? If you do a commit, and then prevent

Boost a query by field at query time - Standard Request Handler

2008-12-04 Thread ashokc
Here is the problem I am trying to solve. I have to use the Standard Request Handler. Query (can be quite complex, as it gets built from an advanced search form): term1^2.0 OR term2 OR term3 term4 I have 3 fields - content (the default search field), title and url. Any matches in the title or

Merging Indices

2008-12-04 Thread ashokc
The SOLR wiki says 3. Make sure both indexes you want to merge are closed. What exactly does 'closed' mean? 1. Do I need to stop SOLR search on both indexes before running the merge command? So a brief downtime is required? Or do I simply prevent any 'updates/deletes' to these indices during

solrQueryParser does not take effect - nightly build

2008-11-20 Thread ashokc
Hi, I have set solrQueryParser defaultOperator=AND/ but it is not taking effect. It continues to take it as OR. I am working with the latest nightly build 11/20/2008 For a querry like term1 term2 Debug shows str name=parsedquerycontent:term1 content:term2/str Bug? Thanks - ashok --