extracted doc
http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml
Grant Ingersoll-6 wrote:
Hmm, looks very much like an encoding problem. Can you post a sample
showing it, along with the commands you invoked?
Thanks,
Grant
On Jul 28, 2009, at 6:14 PM, ashokc wrote
On Jul 28, 2009, at 6:14 PM, ashokc wrote:
I am finding that the search results based on indexing Tika
extracted text
are very different from results based on indexing the text extracted
via
other means. This shows up for example with a chinese web site that
I am
trying to index.
I created
I am finding that the search results based on indexing Tika extracted text
are very different from results based on indexing the text extracted via
other means. This shows up for example with a chinese web site that I am
trying to index.
I created the documents (for posting to SOLR) in two ways.
Yes, I reindexed the entire repository after each of my changes. Here is the
output with debug on.
== DEBUG OUTPUT BEGIN ==
lst name=responseHeader
int name=status0/int
int name=QTime83/int
lst name=params
str name=wtstandard/str
str name=rows10/str
Hi
I have the following fieldType that processes korean/chinese/japanese text
fieldType name=cjk_text class=solr.TextField
analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.CJKTokenizerFactory/
Hi,
I copy 'field1' to 'field2' so that I can apply a different set of analyzers
filters. Content wise, they are identical. 'field2' has to be stored
because it is used for high-lighting. Do I have to declare 'field1' also to
be stored? 'field1' is never returned in the response. Thanks. - ashok
When 'dismax' queries are use, where is the best place to apply boost
values/factors? While indexing by supplying the 'boost' attribute to the
field, or in solrconfig.xml by specifying the 'qf' parameter with the same
boosts? What are the advantages/disadvantages to each? What happens if both
Hi,
I find that I am freely able to post to my production SOLR server, from any
other host that can run the post command. So somebody can wipe out the whole
index by posting a delete query. Is there a way SOLR can be configured so
that it will take updates ONLY from the server on which it is
Hi,
The 'content' field that I am indexing is usually large (e.g. a pdf doc of a
few Mb in size). I need highlighting to be on. This 'seems' to require that
I have to set the 'content' field to be STORED. This returns the whole
content field in the search result XML. for each matching document.
AM, ashokc ash...@qualcomm.com wrote:
What we need is for the white_papers pdfs to be boosted, but if and
only
if such doucments are valid results to the search term in question. How
would I write my above 'q' to accomplish that?
Thanks for explaining in detail.
Basically, all you
that?
Thanks
- ashok
Shalin Shekhar Mangar wrote:
On Fri, Apr 17, 2009 at 1:03 AM, ashokc ash...@qualcomm.com wrote:
I have a query that yields results binned in several facets. How can I
boost
the results that fall in certain facets over the rest of them that do not
belong to those
I have a query that yields results binned in several facets. How can I boost
the results that fall in certain facets over the rest of them that do not
belong to those facets? I use the standard query format. Thank you
- ashok
--
View this message in context:
Hi,
I have separate JDBC datasources (DS1 DS2) that I want to index with DIH
in a single SOLR instance. The unique record for the two sources are
different. Do I have to synthesize a uniqueKey that spans both the
datasources? Something like this? That is, the uniqueKey values will be like
(+
What I am doing right now is to capture all the content under content_korea
for example, use 'copyField' to duplicate that content to content_english.
content_korea gets processed with CJK analyzers, and content_english
gets processed with usual detailed index/query analyzers, filters, synonyms.
not be always in uppercase it can be in mixed case as well
On Sat, Apr 4, 2009 at 12:58 AM, ashokc ash...@qualcomm.com wrote:
Happy to report that it is working. Looks like we have to use UPPER CASE
for
all the column names. When I examined the map 'aRow', it had the column
names in upper case
That worked. Thanks again.
Noble Paul നോബിള് नोब्ळ् wrote:
the column names are case sensitive try this
field column=PROJECT_AREA name=projects /
field column=PROJECT_VERSION name=projects /
On Sat, Apr 4, 2009 at 3:58 AM, ashokc ash...@qualcomm.com wrote:
Hi,
I need
that is the easiest
--Noble
On Fri, Apr 3, 2009 at 9:35 AM, ashokc ash...@qualcomm.com wrote:
That would require me to recompile (with ant/maven scripts?) the source
and
replace the jar for DIH, right? I can try - for the first time.
- ashok
Noble Paul നോബിള് नोब्ळ् wrote:
This looks strange
wrong with your setup.
can you just paste the whole data-config.xml
--Noble
On Fri, Apr 3, 2009 at 5:39 PM, ashokc ash...@qualcomm.com wrote:
Noble,
I put in a few 'System.out.println' statements in the
ClobTransformer.java
file remade the war. But I see none of these prints coming up
behavior with the 'war' that download came with. Thanks Noble.
Noble Paul നോബിള് नोब्ळ् wrote:
and which version of Solr are u using?
On Fri, Apr 3, 2009 at 10:09 PM, ashokc ash...@qualcomm.com wrote:
Sure:
data-config Xml
===
dataConfig
dataSource driver
of clue, why this may happen. I
even wrote a testcase and it seems to work fine
--Noble
On Fri, Apr 3, 2009 at 10:23 PM, ashokc ash...@qualcomm.com wrote:
I downloaded the nightly build yesterday (2nd April), modified the
ClobTransformer.java file with some prints, compiled it all (ant dist
Hi,
I need to assign multiple values to a field, with each value coming from a
different column of the sql query.
My data config snippet has lines like
field column=project_area name=projects /
field column=project_version name=projects /
where 'project_area'
Hi,
I have set up to import some oracle clob columns with DIH. I am using the
latest nightly release. My config says,
But it does not seem to turn this clob into a String. The search results
show:
1.8670129
oracle.sql.c...@aed3a5
4486
Any pointers on why I do not get
?
Is the nightly war NOT the right one to use?
Thanks for your help.
- ashok
ashokc wrote:
Hi,
I have set up to import some oracle clob columns with DIH. I am using the
latest nightly release. My config says,
entity name=description transformer=ClobTransformer ... field
column=description clob=true
ClobTransformer adding(System.out.println
into ClobTransformer may help)
On Fri, Apr 3, 2009 at 6:04 AM, ashokc ash...@qualcomm.com wrote:
Correcting my earlier post. It lost some lines some how.
Hi,
I have set up to import some oracle clob columns with DIH. I am using the
latest nightly release
Hi,
I have documents where text from two languages, e.g. (english korean) or
(english german) are mixed u p in a fairly intensive way. 20-30% of the
text is in English and the rest in the other. Can somebody indicate how I
should set up the 'analyzers' and 'fields' in schema.xml? Should I have
I have seen some of these oddities that Chris is referring to. In my case,
terms that are NOT in the query get highlighted. For example searching for
'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
either. Do these filter factories add some extra intelligence to the
Hello,
Is it possible to have the index created by a single SOLR instance, but have
several SOLR instances field the search queries. Or do I HAVE to replicate
the index for each SOLR instance that I want to answer queries? I need to
set up a fail-over instance. Thanks
- ashok
--
View this
, if
there is network in the picture.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: ashokc ash...@qualcomm.com
To: solr-user@lucene.apache.org
Sent: Monday, January 12, 2009 3:05:40 PM
Subject: Single index - multiple SOLR instances
Thanks for the reply. I figured there is no simple solution here. I am
parsing the query in my code separating out negations, assertions and such
and building the final SOLR query to issue. I simply ue the boost as given
by the user. If none given, I use a default boost for title url matches.
-
to search over.
Are there better approaches?
Thanks
- ashok
Yonik Seeley wrote:
On Thu, Dec 4, 2008 at 6:39 PM, ashokc [EMAIL PROTECTED] wrote:
The SOLR wiki says
3. Make sure both indexes you want to merge are closed.
What exactly does 'closed' mean?
If you do a commit, and then prevent
Here is the problem I am trying to solve. I have to use the Standard Request
Handler.
Query (can be quite complex, as it gets built from an advanced search form):
term1^2.0 OR term2 OR term3 term4
I have 3 fields - content (the default search field), title and url.
Any matches in the title or
The SOLR wiki says
3. Make sure both indexes you want to merge are closed.
What exactly does 'closed' mean?
1. Do I need to stop SOLR search on both indexes before running the merge
command? So a brief downtime is required?
Or do I simply prevent any 'updates/deletes' to these indices during
Hi,
I have set
solrQueryParser defaultOperator=AND/
but it is not taking effect. It continues to take it as OR. I am working
with the latest nightly build 11/20/2008
For a querry like
term1 term2
Debug shows
str name=parsedquerycontent:term1 content:term2/str
Bug?
Thanks
- ashok
--
33 matches
Mail list logo