Re: Fwd: Language detection for solr 3.6.1

2014-07-09 Thread T. Kuro Kurosaka
directory to the lib directory, and use LangDetectLanguageIdentifierUpdateProcessorFactory instead of TikaLanguageIdentifierUpdateProcessorFactory in the commented out portion of example/solr/conf/solrconfig.xml (and you need to un-comment out that portion, of course) Hope this helps. -- T. Kuro

Re: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-20 Thread T. Kuro Kurosaka
On 06/20/2014 04:04 AM, Allison, Timothy B. wrote: Let's say a predominantly English document contains a Chinese sentence. If the English field uses the WhitespaceTokenizer with a basic WordDelimiterFilter, the Chinese sentence could be tokenized as one big token (if it doesn't have any

Re: Strict mode at searching and indexing

2014-06-03 Thread T. Kuro Kurosaka
On 05/30/2014 08:29 AM, Erick Erickson wrote: I see errors in both cases. Do you 1 have schemaless configured or 2 have a dynamic field pattern that matches your non_exist_field? Maybe !--dynamicField name=* type=ignored multiValued=true /-- is un-commented-out in schema.xml? Kuro

Re: Stemming for Chinese and Japanese

2014-06-03 Thread T. Kuro Kurosaka
On 05/20/2014 11:31 AM, Geepalem wrote: Hi, What is the filter to be used to implement stemming for Chinese and Japanese language field types. For English, I have used filter class=solr.SnowballPorterFilterFactory language=English / and its working fine. What do you mean by working fine? Try

Any Solrj API to obtain field list?

2014-05-27 Thread T. Kuro Kurosaka
. But I cannot find a suitable Solrj API. Is there any? I'm using Solr 4.6.1. I could write code to use Schema REST API (https://wiki.apache.org/solr/SchemaRESTAPI) but I would much prefer to use the existing code if one exists. -- T. Kuro Kurosaka • Senior Software Engineer

Re: Any Solrj API to obtain field list?

2014-05-27 Thread T. Kuro Kurosaka
On 05/27/2014 02:29 PM, Jack Krupansky wrote: You might consider an update request processor as an alternative. It runs on the server and might be simpler. You can even use the stateless script update processor to avoid having to write any custom Java code. -- Jack Krupansky That's an

Re: Any Solrj API to obtain field list?

2014-05-27 Thread T. Kuro Kurosaka
On 05/27/2014 02:55 PM, Steve Rowe wrote: You can call the Schema API from SolrJ - see Shawn Heisey’s example code here:http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3c51daecd2.6030...@elyograg.org%3e Steve It looks like this returns a Json representation of fields

Re: Any Solrj API to obtain field list?

2014-05-27 Thread T. Kuro Kurosaka
On 05/27/2014 04:21 PM, Steve Rowe wrote: Shawn’s code shows that SolrJ parses the JSON for you into NamedList (response.getResponse()). - Steve Thank you for pointing it out. It wasn't apparent what get(key) returns since the method signature of getResponse() merely tells it would return a

Re: Solr special characters like '(' and ''?

2014-04-08 Thread T. Kuro Kurosaka
I don't think is special to the parser. Classic examples like ATT just work, as far as query parser is considered. https://wiki.apache.org/solr/SolrQuerySyntax even tells that you can escape the special meaning by the backslash. is special in the URL, however, and that has to be hex-escaped

Re: Analysis of Japanese characters

2014-04-07 Thread T. Kuro Kurosaka
Tom, You should be using JapaneseAnalyzer (kuromoji). Neither CJK nor ICU tokenize at word boundaries. On 04/02/2014 10:33 AM, Tom Burton-West wrote: Hi Shawn, I'm not sure I understand the problem and why you need to solve it at the ICUTokenizer level rather than the CJKBigramFilter Can you

Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread T. Kuro Kurosaka
On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an

w/10 ? [was: Partial Counts in SOLR]

2014-03-19 Thread T. Kuro Kurosaka
In the thread Partial Counts in SOLR, Salman gave us this sample query: ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or purchase* or repurchase*)) w/10 (executive or director) I'm not familiar with this w/10 notation. What does this mean, and what parser(s) supports this

Re: Apache Solr Configuration Problem (Japanese Language)

2014-03-06 Thread T. Kuro Kurosaka
with multilingual fields. -- T. Kuro Kurosaka • Senior Software Engineer Healthline - The Power of Intelligent Health www.healthline.com |@Healthline | @HealthlineCorp

What types is supported by Solrj addBean() in the fields of POJO objects?

2014-03-03 Thread T. Kuro Kurosaka
What are supported types of the POJO objects that are sent to SolrServer.addBean(obj)? A quick glance of DocumentObjectBinder seems to suggest that an arbitrary combination of an Collection, List, ArrayList, array ([]), Map, Hashmap, of primitive types, String and Date is supported, but I'm

search across cores

2014-02-21 Thread T. Kuro Kurosaka
If I want to search across cores, can I use (abuse?) the distributed search? My simple experiment seems to confirm this but I'd like to know if there is any drawbacks other than those of distributed search listed here?

Re: Escape \\n from getting highlighted - highlighter component

2014-02-18 Thread T. Kuro Kurosaka
Your search expression means 'talk' OR 'n' OR 'text'. I think you want to do a phrase search. To do that, quote the whole thing with double-quotes talk n text, if you are using one of the Solr standard query parsers. On 02/17/2014 03:53 PM, Developer wrote: Hi, When searching for a text

Re: geo/spatial search performance comparison using different methods

2013-11-06 Thread T. Kuro Kurosaka
on, then definitely don't use SOLR-2155 or RPT for that, use LatLonType. It's surely faster but I haven't measured it. The best multi-valued distance sort option for Solr 4 is currently this: https://issues.apache.org/jira/browse/SOLR-5170 ~ David On 11/5/13 1:36 PM, T. Kuro Kurosaka k...@healthline.com

geo/spatial search performance comparison using different methods

2013-11-05 Thread T. Kuro Kurosaka
GeoHash, Solr 4 implementation of GeoHash and Solr 4's SpatialRecursivePrefixTreeFieldType (location_rpt). I see comparison of Solr 3 LatLongType vs Solr-2155 3.6.2-work/example/solr/conf/ but it is 2 years old. -- - T. Kuro Kurosaka • Senior Software

Re: character encoding issue...

2013-11-05 Thread T. Kuro Kurosaka
to be a character encoding issue. Any pointers on how to resolve this one? I have seen that this occurs mostly for japanese chinese characters. -- - T. Kuro Kurosaka • Senior Software Engineer

Phrase query with prefix query

2013-08-02 Thread T. Kuro Kurosaka
Is there a query parser that supports a phrase query with prefix query at the end, such as San Fran* ? -- - T. Kuro Kurosaka • Senior Software Engineer

Re: predefined variables usable in schema.xml ?

2012-11-30 Thread T. Kuro Kurosaka
I tried to use ${solr.core.instanceDir} in schema.xml with Solr 4.0, where every deployment is multi-core, and it didn't work. It must be that the description about pre-defined properties in CoreAdmin wiki page is wrong, or it only works in solrconfig.xml, perhaps? On 11/28/12 5:17 PM, T. Kuro

Re: predefined variables usable in schema.xml ?

2012-11-30 Thread T. Kuro Kurosaka
, ${solr.core.instanceDir} is replaced by the value collection1 (no solr/). I was hoping that ${solr.core.instanceDir} would be replaced by the absolute path to the examples/core/collection1 directory. On 11/30/12 2:41 PM, T. Kuro Kurosaka wrote: I tried to use ${solr.core.instanceDir

Re: predefined variables usable in schema.xml ?

2012-11-28 Thread T. Kuro Kurosaka
Thank you, Hoss. I found this SolrWiki page talks about pre-defined properties such as solr.core.instanceDir: http://wiki.apache.org/solr/CoreAdmin I tried to use ${solr.core.instanceDir} in the default single-core schema.xml, and it didn't work. Is this page wrong, or these properties are

predefined variables usable in schema.xml ?

2012-11-27 Thread T. Kuro Kurosaka
Is there a pre-defined variable that can be used in schema.xml to point to the solr core directory, or the conf subdirectory? I thought ${solr.home} or perhaps ${solr.solr.home} might work but they didn't (unless -Dsolr.home=/my/solr/home is supplied, that is). The default solrconfig.xml seems

Re: Any filter to map mutiple tokens into one ?

2012-10-15 Thread T. Kuro Kurosaka
On 10/14/12 12:19 PM, Jack Krupansky wrote: There's a miscommunication here somewhere. Is Solr 4.0 still passing *:* to the analyzer? Show us the parsed query for *:*, as well as the debugQuery explain for the score. I'm not quite sure what you mean by the parsed query for *:*. This fake

Re: Any filter to map mutiple tokens into one ?

2012-10-15 Thread T. Kuro Kurosaka
of a way to tell if a tokenizer is really invoked, let me know. -- Jack Krupansky -Original Message- From: T. Kuro Kurosaka Sent: Monday, October 15, 2012 1:28 PM To: solr-user@lucene.apache.org Subject: Re: Any filter to map mutiple tokens into one ? On 10/14/12 12:19 PM, Jack

Re: Any filter to map mutiple tokens into one ?

2012-10-14 Thread T. Kuro Kurosaka
to check, but SOLR-3261, which was fixed in Solr 3.6 may be your culprit. See: https://issues.apache.org/jira/browse/SOLR-3261 So, try SOlr 3.6 or 3.6.1 or 4.0 to see if your issue goes away. -- Jack Krupansky -Original Message- From: T. Kuro Kurosaka Sent: Friday, October 12, 2012 3:15

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread T. Kuro Kurosaka
around this by reconstructing *:* in the analysis chain. But, what is it you are really trying to do? What's the real problem? (This sounds like a proverbial XY Problem.) -- Jack Krupansky -Original Message- From: T. Kuro Kurosaka Sent: Thursday, October 11, 2012 7:35 PM To: solr-user

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread T. Kuro Kurosaka
that is fixed now... oh well. I mean, feel free to check the revision history for edismax since the 3.5 release. -- Jack Krupansky -Original Message- From: T. Kuro Kurosaka Sent: Friday, October 12, 2012 11:54 AM To: solr-user@lucene.apache.org Subject: Re: Any filter to map mutiple tokens

Any filter to map mutiple tokens into one ?

2012-10-11 Thread T. Kuro Kurosaka
if an analyzer that doesn't break *:* is used. So I'd like to stitch together *, :, * into *:* again to make DisjunctionMaxQuery happy. Thanks. T. Kuro Kurosaka

Why does Solr (1.4.1) keep so many Tokenizer objects?

2012-09-08 Thread T. Kuro Kurosaka
While investigating a bug, I found that Solr keeps many Tokenizer objects. This experimental 80-core Solr 1.4.1 system runs on Tomcat. It was continuously sent indexing requests in parallel, and it eventually died due to OutOfMemory. The heap dump that was taken by the JVM shows there were