Re: Apache Solr Configuration Problem (Japanese Language)
Andy, I don't have a direct answer to your question but I have a question. On 03/05/2014 07:21 AM, Andy Alexander wrote: fq=ss_language:ja&q=製品 I am guessing you have a field called ss_language where a language code of the document is stored, and you have Solr documents of different languages. +DisjunctionMaxQuery((content:製品)~0.01) This indicate your default query field is "content". What does the analyzer for this field look like? Does the analyzer work for any languages that you want to support? Many analyzers have language dependency and won't work with multilingual fields. -- T. "Kuro" Kurosaka • Senior Software Engineer Healthline - The Power of Intelligent Health www.healthline.com |@Healthline | @HealthlineCorp
Apache Solr Configuration Problem (Japanese Language)
I am trying to pass a string of Japanese characters to an Apache Solr query. The string in question is '製品'. When a search is passed without any arguments, it brings up all of the indexed information, including all of the documents that have this particular string in them, however when this parameter is passed in as q=製品, only one of the items is displayed. Furthermore, when I have the query fq=ss_language:ja&q=製品 *three* items are shown. What would cause this peculiar behavior? The field in question where I am searching for this string is indexed, and my assumption is that it should bring up all documents with this string inside of them. Here's the debug information: 製品 製品 +DisjunctionMaxQuery((content:製品)~0.01) +(content:製品)~0.01 0.41303736 = (MATCH) fieldWeight(content:製品 in 80), product of: 1.4142135 = tf(termFreq(content:製品)=2) 5.3405533 = idf(docFreq=3, maxDocs=307) 0.0546875 = fieldNorm(field=content, doc=80) 0.33378458 = (MATCH) fieldWeight(content:製品 in 66), product of: 1.0 = tf(termFreq(content:製品)=1) 5.3405533 = idf(docFreq=3, maxDocs=307) 0.0625 = fieldNorm(field=content, doc=66) 0.2529327 = (MATCH) fieldWeight(content:製品 in 46), product of: 3.4641016 = tf(termFreq(content:製品)=12) 5.3405533 = idf(docFreq=3, maxDocs=307) 0.013671875 = fieldNorm(field=content, doc=46) ExtendedDismaxQParser ss_language:ja ss_language:ja 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
Re: Configuration problem
Am 03.03.2014 um 22:43 schrieb Shawn Heisey: > On 3/3/2014 9:02 AM, Thomas Fischer wrote: >> The setting is >> solr directories (I use different solr versions at the same time): >> /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the >> new "discovery type" (no cores), and inside the core directories are empty >> files core.properties and symbolic links to the universal conf directory. >> solr webapps (I use very different webapps simultaneously): >> /srv/www/webapps/solr/solr4.6.1 is the solr webapp >> >> I tried to convey this information to the tomcat server by putting a file >> solr4.6.1.xml into the cataiina/localhost folder with the contents >> >> > crossContext="true"> >> > value="/srv/solr/solr4.6.1" override="true"/> >> > > Your message is buried deep in another message thread about NoSQL, because > you replied to an existing message rather than starting a new message to > solr-user@lucene.apache.org. On list-mirroring forums like Nabble, nobody > will even see your message (or this reply) unless they actually open that > other thread. This is what it looks like on a threading mail reader > (Thunderbird): > > https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png Yes, I'm sorry, I only afterwards realized that my question inherited the thread from the E-Mail I was reading and using as a template for the answer. Meanwhile I figured out that I overlooked the third place to define solr home for Tomcat (after JAVA_OPTS and JNDI): web.xml in WEB-INF of the given webapp. This overrides the other definitions and created the impression that I couldn't set solr home. But now I get the message "Could not load config file /srv/solr/solr4.6.1/cores/geo/solrconfig.xml" for the core "geo". In the solr wiki I read (http://wiki.apache.org/solr/ConfiguringSolr): "In each core, Solr will look for a conf/solrconfig.xml file" and expected solr to look for /srv/solr/solr4.6.1/cores/geo/conf/solrconfig.xml (which exists), but obviously it doesn't. Why? My misunderstanding? Best Thomas
Re: Configuration problem
On 3/3/2014 9:02 AM, Thomas Fischer wrote: The setting is solr directories (I use different solr versions at the same time): /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new "discovery type" (no cores), and inside the core directories are empty files core.properties and symbolic links to the universal conf directory. solr webapps (I use very different webapps simultaneously): /srv/www/webapps/solr/solr4.6.1 is the solr webapp I tried to convey this information to the tomcat server by putting a file solr4.6.1.xml into the cataiina/localhost folder with the contents Your message is buried deep in another message thread about NoSQL, because you replied to an existing message rather than starting a new message to solr-user@lucene.apache.org. On list-mirroring forums like Nabble, nobody will even see your message (or this reply) unless they actually open that other thread. This is what it looks like on a threading mail reader (Thunderbird): https://www.dropbox.com/s/87ilv7jls7y5gym/solr-reply-thread.png I don't use Tomcat, so I can't even begin to comment on that. I can talk about your solr home setting and what Solr is going to do with that. You probably do not have /srv/solr/solr4.6.1/solr.xml on your system. Solr will look for solr.mxl in your solr home, and if it cannot find it, it assumes that you are not running multicore, so it look for things like collection1/conf/solrconfig.xml instead. There is a solr.xml in the example. Use that, changing as necessary, or create a solr.xml file with just the following line in it. It will probably start working: You *might* need the following instead, but since Solr uses standard XML parsing libraries, I would guess that the above line will work. Thanks, Shawn
Configuration problem
Hello, for some reason I have problems to get my local solr system to run (MacBook, tomcat 6.0.35). The setting is solr directories (I use different solr versions at the same time): /srv/solr/solr4.6.1 is the solr home, in solr home is a file solr.xml of the new "discovery type" (no cores), and inside the core directories are empty files core.properties and symbolic links to the universal conf directory. solr webapps (I use very different webapps simultaneously): /srv/www/webapps/solr/solr4.6.1 is the solr webapp I tried to convey this information to the tomcat server by putting a file solr4.6.1.xml into the cataiina/localhost folder with the contents The Tomcat Manager shows solr4.6.1 as started, but following the given link gives an error with the message: "SolrCore 'collection1' is not available due to init failure: Could not load config file /srv/solr4.6.1/collection1/solrconfig.xml" which is plausible, since 1. there is no folder /srv/solr4.6.1/collection1 and 2.for the actual cores solrconfig.xml is inside of /srv/solr4.6.1/cores/geo/conf/ But why does Tomcat try to find a solrconfig.xml there? The problem persists if I start tomcat with -Dsolr.solr.home=/srv/solr/solr4.6.1, it seems that the system just ignores the solr home setting. Can somebody give me a hint what I'm doing wrong? Best regards Thomas P.S.: Is there a way to stop Tomcat from throwing these errors into my face threefold: once as heading (!), once as message and once as description?
Re: SolrEntityProcessor Configuration Problem
The SolrEntityProcessor resolves all of its parameters at start time, not for each query. This technique cannot work. I filed it: https://issues.apache.org/jira/browse/SOLR-3336 On Fri, Apr 6, 2012 at 11:13 AM, wrote: > Dear all, > I'm facing a problem with SolrEntityProcessor, when having it configured > under a JDBC Datasource. > My configuration looks like this: > > > > > > > > > > clob="true"/> > > name="extended_keywords" clob="true"/> > name="publication_date"/> > > > > name="dl_file_entry_id" /> > name="dl_file_version_id" /> > /> > > > > > > fl="content" url="http://vmcenter120:8983/solr/"; > query="folderId:${V_MARKET_STUDIES.DL_FOLDER_ID}" > fq="entryClassPK:${V_MARKET_STUDIES.DL_FILE_ENTRY_ID}"> > > > > > > I have 6 rows in the Oracle Database, but only the first row is processed > right, means that the 2nd Solr is queried > and the results went to the document, the remaining 5 rows where processed > without quering the 2nd Solr and therfore > didn't have the content field filled. > > Any suggestions? > Did I configured something wrong, or misunderstand something wrong? > Thanks for your help > > > Best regards > Michael -- Lance Norskog goks...@gmail.com
SolrEntityProcessor Configuration Problem
Dear all, I'm facing a problem with SolrEntityProcessor, when having it configured under a JDBC Datasource. My configuration looks like this: http://vmcenter120:8983/solr/"; query="folderId:${V_MARKET_STUDIES.DL_FOLDER_ID}" fq="entryClassPK:${V_MARKET_STUDIES.DL_FILE_ENTRY_ID}"> I have 6 rows in the Oracle Database, but only the first row is processed right, means that the 2nd Solr is queried and the results went to the document, the remaining 5 rows where processed without quering the 2nd Solr and therfore didn't have the content field filled. Any suggestions? Did I configured something wrong, or misunderstand something wrong? Thanks for your help Best regards Michael
Re: HTMLStripCharFilterFactory configuration problem
> Actually I am using SolrJ client.. > Is there anyway to do same using solrj. > > thanks If you are using Java, life is easier. You can use this static function before adding a field to SolrInputDocument. static String stripHTMLX(String value) { StringBuilder out = new StringBuilder(); StringReader strReader = new StringReader(value); try { HTMLStripCharFilter html = new HTMLStripCharFilter(CharReader.get(strReader.markSupported() ? strReader : new BufferedReader(strReader))); char[] cbuf = new char[1024 * 10]; while (true) { int count = html.read(cbuf); if (count == -1) break; // end of stream mark is -1 if (count > 0) out.append(cbuf, 0, count); } html.close(); } catch (IOException e) { e.printStackTrace(); return null; // "Failed stripping HTML for column: " + column, e); } return out.toString(); }
Re: HTMLStripCharFilterFactory configuration problem
thanks.. Actually I am using SolrJ client.. Is there anyway to do same using solrj. thanks On Sat, Apr 17, 2010 at 8:06 PM, Ahmet Arslan wrote: > > > > Thanks for reply.. > > but how will I get the stored value instead of indexed > > value.. > > where I need to configure to get stored instead of indexed > > value. > > please help... > > > > You need to remove html tags before analysis (charfilter, tokenizer, > tokenfilter) phase. For example if you are using DIH to index, you can use > HTMLStripTransformer[1]. How are you indexing your data? > > [1]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer > > > >
Re: HTMLStripCharFilterFactory configuration problem
> Thanks for reply.. > but how will I get the stored value instead of indexed > value.. > where I need to configure to get stored instead of indexed > value. > please help... > You need to remove html tags before analysis (charfilter, tokenizer, tokenfilter) phase. For example if you are using DIH to index, you can use HTMLStripTransformer[1]. How are you indexing your data? [1]http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
Re: HTMLStripCharFilterFactory configuration problem
Hi Sven, Thanks for reply.. but how will I get the stored value instead of indexed value.. where I need to configure to get stored instead of indexed value. please help... thanks with regards On Wed, Apr 14, 2010 at 3:16 PM, Sven Maurmann wrote: > Hi, > > please note that you get the stored value of the field as a result and > not the indexed one. > > Cheers, > Sven > > > --On Wednesday, April 14, 2010 02:54:52 PM +0530 Ranveer Kumar < > ranveer.s...@gmail.com> wrote: > > Hi all, >> >> I am facing problem to configure HTMLStripCharFilterFactory. >> following is the schema : >> > positionIncrementGap="100"> >> >> >> >>>ignoreCase="true" >>words="stopwords.txt" >>enablePositionIncrements="true" >>/> >>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> >>> language="English" protected="protwords.txt"/> >> >> >> >> >> >>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> >>>ignoreCase="true" >>words="stopwords.txt" >>enablePositionIncrements="true" >>/> >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> >>> language="English" protected="protwords.txt"/> >> >> >> >> when I am checking with analysis.jsp it giving true result. But in >> my query result still I am getting html tage.. >> I am using solrj client.. >> >> please help me >> > > > > -- > kippdata informationstechnologie GmbH > Sven Maurmann Tel: 0228 98549 -12 > Bornheimer Str. 33a Fax: 0228 98549 -50 > D-53111 Bonnsven.maurm...@kippdata.de > > HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 > Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann >
Re: HTMLStripCharFilterFactory configuration problem
Hi, please note that you get the stored value of the field as a result and not the indexed one. Cheers, Sven --On Wednesday, April 14, 2010 02:54:52 PM +0530 Ranveer Kumar wrote: Hi all, I am facing problem to configure HTMLStripCharFilterFactory. following is the schema : when I am checking with analysis.jsp it giving true result. But in my query result still I am getting html tage.. I am using solrj client.. please help me -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonnsven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
HTMLStripCharFilterFactory configuration problem
Hi all, I am facing problem to configure HTMLStripCharFilterFactory. following is the schema : when I am checking with analysis.jsp it giving true result. But in my query result still I am getting html tage.. I am using solrj client.. please help me
Re: Tomcat JNDI and CWD Configuration problem with multiple solrs
OMG! I found the error. I cant believe how much time i spent on this and it turns out i should pay more attention. (Or really, I can believe, because it happens more frequently than I wish it would). Anyway, for those having the same issue in the future: I am using acts_as_solr, a rails plugin for solr searching. And to set up solr for my railsapp I blindly copied the solrconfig.xml from the plugin to my solr/conf dir and installed the jndi context into tomcat and expected it to work. What I experienced was that it then proceeded to create a solr/index dir under cwd. This because the solrconfig.xml in acts_as_solr has this line in the config: ${solr.data.dir:./solr/data} It seems this creates the datadir under cwd. And it is probably not wanted when you install it on a systemwide tomcat app server. Problem solved when i commented that line out. Hopefully this will save future acts_as_solr users some pain. Albert On Sat, Apr 19, 2008 at 12:05 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : The apps seem to work fine, only for some reason, when I start tomcat it > : creates a solr dir in the cwd. So naturally, depending on where i do the > > Solr does not ever attempt to create a directory named "solr" (the only > directories Solr tries to create if they don't already exist are inside > of hte data dir) > > : restart, it wont work. If i cd to some dir where I have write access, > the > : apps goes up fine, and it even says the solr/home is where it should be. > > so what get's put in this solr dir that is created for you? my guess is > it's the expanded war file -- there is probably a tomcat setting for where > these should go, and your tomcat configs have it as "." > > : (The dir i defined in the xml file, NOT cwd). But under statistics, both > : separate solr apps seems to use an IndexReader under CWD. Ideally I > would > > can you be more explicit about what exactly you are seeing (ie: cut+paste > the log messages from Solr startup about solr home nad JNDI, cut+paste > exactly what you see on the statistics page, etc..., cut+paste the shell > commands you are running -- starting with a call to pwd so we know what > the current directory is, cut+paste the directory listings of each > directory you ae refering to.) > > : my own install of tomcat 6. (Although I dont understand why the example > has > : "f:/" stuff in the directory paths, since that notation throws errors at > me. > > That's just an example of a windows path. > > > -Hoss > >
Re: Tomcat JNDI and CWD Configuration problem with multiple solrs
: The apps seem to work fine, only for some reason, when I start tomcat it : creates a solr dir in the cwd. So naturally, depending on where i do the Solr does not ever attempt to create a directory named "solr" (the only directories Solr tries to create if they don't already exist are inside of hte data dir) : restart, it wont work. If i cd to some dir where I have write access, the : apps goes up fine, and it even says the solr/home is where it should be. so what get's put in this solr dir that is created for you? my guess is it's the expanded war file -- there is probably a tomcat setting for where these should go, and your tomcat configs have it as "." : (The dir i defined in the xml file, NOT cwd). But under statistics, both : separate solr apps seems to use an IndexReader under CWD. Ideally I would can you be more explicit about what exactly you are seeing (ie: cut+paste the log messages from Solr startup about solr home nad JNDI, cut+paste exactly what you see on the statistics page, etc..., cut+paste the shell commands you are running -- starting with a call to pwd so we know what the current directory is, cut+paste the directory listings of each directory you ae refering to.) : my own install of tomcat 6. (Although I dont understand why the example has : "f:/" stuff in the directory paths, since that notation throws errors at me. That's just an example of a windows path. -Hoss
Tomcat JNDI and CWD Configuration problem with multiple solrs
Hello List! I am not an expert at configuring Tomcat, so I must be doing something wrong, but for the life of me, I cannot find anything that would explain this: I want to have two separate solr apps running on one tomcat. I use the exact configuration suggested here: http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac - Under multiple solr webapps. The apps seem to work fine, only for some reason, when I start tomcat it creates a solr dir in the cwd. So naturally, depending on where i do the restart, it wont work. If i cd to some dir where I have write access, the apps goes up fine, and it even says the solr/home is where it should be. (The dir i defined in the xml file, NOT cwd). But under statistics, both separate solr apps seems to use an IndexReader under CWD. Ideally I would want to be able to configure this, so I know where the reader keeps its files. And must both apps share this directory? I suspect this sharing is why I cannot reindex both apps at the same time, since it touches some .lock file during reindexing in the reader dir. I use the exact same xml files under /Catalina/localhost/solr1.xml etc as the wiki says. Same behaviour in tomcat 5.5 that ships with Ubuntu 7.10 and my own install of tomcat 6. (Although I dont understand why the example has "f:/" stuff in the directory paths, since that notation throws errors at me. Does anyone know if I am doing something wrong, and how I can have separate IndexReader folders? Albert