How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
Hi All, I'm trying to post some files to Solr server. I've done this using the post.jar files for posting xml files residing on my local disk[I tried posting all those xml files from example directory]. Now I'm trying to generate xml files on the fly, with required text to be indexed included therein though, and want to post these files to solr. As per the examples we've used SimplePostTool for posting locally resinding files but can some one give me direction on indexing in-memory xml files[files generated on the fly]. Actually I want to automate this process in a loop, so that I'll extract some information and put that to xml file and push it off to Solr for indexing. Thanks in appreciation. --Ahmed.
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet ahmed.bas...@gmail.comwrote: Hi All, I'm trying to post some files to Solr server. I've done this using the post.jar files for posting xml files residing on my local disk[I tried posting all those xml files from example directory]. Now I'm trying to generate xml files on the fly, with required text to be indexed included therein though, and want to post these files to solr. As per the examples we've used SimplePostTool for posting locally resinding files but can some one give me direction on indexing in-memory xml files[files generated on the fly]. Actually I want to automate this process in a loop, so that I'll extract some information and put that to xml file and push it off to Solr for indexing. Thanks in appreciation. You can use the Solrj client to avoid building the intermediate XML yourself. Extract the information, use the Solrj api to add the extracted text to fields and send them to the solr server. http://wiki.apache.org/solr/Solrj -- Regards, Shalin Shekhar Mangar.
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
Hi, After going through the solrj wiki I found that we've to set some dependencies in pom.xml for using Solrj, which I haven't done yet. So I googled to know how to do that but no help. I searched the solr directory and found a bunch of *-pom.template files [like solr-core-pom.xml, solr-solrj-pom.xml etc] and I'm not able to figure out which one to use. Any help would be appreciated. Thanks, Ahmed. On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet ahmed.bas...@gmail.comwrote: Shalin, thanks for your quick response. Actually I'm trying to pull plaintext from html pages and trying to make xml files for each page. I went through the SolrJ webpage and found that the we've to add all the field and its contents anyway, right? but yes it makes adding/updating etc quite easier than using that SimplePostTool. I tried to use SolrJ client but it doesnot seem to be working. I added all the jar files mentioned in SolrJ wiki to classpath but still its giving me some error. To be precise it gives me the following error, .cannot find symbol: symbol : class CommonsHttpSolrServer I rechecked to make sure that commons-httpclient-3.1.jar is in the class path. Can someone please point me what is the issue? I'm working on Windows and my classpath variable is this: .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar Thank you very much. Ahmed. On Mon, Apr 27, 2009 at 3:55 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 27, 2009 at 3:30 PM, ahmed baseet ahmed.bas...@gmail.com wrote: Hi All, I'm trying to post some files to Solr server. I've done this using the post.jar files for posting xml files residing on my local disk[I tried posting all those xml files from example directory]. Now I'm trying to generate xml files on the fly, with required text to be indexed included therein though, and want to post these files to solr. As per the examples we've used SimplePostTool for posting locally resinding files but can some one give me direction on indexing in-memory xml files[files generated on the fly]. Actually I want to automate this process in a loop, so that I'll extract some information and put that to xml file and push it off to Solr for indexing. Thanks in appreciation. You can use the Solrj client to avoid building the intermediate XML yourself. Extract the information, use the Solrj api to add the extracted text to fields and send them to the solr server. http://wiki.apache.org/solr/Solrj -- Regards, Shalin Shekhar Mangar.
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet ahmed.bas...@gmail.comwrote: To be precise it gives me the following error, .cannot find symbol: symbol : class CommonsHttpSolrServer I rechecked to make sure that commons-httpclient-3.1.jar is in the class path. Can someone please point me what is the issue? I'm working on Windows and my classpath variable is this: .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar The jars look right. It is likely a problem with your classpath. CommonsHttpSolrServer is in the solr-solrj jar. If you are using Maven, then you'd need to change your pom.xml -- Regards, Shalin Shekhar Mangar.
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
Can anyone help me selecting the proper pom.xml file out of the bunch of *-pom.xml.templates available. I got the following when searched for pom.xml files, solr-common-csv-pom.xml solr-lucene-analyzers-pom.xml solr-lucene-contrib-pom.xml solr-lucene-*-pom.xml [ a lot of solr-lucene-... pom files are available, hence shortened to avoid typing all] solr-dataimporthandler-pom.xml solr-common-pom.xml solr-core-pom.xml solr-parent-pom.xml solr-solr-pom.xml Thanks, Ahmed. On Mon, Apr 27, 2009 at 5:38 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 27, 2009 at 4:53 PM, ahmed baseet ahmed.bas...@gmail.com wrote: To be precise it gives me the following error, .cannot find symbol: symbol : class CommonsHttpSolrServer I rechecked to make sure that commons-httpclient-3.1.jar is in the class path. Can someone please point me what is the issue? I'm working on Windows and my classpath variable is this: .;E:\Program Files\Java\jdk1.6.0_05\bin;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-httpclient-3.1.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-common.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\apache-solr-1.3.0.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\solr-solrj-1.3.0.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-io-1.3.1.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-codec-1.3.jar;D:\firefox download\apache-solr-1.3.0\apache-solr-1.3.0\dist\solrj-lib\commons-logging-1.0.4.jar The jars look right. It is likely a problem with your classpath. CommonsHttpSolrServer is in the solr-solrj jar. If you are using Maven, then you'd need to change your pom.xml -- Regards, Shalin Shekhar Mangar.
Re: How to post in-memory[not residing on local disks] Xml files to Solr server for indexing?
On Mon, Apr 27, 2009 at 6:27 PM, ahmed baseet ahmed.bas...@gmail.comwrote: Can anyone help me selecting the proper pom.xml file out of the bunch of *-pom.xml.templates available. Ahmed, are you using Maven? If not, then you do not need these pom files. If you are using Maven, then you need to add a dependency to solrj. http://wiki.apache.org/solr/Solrj#head-674dd7743df665fdd56e8eccddce16fc2de20e6e -- Regards, Shalin Shekhar Mangar.
Re: Date faceting - howto improve performance
You mean doc A and doc B will become one doc after adding index 2 to index 1? I don't think this is currently supported either at Lucene level or at Solr level. If index 1 has m docs and index 2 has n docs, index 1 will have m+n docs after adding index 2 to index 1. Documents themselves are not modified by index merge. Cheers, Ning On Sat, Apr 25, 2009 at 4:03 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Hmm looking in the code for the IndexMerger in Solr (org.apache.solr.update.DirectUpdateHandler(2) See that the IndexWriter.addIndexesNoOptimize(dirs) is used (union of indexes) ? And the test class org.apache.solr.client.solrj.MergeIndexesExampleTestBase suggests: add doc A to index1 with id=AAA,name=core1 add doc B to index2 with id=BBB,name=core2 merge the two indexes into one index which then contains both docs. The resulting index will have 2 docs. Great but in my case I think it should work more like this. add doc A to index1 with id=X,title=blog entry title,description=blog entry description add doc B to index2 with id=X,score=1.2 somehow add index2 to index1 so id=XX has score=1.2 when searching in index1 The resulting index should have 1 doc. So this is not really what I want right ? Sorry for being a smart-ass... Kindly //Marcus On Sat, Apr 25, 2009 at 5:10 PM, Marcus Herou marcus.he...@tailsweep.comwrote: Guys! Thanks for these insights, I think we will head for Lucene level merging strategy (two or more indexes). When merging I guess the second index need to have the same doc ids somehow. This is an internal id in Lucene, not that easy to get hold of right ? So you are saying the the solr: ExternalFileField + FunctionQuery stuff would not work very well performance wise or what do you mean ? I sure like bleeding edge :) Cheers dudes //Marcus On Sat, Apr 25, 2009 at 3:46 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I should emphasize that the PR trick I mentioned is something you'd do at the Lucene level, outside Solr, and then you'd just slip the modified index back into Solr. Of, if you like the bleeding edge, perhaps you can make use of Ning Li's Solr index merging functionality (patch in JIRA). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Saturday, April 25, 2009 9:41:45 AM Subject: Re: Date faceting - howto improve performance Yes, you could simply round the date, no need for a non-date type field. Yes, you can add a field after the fact by making use of ParallelReader and merging (I don't recall the details, search the ML for ParallelReader and Andrzej), I remember he once provided the working recipe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Marcus Herou To: solr-user@lucene.apache.org Sent: Saturday, April 25, 2009 6:54:02 AM Subject: Date faceting - howto improve performance Hi. One of our faceting use-cases: We are creating trend graphs of how many blog posts that contains a certain term and groups it by day/week/year etc. with the nice DateMathParser functions. The performance degrades really fast and consumes a lot of memory which forces OOM from time to time We think it is due the fact that the cardinality of the field publishedDate in our index is huge, almost equal to the nr of documents in the index. We need to address that... Some questions: 1. Can a datefield have other date-formats than the default of -MM-dd HH:mm:ssZ ? 2. We are thinking of adding a field to the index which have the format -MM-dd to reduce the cardinality, if that field can't be a date, it could perhaps be a string, but the question then is if faceting can be used ? 3. Since we now already have such a huge index, is there a way to add a field afterwards and apply it to all documents without actually reindexing the whole shebang ? 4. If the field cannot be a string can we just leave out the hour/minute/second information and to reduce the cardinality and improve performance ? Example: 2009-01-01 00:00:00Z 5. I am afraid that we need to reindex everything to get this to work (negates Q3). We have 8 shards as of current, what would the most efficient way be to reindexing the whole shebang ? Dump the entire database to disk (sigh), create many xml file splits and use curl in a random/hash(numServers) manner on them ? Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/
RE: How to index the contents from SVN repository
Hi Ashish, The excellent SVN/CVS repo browser ViewVC http://www.viewvc.org/ has tools to record SVN/CVS commit metadata in a database - seeing how they do it may give you some hints. The INSTALL file gives pointers to the relevant tools (look for the SQL CHECKIN DATABASE section): http://viewvc.tigris.org/svn/viewvc/trunk/INSTALL ViewVC doesn't have file content search capabilities yet - maybe while you're at it, you could contribute your work to that project :). Good luck, Steve On 4/27/2009 at 1:12 AM, Ashish P wrote: Right. But is there a way to track file updates and diffs. Thanks, Ashish Noble Paul നോബിള് नोब्ळ् wrote: If you can check it out into a directory using SVN command then you may use DIH to index the content. a combination of FileListEntityProcessor and PlainTextEntityProcessor may help On Sun, Apr 26, 2009 at 1:38 PM, Ashish P ashish.ping...@gmail.com wrote: Is there any way to index contents of SVN rep in Solr ??
Solr 1.4 Release Date
Hi, I am curious to know when is the scheduled/tentative release date of Solr 1.4. Thanks, Gurjot
Configuration of format and type index with solr
Hi, I work with Lucne there is some years and I use some advanced resources of the library as different formats of index and types of persistency. Now I would like use Solr. Is possible to configure these resources using solr ? My doubt is about of possibility of configurate in solr this four themes: 1- Guarantee that my searcher (solr) ALWAYS search in my index in *memory * (use RAMDirectory). Not to use cache. 2- Guarantee that my searcher (solr) ALWAYS search in my index in *file system* (use FSDirectory). 3- Persist my index is genereted in only one archive in File System (optimized) 4- Persist my index (RAMDirectory) in serializable archive java. I need create a loader that load my .ser and deseriaze the class RAMDirectory e set in the searcher class. Is possible add any component that manipule the index ? Thanks Haroldo
Re: How to index the contents from SVN repository
I would suggest looking at Apache commons VFS and using the solrj API: http://commons.apache.org/vfs/ With SVN, you may be able to use the webdav provider. ryan On Apr 26, 2009, at 4:08 AM, Ashish P wrote: Is there any way to index contents of SVN rep in Solr ?? -- View this message in context: http://www.nabble.com/How-to-index-the-contents-from-SVN-repository-tp23240110p23240110.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Performance bottleneck
As a follow-up note, we solved our problem by moving the indexes to local store and upgrading to Solr 1.4. I did a thread dump against our 1.3 Solr instance and it was spending lots of time blocking on index section loading. The NIO implementation in 1.4 solved that problem and copying to local store almost certainly reduced file loading time. Trying to point multiple Solrs on multiple boxes at a single shared directory is almost certainly doomed to failure; the read-only Solrs won't know when the read/write Solr instance has updated the index. We are going to try to move our indexes back to shared disk, as our backup solutions are all tied to the shared disk. Also, if an individual box fails, we can bring up a new box and point it at the shared disk. Are there any known problems with NIO and NFS that will cause this to fail? Can anyone suggest a better solution? Thanks, Jon -- View this message in context: http://www.nabble.com/Solr-Performance-bottleneck-tp23209595p23262198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Performance bottleneck
This isn't a new problem, NFS was 100X slower than local disk for me with Solr 1.1. Backing up indexes is very tricky. You need to do it while the are not being updated, or you'll get a corrupt copy. If your indexes aren't large, you are probably better off backing up the source documents and building new indexes from scratch. wunder On 4/27/09 11:27 AM, Jon Bodner jbod...@blackboard.com wrote: As a follow-up note, we solved our problem by moving the indexes to local store and upgrading to Solr 1.4. I did a thread dump against our 1.3 Solr instance and it was spending lots of time blocking on index section loading. The NIO implementation in 1.4 solved that problem and copying to local store almost certainly reduced file loading time. Trying to point multiple Solrs on multiple boxes at a single shared directory is almost certainly doomed to failure; the read-only Solrs won't know when the read/write Solr instance has updated the index. We are going to try to move our indexes back to shared disk, as our backup solutions are all tied to the shared disk. Also, if an individual box fails, we can bring up a new box and point it at the shared disk. Are there any known problems with NIO and NFS that will cause this to fail? Can anyone suggest a better solution? Thanks, Jon
adding plug-in after search is done
trying to manipulate search result (like further filtering out unwanted), and ordering the results differently. Where is the suitable place for doing it? I've been using QueryResponseWriter but that doesn't seem to be the right place. thanks. _ Rediscover Hotmail®: Get quick friend updates right in your inbox. http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates2_042009
fail to create or find snapshoot
Hi, According to Solr's wiki page http://wiki.apache.org/solr/SolrReplication, if I send the following request to master, a snapshoot will be created http://master_host:port/solr/replication?command=snapshoothttp://master_host/solr/replication?command=snapshoot But after I did it, nothing seemed happening. I got this response back, ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2/int/lst /response and I checked the data directory, no snapshoot was created. I am not sure what to expect after making the request, and where to find the snapshoot files (and what they are). Thanks, Jianhan
Re: fail to create or find snapshoot
Actually, I found the snapshot in the directory where solr was lauched. Is this done on purpose? shouldn't it be in the data directory? Thanks, Jianhan On Mon, Apr 27, 2009 at 11:43 AM, Jian Han Guo jian...@gmail.com wrote: Hi, According to Solr's wiki page http://wiki.apache.org/solr/SolrReplication, if I send the following request to master, a snapshoot will be created http://master_host:port/solr/replication?command=snapshoothttp://master_host/solr/replication?command=snapshoot But after I did it, nothing seemed happening. I got this response back, ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime2/int/lst /response and I checked the data directory, no snapshoot was created. I am not sure what to expect after making the request, and where to find the snapshoot files (and what they are). Thanks, Jianhan
Re: Configuration of format and type index with solr
On Mon, Apr 27, 2009 at 10:40 PM, hpn1975 nasc hpn1...@gmail.com wrote: 1- Guarantee that my searcher (solr) ALWAYS search in my index in *memory * (use RAMDirectory). Not to use cache. It is possible to disable all caches. But it is not possible to use RAMDirectory right now. This is in progress. https://issues.apache.org/jira/browse/SOLR-465 2- Guarantee that my searcher (solr) ALWAYS search in my index in *file system* (use FSDirectory). Yes, that is the default and only way currently. 3- Persist my index is genereted in only one archive in File System (optimized) The useCompoundFile setting in solrconfig.xml can help here. 4- Persist my index (RAMDirectory) in serializable archive java. I need create a loader that load my .ser and deseriaze the class RAMDirectory e set in the searcher class. I don't think you can use RAMDirectory right now. However if the use-case behind serializing a ram directory is for replication, then there are alternate methods available. http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrReplication Is possible add any component that manipule the index ? Yes. You can write your own request handlers and search components. -- Regards, Shalin Shekhar Mangar.
Re: adding plug-in after search is done
On Tue, Apr 28, 2009 at 12:04 AM, siping liu siping...@hotmail.com wrote: trying to manipulate search result (like further filtering out unwanted), and ordering the results differently. Where is the suitable place for doing it? I've been using QueryResponseWriter but that doesn't seem to be the right place. You should probably look at writing your own SearchComponent. Also look at the QueryElevationComponent which can help with fixing the positions of some documents in the result set. -- Regards, Shalin Shekhar Mangar.
Re: offline solr indexing
On Tue, Apr 28, 2009 at 12:38 AM, Charles Federspiel charles.federsp...@gmail.com wrote: Solr Users, Our app servers are setup on read-only filesystems. Is there a way to perform indexing from the command line, then copy the index files to the app-server and use Solr to perform search from inside the servlet container? If the filesystem is read-only, then how can you index at all? But what I think you are describing is the regular master-slave setup that we use. A dedicated master on which writes are performed. Multiple slaves on which searches are performed. The index is replicated to slaves through script or the new java based replication. If the Solr implementation is bound to http requests, can Solr perform searches against an index that I create with Lucene? thank you, It can but it is a little tricky to get the schema and analysis correct between your Lucene writer and Solr searcher. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler Questions-Load data in parallel and temp tables
On Tue, Apr 28, 2009 at 3:43 AM, Amit Nithian anith...@gmail.com wrote: All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently if you want to use DIH then the Solr master doubles up as the index loader as well. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this? I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. Pretty specific to DBs. However, isn't this something that can be done in your database with views? 2) Parallel support Parallelizing import of root-entities might be the easiest to attempt. There's also an issue open to write to Solr (tokenization/analysis) in a separate thread. Look at https://issues.apache.org/jira/browse/SOLR-1089 We actually wrote a multi-threaded DIH during the initial iterations. But we discarded it because we found that the bottleneck was usually the database (too many queries) or Lucene indexing itself (analysis, tokenization) etc. The improvement was ~10% but it made the code substantially more complex. The only scenario in which it helped a lot was when importing from HTTP or a remote database (slow networks). But if you think it can help in your scenario, I'd say go for it. - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) Not sure what you mean here. 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). Solrj integration would be helpful to many I think. There's an issue open. Look at https://issues.apache.org/jira/browse/SOLR-853 -- Regards, Shalin Shekhar Mangar.
Re: Solr test anyone?
Yes, look at AbstractSolrTestCase which is the base class of almost all Solr tests. http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/util/AbstractSolrTestCase.java On Mon, Apr 27, 2009 at 6:38 PM, Eric Pugh ep...@opensourceconnections.comwrote: Look into the test code that Solr uses, there is a lot of good stuff on how to do testing. http://svn.apache.org/repos/asf/lucene/solr/trunk/src/test/. Eric On Apr 27, 2009, at 6:25 AM, tarjei wrote: Hi, I'm looking for ways to test that my indexing methods work correctly with my Solr schema. Therefore I'm wondering if someone has created a test setup where they start a Solr instance and then add some documents to the instance - as a Junit/testng test - preferably with a working Maven dependencies for it as well. I've tried googling for this as well as setting it up myself, but I have never managed to get a test working like I want it to. Kind regards, Tarjei - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- Regards, Shalin Shekhar Mangar.
Re: facet results in order of rank
On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb ristretto...@gmail.comwrote: Hello, Is it possible to order the facet results on some ranking score? I've had a look at the facet.sort param, ( http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1 ) but that seems to order the facet either by count or by index value (in my case alphabetical.) Facets are not ranked because there is no criteria for determining relevancy for them. They are just the count of documents for each term in a given field computed for the current result set. We are facing a big number of facet results for multiple termed queries that are OR'ed together. We want to keep the OR nature of our queries, but, we want to know which facet values are likely to give you higher ranked results. We could AND together the terms, to get the facet list to be more manageable, but we would be filtering out too many results. We prefer to OR terms and let the ranking bring the good stuff to the top. For example, suppose we have a index of all known animals and each doc has a field AO for animal-origin. Suppose we search for: wolf grey forest Europe And generate facets AO. We might get the following facet results: For the AO field, lots of countries of the world probably have grey or forest or wolf or Europe in their indexing data, so I'm asserting we'd get a big list here. But, only some of the countries will have all 4 terms, and those are the facets that will be the most interesting to drill down on. Is there a way to figure out which facet is the most highly ranked like this? Suppose 10 documents match the query you described. If you facet on AO, then it would just go through all the terms in AO and give you the number of documents which have that term. There's no question of relevance at all here. The returned documents themselves are of course ranked according to the relevancy score. Perhaps I've misunderstood the query? -- Regards, Shalin Shekhar Mangar.
Re: Get the field value that caused the result
On Sat, Apr 25, 2009 at 8:25 PM, Wouter Samaey wouter.sam...@gmail.comwrote: I'm looking into a way to determine the value of a field that caused the result to be returned. Can highlighting help here? It returns the snipped from the document which matched the query. http://wiki.apache.org/solr/HighlightingParameters -- Regards, Shalin Shekhar Mangar.
Re: Authenticated Indexing Not working
On Sun, Apr 26, 2009 at 11:04 AM, Allahbaksh Asadullah allahbaks...@gmail.com wrote: HI Otis, I am using HTTPClient for authentication. When I use the server with Authentication for searching it works fine. But when I use it for indexing it throws error. What is the error? Is it thrown by Solr or your servlet container? One difference between a search request and update request with Solrj is that a search request uses HTTP GET by default but an update request uses an HTTP POST by default. Perhaps your authentication scheme is not configured correctly for POST requests? -- Regards, Shalin Shekhar Mangar.
Re: Phonetic analysis with the spell-check component?
On Sun, Apr 26, 2009 at 11:55 PM, David Smiley @MITRE.org dsmi...@mitre.org wrote: It appears to me that the spell-check component can't build a dictionary based on phonetic similarity (i.e. using a Phonetic analysis filter). Sure, you can go ahead and configure the spell check component to use a field type that uses a phonetic filter but the suggestions presented to the user are based on the indexed values (i.e. phonemes), not the original words. Thus the user will be presented with a suggested phoneme which is a poor user experience. It's not clear how this shortcoming could be rectified because for a given phoneme, there are potentially multiple words to choose from that could be encoded to a given phoneme. Hmm. I think the problem here is that spell checker creates its own index with the indexed tokens of a Solr field. So it does not have the original words anymore. But if we could have an option to store the original words as well into the spell check index, we could return them as suggestions. Do you mind creating an Jira issue so that we don't forget about this? -- Regards, Shalin Shekhar Mangar.
Re: MacOS Failed to initialize DataSource:db+ DataimportHandler ???
Hi, sure: message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311) at org.apache.solr.core.SolrCore.init(SolrCore.java:480) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: mydb Processing Document # at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308) at org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273) at org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) ... 31 more Caused by: org.apache.solr.common.SolrException: Could not load driver: com.mysql.jdbc.Driver at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112) at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:65) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306) ... 35 more Caused by: java.lang.ClassNotFoundException: Unable to load com.mysql.jdbc.Driver or org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587) at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:110) ... 37 more Caused by: org.apache.solr.common.SolrException: Error loading class 'com.mysql.jdbc.Driver' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:577) ... 38 more Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:242) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257) ... 39 more
Re: facet results in order of rank
Thanks for the reply Your thoughts are what I initially was thinking. But, given some more consideration, I imagined a system that would take all the docs that would be returned for a given facet, and get an average score based on their scores from the original search that produced the facets. This would be the facet values rank. So, a higher ranked facet value would be more likely to return higher ranked results. The idea is that if you want a broad loose search over a large dataset, and you order the results based on rank, so you get the most relevant results at the top, e.g. the first page in a search engine website. You might have pages and pages of results, but it's the first few pages of results that are highly ranked that most users generally see. As the relevance tapers off, then generally do another search. However, if you compute facet values on these results, you have no way of knowing if one facet value for a field is more or less likely to return higher scored, relevant records for the user. You end up getting facet values that match records that is often totally irrelevant. We can sort by Index order, or Count of docs returned. Would I would like is a sort based on Score, such that it would be sum(scores)/Count. I would assume that most users would be interested in the higher ranked ones more often. So, a more efficient UI could be built to show just the high ranked facets on this score, and provide a control to show all the facets (not just the high ranked ones.) Does this clear up my post at all? Perhaps this wouldn't be too hard for me to implement. I have lots of Java experience, but no experience with Lucene or Solr code. thoughts? thanks gene On Tue, Apr 28, 2009 at 10:56 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb ristretto...@gmail.comwrote: Hello, Is it possible to order the facet results on some ranking score? I've had a look at the facet.sort param, ( http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1 ) but that seems to order the facet either by count or by index value (in my case alphabetical.) Facets are not ranked because there is no criteria for determining relevancy for them. They are just the count of documents for each term in a given field computed for the current result set. We are facing a big number of facet results for multiple termed queries that are OR'ed together. We want to keep the OR nature of our queries, but, we want to know which facet values are likely to give you higher ranked results. We could AND together the terms, to get the facet list to be more manageable, but we would be filtering out too many results. We prefer to OR terms and let the ranking bring the good stuff to the top. For example, suppose we have a index of all known animals and each doc has a field AO for animal-origin. Suppose we search for: wolf grey forest Europe And generate facets AO. We might get the following facet results: For the AO field, lots of countries of the world probably have grey or forest or wolf or Europe in their indexing data, so I'm asserting we'd get a big list here. But, only some of the countries will have all 4 terms, and those are the facets that will be the most interesting to drill down on. Is there a way to figure out which facet is the most highly ranked like this? Suppose 10 documents match the query you described. If you facet on AO, then it would just go through all the terms in AO and give you the number of documents which have that term. There's no question of relevance at all here. The returned documents themselves are of course ranked according to the relevancy score. Perhaps I've misunderstood the query? -- Regards, Shalin Shekhar Mangar.
highlighting html content
Hi, I've been looking around but can't seem to find any clear instruction on how to do this... I'm storing html content and would like to enable highlighting on the html content. The problem is that the search can sometimes match html element names or attributes, and when the highlighter adds the highlight tags, the html is bad. I've been toying with setting custom pre/post delimiters and then removing them in the client, but I thought I'd ask the list before I go to far with that idea :) Thanks, Matt
Re: Solr 1.4 Release Date
Gurjot, please see http://wiki.apache.org/solr/Solr1.4 - we are currently 33 JIRA issues away. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gurjot Singh gurjotas...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, April 27, 2009 12:45:32 PM Subject: Solr 1.4 Release Date Hi, I am curious to know when is the scheduled/tentative release date of Solr 1.4. Thanks, Gurjot
Re: Term highlighting with MoreLikeThisHandler?
Eric, Have you tried using MLT with parameters described on http://wiki.apache.org/solr/HighlightingParameters ? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Eric Sabourin eric.sabourin2...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, April 27, 2009 10:31:38 AM Subject: Term highlighting with MoreLikeThisHandler? I submit a query to the MoreLikeThisHandler to find documents similar to a specified document. This works and I've configured my request handler to also return the interesting terms. Is it possible to have MLT return to me highlight snippets in the similar documents it returns? I mean generate hl snippets of the interesting terms? If so how? Thanks... Eric
Re: SOLRizing advice?
My turn to help, Paul. There is no such page on the Solr Wiki, but I agree with Paul, this can really be a quick and painless migration for typical Lucene/Solr setups. This is roughly how I'd do things: - I'd set up Solr - I'd create the schema.xml mimicking the fields in the existing Lucene index - I'd copy over the Lucene index, keeping in mind Lucene jar versions, Solr/Lucene jar versions, and index compatibility - Start Solr - Go to Admin page and run test queries - Go to schema/solrconfig.xml and add various other things - proper cache sizes, index replication, dismax, spellchecker, etc. - Go to Lucene-based indexer classes and change them to use Solrj - Go to Lucene-based searcher classes and change them to use Solrj I'd leave embedded Solr and dynamic fields for phase 2 of the migration, unless those things really are necessary. I don't think you'd need to do anything with web.xml - solr comes as a webapp, packaged in a way, which contains its own web.xml Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Paul Libbrecht p...@activemath.org To: solr-user@lucene.apache.org Sent: Monday, April 27, 2009 12:35:59 AM Subject: SOLRizing advice? Hello list, I am surely not the only one who wishes to migrate from bare lucene to solr. Many different reasons can be there, e.g. facetting, web-externalization, ease of update... what interests me here are the steps needed in the form of advice as to what to use. Here's a few hints. I would love a web-page grouping all these: - first change references to indexwriter/indexreader/indexsearch to be those of SOLR using embedded-solr-server - make a first solr schema with appropriate analyzers by defining particular dynamic fields - slowly replace the queries methods with solr queries, slowly taking advantage of solr features - web-expose the solr core for at least admin by merging the web.xml Does such a web-page already exist? thanks in advance paul
Re: Fwd: Question about MoreLikeThis
Hello, Well, if you want documents similar to a specific document, then just make sure the query (q) matches that one document. You can do that by using the uniqueKey field in the query, e.g. q=id:123 . Then you will get documents similar to that one document that matched your id:123 query. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: jli...@gmail.com jli...@gmail.com To: solr-user@lucene.apache.org Sent: Sunday, April 26, 2009 5:49:26 AM Subject: Fwd: Question about MoreLikeThis I think I understand it now. It means to return MoreLikeThis docs for every doc in the result. ===8==Original message text=== Hi, I have a question about what MoreLikeThis means - I suppose it means get more documents that are similar to _this_ document. So I expect the query always take a known document as argument. I wonder how I should interpret this query: http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score It doesn't seem to specify a document. So what's the This in MoreLikeThis in this case? Or, this means something else, and not a document?
half width katakana
I want to convert half width katakana to full width katakana. I tried using cjk analyzer but not working. Does cjkAnalyzer do it or is there any other way?? -- View this message in context: http://www.nabble.com/half-width-katakana-tp23270186p23270186.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler Questions-Load data in parallel and temp tables
there is an issue already to write to the index in a separate thread. https://issues.apache.org/jira/browse/SOLR-1089 On Tue, Apr 28, 2009 at 4:15 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 28, 2009 at 3:43 AM, Amit Nithian anith...@gmail.com wrote: All, I have a few questions regarding the data import handler. We have some pretty gnarly SQL queries to load our indices and our current loader implementation is extremely fragile. I am looking to migrate over to the DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom stuff to remotely load the indices so that my index loader and main search engine are separated. Currently if you want to use DIH then the Solr master doubles up as the index loader as well. Currently, unless I am missing something, the data gathering from the entity and the data processing (i.e. conversion to a Solr Document) is done sequentially and I was looking to make this execute in parallel so that I can have multiple threads processing different parts of the resultset and loading documents into Solr. Secondly, I need to create temporary tables to store results of a few queries and use them later for inner joins was wondering how to best go about this? I am thinking to add support in DIH for the following: 1) Temporary tables (maybe call it temporary entities)? --Specific only to SQL though unless it can be generalized to other sources. Pretty specific to DBs. However, isn't this something that can be done in your database with views? 2) Parallel support Parallelizing import of root-entities might be the easiest to attempt. There's also an issue open to write to Solr (tokenization/analysis) in a separate thread. Look at https://issues.apache.org/jira/browse/SOLR-1089 We actually wrote a multi-threaded DIH during the initial iterations. But we discarded it because we found that the bottleneck was usually the database (too many queries) or Lucene indexing itself (analysis, tokenization) etc. The improvement was ~10% but it made the code substantially more complex. The only scenario in which it helped a lot was when importing from HTTP or a remote database (slow networks). But if you think it can help in your scenario, I'd say go for it. - Including some mechanism to get the number of records (whether it be count or the MAX(custom_id)-MIN(custom_id)) Not sure what you mean here. 3) Support in DIH or Solr to post documents to a remote index (i.e. create a new UpdateHandler instead of DirectUpdateHandler2). Solrj integration would be helpful to many I think. There's an issue open. Look at https://issues.apache.org/jira/browse/SOLR-853 -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: MacOS Failed to initialize DataSource:db+ DataimportHandler ???
apparently you do not have the driver in the path. drop your driver jar into ${solr.home}/lib On Tue, Apr 28, 2009 at 4:42 AM, gateway0 reiterwo...@yahoo.de wrote: Hi, sure: message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311) at org.apache.solr.core.SolrCore.init(SolrCore.java:480) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:719) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:578) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Failed to initialize DataSource: mydb Processing Document # at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:308) at org.apache.solr.handler.dataimport.DataImporter.addDataSource(DataImporter.java:273) at org.apache.solr.handler.dataimport.DataImporter.initEntity(DataImporter.java:228) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:98) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) ... 31 more Caused by: org.apache.solr.common.SolrException: Could not load driver: com.mysql.jdbc.Driver at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:112) at org.apache.solr.handler.dataimport.JdbcDataSource.init(JdbcDataSource.java:65) at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(DataImporter.java:306) ... 35 more Caused by: java.lang.ClassNotFoundException: Unable to load com.mysql.jdbc.Driver or org.apache.solr.handler.dataimport.com.mysql.jdbc.Driver at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:587) at org.apache.solr.handler.dataimport.JdbcDataSource.createConnectionFactory(JdbcDataSource.java:110) ... 37 more Caused by: org.apache.solr.common.SolrException: Error loading class 'com.mysql.jdbc.Driver' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:577) ... 38 more Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233) at
Re: half width katakana
Ashish P wrote: I want to convert half width katakana to full width katakana. I tried using cjk analyzer but not working. Does cjkAnalyzer do it or is there any other way?? CharFilter which comes with trunk/Solr 1.4 just covers this type of problem. If you are using Solr 1.3, try the patch attached below: https://issues.apache.org/jira/browse/SOLR-822 Koji
Re: half width katakana
After this should I be using same cjkAnalyzer or use charFilter?? Thanks, Ashish Koji Sekiguchi-2 wrote: Ashish P wrote: I want to convert half width katakana to full width katakana. I tried using cjk analyzer but not working. Does cjkAnalyzer do it or is there any other way?? CharFilter which comes with trunk/Solr 1.4 just covers this type of problem. If you are using Solr 1.3, try the patch attached below: https://issues.apache.org/jira/browse/SOLR-822 Koji -- View this message in context: http://www.nabble.com/half-width-katakana-tp23270186p23270453.html Sent from the Solr - User mailing list archive at Nabble.com.