Re: Restrict access to localhost
for 1) use the tomcat configuration in conf/server.xml Connector address=127.0.0.1 port=8080 ... for 2) if they have direct access to solr either insert a middleware layer or create a write lock ;-) Hello all, 1) I want to restrict access to Solr only in localhost. How to acheive that? 2) If i want to allow the clients to search but not to delete? How to restric the access? Any thoughts? Regards Ganesh. Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php -- http://jetwick.com twitter search prototype
SOLR Thesaurus
Hi List, Coming to and end of a proto type evaluation of SOLR (all very good etc etc) Getting to the point at looking at bells and whistles. Does SOLR have a thesuarus. Cant find any refrerence to one in the docs and on the wiki etc. (Apart from a few mail threads which describe the synonym.txt as a thesuarus) I mean something like: PT: BT: xxx,, NT: xxx,, RT:xxx,xxx,xxx Scope Note: xx, Like i say bells and whistles cheers Lee
Re: Preventing index segment corruption when windows crashes
The Win7 crashes aren't from disk drivers - they come from, in this case, a Broadcom wireless adapter driver. The corruption comes as a result of the 'hard stop' of Windows. I would imagine this same problem could/would occur on any OS if the plug was pulled from the machine. Thanks, Peter On Thu, Dec 2, 2010 at 4:07 AM, Lance Norskog goks...@gmail.com wrote: Is there any way that Windows 7 and disk drivers are not honoring the fsync() calls? That would cause files and/or blocks to get saved out of order. On Tue, Nov 30, 2010 at 3:24 PM, Peter Sturge peter.stu...@gmail.com wrote: After a recent Windows 7 crash (:-\), upon restart, Solr starts giving LockObtainFailedException errors: (excerpt) 30-Nov-2010 23:10:51 org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: nativefsl...@solr\.\.\data0\index\lucene-ad25f73e3c87e6f192c4421756925f47-write.lock When I run CheckIndex, I get: (excerpt) 30 of 30: name=_2fi docCount=857 compound=false hasProx=true numFiles=8 size (MB)=0.769 diagnostics = {os.version=6.1, os=Windows 7, lucene.version=3.1-dev ${svnver sion} - 2010-09-11 11:09:06, source=flush, os.arch=amd64, java.version=1.6.0_18, java.vendor=Sun Microsystems Inc.} no deletions test: open reader.FAILED WARNING: fixIndex() would remove reference to this segment; full exception: org.apache.lucene.index.CorruptIndexException: did not read all bytes from file _2fi.fnm: read 1 vs size 512 at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:367) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReade r.java:119) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:583) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:561) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:467) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:878) WARNING: 1 broken segments (containing 857 documents) detected This seems to happen every time Windows 7 crashes, and it would seem extraordinary bad luck for this tiny test index to be in the middle of a commit every time. (it is set to commit every 40secs, but for such a small index it only takes millis to complete) Does this seem right? I don't remember seeing so many corruptions in the index - maybe it is the world of Win7 dodgy drivers, but it would be worth investigating if there's something amiss in Solr/Lucene when things go down unexpectedly... Thanks, Peter On Tue, Nov 30, 2010 at 9:19 AM, Peter Sturge peter.stu...@gmail.com wrote: The index itself isn't corrupt - just one of the segment files. This means you can read the index (less the offending segment(s)), but once this happens it's no longer possible to access the documents that were in that segment (they're gone forever), nor write/commit to the index (depending on the env/request, you get 'Error reading from index file..' and/or WriteLockError) (note that for my use case, documents are dynamically created so can't be re-indexed). Restarting Solr fixes the write lock errors (an indirect environmental symptom of the problem), and running CheckIndex -fix is the only way I've found to repair the index so it can be written to (rewrites the corrupted segment(s)). I guess I was wondering if there's a mechanism that would support something akin to a transactional rollback for segments. Thanks, Peter On Mon, Nov 29, 2010 at 5:33 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Nov 29, 2010 at 10:46 AM, Peter Sturge peter.stu...@gmail.com wrote: If a Solr index is running at the time of a system halt, this can often corrupt a segments file, requiring the index to be -fix'ed by rewriting the offending file. Really? That shouldn't be possible (if you mean the index is truly corrupt - i.e. you can't open it). -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: Preventing index segment corruption when windows crashes
On Thu, Dec 2, 2010 at 4:10 AM, Peter Sturge peter.stu...@gmail.com wrote: The Win7 crashes aren't from disk drivers - they come from, in this case, a Broadcom wireless adapter driver. The corruption comes as a result of the 'hard stop' of Windows. I would imagine this same problem could/would occur on any OS if the plug was pulled from the machine. Actually, Lucene should be robust to this -- losing power, OS crash, hardware failure (as long as the failure doesn't flip bits), etc. This is because we do not delete files associated with an old commit point until all files referenced by the new commit point are successfully fsync'd. However it sounds like something is wrong, at least on Windows 7. I suspect it may be how we do the fsync -- if you look in FSDirectory.fsync, you'll see that we take a String fileName in. We then open a new read/write RandomAccessFile, and call its .getFD().sync(). I think this is potentially risky, ie, it would be better if we called .sync() on the original file we had opened for writing and written lots of data to, before closing it, instead of closing it, opening a new FileDescriptor, and calling sync on it. We could conceivably take this approach, entirely in the Directory impl, by keeping the pool of file handles for write open even after .close() was called. When a file is deleted we'd remove it from that pool, and when it's finally sync'd we'd then sync it and remove it from the pool. Could it be that on Windows 7 the way we fsync (opening a new FileDescriptor long after the first one was closed) doesn't in fact work? Mike
Re: Preventing index segment corruption when windows crashes
As I'm not familiar with the syncing in Lucene, I couldn't say whether there's a specific problem with regards Win7/2008 server etc. Windows has long had the somewhat odd behaviour of deliberately caching file handles after an explicit close(). This has been part of NTFS since NT 4 days, but there may be some new behaviour introduced in Windows 6.x (and there is a lot of new behaviour) that causes an issue. I have also seen this problem in Windows Server 2008 (server version of Win7 - same file system). I'll try some further testing on previous Windows versions, but I've not previously come across a single segment corruption on Win 2k3/XP after hard failures. In fact, it was when I first encountered this problem on Server 2008 that I even discovered CheckIndex existed! I guess a good question for the community is: Has anyone else seen/reproduced this problem on Windows 6.x (i.e. Server 2008 or Win7)? Mike, are there any diagnostics/config etc. that I could try to help isolate the problem? Many thanks, Peter On Thu, Dec 2, 2010 at 9:28 AM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Dec 2, 2010 at 4:10 AM, Peter Sturge peter.stu...@gmail.com wrote: The Win7 crashes aren't from disk drivers - they come from, in this case, a Broadcom wireless adapter driver. The corruption comes as a result of the 'hard stop' of Windows. I would imagine this same problem could/would occur on any OS if the plug was pulled from the machine. Actually, Lucene should be robust to this -- losing power, OS crash, hardware failure (as long as the failure doesn't flip bits), etc. This is because we do not delete files associated with an old commit point until all files referenced by the new commit point are successfully fsync'd. However it sounds like something is wrong, at least on Windows 7. I suspect it may be how we do the fsync -- if you look in FSDirectory.fsync, you'll see that we take a String fileName in. We then open a new read/write RandomAccessFile, and call its .getFD().sync(). I think this is potentially risky, ie, it would be better if we called .sync() on the original file we had opened for writing and written lots of data to, before closing it, instead of closing it, opening a new FileDescriptor, and calling sync on it. We could conceivably take this approach, entirely in the Directory impl, by keeping the pool of file handles for write open even after .close() was called. When a file is deleted we'd remove it from that pool, and when it's finally sync'd we'd then sync it and remove it from the pool. Could it be that on Windows 7 the way we fsync (opening a new FileDescriptor long after the first one was closed) doesn't in fact work? Mike
Re: Best practice for Delta every 2 Minutes.
at the time no OOM occurs. but we are not in correct live system ... i thougt maybe i get this problem ... we are running seven cores and each want be update very fast. only one core have a huge index with 28M docs. maybe it makes sense for the future to use solr with replication !? or can i runs two instances, one for search and one for updating ? or is there the danger of corrupt indizes ? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p2005108.html Sent from the Solr - User mailing list archive at Nabble.com.
Dataimport destroys our harddisks
Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr index files are located stopped working (we needed to replace them). After the crash of the raid controller, we decided to move the development of solr/index related stuff to our local development machines. Yesterday i was running another full-import of ~10 Million documents on my local development machine, and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to be around 100% all the time, even if no solr server is running at all. I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything useful. Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now, our steps look like this: 1. Delete the current index 2. Restart solr, to load the updated schemas 3. Start the full import Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved the index to a separate harddisk, but nevertheless this harddisk crashed too. I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this on our production servers when there's the risk of harddisk failures. thanks. -robert
Re: Preventing index segment corruption when windows crashes
On Thu, Dec 2, 2010 at 4:53 AM, Peter Sturge peter.stu...@gmail.com wrote: As I'm not familiar with the syncing in Lucene, I couldn't say whether there's a specific problem with regards Win7/2008 server etc. Windows has long had the somewhat odd behaviour of deliberately caching file handles after an explicit close(). This has been part of NTFS since NT 4 days, but there may be some new behaviour introduced in Windows 6.x (and there is a lot of new behaviour) that causes an issue. I have also seen this problem in Windows Server 2008 (server version of Win7 - same file system). I'll try some further testing on previous Windows versions, but I've not previously come across a single segment corruption on Win 2k3/XP after hard failures. In fact, it was when I first encountered this problem on Server 2008 that I even discovered CheckIndex existed! I guess a good question for the community is: Has anyone else seen/reproduced this problem on Windows 6.x (i.e. Server 2008 or Win7)? Mike, are there any diagnostics/config etc. that I could try to help isolate the problem? Actually it might be easiest to make a standalone Java test, maybe using Lucene's FSDir, that opens files in sequence (0.bin, 1.bin, 2.bin...), writes verifiable them (eg random bytes from a fixed seed) and then closes syncs each one. Then, crash the box while this is running. Finally, run a verify step that checks that the data is correct? Ie that our attempt to fsync worked? It could very well be that windows 6.x is now smarter about fsync in that it only syncs bytes actually written with the currently open file descriptor, and not bytes written agains the same file by past file descriptors (ie via a global buffer cache, like Linux). Mike
Re: Troubles with forming query for solr.
Hello, would something similar along those lines: (field1:term AND field2:term AND field3:term)^2 OR (field1:term AND field2:term)^0.8 OR (field2:term AND field3:term)^0.5 work? You'll probably need to experiment with the boost values to get the desired result. Another option could be investigating the Dismax handler. On 1 December 2010 02:38, kolesman alekkolesni...@gmail.com wrote: Hi, I have some troubles with forming query for solr. Here is my task : I'm indexing objects with 3 fields, for example {field1, field2, filed3} In solr's response I want to get object in special order : 1. Firstly I want to get objects where all 3 fields are matched 2. Then I want to get objects where ONLY field1 and field2 are matched 3. And finnally I want to get objects where ONLY field2 and field3 are matched. Could your explain me how to form query for my task? -- View this message in context: http://lucene.472066.n3.nabble.com/Troubles-with-forming-query-for-solr-tp1996630p1996630.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Return Lucene DocId in Solr Results
I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
Re: Tuning Solr caches with high commit rates (NRT)
great thread and exactly my problems :D i set up two solr-instances, one for update the index and another for searching. When i perform an update. the search-instance dont get the new documents. when i start a commit on searcher he found it. how can i say the searcher that he alwas look not only the old index. automatic refresh ? XD -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-tp1461275p2005738.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tuning Solr caches with high commit rates (NRT)
In order for the 'read-only' instance to see any new/updated documents, it needs to do a commit (since it's read-only, it is a commit of 0 documents). You can do this via a client service that issues periodic commits, or use autorefresh from within solrconfig.xml. Be careful that you don't do anything in the read-only instance that will change the underlying index - like optimize. Peter On Thu, Dec 2, 2010 at 12:51 PM, stockii st...@shopgate.com wrote: great thread and exactly my problems :D i set up two solr-instances, one for update the index and another for searching. When i perform an update. the search-instance dont get the new documents. when i start a commit on searcher he found it. how can i say the searcher that he alwas look not only the old index. automatic refresh ? XD -- View this message in context: http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-tp1461275p2005738.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
In fact, having a master/slave where the master is the indexing/updating machine and the slave(s) are searchers is one of the recommended configurations. The replication is used in many, many sites so it's pretty solid. It's generally not recommended, though, to run separate instances on the *same* server. No matter how many cores/instances/etc, you're still running on the same physical hardware so I/O contention, memory issues, etc are still bounded by your hardware Best Erick On Thu, Dec 2, 2010 at 5:12 AM, stockii st...@shopgate.com wrote: at the time no OOM occurs. but we are not in correct live system ... i thougt maybe i get this problem ... we are running seven cores and each want be update very fast. only one core have a huge index with 28M docs. maybe it makes sense for the future to use solr with replication !? or can i runs two instances, one for search and one for updating ? or is there the danger of corrupt indizes ? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p2005108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dataimport destroys our harddisks
The very first thing I'd ask is how much free space is on your disk when this occurs? Is it possible that you're simply filling up your disk? do note that an optimize may require up to 2X the size of your index if/when it occurs. Are you sure you aren't optimizing as you add items to your index? But I've never heard of Solr causing hard disk crashes, it doesn't do anything special but read/write... Best Erick 2010/12/2 Robert Gründler rob...@dubture.com Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr index files are located stopped working (we needed to replace them). After the crash of the raid controller, we decided to move the development of solr/index related stuff to our local development machines. Yesterday i was running another full-import of ~10 Million documents on my local development machine, and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to be around 100% all the time, even if no solr server is running at all. I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything useful. Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now, our steps look like this: 1. Delete the current index 2. Restart solr, to load the updated schemas 3. Start the full import Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved the index to a separate harddisk, but nevertheless this harddisk crashed too. I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this on our production servers when there's the risk of harddisk failures. thanks. -robert
Re: Return Lucene DocId in Solr Results
Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
RE: SOLR Thesaurus
Hi Lee, Can you describe your thesaurus format (it's not exactly self-descriptive) and how you would like it to be applied? I gather you're referring to a thesaurus feature in another product (or product class)? Maybe if you describe that it would help too. Steve -Original Message- From: lee carroll [mailto:lee.a.carr...@googlemail.com] Sent: Thursday, December 02, 2010 3:56 AM To: solr-user@lucene.apache.org Subject: SOLR Thesaurus Hi List, Coming to and end of a proto type evaluation of SOLR (all very good etc etc) Getting to the point at looking at bells and whistles. Does SOLR have a thesuarus. Cant find any refrerence to one in the docs and on the wiki etc. (Apart from a few mail threads which describe the synonym.txt as a thesuarus) I mean something like: PT: BT: xxx,, NT: xxx,, RT:xxx,xxx,xxx Scope Note: xx, Like i say bells and whistles cheers Lee
RE: Return Lucene DocId in Solr Results
I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
Re: Return Lucene DocId in Solr Results
Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filter in the first core you get returned a set of data that includes the docId from the first core that the document relates to. I have backed off from this approach and have a user defined primary key in the firstCore, which is stored as the reference in the secondCore and when the filter performs the search it goes off and queries the firstCore for each primary key and gets the lucene docId from the returned doc. Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 02:19 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results On the face of it, this doesn't make sense, so perhaps you can explain a bit.The doc IDs from one Solr instance have no relation to the doc IDs from another Solr instance. So anything that uses doc IDs from one Solr instance to create a filter on another instance doesn't seem to be something you'd want to do... Which may just mean I don't understand what you're trying to do. Can you back up a bit and describe the higher-level problem? This seems like it may be an XY problem, see: http://people.apache.org/~hossman/#xyproblem Best Erick On Tue, Nov 30, 2010 at 6:57 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: Hi, I was wondering how I would go about getting the lucene docid included in the results from a solr query? I've built a QueryParser to query another solr instance and and join the results of the two instances through the use of a Filter. The Filter needs the lucene docid to work. This is the only bit I'm missing right now. Thanks, Steve
Re: Dataimport destroys our harddisks
The very first thing I'd ask is how much free space is on your disk when this occurs? Is it possible that you're simply filling up your disk? no, i've checked that already. all disks have plenty of space (they have a capacity of 2TB, and are currently filled up to 20%. do note that an optimize may require up to 2X the size of your index if/when it occurs. Are you sure you aren't optimizing as you add items to your index? index size is not a problem in our case. Our index currently has about 3GB. What do you mean with optimizing as you add items to your index? But I've never heard of Solr causing hard disk crashes, neither did we, and google is the same opinion. One thing that i've found is the mergeFactor value: http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor Our sysadmin speculates that maybe the chunk size of our raid/harddisks and the segment size of the lucene index does not play well together. Does the lucene segment size affect how the data is written to the disk? thanks for your help. -robert Best Erick 2010/12/2 Robert Gründler rob...@dubture.com Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr index files are located stopped working (we needed to replace them). After the crash of the raid controller, we decided to move the development of solr/index related stuff to our local development machines. Yesterday i was running another full-import of ~10 Million documents on my local development machine, and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to be around 100% all the time, even if no solr server is running at all. I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything useful. Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now, our steps look like this: 1. Delete the current index 2. Restart solr, to load the updated schemas 3. Start the full import Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved the index to a separate harddisk, but nevertheless this harddisk crashed too. I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this on our production servers when there's the risk of harddisk failures. thanks. -robert
Re: Dataimport destroys our harddisks
What Raid controller do you use, and what kernel version? (Assuming Linux). We hade problems during high load with a 3Ware raid controller and the current kernel for Ubuntu 10.04, we hade to downgrade the kernel... The problem was a bug in the driver that only showed up with very high disk load (as is the case when doing imports) /Sven 2010/12/2 Robert Gründler rob...@dubture.com: The very first thing I'd ask is how much free space is on your disk when this occurs? Is it possible that you're simply filling up your disk? no, i've checked that already. all disks have plenty of space (they have a capacity of 2TB, and are currently filled up to 20%. do note that an optimize may require up to 2X the size of your index if/when it occurs. Are you sure you aren't optimizing as you add items to your index? index size is not a problem in our case. Our index currently has about 3GB. What do you mean with optimizing as you add items to your index? But I've never heard of Solr causing hard disk crashes, neither did we, and google is the same opinion. One thing that i've found is the mergeFactor value: http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor Our sysadmin speculates that maybe the chunk size of our raid/harddisks and the segment size of the lucene index does not play well together. Does the lucene segment size affect how the data is written to the disk? thanks for your help. -robert Best Erick 2010/12/2 Robert Gründler rob...@dubture.com Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr index files are located stopped working (we needed to replace them). After the crash of the raid controller, we decided to move the development of solr/index related stuff to our local development machines. Yesterday i was running another full-import of ~10 Million documents on my local development machine, and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to be around 100% all the time, even if no solr server is running at all. I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything useful. Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now, our steps look like this: 1. Delete the current index 2. Restart solr, to load the updated schemas 3. Start the full import Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved the index to a separate harddisk, but nevertheless this harddisk crashed too. I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this on our production servers when there's the risk of harddisk failures. thanks. -robert
Re: Dataimport destroys our harddisks
On Dec 2, 2010, at 15:43 , Sven Almgren wrote: What Raid controller do you use, and what kernel version? (Assuming Linux). We hade problems during high load with a 3Ware raid controller and the current kernel for Ubuntu 10.04, we hade to downgrade the kernel... The problem was a bug in the driver that only showed up with very high disk load (as is the case when doing imports) We're running freebsd: RaidController 3ware 9500S-8 Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU Freebsd 7.2, UFS Filesystem. /Sven 2010/12/2 Robert Gründler rob...@dubture.com: The very first thing I'd ask is how much free space is on your disk when this occurs? Is it possible that you're simply filling up your disk? no, i've checked that already. all disks have plenty of space (they have a capacity of 2TB, and are currently filled up to 20%. do note that an optimize may require up to 2X the size of your index if/when it occurs. Are you sure you aren't optimizing as you add items to your index? index size is not a problem in our case. Our index currently has about 3GB. What do you mean with optimizing as you add items to your index? But I've never heard of Solr causing hard disk crashes, neither did we, and google is the same opinion. One thing that i've found is the mergeFactor value: http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor Our sysadmin speculates that maybe the chunk size of our raid/harddisks and the segment size of the lucene index does not play well together. Does the lucene segment size affect how the data is written to the disk? thanks for your help. -robert Best Erick 2010/12/2 Robert Gründler rob...@dubture.com Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr index files are located stopped working (we needed to replace them). After the crash of the raid controller, we decided to move the development of solr/index related stuff to our local development machines. Yesterday i was running another full-import of ~10 Million documents on my local development machine, and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to be around 100% all the time, even if no solr server is running at all. I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything useful. Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now, our steps look like this: 1. Delete the current index 2. Restart solr, to load the updated schemas 3. Start the full import Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved the index to a separate harddisk, but nevertheless this harddisk crashed too. I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this on our production servers when there's the risk of harddisk failures. thanks. -robert
Re: Dataimport destroys our harddisks
That's the same series we use... we hade problems when running other disk-heavy operations like rsync and backup on them too.. But in our case we mostly had hangs or load 180 :P... Can you simulate very heavy random disk i/o? if so then you could check if you still have the same problems... That's all I can be of help with, good luck :) /Sven 2010/12/2 Robert Gründler rob...@dubture.com: On Dec 2, 2010, at 15:43 , Sven Almgren wrote: What Raid controller do you use, and what kernel version? (Assuming Linux). We hade problems during high load with a 3Ware raid controller and the current kernel for Ubuntu 10.04, we hade to downgrade the kernel... The problem was a bug in the driver that only showed up with very high disk load (as is the case when doing imports) We're running freebsd: RaidController 3ware 9500S-8 Corrupt unit: Raid-10 3725.27GB 256K Stripe Size without BBU Freebsd 7.2, UFS Filesystem. /Sven 2010/12/2 Robert Gründler rob...@dubture.com: The very first thing I'd ask is how much free space is on your disk when this occurs? Is it possible that you're simply filling up your disk? no, i've checked that already. all disks have plenty of space (they have a capacity of 2TB, and are currently filled up to 20%. do note that an optimize may require up to 2X the size of your index if/when it occurs. Are you sure you aren't optimizing as you add items to your index? index size is not a problem in our case. Our index currently has about 3GB. What do you mean with optimizing as you add items to your index? But I've never heard of Solr causing hard disk crashes, neither did we, and google is the same opinion. One thing that i've found is the mergeFactor value: http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor Our sysadmin speculates that maybe the chunk size of our raid/harddisks and the segment size of the lucene index does not play well together. Does the lucene segment size affect how the data is written to the disk? thanks for your help. -robert Best Erick 2010/12/2 Robert Gründler rob...@dubture.com Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and in this period 2 of the harddisks where the solr index files are located stopped working (we needed to replace them). After the crash of the raid controller, we decided to move the development of solr/index related stuff to our local development machines. Yesterday i was running another full-import of ~10 Million documents on my local development machine, and during the import, a harddisk failure occurred. Since this failure, my harddisk activity seems to be around 100% all the time, even if no solr server is running at all. I've been googling the last 2 days to find some info about solr related harddisk problems, but i didn't find anything useful. Are there any steps we need to take care of in respect to harddisk failures when doing a full-import? Right now, our steps look like this: 1. Delete the current index 2. Restart solr, to load the updated schemas 3. Start the full import Initially, the solr index and the relational database were located on the same harddisk. After the crash, we moved the index to a separate harddisk, but nevertheless this harddisk crashed too. I'd really appreciate any hints on what we might do wrong when importing data, as we can't release this on our production servers when there's the risk of harddisk failures. thanks. -robert
RE: Return Lucene DocId in Solr Results
I must be missing something as I'm getting a NPE on the line: docIds[i] = termDocs.doc(); here's what I came up with: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory indexDirectory = null; try { indexDirectory = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } //String pkQueryString = resourceId: + favBean.getResourceId(); //Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; IndexReader indexReader = null; TermDocs termDocs = null; try { searcher = new IndexSearcher(indexDirectory, true); indexReader = new FilterIndexReader(searcher.getIndexReader()); termDocs = indexReader.termDocs(); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { Term term = new Term(resourceId, favBean.getResourceId()); try { termDocs.seek(term); docIds[i] = termDocs.doc(); } catch (IOException e) { throw new ParseException(IOException, cannot seek to the primary key + favBean.getResourceId() + in : + indexDir + + e.getMessage()); } //ScoreDoc[] hits = collector.topDocs().scoreDocs; //if(hits != null hits[0] != null) { i++; //} } Arrays.sort(docIds); return docIds; } Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 14:20 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are some tricks you could try if there are performance issues Best Erick On Thu, Dec 2, 2010 at 7:47 AM, Lohrenz,
Multi-valued poly fields search
Hi, (should this be on solr-dev mailing list?) I have this kind of data, about articles in newspapers: article A-001 . published on 2010-10-31, in newspaper N-1, edition E1 . published on 2010-10-30, in newspaper N-2, edition E2 article A-002 . published on 2010-10-30, in newspaper N-1, edition E1 I have to be able to search on those sub-fields, eg: all articles published on 2010-10-30 in newspaper N-1 (all editions) I expect to find document A-002, but not document A-001 I control the indexing, analyzers,... but I would like use standard Solr query syntax (or an extension of it) If I index those documents: add doc field name=idA-001/field field name=pubDate2010-10-31/field field name=nsN-1/field field name=edE1/field field name=pubDate2010-10-30/field field name=nsN-2/field field name=edE2/field /doc doc field name=idA-002/field field name=pubDate2010-10-30/field field name=nsN-1/field field name=edE1/field /doc /add (ie: flattening the structure, losing the link between newspapers and dates) then a search for pubDate=2010-10-30 AND ns=N-1 will give me both documents (because A-001 has been published in newspaper N-1 (at another date) and has been published on 2010-10-30 (but in another newspaper)) Is there any way to index the data/express the search/... to be able to find only document A-002? In Solr terms, I believe that this is a multi-valued poly field (not yet in the current stable version 1.4...) Will this be supported by the next release? (what syntax?) Some idea that I've had (usable with Solr 1.4) (1) Add fields like this for doc A-001: field name=combinedN-1/E1/2010-10-31/field field name=combinedN-2/E2/2010-10-30/field and make a wildcard search N-1/*/2010-10-30 this will work for simple queries, but: . I think that it will not allow range queries: all articles published in newspaper N-1 between 2009-08-01 and 2010-10-15 . a wildcard query on N-1/E2/* will be very inefficient! . writing queries will be more difficult (sometimes the user has to use the field ns, something the field combined,...) (2) Make the simple query pubDate=2010-10-30 AND ns=N-1, but filter the results (the above query will give all correct results, plus some more). This is not a generic solution, and writing the filter will be difficult if the query is more complex: (pubDate=2010-10-31 AND ns=N-1 ) OR (text contains Barack) (3) On the same field as (1) here above, use an analyzer that will cheat the proximity search, in issuing the following terms: term 1: ns:N-1 term 2: ed:E1 term 3: pubDate:2010-10-31 term 11: ns:N-2 term 12: ed:E2 term 13: pubDate:2010-10-30 ... then a proximity search (combined:ns:N-1 AND combined:pubDate:2010-10-30)~3 will give me only document A-002, not document A-001 Again, this will make problems with range queries, won't it? Isn't there any better way to do this? Ideally, I would index this (with my own syntax...): doc field name=idA-001/field field name=pubDate set=12010-10-31/field field name=ns set=1N-1/field field name=ed set=1E1/field field name=pubDate set=22010-10-30/field field name=ns set=2N-2/field field name=ed set=2E2/field /doc and then search: (pubDate=2010-10-31 AND ns=N-1){sameSet} or something like this... I've found references to similar questions, but no answer that I could use in my case. (this one being the closer to my problem: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201001.mbox/%3c9b742a34aa31814594f2bed8dfd9cceec96ca...@sparky.office.techtarget.com%3e or *http://tinyurl.com/3527w4u*) Thanks in advance for your ideas! (and sorry for any english mistakes)
Re: SOLR Thesaurus
Hello Lee, these bells sound like SKOS ;o) AFAIK Solr does not support thesauri just plain flat synonym lists. One could implement a thesaurus filter and put it into the end of the analyzer chain of solr. The filter would then do a thesaurus lookup for each token it receives and possibly * expand the query or * kind of stem document tokens to some prefered variants according to the thesaurus Maybe even taking term relations from thesaurus into account and boost queries or doc fields at index time. Maybe have a look at http://poolparty.punkt.at/ a full features SKOS thesaurus management server. It's also providing webservices which could feed such a Solr filter. Kind regards Michael - Ursprüngliche Mail - Von: lee carroll lee.a.carr...@googlemail.com An: solr-user@lucene.apache.org Gesendet: Donnerstag, 2. Dezember 2010 09:55:54 Betreff: SOLR Thesaurus Hi List, Coming to and end of a proto type evaluation of SOLR (all very good etc etc) Getting to the point at looking at bells and whistles. Does SOLR have a thesuarus. Cant find any refrerence to one in the docs and on the wiki etc. (Apart from a few mail threads which describe the synonym.txt as a thesuarus) I mean something like: PT: BT: xxx,, NT: xxx,, RT:xxx,xxx,xxx Scope Note: xx, Like i say bells and whistles cheers Lee
Re: SOLR Thesaurus
Hi Stephen, yes sorry should have been more plain a term can have a Prefered Term (PT), many Broader Terms (BT), Many Narrower Terms (NT) Related Terms (RT) etc So User supplied Term is say : Ski Prefered term: Skiing Broader terms could be : Ski and Snow Boarding, Mountain Sports, Sports Narrower terms: down hill skiing, telemark, cross country Related terms: boarding, snow boarding, winter holidays Michael, yes exactly, SKOS, although maybe without the over wheening ambition to take over the world. By the sounds of it though out of the box you get a simple (but pretty effective synonym list and ring) Anything more we'd need to write it ourselfs ie your thesaurus filter and plus a change to the response as broader terms, narrower terms etc would be good to be suggested to the ui. No plugins out there ? On 2 December 2010 16:16, Michael Zach za...@punkt.at wrote: Hello Lee, these bells sound like SKOS ;o) AFAIK Solr does not support thesauri just plain flat synonym lists. One could implement a thesaurus filter and put it into the end of the analyzer chain of solr. The filter would then do a thesaurus lookup for each token it receives and possibly * expand the query or * kind of stem document tokens to some prefered variants according to the thesaurus Maybe even taking term relations from thesaurus into account and boost queries or doc fields at index time. Maybe have a look at http://poolparty.punkt.at/ a full features SKOS thesaurus management server. It's also providing webservices which could feed such a Solr filter. Kind regards Michael - Ursprüngliche Mail - Von: lee carroll lee.a.carr...@googlemail.com An: solr-user@lucene.apache.org Gesendet: Donnerstag, 2. Dezember 2010 09:55:54 Betreff: SOLR Thesaurus Hi List, Coming to and end of a proto type evaluation of SOLR (all very good etc etc) Getting to the point at looking at bells and whistles. Does SOLR have a thesuarus. Cant find any refrerence to one in the docs and on the wiki etc. (Apart from a few mail threads which describe the synonym.txt as a thesuarus) I mean something like: PT: BT: xxx,, NT: xxx,, RT:xxx,xxx,xxx Scope Note: xx, Like i say bells and whistles cheers Lee
Import Data Into Solr
Hi, all, I am a new user of Solr. Before using it, all of the data is indexed myself with Lucene. According to the Chapter 3 of the book, Solr. 1.4 Enterprise Search Server written by David Smiley and Eric Pugh, data in the formats of XML, CSV and even PDF, etc, can be imported to Solr. If I wish to import the Lucene indexes into Solr, may I have any other approaches? I know that Solr is a serverized Lucene. Thanks, Bing Li
Re: Dinamically change master
Back with my master resiliency need, talking with Upayavira we discovered we were proposing the same solution :-) This can be useful if you don't have a VIP with master/backup polling policy. It goes like this: there are 2 host for indexing, one is the main and one is the backup one, the backup one is slave of the main one and the main one is also master of N hosts which will be used for searching. If the main master goes down then the backup one will be used for indexing and/or serving search slaves. This last feature can be done defining an external properties file for each search slave which will contain the URL to master (pointed inside the replication request handler tag of solrconfig.xml), so if these search slaves run on multi core one has only to change properties file URL and issue a http://SLAVEURL/solr/admin/cores?action=RELOADcore=core0 to get polling the backup master. Cheers, Tommaso 2010/12/1 Tommaso Teofili tommaso.teof...@gmail.com Thanks Upayavira, that sounds very good. p.s.: I read that page some weeks ago and didn't get back to check on it. 2010/12/1 Upayavira u...@odoko.co.uk Note, all extracted from http://wiki.apache.org/solr/SolrReplication You'd put: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. -- str name=replicateAfterstartup/str str name=replicateAftercommit/str /lst /requestHandler into every box you want to be able to act as a master, then use: http://slave_host:port /solr/replication?command=fetchindexmasterUrl=your master URL As the above page says better than I can, It is possible to pass on extra attribute 'masterUrl' or other attributes like 'compression' (or any other parameter which is specified in the lst name=slave tag) to do a one time replication from a master. This obviates the need for hardcoding the master in the slave. HTH, Upayavira On Wed, 01 Dec 2010 06:24 +0100, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi Upayavira, this is a good start for solving my problem, can you please tell how does such a replication URL look like? Thanks, Tommaso 2010/12/1 Upayavira u...@odoko.co.uk Hi Tommaso, I believe you can tell each server to act as a master (which means it can have its indexes pulled from it). You can then include the master hostname in the URL that triggers a replication process. Thus, if you triggered replication from outside solr, you'd have control over which master you pull from. Does this answer your question? Upayavira On Tue, 30 Nov 2010 09:18 -0800, Ken Krugler kkrugler_li...@transpac.com wrote: Hi Tommaso, On Nov 30, 2010, at 7:41am, Tommaso Teofili wrote: Hi all, in a replication environment if the host where the master is running goes down for some reason, is there a way to communicate to the slaves to point to a different (backup) master without manually changing configuration (and restarting the slaves or their cores)? Basically I'd like to be able to change the replication master dinamically inside the slaves. Do you have any idea of how this could be achieved? One common approach is to use VIP (virtual IP) support provided by load balancers. Your slaves are configured to use a VIP to talk to the master, so that it's easy to dynamically change which master they use, via updates to the load balancer config. -- Ken -- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
TermsComponent prefix query with fileds analyzers
Hi everyone Does anyone know how to apply some analyzers over a prefix query? What I'm looking for is a way to build an autosuggest using the termsComponent that could be able to remove the accents from the query's prefix. For example, I have the term analisis in the index and I want to retrieve it with the prefix Análi (notice the accent in the third letter). I think the regexp function won't help here, so I was wondering if specifying some analyzers (LowerCase and ASCIIFolding) in the termComponents configuration, it would be applied over the prefix. Thanks in advance. Nestor
disabled replication setting
For solr replication, we can send command to disable replication. Does anyone know where i can verify the replication enabled/disabled setting? i cannot seem to find it on dashboard or details command output. Thanks, Xin
Exceptions in Embedded Solr
Hi everyone, I get the exception below when using Embedded Solr suddenly. If I delete the Solr index it goes back to normal, but it obviously has to start indexing from scratch. Any idea what the cause of this is? java.lang.RuntimeException: java.io.FileNotFoundException: /home/evanthika/WSO2/CARBON/GREG/3.6.0/23-11-2010/normal/wso2greg-3.6.0/solr/data/index/segments_2 (No such file or directory) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.wso2.carbon.registry.indexing.solr.SolrClient.init(SolrClient.java:103) at org.wso2.carbon.registry.indexing.solr.SolrClient.getInstance(SolrClient.java:115) ... 44 more Caused by: java.io.FileNotFoundException: /home/evanthika/WSO2/CARBON/GREG/3.6.0/23-11-2010/normal/wso2greg-3.6.0/solr/data/index/segments_2 (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:108) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:94) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:691) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:236) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:72) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) ... 48 more [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.SolrCore} - REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@58f24b6 (null) has a reference count of 1 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.SolrCore} - REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@654dbbf6 (null) has a reference count of 1 [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.CoreContainer} - CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [2010-11-23 14:14:46,568] ERROR {org.apache.solr.core.CoreContainer} - CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! -- Regards, Tharindu -- Regards, Tharindu
RE: disabled replication setting
Does anything know? Thanks, -Original Message- From: Xin Li [mailto:xin.li@gmail.com] Sent: Thursday, December 02, 2010 12:25 PM To: solr-user@lucene.apache.org Subject: disabled replication setting For solr replication, we can send command to disable replication. Does anyone know where i can verify the replication enabled/disabled setting? i cannot seem to find it on dashboard or details command output. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity.
Re: Return Lucene DocId in Solr Results
You have to call termDocs.next() after termDocs.seek. Something like termDocs.seek(). if (termDocs.next()) { // means there was a term/doc matching and your references should be valid. } On Thu, Dec 2, 2010 at 10:22 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I must be missing something as I'm getting a NPE on the line: docIds[i] = termDocs.doc(); here's what I came up with: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory indexDirectory = null; try { indexDirectory = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } //String pkQueryString = resourceId: + favBean.getResourceId(); //Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; IndexReader indexReader = null; TermDocs termDocs = null; try { searcher = new IndexSearcher(indexDirectory, true); indexReader = new FilterIndexReader(searcher.getIndexReader()); termDocs = indexReader.termDocs(); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { Term term = new Term(resourceId, favBean.getResourceId()); try { termDocs.seek(term); docIds[i] = termDocs.doc(); } catch (IOException e) { throw new ParseException(IOException, cannot seek to the primary key + favBean.getResourceId() + in : + indexDir + + e.getMessage()); } //ScoreDoc[] hits = collector.topDocs().scoreDocs; //if(hits != null hits[0] != null) { i++; //} } Arrays.sort(docIds); return docIds; } Thanks, Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 14:20 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven steven.lohr...@hmhpub.comwrote: I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way to do it: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, ListFavorites favsBeans) throws ParseException { // open the core get data directory String indexDir = req.getCore().getIndexDir(); FSDirectory index = null; try { index = FSDirectory.open(new File(indexDir)); } catch (IOException e) { throw new ParseException(IOException, cannot open the index at: + indexDir + + e.getMessage()); } int[] docIds = new int[favsBeans.size()]; int i = 0; for(Favorites favBean: favsBeans) { String pkQueryString = resourceId: + favBean.getResourceId(); Query pkQuery = new QueryParser(Version.LUCENE_CURRENT, resourceId, new StandardAnalyzer()).parse(pkQueryString); IndexSearcher searcher = null; TopScoreDocCollector collector = null; try { searcher = new IndexSearcher(index, true); collector = TopScoreDocCollector.create(1, true); searcher.search(pkQuery, collector); } catch (IOException e) { throw new ParseException(IOException, cannot search the index at: + indexDir + + e.getMessage()); } ScoreDoc[] hits = collector.topDocs().scoreDocs; if(hits != null hits[0] != null) { docIds[i] = hits[0].doc; i++; } } Arrays.sort(docIds); return docIds; } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 December 2010 13:46 To: solr-user@lucene.apache.org Subject: Re: Return Lucene DocId in Solr Results Sounds good, especially because your old scenario was fragile. The doc IDs in
Re: Import Data Into Solr
You can just point your Solr instance at your Lucene index. Really, copy the Lucene index into the right place to be found by solr. HOWEVER, you need to take great care that the field definitions that you used when you built your Lucene index are compatible with the ones configured in your schema.xml file. This is NOT a trivial task. I'd recommend that you try having Solr build your index, you'll probably want to sometime in the future anyway so you might as well bite the bullet now if possible... Plus, I'm not quite sure about version index issues. Best Erick On Thu, Dec 2, 2010 at 11:54 AM, Bing Li lbl...@gmail.com wrote: Hi, all, I am a new user of Solr. Before using it, all of the data is indexed myself with Lucene. According to the Chapter 3 of the book, Solr. 1.4 Enterprise Search Server written by David Smiley and Eric Pugh, data in the formats of XML, CSV and even PDF, etc, can be imported to Solr. If I wish to import the Lucene indexes into Solr, may I have any other approaches? I know that Solr is a serverized Lucene. Thanks, Bing Li
Re: SOLR Thesaurus
No, it doesn't. And it's not entirely clear what (if any) simple way there is to use Solr to expose hieararchically related documents in a way that preserves and usefully allows navigation of the relationships. At least in general, for sophisticated stuff. On 12/2/2010 3:55 AM, lee carroll wrote: Hi List, Coming to and end of a proto type evaluation of SOLR (all very good etc etc) Getting to the point at looking at bells and whistles. Does SOLR have a thesuarus. Cant find any refrerence to one in the docs and on the wiki etc. (Apart from a few mail threads which describe the synonym.txt as a thesuarus) I mean something like: PT: BT: xxx,, NT: xxx,, RT:xxx,xxx,xxx Scope Note: xx, Like i say bells and whistles cheers Lee
Re: TermsComponent prefix query with fileds analyzers
I don't believe you can. If you just need query-time transformation, can't you just do it in your client app? If you need index-time transformation... well, you can do that, but it's up to your schema.xml and will of course apply to the field as a whole, not just for termscomponent queries, because that's just how solr works. I'd note for your example, you'll also have to lowercase that capital A if you want it to match a lowercased a in a termscomponent prefix query. To my mind (others may disagree), robust flexible auto-complete like this is still a somewhat unsolved problem in Solr, the termscomponent approach has it's definite limitations. On 12/2/2010 12:24 PM, Nestor Oviedo wrote: Hi everyone Does anyone know how to apply some analyzers over a prefix query? What I'm looking for is a way to build an autosuggest using the termsComponent that could be able to remove the accents from the query's prefix. For example, I have the term analisis in the index and I want to retrieve it with the prefix Análi (notice the accent in the third letter). I think the regexp function won't help here, so I was wondering if specifying some analyzers (LowerCase and ASCIIFolding) in the termComponents configuration, it would be applied over the prefix. Thanks in advance. Nestor
RE: ramBufferSizeMB not reflected in segment sizes in index
Hi Mike, We turned on infostream. Is there documentation about how to interpret it, or should I just grep through the codebase? Is the excerpt below what I am looking for as far as understanding the relationship between ramBufferSize and size on disk? is newFlushedSize the size on disk in bytes? DW: ramUsed=329.782 MB newFlushedSize=74520060 docs/MB=0.943 new/old=21.55% RAM: now balance allocations: usedMB=325.997 vs trigger=320 deletesMB=0.048 byteBlockFre e=0.125 perDocFree=0.006 charBlockFree=0 ... DW: after free: freedMB=0.225 usedMB=325.82 Dec 1, 2010 5:40:22 PM IW 0 [Wed Dec 01 17:40:22 EST 2010; http-8091-Processor12]: flush: now pause all indexing threads Dec 1, 2010 5:40:22 PM IW 0 [Wed Dec 01 17:40:22 EST 2010; http-8091-Processor12]: flush: segment=_5h docStoreSegment=_5e docStoreOffset=266 flushDocs=true flushDeletes=false flushDocStores=false numDocs=40 numBufDelTerms=40 ... Dec 1, 2010 5:40:22 PM purge field=geographic Dec 1, 2010 5:40:22 PM purge field=serialTitle_ab Dec 1, 2010 5:40:33 PM IW 0 [Wed Dec 01 17:40:33 EST 2010; http-8091-Processor12]: DW: ramUsed=325.772 MB newFlushedSize=69848046 docs/MB=0.6 new/old=20.447% Dec 1, 2010 5:40:33 PM IW 0 [Wed Dec 01 17:40:33 EST 2010; http-8091-Processor12]: flushedFiles=[_5h.frq, _5h.tis, _5h.prx, _5h.nrm, _5h.fnm, _5h.tii] Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, December 01, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Re: ramBufferSizeMB not reflected in segment sizes in index On Wed, Dec 1, 2010 at 3:16 PM, Burton-West, Tom tburt...@umich.edu wrote: Thanks Mike, Yes we have many unique terms due to dirty OCR and 400 languages and probably lots of low doc freq terms as well (although with the ICUTokenizer and ICUFoldingFilter we should get fewer terms due to bad tokenization and normalization.) OK likely this explains the lowish RAM efficiency. Is this additional overhead because each unique term takes a certain amount of space compared to adding entries to a list for an existing term? Exactly. There's a highish startup cost for each term but then appending docs/positions to that term is more efficient especially for higher frequency terms. In the limit, a single unique term across all docs will have very high RAM efficiency... Does turning on IndexWriters infostream have a significant impact on memory use or indexing speed? I don't believe so Mike
Re: ramBufferSizeMB not reflected in segment sizes in index
On Wed, Dec 1, 2010 at 3:01 PM, Shawn Heisey s...@elyograg.org wrote: I have seen this. In Solr 1.4.1, the .fdt, .fdx, and the .tv* files do not segment, but all the other files do. I can't remember whether it behaves the same under 3.1, or whether it also creates these files in each segment. Yep, that's the shared doc store (where stored fields go.. the non-inverted part of the index), and it works like that in 3.x and trunk too. It's nice because when you merge segments, you don't have to re-copy the docs (provided you're within a single indexing session). There have been discussions about removing it in trunk though... we'll see. -Yonik http://www.lucidimagination.com
Joining Fields in and Index
All, I have an index that has a field with country codes in it. I have 7 million or so documents in the index and when displaying facets the country codes don't mean a whole lot to me. Is there any way to add a field with the full country names then join the codes in there accordingly? I suppose I can do this before updating the records in the index but before I do that I would like to know if there is a way to do this sort of join. Example: US - United States Thanks, Adam
Re: Joining Fields in and Index
Hi, If you are able to do a full re-index then you could index the full names and not the codes. When you later facet on the Country field you'll get the actual name rather than the code. If you are not able to re-index then probably this conversion could be added at your application layer prior to displaying your results.(e.g. in your DAO object) On 2 December 2010 22:05, Adam Estrada estrada.adam.gro...@gmail.comwrote: All, I have an index that has a field with country codes in it. I have 7 million or so documents in the index and when displaying facets the country codes don't mean a whole lot to me. Is there any way to add a field with the full country names then join the codes in there accordingly? I suppose I can do this before updating the records in the index but before I do that I would like to know if there is a way to do this sort of join. Example: US - United States Thanks, Adam
Re: Joining Fields in and Index
Hi, I was hoping to do it directly in the index but it was more out of curiosity than anything. I can certainly map it in the DAO but again...I was hoping to learn if it was possible in the index. Thanks for the feedback! Adam On Dec 2, 2010, at 5:48 PM, Savvas-Andreas Moysidis wrote: Hi, If you are able to do a full re-index then you could index the full names and not the codes. When you later facet on the Country field you'll get the actual name rather than the code. If you are not able to re-index then probably this conversion could be added at your application layer prior to displaying your results.(e.g. in your DAO object) On 2 December 2010 22:05, Adam Estrada estrada.adam.gro...@gmail.comwrote: All, I have an index that has a field with country codes in it. I have 7 million or so documents in the index and when displaying facets the country codes don't mean a whole lot to me. Is there any way to add a field with the full country names then join the codes in there accordingly? I suppose I can do this before updating the records in the index but before I do that I would like to know if there is a way to do this sort of join. Example: US - United States Thanks, Adam
Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException
It WORKED Thank you so much everybody! I feel like jumping up and down like 'Hiro' on Heroes Dennis Gearon - Original Message - From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Wednesday, December 01, 2010 7:51 PM Subject: spatial query parinsg error: org.apache.lucene.queryParser.ParseException I am trying to get spatial search to work on my Solr installation. I am running version 1.4.1 with the Jayway Team spatial-solr-plugin. I am performing the search with the following url: http://localhost:8080/solr/select?wt=jsonindent=trueq=title:Art%20Loft{!spatial%20lat=37.326375%20lng=-121.892639%20radius=3%20unit=km%20threadCount=3} The result that I get is the following error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 'title:Art Loft{!spatial lat=37.326375 lng=-121.892639 radius=3 unit=km threadCount=3}': Encountered RANGEEX_GOOP lng=-121.892639 at line 1, column 38. Was expecting: } Not sure why it would be complaining about the lng parameter in the query. I double-checked to make sure that I had the right name for the longitude field in my solrconfig.xml file. Any help/suggestions would be greatly appreciated Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Cannot start Solr anymore
Hi, I'm new here. First, could anyone tell me how to restart solr? I started solr and killed the process. Then when I tried to start it again, it failed: $ java -jar start.jar 2010-12-02 14:28:00.011::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-12-02 14:28:00.099::INFO: jetty-6.1.3 2010-12-02 14:28:00.231::WARN: Failed startup of context org.mortbay.jetty.webapp.webappcont...@73901437 {/solr,jar:file:/.../solr/apache-solr-1.4.1/example/webapps/solr.war!/} java.util.zip.ZipException: invalid END header (bad central directory offset) Thanks Richard
Re: TermsComponent prefix query with fileds analyzers
Does anyone know how to apply some analyzers over a prefix query? Lucene has an special QueryParser for this. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html Someone provided a patch to use it in solr. It was an attachment to a thread at nabble. I couldn't find it now. Similar discussion : http://search-lucene.com/m/oMtRJQPgGb1/
Re: solr/admin/dataimport Not Found
(10/12/03 8:58), Ruixiang Zhang wrote: I tried to import data from mysql. When I tried to run http://mydomain.com:8983/solr/admin/dataimport , I got these error message: HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin/dataimport *Powered by Jetty://http://jetty.mortbay.org/ * Any help will be appreciated!!! Thanks Richard Richard, Usually, it should be http://mydomain.com:8983/solr/dataimport Koji -- http://www.rondhuit.com/en/
Re: solr/admin/dataimport Not Found
Hi Koji Thanks for your reply. I pasted the wrong link. Actually I tried this fist http://mydomain.com:8983/solr/dataimport It didn't work. The page should be there after installation, right? Did I miss something? Thanks a lot! Richard On Thu, Dec 2, 2010 at 4:23 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/12/03 8:58), Ruixiang Zhang wrote: I tried to import data from mysql. When I tried to run http://mydomain.com:8983/solr/admin/dataimport , I got these error message: HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin/dataimport *Powered by Jetty://http://jetty.mortbay.org/ * Any help will be appreciated!!! Thanks Richard Richard, Usually, it should be http://mydomain.com:8983/solr/dataimport Koji -- http://www.rondhuit.com/en/
Re: solr/admin/dataimport Not Found
(10/12/03 9:29), Ruixiang Zhang wrote: Hi Koji Thanks for your reply. I pasted the wrong link. Actually I tried this fist http://mydomain.com:8983/solr/dataimport It didn't work. The page should be there after installation, right? Did I miss something? Thanks a lot! Richard To work that URL, you have to have a request handler in your solrconfig.xml: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler If you try DIH for the first time, please read solr/example/example-DIH/README.txt and try example-DIH first. Koji -- http://www.rondhuit.com/en/
Re: ramBufferSizeMB not reflected in segment sizes in index
On Thu, Dec 2, 2010 at 4:31 PM, Burton-West, Tom tburt...@umich.edu wrote: We turned on infostream. Is there documentation about how to interpret it, or should I just grep through the codebase? There isn't any documentation... and it changes over time as we add new diagnostics. Is the excerpt below what I am looking for as far as understanding the relationship between ramBufferSize and size on disk? is newFlushedSize the size on disk in bytes? Yes -- so IW's buffer was using 329.782 MB RAM, and was flushed to a 69,848,046 byte segment. Mike
Re: solr/admin/dataimport Not Found
Thank you so much, Koji, the example-DIH works. I'm reading for details... Richard On Thu, Dec 2, 2010 at 4:39 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/12/03 9:29), Ruixiang Zhang wrote: Hi Koji Thanks for your reply. I pasted the wrong link. Actually I tried this fist http://mydomain.com:8983/solr/dataimport It didn't work. The page should be there after installation, right? Did I miss something? Thanks a lot! Richard To work that URL, you have to have a request handler in your solrconfig.xml: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler If you try DIH for the first time, please read solr/example/example-DIH/README.txt and try example-DIH first. Koji -- http://www.rondhuit.com/en/
Limit number of characters returned
Is there way to limit the number of characters returned from a stored field? For example: Say I have a document (~2K words) and I search for a word that's somewhere in the middle. I would like the document to match the search query but the stored field should only return the first 200 characters of the document. Is there anyway to accomplish this that doesn't involve two fields? Thanks
PDF text extracted without spaces
Hello all, I know, this is not the right group to ask this question, thought some of you guys might have experienced. I newbie with Tika. I am using latest version 0.8 version. I extracted text from PDF document but found spaces and new line missing. Indexing the data gives wrong result. Could any one in this group could help me? I am using tika directly to extract the contents, which later gets indexed. Regards Ganesh Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
Solr Multi-thread Update Transaction Control
Hi, Now we are using solr1.4.1, and encounter a problem. When multi-threads update solr data at the same time, can every thread have its separate transaction? If this is possible, how can we realize this. Is there any suggestion here? Waiting online. Thank you for any useful reply.
Query performance very slow even after autowarming
Hi, I am using edgeNgramFilterfactory on SOLR 1.4.1 [filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 /] for my indexing. Each document will have about 5 fields in it and only one field is indexed with EdgeNGramFilterFactory. I have about 1.4 million documents in my index now and my index size is approx 296MB. I made the field that is indexed with EdgeNGramFilterFactory as default search field. All my query responses are very slow, some of them taking more than 10seconds to respond. All my query responses are very slow, Queries with single letters are still very slow. /select/?q=m So I tried query warming as follows. listener event=newSearcher class=solr.QuerySenderListener arr name=queries lststr name=qa/str/lst lststr name=qb/str/lst lststr name=qc/str/lst lststr name=qd/str/lst lststr name=qe/str/lst lststr name=qf/str/lst lststr name=qg/str/lst lststr name=qh/str/lst lststr name=qi/str/lst lststr name=qj/str/lst lststr name=qk/str/lst lststr name=ql/str/lst lststr name=qm/str/lst lststr name=qn/str/lst lststr name=qo/str/lst lststr name=qp/str/lst lststr name=qq/str/lst lststr name=qr/str/lst lststr name=qs/str/lst lststr name=qt/str/lst lststr name=qu/str/lst lststr name=qv/str/lst lststr name=qw/str/lst lststr name=qx/str/lst lststr name=qy/str/lst lststr name=qz/str/lst /arr /listener The same above is done for firstSearcher as well. My cache settings are as follows. filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=1024/ documentCache class=solr.LRUCache size=16384 initialSize=16384 / Still after query warming, few single character search is taking up to 3 seconds to respond. Am i doing anything wrong in my cache setting or autowarm setting or am i missing anything here? Thanks, Johnny -- View this message in context: http://lucene.472066.n3.nabble.com/Query-performance-very-slow-even-after-autowarming-tp2010384p2010384.html Sent from the Solr - User mailing list archive at Nabble.com.